Full Code of yihong0618/bilingual_book_maker for AI

main d66c685c24b4 cached

60 files

264.8 KB

64.4k tokens

258 symbols

1 requests

Download .txt

Showing preview only (281K chars total). Download the full file or copy to clipboard to get everything.

Repository: yihong0618/bilingual_book_maker
Branch: main
Commit: d66c685c24b4
Files: 60
Total size: 264.8 KB

Directory structure:
gitextract_6fkzntyz/

├── .dockerignore
├── .github/
│   └── workflows/
│       ├── docs.yaml
│       ├── make_test_ebook.yaml
│       └── release.yaml
├── .gitignore
├── Dockerfile
├── LICENSE
├── Makefile
├── README-CN.md
├── README.md
├── book_maker/
│   ├── __init__.py
│   ├── __main__.py
│   ├── cli.py
│   ├── config.py
│   ├── loader/
│   │   ├── __init__.py
│   │   ├── base_loader.py
│   │   ├── epub_loader.py
│   │   ├── helper.py
│   │   ├── md_loader.py
│   │   ├── pdf_loader.py
│   │   ├── srt_loader.py
│   │   └── txt_loader.py
│   ├── obok.py
│   ├── translator/
│   │   ├── __init__.py
│   │   ├── base_translator.py
│   │   ├── caiyun_translator.py
│   │   ├── chatgptapi_translator.py
│   │   ├── claude_translator.py
│   │   ├── custom_api_translator.py
│   │   ├── deepl_free_translator.py
│   │   ├── deepl_translator.py
│   │   ├── gemini_translator.py
│   │   ├── google_translator.py
│   │   ├── groq_translator.py
│   │   ├── litellm_translator.py
│   │   ├── qwen_translator.py
│   │   ├── tencent_transmart_translator.py
│   │   └── xai_translator.py
│   └── utils.py
├── disclaimer.md
├── docs/
│   ├── book_source.md
│   ├── cmd.md
│   ├── disclaimer.md
│   ├── env_settings.md
│   ├── index.md
│   ├── installation.md
│   ├── model_lang.md
│   ├── prompt.md
│   └── quickstart.md
├── make_book.py
├── mkdocs.yml
├── prompt_md.json
├── prompt_md.prompt.md
├── prompt_template_sample.json
├── pyproject.toml
├── tests/
│   ├── test_epub_metadata.py
│   ├── test_integration.py
│   ├── test_pdf_cli.py
│   └── test_pdf_loader.py
└── typos.toml

================================================
FILE CONTENTS
================================================

================================================
FILE: .dockerignore
================================================
Dockerfile*
docker-compose*
LICENSE
test_books
README*
.dockerignore
.git
.github
.gitignore
.vscode

================================================
FILE: .github/workflows/docs.yaml
================================================
name: Publish docs
on:
  push:
    branches:
      - main

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-python@v2
        with:
          python-version: '3.10'
      - run: pip install mkdocs mkdocs-material
      - run: mkdocs gh-deploy --force


================================================
FILE: .github/workflows/make_test_ebook.yaml
================================================
name: CI

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]
  workflow_dispatch:
  
env:
  ACTIONS_ALLOW_UNSECURE_COMMANDS: true
  OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
  BBM_CAIYUN_API_KEY: ${{ secrets.BBM_CAIYUN_API_KEY }}

jobs:
  typos-check:
    name: Spell Check with Typos
    runs-on: ubuntu-latest
    steps:
      - name: Checkout Actions Repository
        uses: actions/checkout@v3
      - name: Check spelling with custom config file
        uses: crate-ci/typos@v1.16.6
        with:
          config: ./typos.toml
  testing:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: install python 3.10
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'
          cache: 'pip' # caching pip dependencies
      - name: Check formatting (black)
        run: |
            pip install black
            black . --check
      - name: install python requirements
        run: pip install -r requirements.txt

      - name: Test install
        run: |
            pip install .

      - name: make normal ebook test using google translate and cli
        run: |
            bbook_maker --book_name "test_books/Liber_Esther.epub" --test --test_num 10 --model google --translate-tags div,p
            bbook_maker --book_name "test_books/Liber_Esther.epub" --test --test_num 20 --model google

      - name: make txt book test using google translate
        run: |
          python3 make_book.py --book_name "test_books/the_little_prince.txt" --test --test_num 20 --model google

      - name: make txt book test with batch_size
        run: |
          python3 make_book.py --book_name "test_books/the_little_prince.txt" --test --batch_size 30 --test_num 20 --model google
  
      - name: make caiyun translator test
        if: env.BBM_CAIYUN_API_KEY != null
        run: |
          python3 make_book.py --book_name "test_books/the_little_prince.txt" --test --batch_size 30 --test_num 100 --model caiyun

      - name: make openai key ebook test
        if: env.BBM_DEEPL_API_KEY != null
        run: |
            python3 make_book.py --book_name "test_books/lemo.epub" --test --test_num 5 --language zh-hans
            python3 make_book.py --book_name "test_books/animal_farm.epub" --test --test_num 5 --language ja --model gpt3 --prompt prompt_template_sample.txt
            python3 make_book.py --book_name "test_books/animal_farm.epub" --test --test_num 5 --language ja --prompt prompt_template_sample.json
            python3 make_book.py --book_name test_books/Lex_Fridman_episode_322.srt --test --test_num 20
            
      - name: Rename and Upload ePub
        if: env.OPENAI_API_KEY != null
        uses: actions/upload-artifact@v4
        with:
          name: epub_output
          path: "test_books/lemo_bilingual.epub"


================================================
FILE: .github/workflows/release.yaml
================================================
name: Release and Build Docker Image

permissions:
  contents: write

on:
  push:
    tags:
      - "*"

jobs:
  release-pypi:
    name: Build and Release PyPI
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0

      - uses: actions/setup-python@v4
        with:
          python-version: "3.10"

      - uses: actions/setup-node@v3
        with:
          node-version: 16

      - name: Build artifacts
        run: |
          pip install build
          python -m build

      - uses: pypa/gh-action-pypi-publish@release/v1
        with:
          password: ${{ secrets.PYPI_API_TOKEN }}



================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
.idea/
.DS_Store
test_books/

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

/test_books/*.epub
log/
.chatgpt_cache.json
# for user do not want to push
*.srt
*.txt
*.bin
*.epub

# For markdown files in user directories
.cursorrules
books/
prompts/
.pdm-python


================================================
FILE: Dockerfile
================================================
FROM python:3.10-slim

RUN apt-get update

WORKDIR /app

COPY requirements.txt .

RUN pip install -r /app/requirements.txt

COPY . .

ENTRYPOINT ["python3", "make_book.py"]


================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2023 yihong

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: Makefile
================================================
SHELL := /bin/bash

fmt:
	@echo "Running formatter ..."
	venv/bin/black .

.PHONY:tests
tests:
	@echo "Running tests ..."
	venv/bin/pytest tests/test_integration.py

serve-docs:
	mkdocs serve


================================================
FILE: README-CN.md
================================================
# bilingual_book_maker

bilingual_book_maker 是一个 AI 翻译工具，使用 ChatGPT 帮助用户制作多语言版本的 epub/txt/srt 文件和图书。该工具仅适用于翻译进入公共版权领域的 epub/txt 图书，不适用于有版权的书籍。请在使用之前阅读项目的 **[免责声明](./disclaimer.md)**。

![image](https://user-images.githubusercontent.com/15976103/222317531-a05317c5-4eee-49de-95cd-04063d9539d9.png)

## 准备

1. ChatGPT or OpenAI token [^token]
2. epub/txt books
3. 能正常联网的环境或 proxy
4. python3.8+

## 快速开始

本地放了一个 `test_books/animal_farm.epub` 给大家测试

```shell
pip install -r requirements.txt
python3 make_book.py --book_name test_books/animal_farm.epub --openai_key ${openai_key} --test
或
pip install -U bbook_maker
bbook --book_name test_books/animal_farm.epub --openai_key ${openai_key} --test
```

## 翻译服务

- 使用 `--openai_key` 指定 OpenAI API key，如果有多个可以用英文逗号分隔(xxx,xxx,xxx)，可以减少接口调用次数限制带来的错误。
  或者，指定环境变量 `BBM_OPENAI_API_KEY` 来略过这个选项。
- 默认用了 [GPT-3.5-turbo](https://openai.com/blog/introducing-chatgpt-and-whisper-apis) 模型，也就是 ChatGPT 正在使用的模型。

* DeepL

  使用 DeepL 封装的 api 进行翻译，需要付费。[DeepL Translator](https://rapidapi.com/splintPRO/api/dpl-translator) 来获得 token

  ```shell
  python3 make_book.py --book_name test_books/animal_farm.epub --model deepl --deepl_key ${deepl_key}
  ```

* DeepL free

  使用 DeepL free

  ```shell
  python3 make_book.py --book_name test_books/animal_farm.epub --model deeplfree
  ```

* Claude

  使用 [Claude](https://console.anthropic.com/docs) 模型进行翻译

  ```shell
  python3 make_book.py --book_name test_books/animal_farm.epub --model claude --claude_key ${claude_key}
  ```

* 谷歌翻译

  ```shell
  python3 make_book.py --book_name test_books/animal_farm.epub --model google
  ```

* 彩云小译

  ```shell
  python3 make_book.py --book_name test_books/animal_farm.epub --model caiyun --caiyun_key ${caiyun_key}
  ```

* Gemini

  ```shell
  python3 make_book.py --book_name test_books/animal_farm.epub --model gemini --gemini_key ${gemini_key}
  ```

* Qwen

  使用 [Qwen](https://www.aliyun.com/product/dashscope) 模型进行翻译，支持 qwen-mt-turbo 和 qwen-mt-plus 模型。

  使用 `--source_lang` 指定源语言，留空为自动检测。

  ```shell
  python3 make_book.py --book_name test_books/animal_farm.epub --qwen_key ${qwen_key} --model qwen-mt-turbo --language "Simplified Chinese"
  python3 make_book.py --book_name test_books/animal_farm.epub --qwen_key ${qwen_key} --model qwen-mt-plus --language "Japanese" --source_lang "English"
  ```

* 腾讯交互翻译

  ```shell
  python3 make_book.py --book_name test_books/animal_farm.epub --model tencentransmart
  ```

* [xAI](https://x.ai)

  ```shell
  python3 make_book.py --book_name test_books/animal_farm.epub --model xai --xai_key ${xai_key}
  ```

* [Ollama](https://github.com/ollama/ollama)

  使用 [Ollama](https://github.com/ollama/ollama) 自托管模型进行翻译。
  如果 ollama server 不运行在本地，使用 `--api_base http://x.x.x.x:port/v1` 指向 ollama server 地址

  ```shell
  python3 make_book.py --book_name test_books/animal_farm.epub --ollama_model ${ollama_model_name}
  ```

* [Groq](https://console.groq.com/keys)

  GroqCloud 当前支持的模型可以查看[Supported Models](https://console.groq.com/docs/models)

  ```shell
  python3 make_book.py --book_name test_books/animal_farm.epub --groq_key [your_key] --model groq --model_list llama3-8b-8192
  ```

## 使用说明

- 翻译完会生成一本 `{book_name}_bilingual.epub` 的双语书
- 如果出现了错误或使用 `CTRL+C` 中断命令，不想接下来继续翻译了，会生成一本 `{book_name}_bilingual_temp.epub` 的书，直接改成你想要的名字就可以了

## 参数说明

- `--test`:

  如果大家没付费可以加上这个先看看效果（有 limit 稍微有些慢）

- `--language`: 指定目标语言

  - 例如： `--language "Simplified Chinese"`，预设值为 `"Simplified Chinese"`.
  - 请阅读 helper message 来查找可用的目标语言： `python make_book.py --help`

- `--proxy`

  方便中国大陆的用户在本地测试时使用代理，传入类似 `http://127.0.0.1:7890` 的字符串

- `--resume`

  手动中断后，加入命令可以从之前中断的位置继续执行。

  ```shell
  python3 make_book.py --book_name test_books/animal_farm.epub --model google --resume
  ```

- `--translate-tags`

  指定需要翻译的标签，使用逗号分隔多个标签。epub 由 html 文件组成，默认情况下，只翻译 `<p>` 中的内容。例如: `--translate-tags h1,h2,h3,p,div`

- `--book_from`

  选项指定电子阅读器类型（现在只有 kobo 可用），并使用 `--device_path` 指定挂载点。

- `--api_base ${url}`

  如果你遇到了墙需要用 Cloudflare Workers 替换 api_base 请使用 `--api_base ${url}` 来替换。
  **请注意，此处你输入的 api 应该是'`https://xxxx/v1`'的字样，域名需要用引号包裹**

- `--allow_navigable_strings`

  如果你想要翻译电子书中的无标签字符串，可以使用 `--allow_navigable_strings` 参数，会将可遍历字符串加入翻译队列，**注意，在条件允许情况下，请寻找更规范的电子书**

- `--prompt`

  如果你想调整 prompt，你可以使用 `--prompt` 参数。有效的占位符包括 `{text}` 和 `{language}`。你可以用以下方式配置 prompt:

  - 如果您不需要设置 `system` 角色，可以这样：`--prompt "Translate {text} to {language}"` 或者 `--prompt prompt_template_sample.txt`（示例文本文件可以在 [./prompt_template_sample.txt](./prompt_template_sample.txt) 找到）。

  - 如果您需要设置 `system` 角色，可以使用以下方式配置：`--prompt '{"user":"Translate {text} to {language}", "system": "You are a professional translator."}'`，或者 `--prompt prompt_template_sample.json`（示例 JSON 文件可以在 [./prompt_template_sample.json](./prompt_template_sample.json) 找到）。

  - 你也可以用环境以下环境变量来配置 `system` 和 `user` 角色 prompt：`BBM_CHATGPTAPI_USER_MSG_TEMPLATE` 和 `BBM_CHATGPTAPI_SYS_MSG`。
  该参数可以是提示模板字符串，也可以是模板 `.txt` 文件的路径。

- `--batch_size`

  指定批量翻译的行数(默认行数为 10，目前只对 txt 生效)

- `--accumulated_num`:

  达到累计token数开始进行翻译。gpt3.5将total_token限制为4090。
  例如，如果您使用`--accumulation_num 1600`，则可能会输出2200个令牌，另外200个令牌用于系统指令（system_message）和用户指令（user_message），1600+2200+200 = 4000，所以token接近极限。你必须选择一个自己合适的值，我们无法在发送之前判断是否达到限制

- `--use_context`:

  prompts the model to create a three-paragraph summary. If it's the beginning of the translation, it will summarize the entire passage sent (the size depending on `--accumulated_num`).
  For subsequent passages, it will amend the summary to include details from the most recent passage, creating a running one-paragraph context payload of the important details of the entire translated work. This improves consistency of flow and tone throughout the translation. This option is available for all ChatGPT-compatible models and Gemini models.

  模型提示词将创建三段摘要。如果是翻译的开始，它将总结发送的整个段落（大小取决于`--accumulated_num`）。
  对于后续的段落，它将修改摘要，以包括最近段落的细节，创建一个完整的段落上下文负载，包含整个翻译作品的重要细节。 这提高了整个翻译过程中的流畅性和语气的一致性。 此选项适用于所有ChatGPT兼容型号和Gemini型号。

  - `--context_paragraph_limit`:

    使用`--use_context`选项时，使用`--context_paragraph_limit`设置上下文段落数限制。

- `--temperature`:

  使用 `--temperature` 设置 `chatgptapi`/`gpt4`/`claude`模型的temperature值.
  如 `--temperature 0.7`.

- `--block_size`:

  使用`--block_size`将多个段落合并到一个块中。这可能会提高准确性并加快处理速度，但可能会干扰原始格式。必须与`--single_translate`一起使用。
  例如：`--block_size 5 --single_translate`。

- `--single_translate`:

  使用`--single_translate`只输出翻译后的图书，不创建双语版本。

- `--translation_style`:

  如: `--translation_style "color: #808080; font-style: italic;"`

- `--retranslate "$translated_filepath" "file_name_in_epub" "start_str" "end_str"(optional)`:

  - 重新翻译，从 start_str 到 end_str 的标记:

  ```shell
  python3 "make_book.py" --book_name "test_books/animal_farm.epub" --retranslate 'test_books/animal_farm_bilingual.epub' 'index_split_002.html' 'in spite of the present book shortage which' 'This kind of thing is not a good symptom. Obviously'
  ```

  - 重新翻译, 从start_str 的标记开始:

  ```shell
  python3 "make_book.py" --book_name "test_books/animal_farm.epub" --retranslate 'test_books/animal_farm_bilingual.epub' 'index_split_002.html' 'in spite of the present book shortage which'
  ```

### 示范用例

**如果使用 `pip install bbook_maker` 以下命令都可以改成 `bbook args`**

```shell
# 如果你想快速测一下
python3 make_book.py --book_name test_books/animal_farm.epub --openai_key ${openai_key} --test

# 或翻译完整本书
python3 make_book.py --book_name test_books/animal_farm.epub --openai_key ${openai_key} --language zh-hans

# Or translate the whole book using Gemini
python3 make_book.py --book_name test_books/animal_farm.epub --gemini_key ${gemini_key} --model gemini

# 指定环境变量来略过 --openai_key
export OPENAI_API_KEY=${your_api_key}

# Use the DeepL model with Japanese
python3 make_book.py --book_name test_books/animal_farm.epub --model deepl --deepl_key ${deepl_key} --language ja

# Use the Claude model with Japanese
python3 make_book.py --book_name test_books/animal_farm.epub --model claude --claude_key ${claude_key} --language ja

# Use the CustomAPI model with Japanese
python3 make_book.py --book_name test_books/animal_farm.epub --model customapi --custom_api ${custom_api} --language ja

# Translate contents in <div> and <p>
python3 make_book.py --book_name test_books/animal_farm.epub --translate-tags div,p

# 修改prompt
python3 make_book.py --book_name test_books/animal_farm.epub --prompt prompt_template_sample.txt
# 或者
python3 make_book.py --book_name test_books/animal_farm.epub --prompt "Please translate \`{text}\` to {language}"
# 翻译 kobo e-reader 中，來自 Rakuten Kobo 的书籍
python3 make_book.py --book_from kobo --device_path /tmp/kobo

# 翻译 txt 文件
python3 make_book.py --book_name test_books/the_little_prince.txt --test
# 聚合多行翻译 txt 文件
python3 make_book.py --book_name test_books/the_little_prince.txt --test --batch_size 20


# 使用彩云小译翻译(彩云api目前只支持: 简体中文 <-> 英文， 简体中文 <-> 日语)
# 彩云提供了测试token（3975l6lr5pcbvidl6jl2）
# 你可以参考这个教程申请自己的token (https://bobtranslate.com/service/translate/caiyun.html)
python3 make_book.py --model caiyun --caiyun_key 3975l6lr5pcbvidl6jl2 --book_name test_books/animal_farm.epub
# 可以在环境变量中设置BBM_CAIYUN_API_KEY，略过--openai_key
export BBM_CAIYUN_API_KEY=${your_api_key}
```

更加小白的示例

```shell
python3 make_book.py --book_name 'animal_farm.epub' --openai_key sk-XXXXX --api_base 'https://xxxxx/v1'

# 有可能你不需要 python3 而是python
python make_book.py --book_name 'animal_farm.epub' --openai_key sk-XXXXX --api_base 'https://xxxxx/v1'
```

[演示视频](https://www.bilibili.com/video/BV1XX4y1d75D/?t=0h07m08s)
[演示视频 2](https://www.bilibili.com/video/BV1T8411c7iU/)

使用 Azure OpenAI service

```shell
python3 make_book.py --book_name 'animal_farm.epub' --openai_key XXXXX --api_base 'https://example-endpoint.openai.azure.com' --deployment_id 'deployment-name'

# Or python3 is not in your PATH
python make_book.py --book_name 'animal_farm.epub' --openai_key XXXXX --api_base 'https://example-endpoint.openai.azure.com' --deployment_id 'deployment-name'
```

## 注意

1. Free trail 的 API token 有所限制，如果想要更快的速度，可以考虑付费方案
2. 欢迎提交 PR

# 感谢

- @[yetone](https://github.com/yetone)

# 贡献

- 任何 issue PR 都欢迎
- Issue 中有些 TODO 没做的都可以选
- 提交代码前请先执行 `black make_book.py` [^black]

# 其它推荐项目

- 书译 BookTranslator -> [Book Translator](https://www.booktranslator.app)

## 赞赏

谢谢就够了

![image](https://user-images.githubusercontent.com/15976103/222407199-1ed8930c-13a8-402b-9993-aaac8ee84744.png)

[^token]: https://platform.openai.com/account/api-keys
[^black]: https://github.com/psf/black


================================================
FILE: README.md
================================================
**[中文](./README-CN.md) | English**
[![litellm](https://img.shields.io/badge/%20%F0%9F%9A%85%20liteLLM-OpenAI%7CAzure%7CAnthropic%7CPalm%7CCohere%7CReplicate%7CHugging%20Face-blue?color=green)](https://github.com/BerriAI/litellm)

# bilingual_book_maker

The bilingual_book_maker is an AI translation tool that uses ChatGPT to assist users in creating multi-language versions of epub/txt/srt/pdf files and books. This tool is exclusively designed for translating epub and other public domain works and is not intended for copyrighted works. Before using this tool, please review the project's **[disclaimer](./disclaimer.md)**.

![image](https://user-images.githubusercontent.com/15976103/222317531-a05317c5-4eee-49de-95cd-04063d9539d9.png)

## Supported Models

gpt-5-mini, gpt-4, gpt-3.5-turbo, claude-2, palm, llama-2, azure-openai, command-nightly, gemini, qwen-mt-turbo, qwen-mt-plus
For using Non-OpenAI models, use class `liteLLM()` - liteLLM supports all models above.
Find more info here for using liteLLM: https://github.com/BerriAI/litellm/blob/main/setup.py

## Preparation

1. ChatGPT or OpenAI token [^token]
2. epub/txt/pdf books
3. Environment with internet access or proxy
4. Python 3.8+

## Quick Start

A sample book, `test_books/animal_farm.epub`, is provided for testing purposes.

```shell
pip install -r requirements.txt
python3 make_book.py --book_name test_books/animal_farm.epub --openai_key ${openai_key} --test
OR
pip install -U bbook_maker
bbook --book_name test_books/animal_farm.epub --openai_key ${openai_key} --test
```

## Translate Service

- Use `--openai_key` option to specify OpenAI API key. If you have multiple keys, separate them by commas (xxx,xxx,xxx) to reduce errors caused by API call limits.
  Or, just set environment variable `BBM_OPENAI_API_KEY` instead.
- A sample book, `test_books/animal_farm.epub`, is provided for testing purposes.
- The default underlying model is [GPT-3.5-turbo](https://openai.com/blog/introducing-chatgpt-and-whisper-apis), which is used by ChatGPT currently. Use `--model gpt4` to change the underlying model to `GPT4`. You can also use `GPT4omini`.
- Important to note that `gpt-4` is significantly more expensive than `gpt-4-turbo`, but to avoid bumping into rate limits, we automatically balance queries across `gpt-4-1106-preview`, `gpt-4`, `gpt-4-32k`, `gpt-4-0613`,`gpt-4-32k-0613`.
- If you want to use a specific model alias with OpenAI (eg `gpt-4-1106-preview` or `gpt-3.5-turbo-0125`), you can use `--model openai --model_list gpt-4-1106-preview,gpt-3.5-turbo-0125`. `--model_list` takes a comma-separated list of model aliases.
- If using chatgptapi, you can add `--use_context` to add a context paragraph to each passage sent to the model for translation (see below).

* DeepL
  Support DeepL model [DeepL Translator](https://rapidapi.com/splintPRO/api/dpl-translator) need pay to get the token

  ```
  python3 make_book.py --book_name test_books/animal_farm.epub --model deepl --deepl_key ${deepl_key}
  ```

* DeepL free

  ```shell
  python3 make_book.py --book_name test_books/animal_farm.epub --model deeplfree
  ```

* [Claude](https://console.anthropic.com/docs)

  Use [Claude](https://console.anthropic.com/docs) model to translate

  ```shell
  python3 make_book.py --book_name test_books/animal_farm.epub --model claude --claude_key ${claude_key}
  ```

* Google Translate

  ```shell
  python3 make_book.py --book_name test_books/animal_farm.epub --model google
  ```

* Caiyun Translate

  ```shell
  python3 make_book.py --book_name test_books/animal_farm.epub --model caiyun --caiyun_key ${caiyun_key}
  ```

* Gemini

  Support Google [Gemini](https://aistudio.google.com/app/apikey) model, use `--model gemini` for Gemini Flash or `--model geminipro` for Gemini Pro.
  If you want to use a specific model alias with Gemini (eg `gemini-1.5-flash-002` or `gemini-1.5-flash-8b-exp-0924`), you can use `--model gemini --model_list gemini-1.5-flash-002,gemini-1.5-flash-8b-exp-0924`. `--model_list` takes a comma-separated list of model aliases.

  ```shell
  python3 make_book.py --book_name test_books/animal_farm.epub --model gemini --gemini_key ${gemini_key}
  ```

* Qwen

  Support Alibaba Cloud [Qwen-MT](https://bailian.console.aliyun.com/) specialized translation model. Supports 92 languages with features like terminology intervention and translation memory.
  Use `--model qwen-mt-turbo` for faster/cheaper translation, or `--model qwen-mt-plus` for higher quality.

  Use `source_lang` to specify the source language explicitly, or leave it empty for auto-detection.

  ```shell
  python3 make_book.py --book_name test_books/animal_farm.epub --qwen_key ${qwen_key} --model qwen-mt-turbo --language "Simplified Chinese"
  python3 make_book.py --book_name test_books/animal_farm.epub --qwen_key ${qwen_key} --model qwen-mt-plus --language "Japanese" --source_lang "English"
  ```

* [Tencent TranSmart](https://transmart.qq.com)

  ```shell
  python3 make_book.py --book_name test_books/animal_farm.epub --model tencentransmart
  ```

* [xAI](https://x.ai)

  ```shell
  python3 make_book.py --book_name test_books/animal_farm.epub --model xai --xai_key ${xai_key}
  ```

* [Ollama](https://github.com/ollama/ollama)

  Support [Ollama](https://github.com/ollama/ollama) self-host models,
  If ollama server is not running on localhost, use `--api_base http://x.x.x.x:port/v1` to point to the ollama server address

  ```shell
  python3 make_book.py --book_name test_books/animal_farm.epub --ollama_model ${ollama_model_name}
  ```

* [groq](https://console.groq.com/keys)

  GroqCloud currently supports models: you can find from [Supported Models](https://console.groq.com/docs/models)

  ```shell
  python3 make_book.py --book_name test_books/animal_farm.epub --groq_key [your_key] --model groq --model_list llama3-8b-8192
  ```

## Use

- Once the translation is complete, a bilingual book named `${book_name}_bilingual.epub` would be generated for EPUB inputs; for TXT/MD/SRT inputs a bilingual text (or subtitle) file named `${book_name}_bilingual.txt` (or `_bilingual.srt`) will be generated. For **PDF inputs** the tool will produce a bilingual `.txt` fallback and will also attempt to create `${book_name}_bilingual.epub` — if EPUB creation fails, the TXT fallback remains so you do not need to retranslate.
- If there are any errors or you wish to interrupt the translation by pressing `CTRL+C`, a temporary bilingual file (for example `{book_name}_bilingual_temp.epub` or `{book_name}_bilingual_temp.txt`) would be generated. You can simply rename it to any desired name.

## Params

- `--test`:

  Use `--test` option to preview the result if you haven't paid for the service. Note that there is a limit and it may take some time.

- `--language`:

  Set the target language like `--language "Simplified Chinese"`. Default target language is `"Simplified Chinese"`.
  Read available languages by helper message: `python make_book.py --help`

- `--proxy`:

  Use `--proxy` option to specify proxy server for internet access. Enter a string such as `http://127.0.0.1:7890`.

- `--resume`:

  Use `--resume` option to manually resume the process after an interruption.

  ```shell
  python3 make_book.py --book_name test_books/animal_farm.epub --model google --resume
  ```

- `--translate-tags`:

  epub is made of html files. By default, we only translate contents in `<p>`.
  Use `--translate-tags` to specify tags need for translation. Use comma to separate multiple tags.
  For example: `--translate-tags h1,h2,h3,p,div`

- `--book_from`:

  Use `--book_from` option to specify e-reader type (Now only `kobo` is available), and use `--device_path` to specify the mounting point.

- `--api_base`:

  If you want to change api_base like using Cloudflare Workers, use `--api_base <URL>` to support it.
  **Note: the api url should be '`https://xxxx/v1`'. Quotation marks are required.**

- `--allow_navigable_strings`:

  If you want to translate strings in an e-book that aren't labeled with any tags, you can use the `--allow_navigable_strings` parameter. This will add the strings to the translation queue. **Note that it's best to look for e-books that are more standardized if possible.**

- `--prompt`:

  To tweak the prompt, use the `--prompt` parameter. Valid placeholders for the `user` role template include `{text}` and `{language}`. It supports a few ways to configure the prompt:

  - If you don't need to set the `system` role content, you can simply set it up like this: `--prompt "Translate {text} to {language}."` or `--prompt prompt_template_sample.txt` (example of a text file can be found at [./prompt_template_sample.txt](./prompt_template_sample.txt)).

  - If you need to set the `system` role content, you can use the following format: `--prompt '{"user":"Translate {text} to {language}", "system": "You are a professional translator."}'` or `--prompt prompt_template_sample.json` (example of a JSON file can be found at [./prompt_template_sample.json](./prompt_template_sample.json)).
  
  - You can now use [PromptDown](https://github.com/btfranklin/promptdown) format (`.md` files) for more structured prompts: `--prompt prompt_md.prompt.md`. PromptDown supports both traditional system messages and developer messages (used by newer AI models). Example:
  
      ```markdown
      # Translation Prompt
      
      ## Developer Message
      You are a professional translator who specializes in accurate translations.
      
      ## Conversation
      
      | Role | Content                                                        |
      | ---- | -------------------------------------------------------------- |
      | User | Please translate the following text into {language}:\n\n{text} |
      ```

  - You can also set the `user` and `system` role prompt by setting environment variables: `BBM_CHATGPTAPI_USER_MSG_TEMPLATE` and `BBM_CHATGPTAPI_SYS_MSG`.

- `--batch_size`:

  Use the `--batch_size` parameter to specify the number of lines for batch translation (default is 10, currently only effective for txt files).

- `--accumulated_num`:

  Wait for how many tokens have been accumulated before starting the translation. gpt3.5 limits the total_token to 4090. For example, if you use `--accumulated_num 1600`, maybe openai will output 2200 tokens and maybe 200 tokens for other messages in the system messages user messages, 1600+2200+200=4000, So you are close to reaching the limit. You have to choose your own
  value, there is no way to know if the limit is reached before sending

- `--use_context`:

  prompts the model to create a three-paragraph summary. If it's the beginning of the translation, it will summarize the entire passage sent (the size depending on `--accumulated_num`).
  For subsequent passages, it will amend the summary to include details from the most recent passage, creating a running one-paragraph context payload of the important details of the entire translated work. This improves consistency of flow and tone throughout the translation. This option is available for all ChatGPT-compatible models and Gemini models.

- `--context_paragraph_limit`:

  Use `--context_paragraph_limit` to set a limit on the number of context paragraphs when using the `--use_context` option.

- `--parallel-workers`:

  Use `--parallel-workers` to enable parallel EPUB chapter processing. Values greater than `1` spin up multiple workers (recommended: `2-4`) and automatically fall back to sequential mode for single-chapter books.

- `--temperature`:

  Use `--temperature` to set the temperature parameter for `chatgptapi`/`gpt4`/`claude` models.
  For example: `--temperature 0.7`.

- `--block_size`:

  Use `--block_size` to merge multiple paragraphs into one block. This may increase accuracy and speed up the process but can disturb the original format. Must be used with `--single_translate`.
  For example: `--block_size 5 --single_translate`.

- `--single_translate`:

  Use `--single_translate` to output only the translated book without creating a bilingual version.

- `--translation_style`:

  example: `--translation_style "color: #808080; font-style: italic;"`

- `--retranslate "$translated_filepath" "file_name_in_epub" "start_str" "end_str"(optional)`:

  Retranslate from start_str to end_str's tag:

  ```shell
  python3 "make_book.py" --book_name "test_books/animal_farm.epub" --retranslate 'test_books/animal_farm_bilingual.epub' 'index_split_002.html' 'in spite of the present book shortage which' 'This kind of thing is not a good symptom. Obviously'
  ```

  Retranslate start_str's tag:

  ```shell
  python3 "make_book.py" --book_name "test_books/animal_farm.epub" --retranslate 'test_books/animal_farm_bilingual.epub' 'index_split_002.html' 'in spite of the present book shortage which'
  ```

### Examples

**Note if use `pip install bbook_maker` all commands can change to `bbook_maker args`**

```shell
# Test quickly
python3 make_book.py --book_name test_books/animal_farm.epub --openai_key ${openai_key}  --test --language zh-hans

# Test quickly for src
python3 make_book.py --book_name test_books/Lex_Fridman_episode_322.srt --openai_key ${openai_key}  --test

# Or translate the whole book
python3 make_book.py --book_name test_books/animal_farm.epub --openai_key ${openai_key} --language zh-hans

# Or translate the whole book using Gemini flash
python3 make_book.py --book_name test_books/animal_farm.epub --gemini_key ${gemini_key} --model gemini

# Translate an EPUB with parallel chapter processing
python3 make_book.py --book_name test_books/animal_farm.epub --openai_key ${openai_key} --parallel-workers 4

# Use a specific list of Gemini model aliases
python3 make_book.py --book_name test_books/animal_farm.epub --gemini_key ${gemini_key} --model gemini --model_list gemini-1.5-flash-002,gemini-1.5-flash-8b-exp-0924

# Set env OPENAI_API_KEY to ignore option --openai_key
export OPENAI_API_KEY=${your_api_key}

# Use the GPT-4 model with context to Japanese
python3 make_book.py --book_name test_books/animal_farm.epub --model gpt4 --use_context --language ja

# Use a specific OpenAI model alias
python3 make_book.py --book_name test_books/animal_farm.epub --model openai --model_list gpt-4-1106-preview --openai_key ${openai_key}

**Note** you can use other `openai like` model in this way
python3 make_book.py --book_name test_books/animal_farm.epub --model openai --model_list yi-34b-chat-0205 --openai_key ${openai_key} --api_base "https://api.lingyiwanwu.com/v1"

# Use a specific list of OpenAI model aliases
python3 make_book.py --book_name test_books/animal_farm.epub --model openai --model_list gpt-4-1106-preview,gpt-4-0125-preview,gpt-3.5-turbo-0125 --openai_key ${openai_key}

# Use the DeepL model with Japanese
python3 make_book.py --book_name test_books/animal_farm.epub --model deepl --deepl_key ${deepl_key} --language ja

# Use the Claude model with Japanese
python3 make_book.py --book_name test_books/animal_farm.epub --model claude --claude_key ${claude_key} --language ja

# Use the CustomAPI model with Japanese
python3 make_book.py --book_name test_books/animal_farm.epub --model customapi --custom_api ${custom_api} --language ja

# Translate contents in <div> and <p>
python3 make_book.py --book_name test_books/animal_farm.epub --translate-tags div,p

# Tweaking the prompt
python3 make_book.py --book_name test_books/animal_farm.epub --prompt prompt_template_sample.txt
# or
python3 make_book.py --book_name test_books/animal_farm.epub --prompt prompt_template_sample.json
# or
python3 make_book.py --book_name test_books/animal_farm.epub --prompt "Please translate \`{text}\` to {language}"

# Translate books download from Rakuten Kobo on kobo e-reader
python3 make_book.py --book_from kobo --device_path /tmp/kobo

# translate txt file
python3 make_book.py --book_name test_books/the_little_prince.txt --test --language zh-hans
# aggregated translation txt file
python3 make_book.py --book_name test_books/the_little_prince.txt --test --batch_size 20

# Using Caiyun model to translate
# (the api currently only support: simplified chinese <-> english, simplified chinese <-> japanese)
# the official Caiyun has provided a test token (3975l6lr5pcbvidl6jl2)
# you can apply your own token by following this tutorial(https://bobtranslate.com/service/translate/caiyun.html)
python3 make_book.py --model caiyun --caiyun_key 3975l6lr5pcbvidl6jl2 --book_name test_books/animal_farm.epub


# Set env BBM_CAIYUN_API_KEY to ignore option --openai_key
export BBM_CAIYUN_API_KEY=${your_api_key}

```

More understandable example

```shell
python3 make_book.py --book_name 'animal_farm.epub' --openai_key sk-XXXXX --api_base 'https://xxxxx/v1'

# Or python3 is not in your PATH
python make_book.py --book_name 'animal_farm.epub' --openai_key sk-XXXXX --api_base 'https://xxxxx/v1'
```

Microsoft Azure Endpoints

```shell
python3 make_book.py --book_name 'animal_farm.epub' --openai_key XXXXX --api_base 'https://example-endpoint.openai.azure.com' --deployment_id 'deployment-name'

# Or python3 is not in your PATH
python make_book.py --book_name 'animal_farm.epub' --openai_key XXXXX --api_base 'https://example-endpoint.openai.azure.com' --deployment_id 'deployment-name'
```

## Docker

You can use [Docker](https://www.docker.com/) if you don't want to deal with setting up the environment.

```shell
# Build image
docker build --tag bilingual_book_maker .

# Run container
# "$folder_path" represents the folder where your book file locates. Also, it is where the processed file will be stored.

# Windows PowerShell
$folder_path=your_folder_path # $folder_path="C:\Users\user\mybook\"
$book_name=your_book_name # $book_name="animal_farm.epub"
$openai_key=your_api_key # $openai_key="sk-xxx"
$language=your_language # see utils.py

docker run --rm --name bilingual_book_maker --mount type=bind,source=$folder_path,target='/app/test_books' bilingual_book_maker --book_name "/app/test_books/$book_name" --openai_key $openai_key --language $language

# Linux
export folder_path=${your_folder_path}
export book_name=${your_book_name}
export openai_key=${your_api_key}
export language=${your_language}

docker run --rm --name bilingual_book_maker --mount type=bind,source=${folder_path},target='/app/test_books' bilingual_book_maker --book_name "/app/test_books/${book_name}" --openai_key ${openai_key} --language "${language}"
```

For example:

```shell
# Linux
docker run --rm --name bilingual_book_maker --mount type=bind,source=/home/user/my_books,target='/app/test_books' bilingual_book_maker --book_name /app/test_books/animal_farm.epub --openai_key sk-XXX --test --test_num 1 --language zh-hant
```

## Notes

1. API token from free trial has limit. If you want to speed up the process, consider paying for the service or use multiple OpenAI tokens
2. PR is welcome

# Thanks

- @[yetone](https://github.com/yetone)

# Contribution

- Any issues or PRs are welcome.
- TODOs in the issue can also be selected.
- Please run `black make_book.py`[^black] before submitting the code.

# Others better

- 书译 BookTranslator -> [Book Translator](https://www.booktranslator.app)

## Appreciation

Thank you, that's enough.

![image](https://user-images.githubusercontent.com/15976103/222407199-1ed8930c-13a8-402b-9993-aaac8ee84744.png)

[^token]: https://platform.openai.com/account/api-keys
[^black]: https://github.com/psf/black


================================================
FILE: book_maker/__init__.py
================================================


================================================
FILE: book_maker/__main__.py
================================================
from cli import main

if __name__ == "__main__":
    main()


================================================
FILE: book_maker/cli.py
================================================
import argparse
import json
import os
from os import environ as env

from book_maker.loader import BOOK_LOADER_DICT
from book_maker.translator import MODEL_DICT
from book_maker.utils import LANGUAGES, TO_LANGUAGE_CODE


def parse_prompt_arg(prompt_arg):
    prompt = None
    if prompt_arg is None:
        return prompt

    # Check if it's a path to a markdown file (PromptDown format)
    if prompt_arg.endswith(".md") and os.path.exists(prompt_arg):
        try:
            from promptdown import StructuredPrompt

            structured_prompt = StructuredPrompt.from_promptdown_file(prompt_arg)

            # Initialize our prompt structure
            prompt = {}

            # Handle developer_message or system_message
            # Developer message takes precedence if both are present
            if (
                hasattr(structured_prompt, "developer_message")
                and structured_prompt.developer_message
            ):
                prompt["system"] = structured_prompt.developer_message
            elif (
                hasattr(structured_prompt, "system_message")
                and structured_prompt.system_message
            ):
                prompt["system"] = structured_prompt.system_message

            # Extract user message from conversation
            if (
                hasattr(structured_prompt, "conversation")
                and structured_prompt.conversation
            ):
                for message in structured_prompt.conversation:
                    if message.role.lower() == "user":
                        prompt["user"] = message.content
                        break

            # Ensure we found a user message
            if "user" not in prompt or not prompt["user"]:
                raise ValueError(
                    "PromptDown file must contain at least one user message"
                )

            print(f"Successfully loaded PromptDown file: {prompt_arg}")

            # Validate required placeholders
            if any(c not in prompt["user"] for c in ["{text}"]):
                raise ValueError(
                    "User message in PromptDown must contain `{text}` placeholder"
                )

            return prompt
        except Exception as e:
            print(f"Error parsing PromptDown file: {e}")
            # Fall through to other parsing methods

    # Existing parsing logic for JSON strings and other formats
    if not any(prompt_arg.endswith(ext) for ext in [".json", ".txt", ".md"]):
        try:
            # user can define prompt by passing a json string
            # eg: --prompt '{"system": "You are a professional translator who translates computer technology books", "user": "Translate \`{text}\` to {language}"}'
            prompt = json.loads(prompt_arg)
        except json.JSONDecodeError:
            # if not a json string, treat it as a template string
            prompt = {"user": prompt_arg}

    elif os.path.exists(prompt_arg):
        if prompt_arg.endswith(".txt"):
            # if it's a txt file, treat it as a template string
            with open(prompt_arg, encoding="utf-8") as f:
                prompt = {"user": f.read()}
        elif prompt_arg.endswith(".json"):
            # if it's a json file, treat it as a json object
            # eg: --prompt prompt_template_sample.json
            with open(prompt_arg, encoding="utf-8") as f:
                prompt = json.load(f)
    else:
        raise FileNotFoundError(f"{prompt_arg} not found")

    # if prompt is None or any(c not in prompt["user"] for c in ["{text}", "{language}"]):
    if prompt is None or any(c not in prompt["user"] for c in ["{text}"]):
        raise ValueError("prompt must contain `{text}`")

    if "user" not in prompt:
        raise ValueError("prompt must contain the key of `user`")

    if (prompt.keys() - {"user", "system"}) != set():
        raise ValueError("prompt can only contain the keys of `user` and `system`")

    print("prompt config:", prompt)
    return prompt


def main():
    translate_model_list = list(MODEL_DICT.keys())
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--book_name",
        dest="book_name",
        type=str,
        help="path of the epub file to be translated",
    )
    parser.add_argument(
        "--book_from",
        dest="book_from",
        type=str,
        choices=["kobo"],  # support kindle later
        metavar="E-READER",
        help="e-reader type, available: {%(choices)s}",
    )
    parser.add_argument(
        "--device_path",
        dest="device_path",
        type=str,
        help="Path of e-reader device",
    )
    ########## KEYS ##########
    parser.add_argument(
        "--openai_key",
        dest="openai_key",
        type=str,
        default="",
        help="OpenAI api key,if you have more than one key, please use comma"
        " to split them to go beyond the rate limits",
    )
    parser.add_argument(
        "--caiyun_key",
        dest="caiyun_key",
        type=str,
        help="you can apply caiyun key from here (https://dashboard.caiyunapp.com/user/sign_in/)",
    )
    parser.add_argument(
        "--deepl_key",
        dest="deepl_key",
        type=str,
        help="you can apply deepl key from here (https://rapidapi.com/splintPRO/api/dpl-translator",
    )
    parser.add_argument(
        "--claude_key",
        dest="claude_key",
        type=str,
        help="you can find claude key from here (https://console.anthropic.com/account/keys)",
    )

    parser.add_argument(
        "--custom_api",
        dest="custom_api",
        type=str,
        help="you should build your own translation api",
    )

    # for Google Gemini
    parser.add_argument(
        "--gemini_key",
        dest="gemini_key",
        type=str,
        help="You can get Gemini Key from  https://makersuite.google.com/app/apikey",
    )

    # for Groq
    parser.add_argument(
        "--groq_key",
        dest="groq_key",
        type=str,
        help="You can get Groq Key from  https://console.groq.com/keys",
    )

    # for xAI
    parser.add_argument(
        "--xai_key",
        dest="xai_key",
        type=str,
        help="You can get xAI Key from  https://console.x.ai/",
    )

    # for Qwen
    parser.add_argument(
        "--qwen_key",
        dest="qwen_key",
        type=str,
        help="You can get Qwen Key from  https://bailian.console.aliyun.com/?tab=model#/api-key",
    )

    parser.add_argument(
        "--test",
        dest="test",
        action="store_true",
        help="only the first 10 paragraphs will be translated, for testing",
    )
    parser.add_argument(
        "--test_num",
        dest="test_num",
        type=int,
        default=10,
        help="how many paragraphs will be translated for testing",
    )
    parser.add_argument(
        "-m",
        "--model",
        dest="model",
        type=str,
        default="chatgptapi",
        choices=translate_model_list,  # support DeepL later
        metavar="MODEL",
        help="model to use, available: {%(choices)s}",
    )
    parser.add_argument(
        "--ollama_model",
        dest="ollama_model",
        type=str,
        default="",
        metavar="MODEL",
        help="use ollama",
    )
    parser.add_argument(
        "--language",
        type=str,
        choices=sorted(LANGUAGES.keys())
        + sorted([k.title() for k in TO_LANGUAGE_CODE]),
        default="zh-hans",
        metavar="LANGUAGE",
        help="language to translate to, available: {%(choices)s}",
    )
    parser.add_argument(
        "--resume",
        dest="resume",
        action="store_true",
        help="if program stop unexpected you can use this to resume",
    )
    parser.add_argument(
        "-p",
        "--proxy",
        dest="proxy",
        type=str,
        default="",
        help="use proxy like http://127.0.0.1:7890",
    )
    parser.add_argument(
        "--deployment_id",
        dest="deployment_id",
        type=str,
        help="the deployment name you chose when you deployed the model",
    )
    # args to change api_base
    parser.add_argument(
        "--api_base",
        metavar="API_BASE_URL",
        dest="api_base",
        type=str,
        help="specify base url other than the OpenAI's official API address",
    )
    parser.add_argument(
        "--exclude_filelist",
        dest="exclude_filelist",
        type=str,
        default="",
        help="if you have more than one file to exclude, please use comma to split them, example: --exclude_filelist 'nav.xhtml,cover.xhtml'",
    )
    parser.add_argument(
        "--only_filelist",
        dest="only_filelist",
        type=str,
        default="",
        help="if you only have a few files with translations, please use comma to split them, example: --only_filelist 'nav.xhtml,cover.xhtml'",
    )
    parser.add_argument(
        "--translate-tags",
        dest="translate_tags",
        type=str,
        default="p",
        help="example --translate-tags p,blockquote",
    )
    parser.add_argument(
        "--exclude_translate-tags",
        dest="exclude_translate_tags",
        type=str,
        default="sup",
        help="example --exclude_translate-tags table,sup",
    )
    parser.add_argument(
        "--allow_navigable_strings",
        dest="allow_navigable_strings",
        action="store_true",
        default=False,
        help="allow NavigableStrings to be translated",
    )
    parser.add_argument(
        "--prompt",
        dest="prompt_arg",
        type=str,
        metavar="PROMPT_ARG",
        help="used for customizing the prompt. It can be the prompt template string, or a path to the template file. The valid placeholders are `{text}` and `{language}`.",
    )
    parser.add_argument(
        "--accumulated_num",
        dest="accumulated_num",
        type=int,
        default=1,
        help="""Wait for how many tokens have been accumulated before starting the translation.
gpt3.5 limits the total_token to 4090.
For example, if you use --accumulated_num 1600, maybe openai will output 2200 tokens
and maybe 200 tokens for other messages in the system messages user messages, 1600+2200+200=4000,
So you are close to reaching the limit. You have to choose your own value, there is no way to know if the limit is reached before sending
""",
    )
    parser.add_argument(
        "--translation_style",
        dest="translation_style",
        type=str,
        help="""ex: --translation_style "color: #808080; font-style: italic;" """,
    )
    parser.add_argument(
        "--batch_size",
        dest="batch_size",
        type=int,
        help="how many lines will be translated by aggregated translation(This options currently only applies to txt files)",
    )
    parser.add_argument(
        "--retranslate",
        dest="retranslate",
        nargs=4,
        type=str,
        help="""--retranslate "$translated_filepath" "file_name_in_epub" "start_str" "end_str"(optional)
        Retranslate from start_str to end_str's tag:
        python3 "make_book.py" --book_name "test_books/animal_farm.epub" --retranslate 'test_books/animal_farm_bilingual.epub' 'index_split_002.html' 'in spite of the present book shortage which' 'This kind of thing is not a good symptom. Obviously'
        Retranslate start_str's tag:
        python3 "make_book.py" --book_name "test_books/animal_farm.epub" --retranslate 'test_books/animal_farm_bilingual.epub' 'index_split_002.html' 'in spite of the present book shortage which'
""",
    )
    parser.add_argument(
        "--single_translate",
        action="store_true",
        help="output translated book, no bilingual",
    )
    parser.add_argument(
        "--use_context",
        dest="context_flag",
        action="store_true",
        help="adds an additional paragraph for global, updating historical context of the story to the model's input, improving the narrative consistency for the AI model (this uses ~200 more tokens each time)",
    )
    parser.add_argument(
        "--context_paragraph_limit",
        dest="context_paragraph_limit",
        type=int,
        default=0,
        help="if use --use_context, set context paragraph limit",
    )
    parser.add_argument(
        "--temperature",
        type=float,
        default=1.0,
        help="temperature parameter for `chatgptapi`/`gpt4`/`gpt4omini`/`gpt4o`/`gpt5mini`/`claude`/`gemini`",
    )
    parser.add_argument(
        "--source_lang",
        type=str,
        default="auto",
        help="source language for translation models like `qwen` (default: auto-detect)",
    )
    parser.add_argument(
        "--block_size",
        type=int,
        default=-1,
        help="merge multiple paragraphs into one block, may increase accuracy and speed up the process, but disturb the original format, must be used with `--single_translate`",
    )
    parser.add_argument(
        "--model_list",
        type=str,
        dest="model_list",
        help="Rather than using our preset lists of models, specify exactly the models you want as a comma separated list `gpt-4-32k,gpt-3.5-turbo-0125` (Currently only supports: `openai`)",
    )
    parser.add_argument(
        "--batch",
        dest="batch_flag",
        action="store_true",
        help="Enable batch translation using ChatGPT's batch API for improved efficiency",
    )
    parser.add_argument(
        "--batch-use",
        dest="batch_use_flag",
        action="store_true",
        help="Use pre-generated batch translations to create files. Run with --batch first before using this option",
    )
    parser.add_argument(
        "--interval",
        type=float,
        default=0.01,
        help="Request interval in seconds (e.g., 0.1 for 100ms). Currently only supported for Gemini models. Default: 0.01",
    )
    parser.add_argument(
        "--parallel-workers",
        dest="parallel_workers",
        type=int,
        default=1,
        help="Number of parallel workers for EPUB chapter processing. Use 2-4 for better performance. Default: 1",
    )

    options = parser.parse_args()

    if not options.book_name:
        print("Error: please provide the path of your book using --book_name <path>")
        exit(1)
    if not os.path.isfile(options.book_name):
        print(f"Error: the book {options.book_name!r} does not exist.")
        exit(1)

    PROXY = options.proxy
    if PROXY != "":
        os.environ["http_proxy"] = PROXY
        os.environ["https_proxy"] = PROXY

    translate_model = MODEL_DICT.get(options.model)
    assert translate_model is not None, "unsupported model"
    API_KEY = ""
    if options.model in [
        "openai",
        "chatgptapi",
        "gpt4",
        "gpt4omini",
        "gpt4o",
        "gpt5mini",
        "o1preview",
        "o1",
        "o1mini",
        "o3mini",
    ]:
        if OPENAI_API_KEY := (
            options.openai_key
            or env.get(
                "OPENAI_API_KEY",
            )  # XXX: for backward compatibility, deprecate soon
            or env.get(
                "BBM_OPENAI_API_KEY",
            )  # suggest adding `BBM_` prefix for all the bilingual_book_maker ENVs.
        ):
            API_KEY = OPENAI_API_KEY
            # patch
        elif options.ollama_model:
            # any string is ok, can't be empty
            API_KEY = "ollama"
        else:
            raise Exception(
                "OpenAI API key not provided, please google how to obtain it",
            )
    elif options.model == "caiyun":
        API_KEY = options.caiyun_key or env.get("BBM_CAIYUN_API_KEY")
        if not API_KEY:
            raise Exception("Please provide caiyun key")
    elif options.model == "deepl":
        API_KEY = options.deepl_key or env.get("BBM_DEEPL_API_KEY")
        if not API_KEY:
            raise Exception("Please provide deepl key")
    elif options.model.startswith("claude"):
        API_KEY = options.claude_key or env.get("BBM_CLAUDE_API_KEY")
        if not API_KEY:
            raise Exception("Please provide claude key")
    elif options.model == "customapi":
        API_KEY = options.custom_api or env.get("BBM_CUSTOM_API")
        if not API_KEY:
            raise Exception("Please provide custom translate api")
    elif options.model in ["gemini", "geminipro"]:
        API_KEY = options.gemini_key or env.get("BBM_GOOGLE_GEMINI_KEY")
    elif options.model == "groq":
        API_KEY = options.groq_key or env.get("BBM_GROQ_API_KEY")
    elif options.model == "xai":
        API_KEY = options.xai_key or env.get("BBM_XAI_API_KEY")
    elif options.model.startswith("qwen-"):
        API_KEY = options.qwen_key or env.get("BBM_QWEN_API_KEY")
    else:
        API_KEY = ""

    if options.book_from == "kobo":
        from book_maker import obok

        device_path = options.device_path
        if device_path is None:
            raise Exception(
                "Device path is not given, please specify the path by --device_path <DEVICE_PATH>",
            )
        options.book_name = obok.cli_main(device_path)

    book_type = options.book_name.split(".")[-1]
    support_type_list = list(BOOK_LOADER_DICT.keys())
    if book_type not in support_type_list:
        raise Exception(
            f"now only support files of these formats: {','.join(support_type_list)}",
        )

    if options.block_size > 0 and not options.single_translate:
        raise Exception(
            "block_size must be used with `--single_translate` because it disturbs the original format",
        )

    book_loader = BOOK_LOADER_DICT.get(book_type)
    assert book_loader is not None, "unsupported loader"
    language = options.language
    if options.language in LANGUAGES:
        # use the value for prompt
        language = LANGUAGES.get(language, language)

    # change api_base for issue #42
    model_api_base = options.api_base

    if options.ollama_model and not model_api_base:
        # ollama default api_base
        model_api_base = "http://localhost:11434/v1"

    e = book_loader(
        options.book_name,
        translate_model,
        API_KEY,
        options.resume,
        language=language,
        model_api_base=model_api_base,
        is_test=options.test,
        test_num=options.test_num,
        prompt_config=parse_prompt_arg(options.prompt_arg),
        single_translate=options.single_translate,
        context_flag=options.context_flag,
        context_paragraph_limit=options.context_paragraph_limit,
        temperature=options.temperature,
        source_lang=options.source_lang,
        parallel_workers=options.parallel_workers,
    )
    # other options
    if options.allow_navigable_strings:
        e.allow_navigable_strings = True
    if options.translate_tags:
        e.translate_tags = options.translate_tags
    if options.exclude_translate_tags:
        e.exclude_translate_tags = options.exclude_translate_tags
    if options.exclude_filelist:
        e.exclude_filelist = options.exclude_filelist
    if options.only_filelist:
        e.only_filelist = options.only_filelist
    if options.accumulated_num > 1:
        e.accumulated_num = options.accumulated_num
    if options.translation_style:
        e.translation_style = options.translation_style
    if options.batch_size:
        e.batch_size = options.batch_size
    if options.retranslate:
        e.retranslate = options.retranslate
    if options.deployment_id:
        # only work for ChatGPT api for now
        # later maybe support others
        assert options.model in [
            "chatgptapi",
            "gpt4",
            "gpt4omini",
            "gpt4o",
            "gpt5mini",
            "o1",
            "o1preview",
            "o1mini",
            "o3mini",
        ], "only support chatgptapi for deployment_id"
        if not options.api_base:
            raise ValueError("`api_base` must be provided when using `deployment_id`")
        e.translate_model.set_deployment_id(options.deployment_id)
    if options.model in ("openai", "groq"):
        # Currently only supports `openai` when you also have --model_list set
        if options.model_list:
            e.translate_model.set_model_list(options.model_list.split(","))
        else:
            raise ValueError(
                "When using `openai` model, you must also provide `--model_list`. For default model sets use `--model chatgptapi` or `--model gpt4` or `--model gpt4omini` or `--model gpt5mini`",
            )
    # TODO refactor, quick fix for gpt4 model
    if options.model == "chatgptapi":
        if options.ollama_model:
            e.translate_model.set_gpt35_models(ollama_model=options.ollama_model)
        else:
            e.translate_model.set_gpt35_models()
    if options.model == "gpt4":
        e.translate_model.set_gpt4_models()
    if options.model == "gpt4omini":
        e.translate_model.set_gpt4omini_models()
    if options.model == "gpt4o":
        e.translate_model.set_gpt4o_models()
    if options.model == "gpt5mini":
        e.translate_model.set_gpt5mini_models()
    if options.model == "o1preview":
        e.translate_model.set_o1preview_models()
    if options.model == "o1":
        e.translate_model.set_o1_models()
    if options.model == "o1mini":
        e.translate_model.set_o1mini_models()
    if options.model == "o3mini":
        e.translate_model.set_o3mini_models()
    if options.model.startswith("claude-"):
        e.translate_model.set_claude_model(options.model)
    if options.model.startswith("qwen-"):
        e.translate_model.set_qwen_model(options.model)
    if options.block_size > 0:
        e.block_size = options.block_size
    if options.batch_flag:
        e.batch_flag = options.batch_flag
    if options.batch_use_flag:
        e.batch_use_flag = options.batch_use_flag

    if options.model in ("gemini", "geminipro"):
        e.translate_model.set_interval(options.interval)
    if options.model == "gemini":
        if options.model_list:
            e.translate_model.set_model_list(options.model_list.split(","))
        else:
            e.translate_model.set_geminiflash_models()
    if options.model == "geminipro":
        e.translate_model.set_geminipro_models()

    e.make_bilingual_book()


if __name__ == "__main__":
    main()


================================================
FILE: book_maker/config.py
================================================
config = {
    "translator": {
        "chatgptapi": {
            "context_paragraph_limit": 3,
            "batch_context_update_interval": 50,
        }
    },
}


================================================
FILE: book_maker/loader/__init__.py
================================================
from book_maker.loader.epub_loader import EPUBBookLoader
from book_maker.loader.txt_loader import TXTBookLoader
from book_maker.loader.srt_loader import SRTBookLoader
from book_maker.loader.md_loader import MarkdownBookLoader
from book_maker.loader.pdf_loader import PDFBookLoader

BOOK_LOADER_DICT = {
    "epub": EPUBBookLoader,
    "txt": TXTBookLoader,
    "srt": SRTBookLoader,
    "md": MarkdownBookLoader,
    "pdf": PDFBookLoader,
    # TODO add more here
}


================================================
FILE: book_maker/loader/base_loader.py
================================================
from abc import ABC, abstractmethod


class BaseBookLoader(ABC):
    @staticmethod
    def _is_special_text(text):
        return text.isdigit() or text.isspace()

    @abstractmethod
    def _make_new_book(self, book):
        pass

    @abstractmethod
    def make_bilingual_book(self):
        pass

    @abstractmethod
    def load_state(self):
        pass

    @abstractmethod
    def _save_temp_book(self):
        pass

    @abstractmethod
    def _save_progress(self):
        pass


================================================
FILE: book_maker/loader/epub_loader.py
================================================
import os
import pickle
import string
import sys
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
from copy import copy
from pathlib import Path
import traceback
from threading import Lock

from bs4 import BeautifulSoup as bs
from bs4 import Tag
from bs4.element import NavigableString
from ebooklib import ITEM_DOCUMENT, epub
from rich import print
from tqdm import tqdm

from book_maker.utils import num_tokens_from_text, prompt_config_to_kwargs

from .base_loader import BaseBookLoader
from .helper import EPUBBookLoaderHelper, is_text_link, not_trans


class EPUBBookLoader(BaseBookLoader):
    def __init__(
        self,
        epub_name,
        model,
        key,
        resume,
        language,
        model_api_base=None,
        is_test=False,
        test_num=5,
        prompt_config=None,
        single_translate=False,
        context_flag=False,
        context_paragraph_limit=0,
        temperature=1.0,
        source_lang="auto",
        parallel_workers=1,
    ):
        self.epub_name = epub_name
        self.new_epub = epub.EpubBook()
        self.translate_model = model(
            key,
            language,
            api_base=model_api_base,
            context_flag=context_flag,
            context_paragraph_limit=context_paragraph_limit,
            temperature=temperature,
            source_lang=source_lang,
            **prompt_config_to_kwargs(prompt_config),
        )
        self.is_test = is_test
        self.test_num = test_num
        self.translate_tags = "p"
        self.exclude_translate_tags = "sup"
        self.allow_navigable_strings = False
        self.accumulated_num = 1
        self.translation_style = ""
        self.context_flag = context_flag
        self.helper = EPUBBookLoaderHelper(
            self.translate_model,
            self.accumulated_num,
            self.translation_style,
            self.context_flag,
        )
        self.retranslate = None
        self.exclude_filelist = ""
        self.only_filelist = ""
        self.single_translate = single_translate
        self.block_size = -1
        self.batch_use_flag = False
        self.batch_flag = False
        self.parallel_workers = 1
        self.enable_parallel = False
        self._progress_lock = Lock()
        self._translation_index = 0
        self.set_parallel_workers(parallel_workers)

        # monkey patch for # 173
        def _write_items_patch(obj):
            for item in obj.book.get_items():
                if isinstance(item, epub.EpubNcx):
                    obj.out.writestr(
                        "%s/%s" % (obj.book.FOLDER_NAME, item.file_name), obj._get_ncx()
                    )
                elif isinstance(item, epub.EpubNav):
                    obj.out.writestr(
                        "%s/%s" % (obj.book.FOLDER_NAME, item.file_name),
                        obj._get_nav(item),
                    )
                elif item.manifest:
                    obj.out.writestr(
                        "%s/%s" % (obj.book.FOLDER_NAME, item.file_name), item.content
                    )
                else:
                    obj.out.writestr("%s" % item.file_name, item.content)

        def _check_deprecated(obj):
            pass

        epub.EpubWriter._write_items = _write_items_patch
        epub.EpubReader._check_deprecated = _check_deprecated

        try:
            self.origin_book = epub.read_epub(self.epub_name)
        except Exception:
            # tricky monkey patch for #71 if you don't know why please check the issue and ignore this
            # when upstream change will TODO fix this
            def _load_spine(obj):
                spine = obj.container.find("{%s}%s" % (epub.NAMESPACES["OPF"], "spine"))

                obj.book.spine = [
                    (t.get("idref"), t.get("linear", "yes")) for t in spine
                ]
                obj.book.set_direction(spine.get("page-progression-direction", None))

            epub.EpubReader._load_spine = _load_spine
            self.origin_book = epub.read_epub(self.epub_name)

        self.p_to_save = []
        self.resume = resume
        self.bin_path = f"{Path(epub_name).parent}/.{Path(epub_name).stem}.temp.bin"
        if self.resume:
            self.load_state()

    @staticmethod
    def _is_special_text(text):
        return (
            text.isdigit()
            or text.isspace()
            or is_text_link(text)
            or all(char in string.punctuation for char in text)
        )

    def _make_new_book(self, book):
        new_book = epub.EpubBook()
        allowed_ns = set(epub.NAMESPACES.keys()) | set(epub.NAMESPACES.values())

        for namespace, metas in book.metadata.items():
            # Only keep namespaces recognized by ebooklib
            if namespace not in allowed_ns:
                continue

            if isinstance(metas, dict):
                entries = (
                    (name, value, others)
                    for name, values in metas.items()
                    for value, others in (
                        (item if isinstance(item, tuple) else (item, None))
                        for item in values
                    )
                )
            else:
                entries = metas

            for entry in entries:
                if not entry:
                    continue

                if isinstance(entry, tuple):
                    if len(entry) == 3:
                        name, value, others = entry
                    elif len(entry) == 2:
                        name, value = entry
                        others = None
                    else:
                        continue
                else:
                    # Unexpected metadata format; skip gracefully
                    continue

                # `others` can be {} or None
                if others:
                    new_book.add_metadata(namespace, name, value, others)
                else:
                    new_book.add_metadata(namespace, name, value)

        new_book.spine = book.spine
        new_book.toc = self._fix_toc_uids(book.toc)
        return new_book

    def _fix_toc_uids(self, toc, counter=None):
        """Fix TOC items that have uid=None to prevent TypeError when writing NCX."""
        if counter is None:
            counter = [0]  # Use list to allow mutation in nested calls

        fixed_toc = []
        for item in toc:
            if isinstance(item, tuple):
                # Section with sub-items: (Section, [sub-items])
                section, sub_items = item
                if hasattr(section, "uid") and section.uid is None:
                    section.uid = f"navpoint-{counter[0]}"
                    counter[0] += 1
                fixed_sub_items = self._fix_toc_uids(sub_items, counter)
                fixed_toc.append((section, fixed_sub_items))
            elif hasattr(item, "uid"):
                # Link or EpubHtml item
                if item.uid is None:
                    item.uid = f"navpoint-{counter[0]}"
                    counter[0] += 1
                fixed_toc.append(item)
            else:
                fixed_toc.append(item)

        return fixed_toc

    def _extract_paragraph(self, p):
        for p_exclude in self.exclude_translate_tags.split(","):
            # for issue #280
            if type(p) is NavigableString:
                continue
            for pt in p.find_all(p_exclude):
                pt.extract()
        return p

    def _process_paragraph(self, p, new_p, index, p_to_save_len, thread_safe=False):
        if self.resume and index < p_to_save_len:
            p.string = self.p_to_save[index]
            new_p.string = self.p_to_save[
                index
            ]  # Fix: also update new_p to cached translation
        else:
            t_text = ""
            if self.batch_flag:
                self.translate_model.add_to_batch_translate_queue(index, new_p.text)
            elif self.batch_use_flag:
                t_text = self.translate_model.batch_translate(index)
            else:
                t_text = self.translate_model.translate(new_p.text)
            if t_text is None:
                raise RuntimeError(
                    "`t_text` is None: your translation model is not working as expected. Please check your translation model configuration."
                )
            if type(p) is NavigableString:
                new_p = t_text
                self.p_to_save.append(new_p)
            else:
                new_p.string = t_text
                self.p_to_save.append(new_p.text)

        self.helper.insert_trans(
            p, new_p.string, self.translation_style, self.single_translate
        )
        index += 1

        if thread_safe:
            with self._progress_lock:
                if index % 20 == 0:
                    self._save_progress()
        else:
            if index % 20 == 0:
                self._save_progress()
        return index

    def _process_combined_paragraph(
        self, p_block, index, p_to_save_len, thread_safe=False
    ):
        text = []

        for p in p_block:
            if self.resume and index < p_to_save_len:
                p.string = self.p_to_save[index]
            else:
                p_text = p.text.rstrip()
                text.append(p_text)

            if self.is_test and index >= self.test_num:
                break

            index += 1

        if len(text) > 0:
            translated_text = self.translate_model.translate("\n".join(text))
            translated_text = translated_text.split("\n")
            text_len = len(translated_text)

            for i in range(text_len):
                t = translated_text[i]

                if i >= len(p_block):
                    p = p_block[-1]
                else:
                    p = p_block[i]

                if type(p) is NavigableString:
                    p = t
                else:
                    p.string = t

                self.helper.insert_trans(
                    p, p.string, self.translation_style, self.single_translate
                )

        if thread_safe:
            with self._progress_lock:
                self._save_progress()
        else:
            self._save_progress()
        return index

    def translate_paragraphs_acc(self, p_list, send_num):
        count = 0
        wait_p_list = []
        for i in range(len(p_list)):
            p = p_list[i]
            print(f"translating {i}/{len(p_list)}")
            temp_p = copy(p)

            for p_exclude in self.exclude_translate_tags.split(","):
                # for issue #280
                if type(p) is NavigableString:
                    continue
                for pt in temp_p.find_all(p_exclude):
                    pt.extract()

            if any(
                [not p.text, self._is_special_text(temp_p.text), not_trans(temp_p.text)]
            ):
                if i == len(p_list) - 1:
                    self.helper.deal_old(wait_p_list, self.single_translate)
                continue
            length = num_tokens_from_text(temp_p.text)
            if length > send_num:
                self.helper.deal_new(p, wait_p_list, self.single_translate)
                continue
            if i == len(p_list) - 1:
                if count + length < send_num:
                    wait_p_list.append(p)
                    self.helper.deal_old(wait_p_list, self.single_translate)
                else:
                    self.helper.deal_new(p, wait_p_list, self.single_translate)
                break
            if count + length < send_num:
                count += length
                wait_p_list.append(p)
            else:
                self.helper.deal_old(wait_p_list, self.single_translate)
                wait_p_list.append(p)
                count = length

    def get_item(self, book, name):
        for item in book.get_items():
            if item.file_name == name:
                return item

    def find_items_containing_string(self, book, search_string):
        matching_items = []

        for item in book.get_items_of_type(ITEM_DOCUMENT):
            content = item.get_content()
            soup = bs(content, "html.parser")
            if search_string in soup.get_text():
                matching_items.append(item)

        return matching_items

    def retranslate_book(self, index, p_to_save_len, pbar, trans_taglist, retranslate):
        complete_book_name = retranslate[0]
        fixname = retranslate[1]
        fixstart = retranslate[2]
        fixend = retranslate[3]

        if fixend == "":
            fixend = fixstart

        name_fix = complete_book_name

        complete_book = epub.read_epub(complete_book_name)

        if fixname == "":
            fixname = self.find_items_containing_string(complete_book, fixstart)[
                0
            ].file_name
            print(f"auto find fixname: {fixname}")

        new_book = self._make_new_book(complete_book)

        complete_item = self.get_item(complete_book, fixname)
        if complete_item is None:
            return

        ori_item = self.get_item(self.origin_book, fixname)
        if ori_item is None:
            return

        content_complete = complete_item.content
        content_ori = ori_item.content
        soup_complete = bs(content_complete, "html.parser")
        soup_ori = bs(content_ori, "html.parser")

        p_list_complete = soup_complete.findAll(trans_taglist)
        p_list_ori = soup_ori.findAll(trans_taglist)

        target = None
        tagl = []

        # extract from range
        find_end = False
        find_start = False
        for tag in p_list_complete:
            if find_end:
                tagl.append(tag)
                break

            if fixend in tag.text:
                find_end = True
            if fixstart in tag.text:
                find_start = True

            if find_start:
                if not target:
                    target = tag.previous_sibling
                tagl.append(tag)

        for t in tagl:
            t.extract()

        flag = False
        extract_p_list_ori = []
        for p in p_list_ori:
            if fixstart in p.text:
                flag = True
            if flag:
                extract_p_list_ori.append(p)
            if fixend in p.text:
                break

        for t in extract_p_list_ori:
            if target:
                target.insert_after(t)
                target = t

        for item in complete_book.get_items():
            if item.file_name != fixname:
                new_book.add_item(item)
        if soup_complete:
            complete_item.content = soup_complete.encode()

        index = self.process_item(
            complete_item,
            index,
            p_to_save_len,
            pbar,
            new_book,
            trans_taglist,
            fixstart,
            fixend,
        )
        epub.write_epub(f"{name_fix}", new_book, {})

    def has_nest_child(self, element, trans_taglist):
        if isinstance(element, Tag):
            for child in element.children:
                if child.name in trans_taglist:
                    return True
                if self.has_nest_child(child, trans_taglist):
                    return True
        return False

    def filter_nest_list(self, p_list, trans_taglist):
        filtered_list = [p for p in p_list if not self.has_nest_child(p, trans_taglist)]
        return filtered_list

    def process_item(
        self,
        item,
        index,
        p_to_save_len,
        pbar,
        new_book,
        trans_taglist,
        fixstart=None,
        fixend=None,
    ):
        if self.only_filelist != "" and item.file_name not in self.only_filelist.split(
            ","
        ):
            return index
        elif self.only_filelist == "" and item.file_name in self.exclude_filelist.split(
            ","
        ):
            new_book.add_item(item)
            return index

        if not os.path.exists("log"):
            os.makedirs("log")

        content = item.content
        soup = bs(content, "html.parser")
        p_list = soup.findAll(trans_taglist)

        p_list = self.filter_nest_list(p_list, trans_taglist)

        if self.retranslate:
            new_p_list = []

            if fixstart is None or fixend is None:
                return

            start_append = False
            for p in p_list:
                text = p.get_text()
                if fixstart in text or fixend in text or start_append:
                    start_append = True
                    new_p_list.append(p)
                if fixend in text:
                    p_list = new_p_list
                    break

        if self.allow_navigable_strings:
            p_list.extend(soup.findAll(text=True))

        send_num = self.accumulated_num
        if send_num > 1:
            with open("log/buglog.txt", "a") as f:
                print(f"------------- {item.file_name} -------------", file=f)

            print("------------------------------------------------------")
            print(f"dealing {item.file_name} ...")
            self.translate_paragraphs_acc(p_list, send_num)
        else:
            is_test_done = self.is_test and index > self.test_num
            p_block = []
            block_len = 0
            for p in p_list:
                if is_test_done:
                    break
                if not p.text or self._is_special_text(p.text):
                    pbar.update(1)
                    continue

                new_p = self._extract_paragraph(copy(p))
                if self.single_translate and self.block_size > 0:
                    p_len = num_tokens_from_text(new_p.text)
                    block_len += p_len
                    if block_len > self.block_size:
                        index = self._process_combined_paragraph(
                            p_block, index, p_to_save_len, thread_safe=False
                        )
                        p_block = [p]
                        block_len = p_len
                        print()
                    else:
                        p_block.append(p)
                else:
                    index = self._process_paragraph(
                        p, new_p, index, p_to_save_len, thread_safe=False
                    )
                    print()

                # pbar.update(delta) not pbar.update(index)?
                pbar.update(1)

                if self.is_test and index >= self.test_num:
                    break
            if self.single_translate and self.block_size > 0 and len(p_block) > 0:
                index = self._process_combined_paragraph(
                    p_block, index, p_to_save_len, thread_safe=False
                )

        if soup:
            item.content = soup.encode(encoding="utf-8")
        new_book.add_item(item)

        return index

    def set_parallel_workers(self, workers):
        """Set number of parallel workers for chapter processing.

        Args:
            workers (int): Number of parallel workers. Will be automatically
                         optimized based on actual chapter count during processing.
        """
        self.parallel_workers = max(1, workers)
        self.enable_parallel = workers > 1

        if workers > 8:
            print(
                f"⚠️  Warning: {workers} workers is quite high. Consider using 2-8 workers for optimal performance."
            )

    def _get_next_translation_index(self):
        """Thread-safe method to get next translation index."""
        with self._progress_lock:
            index = self._translation_index
            self._translation_index += 1
            return index

    def _process_chapter_parallel(self, chapter_data):
        """Process a single chapter in parallel mode with proper accumulated_num handling."""
        item, trans_taglist, p_to_save_len = chapter_data
        chapter_result = {
            "item": item,
            "processed_content": None,
            "success": False,
            "error": None,
        }

        try:
            # Create a chapter-specific translator instance to avoid context conflicts
            # This ensures each chapter has its own independent context
            thread_translator = self._create_chapter_translator()

            content = item.content
            soup = bs(content, "html.parser")
            p_list = soup.findAll(trans_taglist)
            p_list = self.filter_nest_list(p_list, trans_taglist)

            if self.allow_navigable_strings:
                p_list.extend(soup.findAll(text=True))

            # Initialize chapter-specific context lists
            chapter_context_list = []
            chapter_translated_list = []

            # Apply accumulated_num logic for this chapter independently
            send_num = self.accumulated_num
            if send_num > 1:
                # Use accumulated translation logic for this chapter
                self._translate_paragraphs_acc_parallel(
                    p_list,
                    send_num,
                    thread_translator,
                    chapter_context_list,
                    chapter_translated_list,
                )
            else:
                # Process paragraphs individually for this chapter
                for p in p_list:
                    if not p.text or self._is_special_text(p.text):
                        continue

                    new_p = self._extract_paragraph(copy(p))
                    index = self._get_next_translation_index()

                    if self.resume and index < p_to_save_len:
                        t_text = self.p_to_save[index]
                    else:
                        # Use chapter-specific context for translation
                        t_text = self._translate_with_chapter_context(
                            thread_translator,
                            new_p.text,
                            chapter_context_list,
                            chapter_translated_list,
                        )
                        t_text = "" if t_text is None else t_text
                        with self._progress_lock:
                            self.p_to_save.append(t_text)

                    if isinstance(p, NavigableString):
                        translated_node = NavigableString(t_text)
                        p.insert_after(translated_node)
                        if self.single_translate:
                            p.extract()
                    else:
                        self.helper.insert_trans(
                            p, t_text, self.translation_style, self.single_translate
                        )

                    with self._progress_lock:
                        if index % 20 == 0:
                            self._save_progress()

            if soup:
                chapter_result["processed_content"] = soup.encode(encoding="utf-8")
            chapter_result["success"] = True

        except Exception as e:
            chapter_result["error"] = str(e)
            print(f"Error processing chapter {item.file_name}: {e}")

        return chapter_result

    def _create_chapter_translator(self):
        """Create a translator instance for a specific chapter with independent context."""
        # Return the main translator - we'll handle context at the chapter level
        return self.translate_model

    def _translate_with_chapter_context(
        self, translator, text, chapter_context_list, chapter_translated_list
    ):
        """Translate text with chapter-specific context management."""
        if not translator.context_flag:
            return translator.translate(text)

        # Temporarily replace global context with chapter context
        original_context = getattr(translator, "context_list", [])
        original_translated = getattr(translator, "context_translated_list", [])

        try:
            # Use chapter-specific context
            translator.context_list = chapter_context_list.copy()
            translator.context_translated_list = chapter_translated_list.copy()

            # Perform translation
            result = translator.translate(text)

            # Update chapter context
            chapter_context_list[:] = translator.context_list
            chapter_translated_list[:] = translator.context_translated_list

            return result

        finally:
            # Restore original context
            translator.context_list = original_context
            translator.context_translated_list = original_translated

    def _translate_paragraphs_acc_parallel(
        self,
        p_list,
        send_num,
        translator,
        chapter_context_list,
        chapter_translated_list,
    ):
        """Apply accumulated_num logic for a single chapter in parallel mode with independent context."""
        from book_maker.utils import num_tokens_from_text
        from .helper import not_trans

        count = 0
        wait_p_list = []

        # Create chapter-specific helper instance with context-aware translation
        class ChapterHelper:
            def __init__(
                self, parent_loader, translator, context_list, translated_list
            ):
                self.parent_loader = parent_loader
                self.translator = translator
                self.context_list = context_list
                self.translated_list = translated_list

            def translate_with_context(self, text):
                return self.parent_loader._translate_with_chapter_context(
                    self.translator, text, self.context_list, self.translated_list
                )

            def deal_old(self, wait_p_list, single_translate):
                if not wait_p_list:
                    return

                # Use the same translate_list logic as sequential processing
                # Create a temporary translator with chapter context
                original_context = getattr(self.translator, "context_list", [])
                original_translated = getattr(
                    self.translator, "context_translated_list", []
                )

                try:
                    # Set chapter context to the translator
                    self.translator.context_list = self.context_list.copy()
                    self.translator.context_translated_list = (
                        self.translated_list.copy()
                    )

                    # Call translate_list for consistent batch translation logic
                    result_txt_list = self.translator.translate_list(wait_p_list)

                    # Update chapter context from translator
                    self.context_list[:] = self.translator.context_list
                    self.translated_list[:] = self.translator.context_translated_list

                    # Apply translations using the same logic as helper.deal_old
                    for i in range(len(wait_p_list)):
                        if i < len(result_txt_list):
                            p = wait_p_list[i]
                            from .helper import shorter_result_link

                            self.parent_loader.helper.insert_trans(
                                p,
                                shorter_result_link(result_txt_list[i]),
                                self.parent_loader.translation_style,
                                single_translate,
                            )

                finally:
                    # Restore original context
                    self.translator.context_list = original_context
                    self.translator.context_translated_list = original_translated

                wait_p_list.clear()

            def deal_new(self, p, wait_p_list, single_translate):
                self.deal_old(wait_p_list, single_translate)
                translation = self.translate_with_context(p.text)
                self.parent_loader.helper.insert_trans(
                    p,
                    translation,
                    self.parent_loader.translation_style,
                    single_translate,
                )

        chapter_helper = ChapterHelper(
            self, translator, chapter_context_list, chapter_translated_list
        )

        for i in range(len(p_list)):
            p = p_list[i]
            temp_p = copy(p)

            for p_exclude in self.exclude_translate_tags.split(","):
                if type(p) == NavigableString:
                    continue
                for pt in temp_p.find_all(p_exclude):
                    pt.extract()

            if any(
                [not p.text, self._is_special_text(temp_p.text), not_trans(temp_p.text)]
            ):
                if i == len(p_list) - 1:
                    chapter_helper.deal_old(wait_p_list, self.single_translate)
                continue

            length = num_tokens_from_text(temp_p.text)
            if length > send_num:
                chapter_helper.deal_new(p, wait_p_list, self.single_translate)
                continue

            if i == len(p_list) - 1:
                if count + length < send_num:
                    wait_p_list.append(p)
                    chapter_helper.deal_old(wait_p_list, self.single_translate)
                else:
                    chapter_helper.deal_new(p, wait_p_list, self.single_translate)
                break

            if count + length < send_num:
                count += length
                wait_p_list.append(p)
            else:
                chapter_helper.deal_old(wait_p_list, self.single_translate)
                wait_p_list.append(p)
                count = length

    def batch_init_then_wait(self):
        name, _ = os.path.splitext(self.epub_name)
        if self.batch_flag or self.batch_use_flag:
            self.translate_model.batch_init(name)
            if self.batch_use_flag:
                start_time = time.time()
                while not self.translate_model.is_completed_batch():
                    print("Batch translation is not completed yet")
                    time.sleep(2)
                    if time.time() - start_time > 300:  # 5 minutes
                        raise Exception("Batch translation timed out after 5 minutes")

    def make_bilingual_book(self):
        self.helper = EPUBBookLoaderHelper(
            self.translate_model,
            self.accumulated_num,
            self.translation_style,
            self.context_flag,
        )
        self.batch_init_then_wait()
        new_book = self._make_new_book(self.origin_book)
        all_items = list(self.origin_book.get_items())
        trans_taglist = self.translate_tags.split(",")
        all_p_length = sum(
            (
                0
                if (
                    (i.get_type() != ITEM_DOCUMENT)
                    or (i.file_name in self.exclude_filelist.split(","))
                    or (
                        self.only_filelist
                        and i.file_name not in self.only_filelist.split(",")
                    )
                )
                else len(bs(i.content, "html.parser").findAll(trans_taglist))
            )
            for i in all_items
        )
        all_p_length += self.allow_navigable_strings * sum(
            (
                0
                if (
                    (i.get_type() != ITEM_DOCUMENT)
                    or (i.file_name in self.exclude_filelist.split(","))
                    or (
                        self.only_filelist
                        and i.file_name not in self.only_filelist.split(",")
                    )
                )
                else len(bs(i.content, "html.parser").findAll(text=True))
            )
            for i in all_items
        )
        pbar = tqdm(total=self.test_num) if self.is_test else tqdm(total=all_p_length)
        print()
        index = 0
        p_to_save_len = len(self.p_to_save)
        try:
            if self.retranslate:
                self.retranslate_book(
                    index, p_to_save_len, pbar, trans_taglist, self.retranslate
                )
                exit(0)
            # Add the things that don't need to be translated first, so that you can see the img after the interruption
            for item in self.origin_book.get_items():
                if item.get_type() != ITEM_DOCUMENT:
                    new_book.add_item(item)

            document_items = list(self.origin_book.get_items_of_type(ITEM_DOCUMENT))

            if self.enable_parallel and len(document_items) > 1:
                # Optimize worker count: no point having more workers than chapters
                effective_workers = min(self.parallel_workers, len(document_items))

                # Parallel processing with proper accumulated_num handling
                print(f"🚀 Parallel processing: {len(document_items)} chapters")
                if effective_workers < self.parallel_workers:
                    print(
                        f"📊 Optimized workers: {effective_workers} (reduced from {self.parallel_workers})"
                    )
                else:
                    print(f"📊 Using {effective_workers} workers")

                if self.accumulated_num > 1:
                    print(
                        f"📝 Each chapter applies accumulated_num={self.accumulated_num} independently"
                    )

                if self.context_flag:
                    print(
                        f"🔗 Context enabled: each chapter maintains independent context (limit={self.translate_model.context_paragraph_limit})"
                    )
                else:
                    print(f"🚫 Context disabled for this translation")

                # Create a simpler progress bar for parallel processing
                pbar.close()  # Close the original progress bar
                chapter_pbar = tqdm(
                    total=len(document_items), desc="Chapters", unit="ch"
                )

                chapter_data_list = [
                    (item, trans_taglist, p_to_save_len) for item in document_items
                ]

                with ThreadPoolExecutor(max_workers=effective_workers) as executor:
                    future_to_item = {
                        executor.submit(
                            self._process_chapter_parallel, chapter_data
                        ): chapter_data[0]
                        for chapter_data in chapter_data_list
                    }

                    for future in as_completed(future_to_item):
                        item = future_to_item[future]
                        try:
                            result = future.result()
                            if result["success"] and result["processed_content"]:
                                item.content = result["processed_content"]
                            new_book.add_item(item)
                            chapter_pbar.update(1)
                            chapter_pbar.set_postfix_str(
                                f"Latest: {item.file_name[:20]}..."
                            )

                        except Exception as e:
                            print(f"❌ Error processing {item.file_name}: {e}")
                            new_book.add_item(item)
                            chapter_pbar.update(1)

                chapter_pbar.close()
                print(f"✅ Completed all {len(document_items)} chapters")
            else:
                # Sequential processing (original behavior or single chapter)
                if len(document_items) == 1 and self.enable_parallel:
                    print(f"📄 Single chapter detected - using sequential processing")

                for item in document_items:
                    index = self.process_item(
                        item, index, p_to_save_len, pbar, new_book, trans_taglist
                    )

                if self.accumulated_num > 1:
                    name, _ = os.path.splitext(self.epub_name)
                    epub.write_epub(f"{name}_bilingual.epub", new_book, {})
            name, _ = os.path.splitext(self.epub_name)
            if self.batch_flag:
                self.translate_model.batch()
            else:
                epub.write_epub(f"{name}_bilingual.epub", new_book, {})
            if self.accumulated_num == 1:
                pbar.close()
        except KeyboardInterrupt as e:
            print(e)
            if self.accumulated_num == 1:
                print("you can resume it next time")
                self._save_progress()
                self._save_temp_book()
            sys.exit(0)
        except Exception:
            traceback.print_exc()
            sys.exit(0)

    def load_state(self):
        try:
            with open(self.bin_path, "rb") as f:
                self.p_to_save = pickle.load(f)
        except Exception:
            raise Exception("can not load resume file")

    def _save_temp_book(self):
        # TODO refactor this logic
        origin_book_temp = epub.read_epub(self.epub_name)
        new_temp_book = self._make_new_book(origin_book_temp)
        p_to_save_len = len(self.p_to_save)
        trans_taglist = self.translate_tags.split(",")
        index = 0
        try:
            for item in origin_book_temp.get_items():
                if item.get_type() == ITEM_DOCUMENT:
                    content = item.content
                    soup = bs(content, "html.parser")
                    p_list = soup.findAll(trans_taglist)
                    if self.allow_navigable_strings:
                        p_list.extend(soup.findAll(text=True))
                    for p in p_list:
                        if not p.text or self._is_special_text(p.text):
                            continue
                        # TODO banch of p to translate then combine
                        # PR welcome here
                        if index < p_to_save_len:
                            new_p = copy(p)
                            if type(p) is NavigableString:
                                new_p = self.p_to_save[index]
                            else:
                                new_p.string = self.p_to_save[index]
                            self.helper.insert_trans(
                                p,
                                new_p.string,
                                self.translation_style,
                                self.single_translate,
                            )
                            index += 1
                        else:
                            break
                    # for save temp book
                    if soup:
                        item.content = soup.encode()
                new_temp_book.add_item(item)
            name, _ = os.path.splitext(self.epub_name)
            epub.write_epub(f"{name}_bilingual_temp.epub", new_temp_book, {})
        except Exception as e:
            # TODO handle it
            print(e)

    def _save_progress(self):
        try:
            with open(self.bin_path, "wb") as f:
                pickle.dump(self.p_to_save, f)
        except Exception:
            raise Exception("can not save resume file")


================================================
FILE: book_maker/loader/helper.py
================================================
import re
import backoff
import logging
from copy import copy

logging.basicConfig(level=logging.WARNING)
logger = logging.getLogger(__name__)


class EPUBBookLoaderHelper:
    def __init__(
        self, translate_model, accumulated_num, translation_style, context_flag
    ):
        self.translate_model = translate_model
        self.accumulated_num = accumulated_num
        self.translation_style = translation_style
        self.context_flag = context_flag

    def insert_trans(self, p, text, translation_style="", single_translate=False):
        if text is None:
            text = ""
        if (
            p.string is not None
            and p.string.replace(" ", "").strip() == text.replace(" ", "").strip()
        ):
            return
        new_p = copy(p)
        new_p.string = text
        if translation_style != "":
            new_p["style"] = translation_style
        p.insert_after(new_p)
        if single_translate:
            p.extract()

    @backoff.on_exception(
        backoff.expo,
        Exception,
        on_backoff=lambda details: logger.warning(f"retry backoff: {details}"),
        on_giveup=lambda details: logger.warning(f"retry abort: {details}"),
        jitter=None,
    )
    def translate_with_backoff(self, text, context_flag=False):
        return self.translate_model.translate(text, context_flag)

    def deal_new(self, p, wait_p_list, single_translate=False):
        self.deal_old(wait_p_list, single_translate, self.context_flag)
        self.insert_trans(
            p,
            shorter_result_link(self.translate_with_backoff(p.text, self.context_flag)),
            self.translation_style,
            single_translate,
        )

    def deal_old(self, wait_p_list, single_translate=False, context_flag=False):
        if not wait_p_list:
            return

        result_txt_list = self.translate_model.translate_list(wait_p_list)

        for i in range(len(wait_p_list)):
            if i < len(result_txt_list):
                p = wait_p_list[i]
                self.insert_trans(
                    p,
                    shorter_result_link(result_txt_list[i]),
                    self.translation_style,
                    single_translate,
                )

        wait_p_list.clear()


url_pattern = r"(http[s]?://|www\.)+(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+"


def is_text_link(text):
    return bool(re.compile(url_pattern).match(text.strip()))


def is_text_tail_link(text, num=80):
    text = text.strip()
    pattern = r".*" + url_pattern + r"$"
    return bool(re.compile(pattern).match(text)) and len(text) < num


def shorter_result_link(text, num=20):
    match = re.search(url_pattern, text)

    if not match or len(match.group()) < num:
        return text

    return re.compile(url_pattern).sub("...", text)


def is_text_source(text):
    return text.strip().startswith("Source: ")


def is_text_list(text, num=80):
    text = text.strip()
    return re.match(r"^Listing\s*\d+", text) and len(text) < num


def is_text_figure(text, num=80):
    text = text.strip()
    return re.match(r"^Figure\s*\d+", text) and len(text) < num


def is_text_digit_and_space(s):
    for c in s:
        if not c.isdigit() and not c.isspace():
            return False
    return True


def is_text_isbn(s):
    pattern = r"^[Ee]?ISBN\s*\d[\d\s]*$"
    return bool(re.match(pattern, s))


def not_trans(s):
    return any(
        [
            is_text_link(s),
            is_text_tail_link(s),
            is_text_source(s),
            is_text_list(s),
            is_text_figure(s),
            is_text_digit_and_space(s),
            is_text_isbn(s),
        ]
    )


================================================
FILE: book_maker/loader/md_loader.py
================================================
import sys
from pathlib import Path

from book_maker.utils import prompt_config_to_kwargs

from .base_loader import BaseBookLoader


class MarkdownBookLoader(BaseBookLoader):
    def __init__(
        self,
        md_name,
        model,
        key,
        resume,
        language,
        model_api_base=None,
        is_test=False,
        test_num=5,
        prompt_config=None,
        single_translate=False,
        context_flag=False,
        context_paragraph_limit=0,
        temperature=1.0,
        source_lang="auto",
    ) -> None:
        self.md_name = md_name
        self.translate_model = model(
            key,
            language,
            api_base=model_api_base,
            temperature=temperature,
            source_lang=source_lang,
            **prompt_config_to_kwargs(prompt_config),
        )
        self.is_test = is_test
        self.p_to_save = []
        self.bilingual_result = []
        self.bilingual_temp_result = []
        self.test_num = test_num
        self.batch_size = 10
        self.single_translate = single_translate
        self.md_paragraphs = []

        try:
            with open(f"{md_name}", encoding="utf-8") as f:
                self.origin_book = f.read().splitlines()

        except Exception as e:
            raise Exception("can not load file") from e

        self.resume = resume
        self.bin_path = f"{Path(md_name).parent}/.{Path(md_name).stem}.temp.bin"
        if self.resume:
            self.load_state()

        self.process_markdown_content()

    def process_markdown_content(self):
        """将原始内容处理成 markdown 段落"""
        current_paragraph = []
        for line in self.origin_book:
            # 如果是空行且当前段落不为空，保存当前段落
            if not line.strip() and current_paragraph:
                self.md_paragraphs.append("\n".join(current_paragraph))
                current_paragraph = []
            # 如果是标题行，单独作为一个段落
            elif line.strip().startswith("#"):
                if current_paragraph:
                    self.md_paragraphs.append("\n".join(current_paragraph))
                    current_paragraph = []
                self.md_paragraphs.append(line)
            # 其他情况，添加到当前段落
            else:
                current_paragraph.append(line)

        # 处理最后一个段落
        if current_paragraph:
            self.md_paragraphs.append("\n".join(current_paragraph))

    @staticmethod
    def _is_special_text(text):
        return text.isdigit() or text.isspace() or len(text) == 0

    def _make_new_book(self, book):
        pass

    def make_bilingual_book(self):
        index = 0
        p_to_save_len = len(self.p_to_save)

        try:
            sliced_list = [
                self.md_paragraphs[i : i + self.batch_size]
                for i in range(0, len(self.md_paragraphs), self.batch_size)
            ]
            for paragraphs in sliced_list:
                batch_text = "\n\n".join(paragraphs)
                if self._is_special_text(batch_text):
                    continue
                if not self.resume or index >= p_to_save_len:
                    try:
                        max_retries = 3
                        retry_count = 0
                        while retry_count < max_retries:
                            try:
                                temp = self.translate_model.translate(batch_text)
                                break
                            except AttributeError as ae:
                                print(f"翻译出错: {ae}")
                                retry_count += 1
                                if retry_count == max_retries:
                                    raise Exception("翻译模型初始化失败") from ae
                    except Exception as e:
                        print(f"翻译过程中出错: {e}")
                        raise Exception("翻译过程中出现错误") from e

                    self.p_to_save.append(temp)
                    if not self.single_translate:
                        self.bilingual_result.append(batch_text)
                    self.bilingual_result.append(temp)
                index += self.batch_size
                if self.is_test and index > self.test_num:
                    break

            self.save_file(
                f"{Path(self.md_name).parent}/{Path(self.md_name).stem}_bilingual.md",
                self.bilingual_result,
            )

        except (KeyboardInterrupt, Exception) as e:
            print(f"发生错误: {e}")
            print("程序将保存进度，您可以稍后继续")
            self._save_progress()
            self._save_temp_book()
            sys.exit(1)  # 使用非零退出码表示错误

    def _save_temp_book(self):
        index = 0
        sliced_list = [
            self.origin_book[i : i + self.batch_size]
            for i in range(0, len(self.origin_book), self.batch_size)
        ]

        for i in range(len(sliced_list)):
            batch_text = "".join(sliced_list[i])
            self.bilingual_temp_result.append(batch_text)
            if self._is_special_text(self.origin_book[i]):
                continue
            if index < len(self.p_to_save):
                self.bilingual_temp_result.append(self.p_to_save[index])
            index += 1

        self.save_file(
            f"{Path(self.md_name).parent}/{Path(self.md_name).stem}_bilingual_temp.txt",
            self.bilingual_temp_result,
        )

    def _save_progress(self):
        try:
            with open(self.bin_path, "w", encoding="utf-8") as f:
                f.write("\n".join(self.p_to_save))
        except Exception as e:
            raise Exception("can not save resume file") from e

    def load_state(self):
        try:
            with open(self.bin_path, encoding="utf-8") as f:
                self.p_to_save = f.read().splitlines()
        except Exception as e:
            raise Exception("can not load resume file") from e

    def save_file(self, book_path, content):
        try:
            with open(book_path, "w", encoding="utf-8") as f:
                f.write("\n".join(content))
        except Exception as e:
            raise Exception("can not save file") from e


================================================
FILE: book_maker/loader/pdf_loader.py
================================================
import sys
from pathlib import Path

from book_maker.utils import prompt_config_to_kwargs

from .base_loader import BaseBookLoader

import fitz

from ebooklib import epub


class PDFBookLoader(BaseBookLoader):
    def __init__(
        self,
        pdf_name,
        model,
        key,
        resume,
        language,
        model_api_base=None,
        is_test=False,
        test_num=5,
        prompt_config=None,
        single_translate=False,
        context_flag=False,
        context_paragraph_limit=0,
        temperature=1.0,
        source_lang="auto",
        parallel_workers=1,
    ) -> None:
        if fitz is None:
            raise Exception("PyMuPDF (fitz) is required to use PDF loader")

        self.pdf_name = pdf_name
        self.translate_model = model(
            key,
            language,
            api_base=model_api_base,
            temperature=temperature,
            source_lang=source_lang,
            **prompt_config_to_kwargs(prompt_config),
        )
        self.is_test = is_test
        self.p_to_save = []
        self.bilingual_result = []
        self.bilingual_temp_result = []
        self.test_num = test_num
        self.batch_size = 10
        self.single_translate = single_translate
        self.parallel_workers = max(1, parallel_workers)

        try:
            doc = fitz.open(self.pdf_name)
            lines = []
            for page in doc:
                text = page.get_text("text")
                if not text:
                    continue
                lines.extend(text.splitlines())
            self.origin_book = lines
        except Exception as e:
            raise Exception("can not load file") from e

        self.resume = resume
        self.bin_path = f"{Path(pdf_name).parent}/.{Path(pdf_name).stem}.temp.bin"
        if self.resume:
            self.load_state()

    def _make_new_book(self, book):
        pass

    def _try_create_epub(self):
        """Try to create an EPUB file from translated content.

        The EPUB is created from the `self.bilingual_result` list which alternates
        original and translated strings. If EPUB creation fails for any reason,
        this function will log the error and leave the TXT fallback intact.
        """
        if epub is None:
            # ebooklib not installed; skip EPUB generation
            return False

        if not self.bilingual_result:
            return False

        try:
            book = epub.EpubBook()
            title = Path(self.pdf_name).stem
            # Minimal metadata
            try:
                book.set_identifier(title)
                book.set_title(title)
                book.set_language(
                    self.translate_model.language
                    if hasattr(self.translate_model, "language")
                    else "en"
                )
            except Exception:
                # be tolerant about metadata API differences
                pass

            chapters = []
            # build chapters from bilingual_result (pairs)
            for i in range(0, len(self.bilingual_result), 2):
                orig = self.bilingual_result[i]
                trans = (
                    self.bilingual_result[i + 1]
                    if i + 1 < len(self.bilingual_result)
                    else ""
                )
                # basic html content: original then translated
                content = ""
                if orig:
                    content += (
                        '<div class="original">'
                        + "<p>"
                        + orig.replace("\n", "<br/>")
                        + "</p></div>"
                    )
                if trans:
                    content += (
                        '<div class="translation">'
                        + "<p>"
                        + trans.replace("\n", "<br/>")
                        + "</p></div>"
                    )

                chap = epub.EpubHtml(
                    title=f"part_{i//2}",
                    file_name=f"index_split_{i//2:03d}.xhtml",
                    lang=(
                        book.get_metadata("DC", "language")[0][0]
                        if book.get_metadata("DC", "language")
                        else None
                    ),
                )
                chap.content = content
                book.add_item(chap)
                chapters.append(chap)

            # table of contents and spine
            book.toc = tuple(chapters)
            book.spine = ["nav"] + chapters

            # add navigation files
            book.add_item(epub.EpubNcx())
            book.add_item(epub.EpubNav())

            out_path = f"{Path(self.pdf_name).parent}/{Path(self.pdf_name).stem}_bilingual.epub"
            epub.write_epub(out_path, book)
            return True
        except Exception as e:
            print(f"create epub failed: {e}")
            return False

    def make_bilingual_book(self):
        index = 0
        p_to_save_len = len(self.p_to_save)

        try:
            sliced_list = [
                self.origin_book[i : i + self.batch_size]
                for i in range(0, len(self.origin_book), self.batch_size)
            ]
            for i in sliced_list:
                # fix the format thanks https://github.com/tudoujunha
                batch_text = "\n".join(i)
                if not batch_text.strip():
                    continue
                if not self.resume or index >= p_to_save_len:
                    try:
                        temp = self.translate_model.translate(batch_text)
                    except Exception as e:
                        print(e)
                        raise Exception("Something is wrong when translate") from e
                    self.p_to_save.append(temp)
                    if not self.single_translate:
                        self.bilingual_result.append(batch_text)
                    self.bilingual_result.append(temp)
                index += self.batch_size
                if self.is_test and index > self.test_num:
                    break

            txt_out = (
                f"{Path(self.pdf_name).parent}/{Path(self.pdf_name).stem}_bilingual.txt"
            )
            self.save_file(txt_out, self.bilingual_result)

            # try to create an EPUB alongside the TXT fallback; if EPUB fails we still keep the TXT file
            epub_ok = self._try_create_epub()
            if epub_ok:
                print(f"created epub: {Path(self.pdf_name).stem}_bilingual.epub")
            else:
                print(
                    "epub creation skipped or failed; bilingual text saved to txt fallback"
                )

        except (KeyboardInterrupt, Exception) as e:
            print(e)
            print("you can resume it next time")
            self._save_progress()
            self._save_temp_book()
            sys.exit(0)

    def _save_temp_book(self):
        index = 0
        sliced_list = [
            self.origin_book[i : i + self.batch_size]
            for i in range(0, len(self.origin_book), self.batch_size)
        ]

        for i in range(len(sliced_list)):
            batch_text = "".join(sliced_list[i])
            self.bilingual_temp_result.append(batch_text)
            if index < len(self.p_to_save):
                self.bilingual_temp_result.append(self.p_to_save[index])
            index += 1

        self.save_file(
            f"{Path(self.pdf_name).parent}/{Path(self.pdf_name).stem}_bilingual_temp.txt",
            self.bilingual_temp_result,
        )

    def _save_progress(self):
        try:
            with open(self.bin_path, "w", encoding="utf-8") as f:
                f.write("\n".join(self.p_to_save))
        except Exception as e:
            raise Exception("can not save resume file") from e

    def load_state(self):
        try:
            with open(self.bin_path, encoding="utf-8") as f:
                self.p_to_save = f.read().splitlines()
        except Exception as e:
            raise Exception("can not load resume file") from e

    def save_file(self, book_path, content):
        try:
            with open(book_path, "w", encoding="utf-8") as f:
                f.write("\n".join(content))
        except Exception as e:
            raise Exception("can not save file") from e


================================================
FILE: book_maker/loader/srt_loader.py
================================================
"""
inspired by: https://github.com/jesselau76/srt-gpt-translator, MIT License
"""

import re
import sys
from pathlib import Path

from book_maker.utils import prompt_config_to_kwargs

from .base_loader import BaseBookLoader


class SRTBookLoader(BaseBookLoader):
    def __init__(
        self,
        srt_name,
        model,
        key,
        resume,
        language,
        model_api_base=None,
        is_test=False,
        test_num=5,
        prompt_config=None,
        single_translate=False,
        context_flag=False,
        context_paragraph_limit=0,
        temperature=1.0,
        source_lang="auto",
    ) -> None:
        self.srt_name = srt_name
        self.translate_model = model(
            key,
            language,
            api_base=model_api_base,
            temperature=temperature,
            source_lang=source_lang,
            **prompt_config_to_kwargs(
                {
                    "system": "You are a srt subtitle file translator.",
                    "user": "Translate the following subtitle text into {language}, but keep the subtitle number and timeline and newlines unchanged: \n{text}",
                }
            ),
        )
        self.is_test = is_test
        self.p_to_save = []
        self.bilingual_result = []
        self.bilingual_temp_result = []
        self.test_num = test_num
        self.accumulated_num = 1
        self.blocks = []
        self.single_translate = single_translate

        self.resume = resume
        self.bin_path = f"{Path(srt_name).parent}/.{Path(srt_name).stem}.temp.bin"
        if self.resume:
            self.load_state()

    def _make_new_book(self, book):
        pass

    def _parse_srt(self, srt_text):
        blocks = re.split("\n\s*\n", srt_text)

        final_blocks = []
        new_block = {}
        for i in range(0, len(blocks)):
            block = blocks[i]
            if block.strip() == "":
                continue

            lines = block.strip().splitlines()
            new_block["number"] = lines[0].strip()
            timestamp = lines[1].strip()
            new_block["time"] = timestamp
            text = "\n".join(lines[2:]).strip()
            new_block["text"] = text
            final_blocks.append(new_block)
            new_block = {}

        return final_blocks

    def _get_block_text(self, block):
        return f"{block['number']}\n{block['time']}\n{block['text']}"

    def _get_block_except_text(self, block):
        return f"{block['number']}\n{block['time']}"

    def _concat_blocks(self, sliced_text: str, text: str):
        return f"{sliced_text}\n\n{text}" if sliced_text else text

    def _get_block_translate(self, block):
        return f"{block['number']}\n{block['text']}"

    def _get_block_from(self, text):
        text = text.strip()
        if not text:
            return {}

        block = text.splitlines()
        if len(block) < 2:
            return {"number": block[0], "text": ""}

        return {"number": block[0], "text": "\n".join(block[1:])}

    def _get_blocks_from(self, translate: str):
        if not translate:
            return []

        blocks = []
        blocks_text = translate.strip().split("\n\n")
        for text in blocks_text:
            blocks.append(self._get_block_from(text))

        return blocks

    def _check_blocks(self, translate_blocks, origin_blocks):
        """
        Check if the translated blocks match the original text, with only a simple check of the beginning numbers.
        """
        if len(translate_blocks) != len(origin_blocks):
            return False

        for t in zip(translate_blocks, origin_blocks):
            i = 0
            try:
                i = int(t[0].get("number", 0))
            except ValueError:
                m = re.search(r"\s*\d+", t[0].get("number"))
                if m:
                    i = int(m.group())

            j = int(t[1].get("number", -1))
            if i != j:
                print(f"check failed: {i}!={j}")
                return False

        return True

    def _get_sliced_list(self):
        sliced_list = []
        sliced_text = ""
        begin_index = 0
        for i, block in enumerate(self.blocks):
            text = self._get_block_translate(block)
            if not text:
                continue

            if len(sliced_text + text) < self.accumulated_num:
                sliced_text = self._concat_blocks(sliced_text, text)
            else:
                if sliced_text:
                    sliced_list.append((begin_index, i, sliced_text))
                sliced_text = text
                begin_index = i

        sliced_list.append((begin_index, len(self.blocks), sliced_text))
        return sliced_list

    def make_bilingual_book(self):
        if self.accumulated_num > 512:
            print(f"{self.accumulated_num} is too large, shrink it to 512.")
            self.accumulated_num = 512

        try:
            with open(f"{self.srt_name}", encoding="utf-8") as f:
                self.blocks = self._parse_srt(f.read())
        except Exception as e:
            raise Exception("can not load file") from e

        index = 0
        p_to_save_len = len(self.p_to_save)

        try:
            sliced_list = self._get_sliced_list()

            for sliced in sliced_list:
                begin, end, text = sliced

                if not self.resume or index + (end - begin) > p_to_save_len:
                    if index < p_to_save_len:
                        self.p_to_save = self.p_to_save[:index]

                    try:
                        temp = self.translate_model.translate(text)
                    except Exception as e:
                        print(e)
                        raise Exception("Something is wrong when translate") from e

                    translated_blocks = self._get_blocks_from(temp)

                    if self.accumulated_num > 1:
                        if not self._check_blocks(
                            translated_blocks, self.blocks[begin:end]
                        ):
                            translated_blocks = []
                            # try to translate one by one, so don't accumulate too much
                            print(
                                f"retry it one by one:  {self.blocks[begin]['number']} - {self.blocks[end - 1]['number']}"
                            )
                            for block in self.blocks[begin:end]:
                                try:
                                    temp = self.translate_model.translate(
                                        self._get_block_translate(block)
                                    )
                                except Exception as e:
                                    print(e)
                                    raise Exception(
                                        "Something is wrong when translate"
                                    ) from e
                                translated_blocks.append(self._get_block_from(temp))

                            if not self._check_blocks(
                                translated_blocks, self.blocks[begin:end]
                            ):
                                raise Exception(
                                    "retry failed, adjust the srt manually."
                                )

                    for i, block in enumerate(translated_blocks):
                        text = block.get("text", "")
                        self.p_to_save.append(text)
                        if self.single_translate:
                            self.bilingual_result.append(
                                f"{self._get_block_except_text(self.blocks[begin + i])}\n{text}"
                            )
                        else:
                            self.bilingual_result.append(
                                f"{self._get_block_text(self.blocks[begin + i])}\n{text}"
                            )
                else:
                    for i, block in enumerate(self.blocks[begin:end]):
                        text = self.p_to_save[begin + i]
                        if self.single_translate:
                            self.bilingual_result.append(
                                f"{self._get_block_except_text(self.blocks[begin + i])}\n{text}"
                            )
                        else:
                            self.bilingual_result.append(
                                f"{self._get_block_text(self.blocks[begin + i])}\n{text}"
                            )

                index += end - begin
                if self.is_test and index > self.test_num:
                    break

            self.save_file(
                f"{Path(self.srt_name).parent}/{Path(self.srt_name).stem}_bilingual.srt",
                self.bilingual_result,
            )

        except (KeyboardInterrupt, Exception) as e:
            print(e)
            print("you can resume it next time")
            self._save_progress()
            self._save_temp_book()
            sys.exit(0)

    def _save_temp_book(self):
        for i, block in enumerate(self.blocks):
            if i < len(self.p_to_save):
                text = self.p_to_save[i]
                self.bilingual_temp_result.append(
                    f"{self._get_block_text(block)}\n{text}"
                )
            else:
                self.bilingual_temp_result.append(f"{self._get_block_text(block)}\n")

        self.save_file(
            f"{Path(self.srt_name).parent}/{Path(self.srt_name).stem}_bilingual_temp.srt",
            self.bilingual_temp_result,
        )

    def _save_progress(self):
        try:
            with open(self.bin_path, "w", encoding="utf-8") as f:
                f.write("===".join(self.p_to_save))
        except Exception as e:
            raise Exception("can not save resume file") from e

    def load_state(self):
        try:
            with open(self.bin_path, encoding="utf-8") as f:
                text = f.read()
                if text:
                    self.p_to_save = text.split("===")
                else:
                    self.p_to_save = []

        except Exception as e:
            raise Exception("can not load resume file") from e

    def save_file(self, book_path, content):
        try:
            with open(book_path, "w", encoding="utf-8") as f:
                f.write("\n\n".join(content))
        except Exception as e:
            raise Exception("can not save file") from e


================================================
FILE: book_maker/loader/txt_loader.py
================================================
import sys
from pathlib import Path

from book_maker.utils import prompt_config_to_kwargs

from .base_loader import BaseBookLoader


class TXTBookLoader(BaseBookLoader):
    def __init__(
        self,
        txt_name,
        model,
        key,
        resume,
        language,
        model_api_base=None,
        is_test=False,
        test_num=5,
        prompt_config=None,
        single_translate=False,
        context_flag=False,
        context_paragraph_limit=0,
        temperature=1.0,
        source_lang="auto",
        parallel_workers=1,
    ) -> None:
        self.txt_name = txt_name
        self.translate_model = model(
            key,
            language,
            api_base=model_api_base,
            temperature=temperature,
            source_lang=source_lang,
            **prompt_config_to_kwargs(prompt_config),
        )
        self.is_test = is_test
        self.p_to_save = []
        self.bilingual_result = []
        self.bilingual_temp_result = []
        self.test_num = test_num
        self.batch_size = 10
        self.single_translate = single_translate
        self.parallel_workers = max(1, parallel_workers)

        try:
            with open(f"{txt_name}", encoding="utf-8") as f:
                self.origin_book = f.read().splitlines()

        except Exception as e:
            raise Exception("can not load file") from e

        self.resume = resume
        self.bin_path = f"{Path(txt_name).parent}/.{Path(txt_name).stem}.temp.bin"
        if self.resume:
            self.load_state()

    @staticmethod
    def _is_special_text(text):
        return text.isdigit() or text.isspace() or len(text) == 0

    def _make_new_book(self, book):
        pass

    def make_bilingual_book(self):
        index = 0
        p_to_save_len = len(self.p_to_save)

        try:
            sliced_list = [
                self.origin_book[i : i + self.batch_size]
                for i in range(0, len(self.origin_book), self.batch_size)
            ]
            for i in sliced_list:
                # fix the format thanks https://github.com/tudoujunha
                batch_text = "\n".join(i)
                if self._is_special_text(batch_text):
                    continue
                if not self.resume or index >= p_to_save_len:
                    try:
                        temp = self.translate_model.translate(batch_text)
                    except Exception as e:
                        print(e)
                        raise Exception("Something is wrong when translate") from e
                    self.p_to_save.append(temp)
                    if not self.single_translate:
                        self.bilingual_result.append(batch_text)
                    self.bilingual_result.append(temp)
                index += self.batch_size
                if self.is_test and index > self.test_num:
                    break

            self.save_file(
                f"{Path(self.txt_name).parent}/{Path(self.txt_name).stem}_bilingual.txt",
                self.bilingual_result,
            )

        except (KeyboardInterrupt, Exception) as e:
            print(e)
            print("you can resume it next time")
            self._save_progress()
            self._save_temp_book()
            sys.exit(0)

    def _save_temp_book(self):
        index = 0
        sliced_list = [
            self.origin_book[i : i + self.batch_size]
            for i in range(0, len(self.origin_book), self.batch_size)
        ]

        for i in range(len(sliced_list)):
            batch_text = "".join(sliced_list[i])
            self.bilingual_temp_result.append(batch_text)
            if self._is_special_text(self.origin_book[i]):
                continue
            if index < len(self.p_to_save):
                self.bilingual_temp_result.append(self.p_to_save[index])
            index += 1

        self.save_file(
            f"{Path(self.txt_name).parent}/{Path(self.txt_name).stem}_bilingual_temp.txt",
            self.bilingual_temp_result,
        )

    def _save_progress(self):
        try:
            with open(self.bin_path, "w", encoding="utf-8") as f:
                f.write("\n".join(self.p_to_save))
        except Exception as e:
            raise Exception("can not save resume file") from e

    def load_state(self):
        try:
            with open(self.bin_path, encoding="utf-8") as f:
                self.p_to_save = f.read().splitlines()
        except Exception as e:
            raise Exception("can not load resume file") from e

    def save_file(self, book_path, content):
        try:
            with open(book_path, "w", encoding="utf-8") as f:
                f.write("\n".join(content))
        except Exception as e:
            raise Exception("can not save file") from e


================================================
FILE: book_maker/obok.py
================================================
# The original code comes from:
# https://github.com/apprenticeharper/DeDRM_tools

# Version 4.1.2 March 2023
# Update library for crypto for current Windows

# Version 4.1.1 March 2023
# Make obok.py works as file selector

# Version 4.1.0 February 2021
# Add detection for Kobo directory location on Linux

# Version 4.0.0 September 2020
# Python 3.0
#
# Version 3.2.5 December 2016
# Improve detection of good text decryption.
#
# Version 3.2.4 December 2016
# Remove incorrect support for Kobo Desktop under Wine
#
# Version 3.2.3 October 2016
# Fix for windows network user and more xml fixes
#
# Version 3.2.2 October 2016
# Change to the way the new database version is handled.
#
# Version 3.2.1 September 2016
# Update for v4.0 of Windows Desktop app.
#
# Version 3.2.0 January 2016
# Update for latest version of Windows Desktop app.
# Support Kobo devices in the command line version.
#
# Version 3.1.9 November 2015
# Handle Kobo Desktop under wine on Linux
#
# Version 3.1.8 November 2015
# Handle the case of Kobo Arc or Vox device (i.e. don't crash).
#
# Version 3.1.7 October 2015
# Handle the case of no device or database more gracefully.
#
# Version 3.1.6 September 2015
# Enable support for Kobo devices
# More character encoding fixes (unicode strings)
#
# Version 3.1.5 September 2015
# Removed requirement that a purchase has been made.
# Also add in character encoding fixes
#
# Version 3.1.4 September 2015
# Updated for version 3.17 of the Windows Desktop app.
#
# Version 3.1.3 August 2015
# Add translations for Portuguese and Arabic
#
# Version 3.1.2 January 2015
# Add coding, version number and version announcement
#
# Version 3.05 October 2014
# Identifies DRM-free books in the dialog
#
# Version 3.04 September 2014
# Handles DRM-free books as well (sometimes Kobo Library doesn't
# show download link for DRM-free books)
#
# Version 3.03 August 2014
# If PyCrypto is unavailable try to use libcrypto for AES_ECB.
#
# Version 3.02 August 2014
# Relax checking of application/xhtml+xml  and image/jpeg content.
#
# Version 3.01 June 2014
# Check image/jpeg as well as application/xhtml+xml content. Fix typo
# in Windows ipconfig parsing.
#
# Version 3.0 June 2014
# Made portable for Mac and Windows, and the only module dependency
# not part of python core is PyCrypto. Major code cleanup/rewrite.
# No longer tries the first MAC address; tries them all if it detects
# the decryption failed.
#
# Updated September 2013 by Anon
# Version 2.02
# Incorporated minor fixes posted at Apprentice Alf's.
#
# Updates July 2012 by Michael Newton
# PWSD ID is no longer a MAC address, but should always
# be stored in the registry. Script now works with OS X
# and checks plist for values instead of registry. Must
# have biplist installed for OS X support.
#
# Original comments left below; note the "AUTOPSY" is inaccurate. See
# KoboLibrary.userkeys and KoboFile.decrypt()
#
##########################################################
#                    KOBO DRM CRACK BY                   #
#                      PHYSISTICATED                     #
##########################################################
# This app was made for Python 2.7 on Windows 32-bit
#
# This app needs pycrypto - get from here:
# http://www.voidspace.org.uk/python/modules.shtml
#
# Usage: obok.py
# Choose the book you want to decrypt
#
# Shouts to my krew - you know who you are - and one in
# particular who gave me a lot of help with this - thank
# you so much!
#
# Kopimi /K\
# Keep sharing, keep copying, but remember that nothing is
# for free - make sure you compensate your favorite
# authors - and cut out the middle man whenever possible
# ;) ;) ;)
#
# DRM AUTOPSY
# The Kobo DRM was incredibly easy to crack, but it took
# me months to get around to making this. Here's the
# basics of how it works:
# 1: Get MAC address of first NIC in ipconfig (sometimes
# stored in registry as pwsdid)
# 2: Get user ID (stored in tons of places, this gets it
# from HKEY_CURRENT_USER\Software\Kobo\Kobo Desktop
# Edition\Browser\cookies)
# 3: Concatenate and SHA256, take the second half - this
# is your master key
# 4: Open %LOCALAPPDATA%\Kobo Desktop Editions\Kobo.sqlite
# and dump content_keys
# 5: Unbase64 the keys, then decode these with the master
# key - these are your page keys
# 6: Unzip EPUB of your choice, decrypt each page with its
# page key, then zip back up again
#
# WHY USE THIS WHEN INEPT WORKS FINE? (adobe DRM stripper)
# Inept works very well, but authors on Kobo can choose
# what DRM they want to use - and some have chosen not to
# let people download them with Adobe Digital Editions -
# they would rather lock you into a single platform.
#
# With Obok, you can sync Kobo Desktop, decrypt all your
# ebooks, and then use them on whatever device you want
# - you bought them, you own them, you can do what you
# like with them.
#
# Obok is Kobo backwards, but it is also means "next to"
# in Polish.
# When you buy a real book, it is right next to you. You
# can read it at home, at work, on a train, you can lend
# it to a friend, you can scribble on it, and add your own
# explanations/translations.
#
# Obok gives you this power over your ebooks - no longer
# are you restricted to one device. This allows you to
# embed foreign fonts into your books, as older Kobo's
# can't display them properly. You can read your books
# on your phones, in different PC readers, and different
# ereader devices. You can share them with your friends
# too, if you like - you can do that with a real book
# after all.
#
"""Manage all Kobo books, either encrypted or DRM-free."""

__version__ = "4.1.2"
__about__ = f"Obok v{__version__}\nCopyright © 2012-2020 Physisticated et al."

import base64
import binascii
import contextlib
import hashlib
import os
import re
import shutil
import sqlite3
import subprocess
import sys
import tempfile
import xml.etree.ElementTree as ET
import zipfile

can_parse_xml = True
try:
    from xml.etree import ElementTree as ET

    # print "using xml.etree for xml parsing"
except ImportError:
    can_parse_xml = False
    # print "Cannot find xml.etree, disabling extraction of serial numbers"

# List of all known hash keys
KOBO_HASH_KEYS = ["88b3a2e13", "XzUhGYdFp", "NoCanLook", "QJhwzAtXL"]


class ENCRYPTIONError(Exception):
    pass


def _load_crypto_libcrypto():
    from ctypes import (
        CDLL,
        POINTER,
        Structure,
        c_char_p,
        c_int,
        c_long,
        create_string_buffer,
    )
    from ctypes.util import find_library

    if sys.platform.startswith("win"):
        libcrypto = find_library("libcrypto")
    else:
        libcrypto = find_library("crypto")

    if libcrypto is None:
        raise ENCRYPTIONError("libcrypto not found")
    libcrypto = CDLL(libcrypto)

    AES_MAXNR = 14

    POINTER(c_char_p)
    POINTER(c_int)

    class AES_KEY(Structure):
        _fields_ = [("rd_key", c_long * (4 * (AES_MAXNR + 1))), ("rounds", c_int)]

    AES_KEY_p = POINTER(AES_KEY)

    def F(restype, name, argtypes):
        func = getattr(libcrypto, name)
        func.restype = restype
        func.argtypes = argtypes
        return func

    AES_set_decrypt_key = F(c_int, "AES_set_decrypt_key", [c_char_p, c_int, AES_KEY_p])
    AES_ecb_encrypt = F(None, "AES_ecb_encrypt", [c_char_p, c_char_p, AES_KEY_p, c_int])

    class AES:
        def __init__(self, userkey) -> None:
            self._blocksize = len(userkey)
            if self._blocksize not in [16, 24, 32]:
                raise ENCRYPTIONError(_("AES improper key used"))
            key = self._key = AES_KEY()
            rv = AES_set_decrypt_key(userkey, len(userkey) * 8, key)
            if rv < 0:
                raise ENCRYPTIONError(_("Failed to initialize AES key"))

        def decrypt(self, data):
            clear = b""
            for i in range(0, len(data), 16):
                out = create_string_buffer(16)
                rv = AES_ecb_encrypt(data[i : i + 16], out, self._key, 0)
                if rv == 0:
                    raise ENCRYPTIONError(_("AES decryption failed"))
                clear += out.raw
            return clear

    return AES


def _load_crypto_pycrypto():
    from Crypto.Cipher import AES as _AES

    class AES:
        def __init__(self, key) -> None:
            self._aes = _AES.new(key, _AES.MODE_ECB)

        def decrypt(self, data):
            return self._aes.decrypt(data)

    return AES


def _load_crypto():
    AES = None
    cryptolist = (_load_crypto_pycrypto, _load_crypto_libcrypto)
    for loader in cryptolist:
        with contextlib.suppress(ImportError, ENCRYPTIONError):
            AES = loader()
            break
    return AES


AES = _load_crypto()


# Wrap a stream so that output gets flushed immediately
# and also make sure that any unicode strings get
# encoded using "replace" before writing them.
class SafeUnbuffered:
    def __init__(self, stream) -> None:
        self.stream = stream
        self.encoding = stream.encoding
        if self.encoding is None:
            self.encoding = "utf-8"

    def write(self, data):
        if isinstance(data, str):
            data = data.encode(self.encoding, "replace")
        self.stream.buffer.write(data)
        self.stream.buffer.flush()

    def __getattr__(self, attr):
        return getattr(self.stream, attr)


class KoboLibrary:
    """The Kobo library.

    This class represents all the information available from the data
    written by the Kobo Desktop Edition application, including the list
    of books, their titles, and the user's encryption key(s)."""

    def __init__(self, serials=None, device_path=None, desktopkobodir="") -> None:
        if serials is None:
            serials = []
        print(__about__)
        self.kobodir = ""
        kobodb = ""

        # Order of checks
        # 1. first check if a device_path has been passed in, and whether
        #    we can find the sqlite db in the respective place
        # 2. if 1., and we got some serials passed in (from saved
        #    settings in calibre), just use it
        # 3. if 1. worked, but we didn't get serials, try to parse them
        #    from the device, if this didn't work, unset everything
        # 4. if by now we don't have kobodir set, give up on device and
        #    try to use the Desktop app.

        # step 1. check whether this looks like a real device
        if device_path:
            # we got a device path
            self.kobodir = os.path.join(device_path, ".kobo")
            # devices use KoboReader.sqlite
            kobodb = os.path.join(self.kobodir, "KoboReader.sqlite")
            if not os.path.isfile(kobodb):
                # device path seems to be wrong, unset it
                device_path = ""
                self.kobodir = ""
                kobodb = ""

        # step 3. we found a device but didn't get serials, try to get them
        #
        # we got a device path but no saved serial
        # try to get the serial from the device
        # get serial from device_path/.adobe-digital-editions/device.xml
        if self.kobodir and len(serials) == 0 and can_parse_xml:
            # print "get_device_settings - device_path = {0}".format(device_path)
            devicexml = os.path.join(
                device_path,
                ".adobe-digital-editions",
                "device.xml",
            )
            # print "trying to load {0}".format(devicexml)
            if os.path.exists(devicexml):
                # print "trying to parse {0}".format(devicexml)
                xmltree = ET.parse(devicexml)
                for node in xmltree.iter():
                    if "deviceSerial" in node.tag:
                        serial = node.text
                        # print "found serial {0}".format(serial)
                        serials.append(serial)
                        break
            else:
                # print "cannot get serials from device."
                device_path = ""
                self.kobodir = ""
                kobodb = ""

        if self.kobodir == "":
            # step 4. we haven't found a device with serials, so try desktop apps
            if desktopkobodir != "":
                self.kobodir = desktopkobodir

            if self.kobodir == "":
                if sys.platform.startswith("win"):
                    import winreg

                    if (
                        sys.getwindowsversion().major > 5
                        and "LOCALAPPDATA" in os.environ
                    ):
                        # Python 2.x does not return unicode env. Use Python 3.x
                        self.kobodir = winreg.ExpandEnvironmentStrings("%LOCALAPPDATA%")
                    if self.kobodir == "" and "USERPROFILE" in os.environ:
                        # Python 2.x does not return unicode env. Use Python 3.x
                        self.kobodir = os.path.join(
                            winreg.ExpandEnvironmentStrings("%USERPROFILE%"),
                            "Local Settings",
                            "Application Data",
                        )
                    self.kobodir = os.path.join(
                        self.kobodir,
                        "Kobo",
                        "Kobo Desktop Edition",
                    )
                elif sys.platform.startswith("darwin"):
                    self.kobodir = os.path.join(
                        os.environ["HOME"],
                        "Library",
                        "Application Support",
                        "Kobo",
                        "Kobo Desktop Edition",
                    )
                elif sys.platform.startswith("linux"):
                    # sets ~/.config/calibre as the location to store the kobodir location info file and creates this directory if necessary
                    kobodir_cache_dir = os.path.join(
                        os.environ["HOME"],
                        ".config",
                        "calibre",
                    )
                    if not os.path.isdir(kobodir_cache_dir):
                        os.mkdir(kobodir_cache_dir)

                    # appends the name of the file we're storing the kobodir location info to the above path
                    kobodir_cache_file = f"{str(kobodir_cache_dir)}/kobo_location"

                    """if the above file does not exist, recursively searches from the root
                    of the filesystem until kobodir is found and stores the location of kobodir
                    in that file so this loop can be skipped in the future"""
                    original_stdout = sys.stdout
                    if not os.path.isfile(kobodir_cache_file):
                        for root, _dirs, files in os.walk("/"):
                            for file in files:
                                if file == "Kobo.sqlite":
                                    kobo_linux_path = str(root)
                                    with open(
                                        kobodir_cache_file,
                                        "w",
                                        encoding="utf-8",
                                    ) as f:
                                        sys.stdout = f
                                        print(kobo_linux_path, end="")
                                        sys.stdout = original_stdout

                    f = open(kobodir_cache_file, encoding="utf-8")
                    self.kobodir = f.read()

            # desktop versions use Kobo.sqlite
            kobodb = os.path.join(self.kobodir, "Kobo.sqlite")
            # check for existence of file
            if not os.path.isfile(kobodb):
                # give up here, we haven't found anything useful
                self.kobodir = ""
                kobodb = ""

        if self.kobodir != "":
            self.bookdir = os.path.join(self.kobodir, "kepub")
            # make a copy of the database in a temporary file
            # so we can ensure it's not using WAL logging which sqlite3 can't do.
            self.newdb = tempfile.NamedTemporaryFile(mode="wb", delete=False)
            print(self.newdb.name)
            with open(kobodb, "rb") as olddb:
                self.newdb.write(olddb.read(18))
                self.newdb.write(b"\x01\x01")
                olddb.read(2)
                self.newdb.write(olddb.read())
            self.newdb.close()
            self.__sqlite = sqlite3.connect(self.newdb.name)
            self.__cursor = self.__sqlite.cursor()
            self._userkeys = []
            self._books = []
            self._volumeID = []
            self._serials = serials

    def close(self):
        """Closes the database used by the library."""
        self.__cursor.close()
        self.__sqlite.close()
        # delete the temporary copy of the database
        os.remove(self.newdb.name)

    @property
    def userkeys(self):
        """The list of potential userkeys being used by this library.
        Only one of these will be valid.
        """
        if len(self._userkeys) != 0:
            return self._userkeys
        for macaddr in self.__getmacaddrs():
            self._userkeys.extend(self.__getuserkeys(macaddr))
        return self._userkeys

    @property
    def books(self):
        """The list of KoboBook objects in the library."""
        if len(self._books) != 0:
            return self._books
        """Drm-ed kepub"""
        for row in self.__cursor.execute(
            "SELECT DISTINCT volumeid, Title, Attribution, Series FROM content_keys, content WHERE contentid = volumeid",
        ):
            self._books.append(
                KoboBook(
                    row[0],
                    row[1],
                    self.__bookfile(row[0]),
                    "kepub",
                    self.__cursor,
                    author=row[2],
                    series=row[3],
                ),
            )
            self._volumeID.append(row[0])
        """Drm-free"""
        for f in os.listdir(self.bookdir):
            if f not in self._volumeID:
                row = self.__cursor.execute(
                    "SELECT Title, Attribution, Series FROM content WHERE ContentID = '"
                    + f
                    + "'",
                ).fetchone()
                if row is not None:
                    fTitle = row[0]
                    self._books.append(
                        KoboBook(
                            f,
                            fTitle,
                            self.__bookfile(f),
                            "drm-free",
                            self.__cursor,
                            author=row[1],
                            series=row[2],
                        ),
                    )
                    self._volumeID.append(f)
        """Sort"""
        self._books.sort(key=lambda x: x.title)
        return self._books

    def __bookfile(self, volumeid):
        """The filename needed to open a given book."""
        return os.path.join(self.kobodir, "kepub", volumeid)

    def __getmacaddrs(self):
        """The list of all MAC addresses on this machine."""
        macaddrs = []
        if sys.platform.startswith("win"):
            c = re.compile(
                "\\s?(" + "[0-9a-f]{2}[:\\-]" * 5 + "[0-9a-f]{2})(\\s|$)",
                re.IGNORECASE,
            )
            output = subprocess.Popen(
                "wmic nic where PhysicalAdapter=True get MACAddress",
                shell=True,
                stdout=subprocess.PIPE,
                text=True,
            ).stdout
            for line in output:
                if m := c.search(line):
                    macaddrs.append(re.sub("-", ":", m[1]).upper())
        elif sys.platform.startswith("darwin"):
            c = re.compile(
                "\\s(" + "[0-9a-f]{2}:" * 5 + "[0-9a-f]{2})(\\s|$)",
                re.IGNORECASE,
            )
            output = subprocess.check_output(
                "/sbin/ifconfig -a",
                shell=True,
                encoding="utf-8",
            )
            matches = c.findall(output)
            macaddrs.extend(m[0].upper() for m in matches)
        else:
            # probably linux

            # let's try ip
            c = re.compile(
                "\\s(" + "[0-9a-f]{2}:" * 5 + "[0-9a-f]{2})(\\s|$)",
                re.IGNORECASE,
            )
            for line in os.popen("ip -br link"):
                if m := c.search(line):
                    macaddrs.append(m[1].upper())

            # let's try ipconfig under wine
            c = re.compile(
                "\\s(" + "[0-9a-f]{2}-" * 5 + "[0-9a-f]{2})(\\s|$)",
                re.IGNORECASE,
            )
            for line in os.popen("ipconfig /all"):
                if m := c.search(line):
                    macaddrs.append(re.sub("-", ":", m[1]).upper())

        # extend the list of macaddrs in any case with the serials
        # cannot hurt ;-)
        macaddrs.extend(self._serials)

        return macaddrs

    def __getuserids(self):
        userids = []
        cursor = self.__cursor.execute("SELECT UserID FROM user")
        row = cursor.fetchone()
        while row is not None:
            with contextlib.suppress(Exception):
                userid = row[0]
                userids.append(userid)
            row = cursor.fetchone()
        return userids

    def __getuserkeys(self, macaddr):
        userids = self.__getuserids()
        userkeys = []
        for hash in KOBO_HASH_KEYS:
            deviceid = hashlib.sha256((hash + macaddr).encode("ascii")).hexdigest()
            for userid in userids:
                userkey = hashlib.sha256(
                    (deviceid + userid).encode("ascii"),
                ).hexdigest()
                userkeys.append(binascii.a2b_hex(userkey[32:]))
        return userkeys


class KoboBook:
    """A Kobo book.

    A Kobo book contains a number of unencrypted and encrypted files.
    This class provides a list of the encrypted files.

    Each book has the following instance variables:
    volumeid - a UUID which uniquely refers to the book in this library.
    title - the human-readable book title.
    filename - the complete path and filename of the book.
    type - either kepub or drm-free"""

    def __init__(
        self,
        volumeid,
        title,
        filename,
        type,
        cursor,
        author=None,
        series=None,
    ) -> None:
        self.volumeid = volumeid
        self.title = title
        self.author = author
        self.series = series
        self.series_index = None
        self.filename = filename
        self.type = type
        self.__cursor = cursor
        self._encryptedfiles = {}

    @property
    def encryptedfiles(self):
        """A dictionary of KoboFiles inside the book.

        The dictionary keys are the relative pathnames, which are
        the same as the pathnames inside the book 'zip' file."""
        if self.type == "drm-free":
            return self._encryptedfiles
        if len(self._encryptedfiles) != 0:
            return self._encryptedfiles
        # Read the list of encrypted files from the DB
        for row in self.__cursor.execute(
            "SELECT elementid,elementkey FROM content_keys,content WHERE volumeid = ? AND volumeid = contentid",
            (self.volumeid,),
        ):
            self._encryptedfiles[row[0]] = KoboFile(
                row[0],
                None,
                base64.b64decode(row[1]),
            )

        # Read the list of files from the kepub OPF manifest so that
        # we can get their proper MIME type.
        # NOTE: this requires that the OPF file is unencrypted!
        zin = zipfile.ZipFile(self.filename, "r")
        xmlns = {
            "ocf": "urn:oasis:names:tc:opendocument:xmlns:container",
            "opf": "http://www.idpf.org/2007/opf",
        }
        ocf = ET.fromstring(zin.read("META-INF/container.xml"))
        opffile = ocf.find(".//ocf:rootfile", xmlns).attrib["full-path"]
        basedir = re.sub("[^/]+$", "", opffile)
        opf = ET.fromstring(zin.read(opffile))
        zin.close()

        c = re.compile("/")
        for item in opf.findall(".//opf:item", xmlns):
            # Convert relative URIs
            href = item.attrib["href"]
            if not c.match(href):
                href = "".join((basedir, href))

            # Update books we've found from the DB.
            if href in self._encryptedfiles:
                mimetype = item.attrib["media-type"]
                self._encryptedfiles[href].mimetype = mimetype
        return self._encryptedfiles

    @property
    def has_drm(self):
        return self.type != "drm-free"


class KoboFile:
    """An encrypted file in a KoboBook.

    Each file has the following instance variables:
    filename - the relative pathname inside the book zip file.
    mimetype - the file's MIME type, e.g. 'image/jpeg'
    key - the encrypted page key."""

    def __init__(self, filename, mimetype, key) -> None:
        self.filename = filename
        self.mimetype = mimetype
        self.key = key

    def decrypt(self, userkey, contents):
        """
        Decrypt the contents using the provided user key and the
        file page key. The caller must determine if the decrypted
        data is correct."""
        # The userkey decrypts the page key (self.key)
        keyenc = AES(userkey)
        decryptedkey = keyenc.decrypt(self.key)
        # The decrypted page key decrypts the content
        pageenc = AES(decryptedkey)
        return self.__removeaespadding(pageenc.decrypt(contents))

    def check(self, contents):
        """
        If the contents uses some known MIME types, check if it
        conforms to the type. Throw a ValueError exception if not.
        If the contents uses an uncheckable MIME type, don't check
        it and don't throw an exception.
        Returns True if the content was checked, False if it was not
        checked."""
        if self.mimetype == "application/xhtml+xml":
            # assume utf-8 with no BOM
            textoffset = 0
            stride = 1
            print(f"Checking text:{contents[:10]}:")
            # check for byte order mark
            if contents[:3] == b"\xef\xbb\xbf":
                # seems to be utf-8 with BOM
                print("Could be utf-8 with BOM")
                textoffset = 3
            elif contents[:2] == b"\xfe\xff":
                # seems to be utf-16BE
                print("Could be  utf-16BE")
                textoffset = 3
                stride = 2
            elif contents[:2] == b"\xff\xfe":
                # seems to be utf-16LE
                print("Could be  utf-16LE")
                textoffset = 2
                stride = 2
            else:
                print("Perhaps utf-8 without BOM")

            # now check that the first few characters are in the ASCII range
            for i in range(textoffset, textoffset + 5 * stride, stride):
                if contents[i] < 32 or contents[i] > 127:
                    # Non-ascii, so decryption probably failed
                    print(f"Bad character at {i}, value {contents[i]}")
                    raise ValueError
            print("Seems to be good text")
            return True
        if self.mimetype == "image/jpeg":
            if contents[:3] == b"\xff\xd8\xff":
                return True
            print(f"Bad JPEG: {contents[:3].hex()}")
            raise ValueError
        return False

    def __removeaespadding(self, contents):
        """
        Remove the trailing padding, using what appears to be the CMS
        algorithm from RFC 5652 6.3"""
        lastchar = binascii.b2a_hex(contents[-1:])
        strlen = int(lastchar, 16)
        padding = strlen
        if strlen == 1:
            return contents[:-1]
        if strlen < 16:
            for _ in range(strlen):
                testchar = binascii.b2a_hex(contents[-strlen : -(strlen - 1)])
                if testchar != lastchar:
                    padding = 0
        if padding > 0:
            contents = contents[:-padding]
        return contents


def decrypt_book(book, lib):
    print(f"Converting {book.title}")
    zin = zipfile.ZipFile(book.filename, "r")
    # make filename out of Unicode alphanumeric and whitespace equivalents from title
    outname = "{}.epub".format(re.sub("[^\\s\\w]", "_", book.title, 0, re.UNICODE))
    if book.type == "drm-free":
        print("DRM-free book, conversion is not needed")
        shutil.copyfile(book.filename, outname)
        print(f"Book saved as {os.path.join(os.getcwd(), outname)}")
        return os.path.join(os.getcwd(), outname)
    for userkey in lib.userkeys:
        print(f"Trying key: {userkey.hex()}")
        try:
            zout = zipfile.ZipFile(outname, "w", zipfile.ZIP_DEFLATED)
            for filename in zin.namelist():
                contents = zin.read(filename)
                if filename in book.encryptedfiles:
                    file = book.encryptedfiles[filename]
                    contents = file.decrypt(userkey, contents)
                    # Parse failures mean the key is probably wrong.
                    file.check(contents)
                zout.writestr(filename, contents)
            zout.close()
            print("Decryption succeeded.")
            print(f"Book saved as {os.path.join(os.getcwd(), outname)}")
            break
        except ValueError:
            print("Decryption failed.")
            zout.close()
            os.remove(outname)
    zin.close()
    return os.path.join(os.getcwd(), outname)


def cli_main(devicedir):
    serials = []

    lib = KoboLibrary(serials, devicedir)

    for i, book in enumerate(lib.books):
        print(f"{i + 1}: {book.title}")

    choice = input("Convert book number... ")
    try:
        num = int(choice)
        books = [lib.books[num - 1]]
    except (ValueError, IndexError):
        print("Invalid choice. Exiting...")
        sys.exit()

    results = [decrypt_book(book, lib) for book in books]
    lib.close()
    return results[0]


if __name__ == "__main__":
    sys.stdout = SafeUnbuffered(sys.stdout)
    sys.stderr = SafeUnbuffered(sys.stderr)
    sys.exit(cli_main())


================================================
FILE: book_maker/translator/__init__.py
================================================
from book_maker.translator.caiyun_translator import Caiyun
from book_maker.translator.chatgptapi_translator import ChatGPTAPI
from book_maker.translator.deepl_translator import DeepL
from book_maker.translator.deepl_free_translator import DeepLFree
from book_maker.translator.google_translator import Google
from book_maker.translator.claude_translator import Claude
from book_maker.translator.gemini_translator import Gemini
from book_maker.translator.groq_translator import GroqClient
from book_maker.translator.tencent_transmart_translator import TencentTranSmart
from book_maker.translator.custom_api_translator import CustomAPI
from book_maker.translator.xai_translator import XAIClient
from book_maker.translator.qwen_translator import QwenTranslator

MODEL_DICT = {
    "openai": ChatGPTAPI,
    "chatgptapi": ChatGPTAPI,
    "gpt4": ChatGPTAPI,
    "gpt4omini": ChatGPTAPI,
    "gpt4o": ChatGPTAPI,
    "gpt5mini": ChatGPTAPI,
    "o1preview": ChatGPTAPI,
    "o1": ChatGPTAPI,
    "o1mini": ChatGPTAPI,
    "o3mini": ChatGPTAPI,
    "google": Google,
    "caiyun": Caiyun,
    "deepl": DeepL,
    "deeplfree": DeepLFree,
    "claude": Claude,
    "claude-sonnet-4-6": Claude,
    "claude-opus-4-6": Claude,
    "claude-opus-4-5-20251101": Claude,
    "claude-haiku-4-5-20251001": Claude,
    "claude-sonnet-4-5-20250929": Claude,
    "claude-opus-4-1-20250805": Claude,
    "claude-opus-4-20250514": Claude,
    "claude-sonnet-4-20250514": Claude,
    "gemini": Gemini,
    "geminipro": Gemini,
    "groq": GroqClient,
    "tencentransmart": TencentTranSmart,
    "customapi": CustomAPI,
    "xai": XAIClient,
    "qwen": QwenTranslator,
    "qwen-mt-turbo": QwenTranslator,
    "qwen-mt-plus": QwenTranslator,
    # add more here
}


================================================
FILE: book_maker/translator/base_translator.py
================================================
import itertools
from abc import ABC, abstractmethod


class Base(ABC):
    def __init__(self, key, language) -> None:
        self.keys = itertools.cycle(key.split(","))
        self.language = language

    @abstractmethod
    def rotate_key(self):
        pass

    @abstractmethod
    def translate(self, text):
        pass

    def set_deployment_id(self, deployment_id):
        pass


================================================
FILE: book_maker/translator/caiyun_translator.py
================================================
import json
import re
import time

import requests
from rich import print

from .base_translator import Base


class Caiyun(Base):
    """
    caiyun translator
    """

    def __init__(self, key, language, **kwargs) -> None:
        super().__init__(key, language)
        self.api_url = "https://api.interpreter.caiyunai.com/v1/translator"
        self.headers = {
            "content-type": "application/json",
            "x-authorization": f"token {key}",
        }
        # caiyun api only supports: zh2en, zh2ja, en2zh, ja2zh
        self.translate_type = "auto2zh"
        if self.language == "english":
            self.translate_type = "auto2en"
        elif self.language == "japanese":
            self.translate_type = "auto2ja"

    def rotate_key(self):
        pass

    def translate(self, text):
        print(text)
        # for caiyun translate src issue #279
        text_list = text.splitlines()
        num = None
        if len(text_list) > 1:
            if text_list[0].isdigit():
                num = text_list[0]
        payload = {
            "source": text,
            "trans_type": self.translate_type,
            "request_id": "demo",
            "detect": True,
        }
        response = requests.request(
            "POST",
            self.api_url,
            data=json.dumps(payload),
            headers=self.headers,
        )
        try:
            t_text = response.json()["target"]
        except Exception as e:
            print(str(e), response.text, "will sleep 60s for the time limit")
            if "limit" in response.json()["message"]:
                print("will sleep 60s for the time limit")
            time.sleep(60)
            response = requests.request(
                "POST",
                self.api_url,
                data=json.dumps(payload),
                headers=self.headers,
            )
            t_text = response.json()["target"]

        print("[bold green]" + re.sub("\n{3,}", "\n\n", t_text) + "[/bold green]")
        # for issue #279
        if num:
            t_text = str(num) + "\n" + t_text
        return t_text


================================================
FILE: book_maker/translator/chatgptapi_translator.py
================================================
import re
import time
import os
import shutil
from copy import copy
from os import environ
from itertools import cycle
import json
from threading import Lock

from openai import AzureOpenAI, OpenAI, RateLimitError
from rich import print

from .base_translator import Base
from ..config import config

CHATGPT_CONFIG = config["translator"]["chatgptapi"]

PROMPT_ENV_MAP = {
    "user": "BBM_CHATGPTAPI_USER_MSG_TEMPLATE",
    "system": "BBM_CHATGPTAPI_SYS_MSG",
}

GPT35_MODEL_LIST = [
    "gpt-3.5-turbo",
    "gpt-3.5-turbo-1106",
    "gpt-3.5-turbo-16k",
    "gpt-3.5-turbo-0613",
    "gpt-3.5-turbo-16k-0613",
    "gpt-3.5-turbo-0301",
    "gpt-3.5-turbo-0125",
]
GPT4_MODEL_LIST = [
    "gpt-4-1106-preview",
    "gpt-4",
    "gpt-4-32k",
    "gpt-4o-2024-05-13",
    "gpt-4-0613",
    "gpt-4-32k-0613",
]

GPT4oMINI_MODEL_LIST = [
    "gpt-4o-mini",
    "gpt-4o-mini-2024-07-18",
]
GPT4o_MODEL_LIST = [
    "gpt-4o",
    "gpt-4o-2024-05-13",
    "gpt-4o-2024-08-06",
    "chatgpt-4o-latest",
]
GPT5MINI_MODEL_LIST = [
    "gpt-5-mini",
]
O1PREVIEW_MODEL_LIST = [
    "o1-preview",
    "o1-preview-2024-09-12",
]
O1_MODEL_LIST = [
    "o1",
    "o1-2024-12-17",
]
O1MINI_MODEL_LIST = [
    "o1-mini",
    "o1-mini-2024-09-12",
]
O3MINI_MODEL_LIST = [
    "o3-mini",
]


class ChatGPTAPI(Base):
    DEFAULT_PROMPT = "Please help me to translate,`{text}` to {language}, please return only translated content not include the origin text"

    def __init__(
        self,
        key,
        language,
        api_base=None,
        prompt_template=None,
        prompt_sys_msg=None,
        temperature=1.0,
        context_flag=False,
        context_paragraph_limit=0,
        **kwargs,
    ) -> None:
        super().__init__(key, language)
        self.key_len = len(key.split(","))
        self.openai_client = OpenAI(api_key=next(self.keys), base_url=api_base)
        self.api_base = api_base

        self.prompt_template = (
            prompt_template
            or environ.get(PROMPT_ENV_MAP["user"])
            or self.DEFAULT_PROMPT
        )
        self.prompt_sys_msg = (
            prompt_sys_msg
            or environ.get(
                "OPENAI_API_SYS_MSG",
            )  # XXX: for backward compatibility, deprecate soon
            or environ.get(PROMPT_ENV_MAP["system"])
            or ""
        )
        self.system_content = environ.get("OPENAI_API_SYS_MSG") or ""
        self.deployment_id = None
        self.temperature = temperature
        self.model_list = None
        self.context_flag = context_flag
        self.context_list = []
        self.context_translated_list = []
        if context_paragraph_limit > 0:
            # not set by user, use default
            self.context_paragraph_limit = context_paragraph_limit
        else:
            # set by user, use user's value
            self.context_paragraph_limit = CHATGPT_CONFIG["context_paragraph_limit"]
        self.batch_text_list = []
        self.batch_info_cache = None
        self.result_content_cache = {}
        self._api_lock = Lock()

    def rotate_key(self):
        with self._api_lock:
            self.openai_client.api_key = next(self.keys)

    def rotate_model(self):
        with self._api_lock:
            if self.model_list:
                self.model = next(self.model_list)

    def create_messages(self, text, intermediate_messages=None):
        content = self.prompt_template.format(
            text=text, language=self.language, crlf="\n"
        )

        sys_content = self.system_content or self.prompt_sys_msg.format(crlf="\n")
        messages = [
            {"role": "system", "content": sys_content},
        ]

        if intermediate_messages:
            messages.extend(intermediate_messages)

        messages.append({"role": "user", "content": content})
        return messages

    def create_context_messages(self):
        messages = []
        if self.context_flag:
            messages.append({"role": "user", "content": "\n".join(self.context_list)})
            messages.append(
                {
                    "role": "assistant",
                    "content": "\n".join(self.context_translated_list),
                }
            )
        return messages

    def create_chat_completion(self, text):
        messages = self.create_messages(text, self.create_context_messages())
        completion = self.openai_client.chat.completions.create(
            model=self.model,
            messages=messages,
            temperature=self.temperature,
        )
        return completion

    def get_translation(self, text):
        self.rotate_key()
        self.rotate_model()  # rotate all the model to avoid the limit

        completion = self.create_chat_completion(text)

        # TODO work well or exception finish by length limit
        # Check if content is not None before encoding
        if completion.choices[0].message.content is not None:
            t_text = completion.choices[0].message.content.encode("utf8").decode() or ""
        else:
            t_text = ""

        if self.context_flag:
            self.save_context(text, t_text)

        return t_text

    def save_context(self, text, t_text):
        if self.context_paragraph_limit > 0:
            self.context_list.append(text)
            self.context_translated_list.append(t_text)
            # Remove the oldest context
            if len(self.context_list) > self.context_paragraph_limit:
                self.context_list.pop(0)
                self.context_translated_list.pop(0)

    def translate(self, text, needprint=True):
        start_time = time.time()
        # todo: Determine whether to print according to the cli option
        if needprint:
            print(re.sub("\n{3,}", "\n\n", text))

        attempt_count = 0
        max_attempts = 3
        t_text = ""

        while attempt_count < max_attempts:
            try:
                t_text = self.get_translation(text)
                break
            except RateLimitError as e:
                # todo: better sleep time? why sleep alawys about key_len
                # 1. openai server error or own network interruption, sleep for a fixed time
                # 2. an apikey has no money or reach limit, don`t sleep, just replace it with another apikey
                # 3. all apikey reach limit, then use current sleep
                sleep_time = int(60 / self.key_len)
                print(e, f"will sleep {sleep_time} seconds")
                time.sleep(sleep_time)
                attempt_count += 1
                if attempt_count == max_attempts:
                    print(f"Get {attempt_count} consecutive exceptions")
                    raise
            except Exception as e:
                print(str(e))
                return

        # todo: Determine whether to print according to the cli option
        if needprint:
            print("[bold green]" + re.sub("\n{3,}", "\n\n", t_text) + "[/bold green]")

        time.time() - start_time
        # print(f"translation time: {elapsed_time:.1f}s")

        return t_text

    def translate_and_split_lines(self, text):
        result_str = self.translate(text, False)
        lines = result_str.splitlines()
        lines = [line.strip() for line in lines if line.strip() != ""]
        return lines

    def log_retry(self, state, retry_count, elapsed_time, log_path="log/buglog.txt"):
        if retry_count == 0:
            return
        print(f"retry {state}")
        with open(log_path, "a", encoding="utf-8") as f:
            print(
                f"retry {state}, count = {retry_count}, time = {elapsed_time:.1f}s",
                file=f,
            )

    def log_translation_mismatch(
        self,
        plist_len,
        result_list,
        new_str,
        sep,
        log_path="log/buglog.txt",
    ):
        if len(result_list) == plist_len:
            return
        newlist = new_str.split(sep)
        with open(log_path, "a", encoding="utf-8") as f:
            print(f"problem size: {plist_len - len(result_list)}", file=f)
            for i in range(len(newlist)):
                print(newlist[i], file=f)
                print(file=f)
                if i < len(result_list):
                    print("............................................", file=f)
                    print(result_list[i], file=f)
                    print(file=f)
                print("=============================", file=f)

        print(
            f"bug: {plist_len} paragraphs of text translated into {len(result_list)} paragraphs",
        )
        print("continue")

    def join_lines(self, text):
        lines = text.splitlines()
        new_lines = []
        temp_line = []

        # join
        for line in lines:
            if line.strip():
                temp_line.append(line.strip())
            else:
                if temp_line:
                    new_lines.append(" ".join(temp_line))
                    temp_line = []
                new_lines.append(line)

        if temp_line:
            new_lines.append(" ".join(temp_line))

        text = "\n".join(new_lines)
        # try to fix #372
        if not text:
            return ""

        # del ^M
        text = text.replace("^M", "\r")
        lines = text.splitlines()
        filtered_lines = [line for line in lines if line.strip() != "\r"]
        new_text = "\n".join(filtered_lines)

        return new_text

    def translate_list(self, plist):
        plist_len = len(plist)

        # Create a list of original texts and add clear numbering markers to each paragraph
        formatted_text = ""
        for i, p in enumerate(plist, 1):
            temp_p = copy(p)
            for sup in temp_p.find_all("sup"):
                sup.extract()
            para_text = temp_p.get_text().strip()
            # Using special delimiters and clear numbering
            formatted_text += f"PARAGRAPH {i}:\n{para_text}\n\n"

        print(f"plist len = {plist_len}")

        original_prompt_template = self.prompt_template

        structured_prompt = (
            f"Translate the following {plist_len} paragraphs to {{language}}. "
            f"CRUCIAL INSTRUCTION: Format your response using EXACTLY this structure:\n\n"
            f"TRANSLATION OF PARAGRAPH 1:\n[Your translation of paragraph 1 here]\n\n"
            f"TRANSLATION OF PARAGRAPH 2:\n[Your translation of paragraph 2 here]\n\n"
            f"... and so on for all {plist_len} paragraphs.\n\n"
            f"You MUST provide EXACTLY {plist_len} translated paragraphs. "
            f"Do not merge, split, or rearrange paragraphs. "
            f"Translate each paragraph independently but consistently. "
            f"Keep all numbers and special formatting in your translation. "
            f"Each original paragraph must correspond to exactly one translated paragraph."
        )

        self.prompt_template = structured_prompt + " ```{text}```"

        translated_text = self.translate(formatted_text, False)

        # Extract translations from structured output
        translated_paragraphs = []
        for i in range(1, plist_len + 1):
            pattern = (
                r"TRANSLATION OF PARAGRAPH "
                + str(i)
                + r":(.*?)(?=TRANSLATION OF PARAGRAPH \d+:|\Z)"
            )
            matches = re.findall(pattern, translated_text, re.DOTALL)

            if matches:
                translated_paragraph = matches[0].strip()
                translated_paragraphs.append(translated_paragraph)
            else:
                print(f"Warning: Could not find translation for paragraph {i}")
                loose_pattern = (
                    r"(?:TRANSLATION|PARAGRAPH|PARA).*?"
                    + str(i)
                    + r".*?:(.*?)(?=(?:TRANSLATION|PARAGRAPH|PARA).*?\d+.*?:|\Z)"
                )
                loose_matches = re.findall(loose_pattern, translated_text, re.DOTALL)
                if loose_matches:
                    translated_paragraphs.append(loose_matches[0].strip())
                else:
                    translated_paragraphs.append("")

        self.prompt_template = original_prompt_template

        # If the number of extracted paragraphs is incorrect, try the alternative extraction method.
        if len(translated_paragraphs) != plist_len:
            print(
                f"Warning: Extracted {len(translated_paragraphs)}/{plist_len} paragraphs. Using fallback extraction."
            )

            all_para_pattern = r"(?:TRANSLATION|PARAGRAPH|PARA).*?(\d+).*?:(.*?)(?=(?:TRANSLATION|PARAGRAPH|PARA).*?\d+.*?:|\Z)"
            all_matches = re.findall(all_para_pattern, translated_text, re.DOTALL)

            if all_matches:
                # Create a dictionary to map translation content based on paragraph numbers
                para_dict = {}
                for num_str, content in all_matches:
                    try:
                        num = int(num_str)
                        if 1 <= num <= plist_len:
                            para_dict[num] = content.strip()
                    except ValueError:
                        continue

                # Rebuild the translation list in the original order
                new_translated_paragraphs = []
                for i in range(1, plist_len + 1):
                    if i in para_dict:
                        new_translated_paragraphs.append(para_dict[i])
                    else:
                        new_translated_paragraphs.append("")

                if len(new_translated_paragraphs) == plist_len:
                    translated_paragraphs = new_translated_paragraphs

        if len(translated_paragraphs) < plist_len:
            translated_paragraphs.extend(
                [""] * (plist_len - len(translated_paragraphs))
            )
        elif len(translated_paragraphs) > plist_len:
            translated_paragraphs = translated_paragraphs[:plist_len]

        return translated_paragraphs

    def extract_paragraphs(self, text, paragraph_count):
        """Extract paragraphs from translated text, ensuring paragraph count is preserved."""
        # First try to extract by paragraph numbers (1), (2), etc.
        result_list = []
        for i in range(1, paragraph_count + 1):
            pattern = rf"\({i}\)\s*(.*?)(?=\s*\({i + 1}\)|\Z)"
            match = re.search(pattern, text, re.DOTALL)
            if match:
                result_list.append(match.group(1).strip())

        # If exact pattern matching failed, try another approach
        if len(result_list) != paragraph_count:
            pattern = r"\((\d+)\)\s*(.*?)(?=\s*\(\d+\)|\Z)"
            matches = re.findall(pattern, text, re.DOTALL)
            if matches:
                # Sort by paragraph number
                matches.sort(key=lambda x: int(x[0]))
                result_list = [match[1].strip() for match in matches]

        # Fallback to original line-splitting approach
        if len(result_list) != paragraph_count:
            lines = text.splitlines()
            result_list = [line.strip() for line in lines if line.strip() != ""]

        return result_list

    def set_deployment_id(self, deployment_id):
        self.deployment_id = deployment_id
        self.openai_client = AzureOpenAI(
            api_key=next(self.keys),
            azure_endpoint=self.api_base,
            api_version="2023-07-01-preview",
            azure_deployment=self.deployment_id,
        )

    def set_gpt35_models(self, ollama_model=""):
        if ollama_model:
            self.model_list = cycle([ollama_model])
            return
        # gpt3 all models for save the limit
        if self.deployment_id:
            self.model_list = cycle(["gpt-35-turbo"])
        else:
            my_model_list = [
                i["id"] for i in self.openai_client.models.list().model_dump()["data"]
            ]
            model_list = list(set(my_model_list) & set(GPT35_MODEL_LIST))
            print(f"Using model list {model_list}")
            self.model_list = cycle(model_list)

    def set_gpt4_models(self):
        # for issue #375 azure can not use model list
        if self.deployment_id:
            self.model_list = cycle(["gpt-4"])
        else:
            my_model_list = [
                i["id"] for i in self.openai_client.models.list().model_dump()["data"]
            ]
            model_list = list(set(my_model_list) & set(GPT4_MODEL_LIST))
            print(f"Using model list {model_list}")
            self.model_list = cycle(model_list)

    def set_gpt4omini_models(self):
        # for issue #375 azure can not use model list
        if self.deployment_id:
            self.model_list = cycle(["gpt-4o-mini"])
        else:
            my_model_list = [
                i["id"] for i in self.openai_client.models.list().model_dump()["data"]
            ]
            model_list = list(set(my_model_list) & set(GPT4oMINI_MODEL_LIST))
            print(f"Using model list {model_list}")
            self.model_list = cycle(model_list)

    def set_gpt4o_models(self):
        # for issue #375 azure can not use model list
        if self.deployment_id:
            self.model_list = cycle(["gpt-4o"])
        else:
            my_model_list = [
                i["id"] for i in self.openai_client.models.list().model_dump()["data"]
            ]
            model_list = list(set(my_model_list) & set(GPT4o_MODEL_LIST))
            print(f"Using model list {model_list}")
            self.model_list = cycle(model_list)

    def set_gpt5mini_models(self):
        # for issue #375 azure can not use model list
        if self.deployment_id:
            self.model_list = cycle(["gpt-5-mini"])
        else:
            my_model_list = [
                i["id"] for i in self.openai_client.models.list().model_dump()["data"]
            ]
            model_list = list(set(my_model_list) & set(GPT5MINI_MODEL_LIST))
            print(f"Using model list {model_list}")
            self.model_list = cycle(model_list)

    def set_o1preview_models(self):
        # for issue #375 azure can not use model list
        if self.deployment_id:
            self.model_list = cycle(["o1-preview"])
        else:
            my_model_list = [
                i["id"] for i in self.openai_client.models.list().model_dump()["data"]
            ]
            model_list = list(set(my_model_list) & set(O1PREVIEW_MODEL_LIST))
            print(f"Using model list {model_list}")
            self.model_list = cycle(model_list)

    def set_o1_models(self):
        # for issue #375 azure can not use model list
        if self.deployment_id:
            self.model_list = cycle(["o1"])
        else:
            my_model_list = [
                i["id"] for i in self.openai_client.models.list().model_dump()["data"]
            ]
            model_list = list(set(my_model_list) & set(O1_MODEL_LIST))
            print(f"Using model list {model_list}")
            self.model_list = cycle(model_list)

    def set_o1mini_models(self):
        # for issue #375 azure can not use model list
        if self.deployment_id:
            self.model_list = cycle(["o1-mini"])
        else:
            my_model_list = [
                i["id"] for i in self.openai_client.models.list().model_dump()["data"]
            ]
            model_list = list(set(my_model_list) & set(O1MINI_MODEL_LIST))
            print(f"Using model list {model_list}")
            self.model_list = cycle(model_list)

    def set_o3mini_models(self):
        # for issue #375 azure can not use model list
        if self.deployment_id:
            self.model_list = cycle(["o3-mini"])
        else:
            my_model_list = [
                i["id"] for i in self.openai_client.models.list().model_dump()["data"]
            ]
            model_list = list(set(my_model_list) & set(O3MINI_MODEL_LIST))
            print(f"Using model list {model_list}")
            self.model_list = cycle(model_list)

    def set_model_list(self, model_list):
        model_list = list(set(model_list))
        print(f"Using model list {model_list}")
        self.model_list = cycle(model_list)

    def batch_init(self, book_name):
        self.book_name = self.sanitize_book_name(book_name)

    def add_to_batch_translate_queue(self, book_index, text):
        self.batch_text_list.append({"book_index": book_index, "text": text})

    def sanitize_book_name(self, book_name):
        # Replace any characters that are not alphanumeric, underscore, hyphen, or dot with an underscore
        sanitized_book_name = re.sub(r"[^\w\-_\.]", "_", book_name)
        # Remove leading and trailing underscores and dots
        sanitized_book_name = sanitized_book_name.strip("._")
        return sanitized_book_name

    def batch_metadata_file_path(self):
        return os.path.join(os.getcwd(), "batch_files", f"{self.book_name}_info.json")

    def batch_dir(self):
        return os.path.join(os.getcwd(), "batch_files", self.book_name)

    def custom_id(self, book_index):
        return f"{self.book_name}-{book_index}"

    def is_completed_batch(self):
        batch_metadata_file_path = self.batch_metadata_file_path()

        if not os.path.exists(batch_metadata_file_path):
            print("Batch result file does not exist")
            raise Exception("Batch result file does not exist")

        with open(batch_metadata_file_path, "r", encoding="utf-8") as f:
            batch_info = json.load(f)

        for batch_file in batch_info["batch_files"]:
            batch_status = self.check_batch_status(batch_file["batch_id"])
            if batch_status.status != "completed":
                return False

        return True

    def batch_translate(self, book_index):
        if self.batch_info_cache is None:
            batch_metadata_file_path = self.batch_metadata_file_path()
            with open(batch_metadata_file_path, "r", encoding="utf-8") as f:
                self.batch_info_cache = json.load(f)

        batch_info = self.batch_info_cache
        target_batch = None
        for batch in batch_info["batch_files"]:
            if batch["start_index"] <= book_index < batch["end_index"]:
                target_batch = batch
                break

        if not target_batch:
            raise ValueError(f"No batch found for book_index {book_index}")

        if target_batch["batch_id"] in self.result_content_cache:
            result_content = self.result_content_cache[target_batch["batch_id"]]
        else:
            batch_status = self.check_batch_status(target_batch["batch_id"])
            if batch_status.output_file_id is None:
                raise ValueError(f"Batch {target_batch['batch_id']} is not completed")
            result_content = self.get_batch_result(batch_status.output_file_id)
            self.result_content_cache[target_batch["batch_id"]] = result_content

        result_lines = result_content.text.split("\n")
        custom_id = self.custom_id(book_index)
        for line in result_lines:
            if line.strip():
                result = json.loads(line)
                if result["custom_id"] == custom_id:
                    return result["response"]["body"]["choices"][0]["message"][
                        "content"
                    ]

        raise ValueError(f"No result found for custom_id {custom_id}")

    def create_batch_context_messages(self, index):
        messages = []
        if self.context_flag:
            if index % CHATGPT_CONFIG[
                "batch_context_update_interval"
            ] == 0 or not hasattr(self, "cached_context_messages"):
                context_messages = []
                for i in range(index - 1, -1, -1):
                    item = self.batch_text_list[i]
                    if len(item["text"].split()) >= 100:
                        context_messages.append(item["text"])
                        if len(context_messages) == self.context_paragraph_limit:
                            break

                if len(context_messages) == self.context_paragraph_limit:
                    print("Creating cached context messages")
                    self.cached_context_messages = [
                        {"role": "user", "content": "\n".join(context_messages)},
                        {
                            "role": "assistant",
                            "content": self.get_translation(
                                "\n".join(context_messages)
                            ),
                        },
                    ]

            if hasattr(self, "cached_context_messages"):
                messages.extend(self.cached_context_messages)

        return messages

    def make_batch_request(self, book_index, text):
        messages = self.create_messages(
            text, self.create_batch_context_messages(book_index)
        )
        return {
            "custom_id": self.custom_id(book_index),
            "method": "POST",
            "url": "/v1/chat/completions",
            "body": {
                # model shuould not be rotate
                "model": self.batch_model,
                "messages": messages,
                "temperature": self.temperature,
            },
        }

    def create_batch_files(self, dest_file_path):
        file_paths = []
        # max request 50,000 and max size 100MB
        lines_per_file = 40000
        current_file = 0

        for i in range(0, len(self.batch_text_list), lines_per_file):
            current_file += 1
            file_path = os.path.join(dest_file_path, f"{current_file}.jsonl")
            start_index = i
            end_index = i + lines_per_file

            # TODO: Split the file if it exceeds 100MB
            with open(file_path, "w", encoding="utf-8") as f:
                for text in self.batch_text_list[i : i + lines_per_file]:
                    batch_req = self.make_batch_request(
                        text["book_index"], text["text"]
                    )
                    json.dump(batch_req, f, ensure_ascii=False)
                    f.write("\n")
            file_paths.append(
                {
                    "file_path": file_path,
                    "start_index": start_index,
                    "end_index": end_index,
                }
            )

        return file_paths

    def batch(self):
        self.rotate_model()
        self.batch_model = self.model
        # current working directory
        batch_dir = self.batch_dir()
        batch_metadata_file_path = self.batch_metadata_file_path()
        # cleanup batch dir and result file
        if os.path.exists(batch_dir):
            shutil.rmtree(batch_dir)
        if os.path.exists(batch_metadata_file_path):
            os.remove(batch_metadata_file_path)
        os.makedirs(batch_dir, exist_ok=True)
        # batch execute
        batch_files = self.create_batch_files(batch_dir)
        batch_info = []
        for batch_file in batch_files:
            file_id = self.upload_batch_file(batch_file["file_path"])
            batch = self.batch_execute(file_id)
            batch_info.append(
                self.create_batch_info(
                    file_id, batch, batch_file["start_index"], batch_file["end_index"]
                )
            )
        # save batch info
        batch_info_json = {
            "book_id": self.book_name,
            "batch_date": time.strftime("%Y-%m-%d %H:%M:%S"),
            "batch_files": batch_info,
        }
        with open(batch_metadata_file_path, "w", encoding="utf-8") as f:
            json.dump(batch_info_json, f, ensure_ascii=False, indent=2)

    def create_batch_info(self, file_id, batch, start_index, end_index):
        return {
            "input_file_id": file_id,
            "batch_id": batch.id,
            "start_index": start_index,
            "end_index": end_index,
            "prefix": self.book_name,
        }

    def upload_batch_file(self, file_path):
        batch_input_file = self.openai_client.files.create(
            file=open(file_path, "rb"), purpose="batch"
        )
        return batch_input_file.id

    def batch_execute(self, file_id):
        current_time = time.strftime("%Y-%m-%d %H:%M:%S")
        res = self.openai_client.batches.create(
            input_file_id=file_id,
            endpoint="/v1/chat/completions",
            completion_window="24h",
            metadata={
                "description": f"Batch job for {self.book_name} at {current_time}"
            },
        )
        if res.errors:
            print(res.errors)
            raise Exception(f"Batch execution failed: {res.errors}")
        return res

    def check_

Download .txt

gitextract_6fkzntyz/

├── .dockerignore
├── .github/
│   └── workflows/
│       ├── docs.yaml
│       ├── make_test_ebook.yaml
│       └── release.yaml
├── .gitignore
├── Dockerfile
├── LICENSE
├── Makefile
├── README-CN.md
├── README.md
├── book_maker/
│   ├── __init__.py
│   ├── __main__.py
│   ├── cli.py
│   ├── config.py
│   ├── loader/
│   │   ├── __init__.py
│   │   ├── base_loader.py
│   │   ├── epub_loader.py
│   │   ├── helper.py
│   │   ├── md_loader.py
│   │   ├── pdf_loader.py
│   │   ├── srt_loader.py
│   │   └── txt_loader.py
│   ├── obok.py
│   ├── translator/
│   │   ├── __init__.py
│   │   ├── base_translator.py
│   │   ├── caiyun_translator.py
│   │   ├── chatgptapi_translator.py
│   │   ├── claude_translator.py
│   │   ├── custom_api_translator.py
│   │   ├── deepl_free_translator.py
│   │   ├── deepl_translator.py
│   │   ├── gemini_translator.py
│   │   ├── google_translator.py
│   │   ├── groq_translator.py
│   │   ├── litellm_translator.py
│   │   ├── qwen_translator.py
│   │   ├── tencent_transmart_translator.py
│   │   └── xai_translator.py
│   └── utils.py
├── disclaimer.md
├── docs/
│   ├── book_source.md
│   ├── cmd.md
│   ├── disclaimer.md
│   ├── env_settings.md
│   ├── index.md
│   ├── installation.md
│   ├── model_lang.md
│   ├── prompt.md
│   └── quickstart.md
├── make_book.py
├── mkdocs.yml
├── prompt_md.json
├── prompt_md.prompt.md
├── prompt_template_sample.json
├── pyproject.toml
├── tests/
│   ├── test_epub_metadata.py
│   ├── test_integration.py
│   ├── test_pdf_cli.py
│   └── test_pdf_loader.py
└── typos.toml

Download .txt

SYMBOL INDEX (258 symbols across 28 files)

FILE: book_maker/cli.py
  function parse_prompt_arg (line 11) | def parse_prompt_arg(prompt_arg):
  function main (line 105) | def main():

FILE: book_maker/loader/base_loader.py
  class BaseBookLoader (line 4) | class BaseBookLoader(ABC):
    method _is_special_text (line 6) | def _is_special_text(text):
    method _make_new_book (line 10) | def _make_new_book(self, book):
    method make_bilingual_book (line 14) | def make_bilingual_book(self):
    method load_state (line 18) | def load_state(self):
    method _save_temp_book (line 22) | def _save_temp_book(self):
    method _save_progress (line 26) | def _save_progress(self):

FILE: book_maker/loader/epub_loader.py
  class EPUBBookLoader (line 25) | class EPUBBookLoader(BaseBookLoader):
    method __init__ (line 26) | def __init__(
    method _is_special_text (line 131) | def _is_special_text(text):
    method _make_new_book (line 139) | def _make_new_book(self, book):
    method _fix_toc_uids (line 186) | def _fix_toc_uids(self, toc, counter=None):
    method _extract_paragraph (line 212) | def _extract_paragraph(self, p):
    method _process_paragraph (line 221) | def _process_paragraph(self, p, new_p, index, p_to_save_len, thread_sa...
    method _process_combined_paragraph (line 260) | def _process_combined_paragraph(
    method translate_paragraphs_acc (line 306) | def translate_paragraphs_acc(self, p_list, send_num):
    method get_item (line 346) | def get_item(self, book, name):
    method find_items_containing_string (line 351) | def find_items_containing_string(self, book, search_string):
    method retranslate_book (line 362) | def retranslate_book(self, index, p_to_save_len, pbar, trans_taglist, ...
    method has_nest_child (line 456) | def has_nest_child(self, element, trans_taglist):
    method filter_nest_list (line 465) | def filter_nest_list(self, p_list, trans_taglist):
    method process_item (line 469) | def process_item(
    method set_parallel_workers (line 572) | def set_parallel_workers(self, workers):
    method _get_next_translation_index (line 587) | def _get_next_translation_index(self):
    method _process_chapter_parallel (line 594) | def _process_chapter_parallel(self, chapter_data):
    method _create_chapter_translator (line 679) | def _create_chapter_translator(self):
    method _translate_with_chapter_context (line 684) | def _translate_with_chapter_context(
    method _translate_paragraphs_acc_parallel (line 714) | def _translate_paragraphs_acc_parallel(
    method batch_init_then_wait (line 841) | def batch_init_then_wait(self):
    method make_bilingual_book (line 853) | def make_bilingual_book(self):
    method load_state (line 1004) | def load_state(self):
    method _save_temp_book (line 1011) | def _save_temp_book(self):
    method _save_progress (line 1056) | def _save_progress(self):

FILE: book_maker/loader/helper.py
  class EPUBBookLoaderHelper (line 10) | class EPUBBookLoaderHelper:
    method __init__ (line 11) | def __init__(
    method insert_trans (line 19) | def insert_trans(self, p, text, translation_style="", single_translate...
    method translate_with_backoff (line 42) | def translate_with_backoff(self, text, context_flag=False):
    method deal_new (line 45) | def deal_new(self, p, wait_p_list, single_translate=False):
    method deal_old (line 54) | def deal_old(self, wait_p_list, single_translate=False, context_flag=F...
  function is_text_link (line 76) | def is_text_link(text):
  function is_text_tail_link (line 80) | def is_text_tail_link(text, num=80):
  function shorter_result_link (line 86) | def shorter_result_link(text, num=20):
  function is_text_source (line 95) | def is_text_source(text):
  function is_text_list (line 99) | def is_text_list(text, num=80):
  function is_text_figure (line 104) | def is_text_figure(text, num=80):
  function is_text_digit_and_space (line 109) | def is_text_digit_and_space(s):
  function is_text_isbn (line 116) | def is_text_isbn(s):
  function not_trans (line 121) | def not_trans(s):

FILE: book_maker/loader/md_loader.py
  class MarkdownBookLoader (line 9) | class MarkdownBookLoader(BaseBookLoader):
    method __init__ (line 10) | def __init__(
    method process_markdown_content (line 59) | def process_markdown_content(self):
    method _is_special_text (line 82) | def _is_special_text(text):
    method _make_new_book (line 85) | def _make_new_book(self, book):
    method make_bilingual_book (line 88) | def make_bilingual_book(self):
    method _save_temp_book (line 138) | def _save_temp_book(self):
    method _save_progress (line 159) | def _save_progress(self):
    method load_state (line 166) | def load_state(self):
    method save_file (line 173) | def save_file(self, book_path, content):

FILE: book_maker/loader/pdf_loader.py
  class PDFBookLoader (line 13) | class PDFBookLoader(BaseBookLoader):
    method __init__ (line 14) | def __init__(
    method _make_new_book (line 70) | def _make_new_book(self, book):
    method _try_create_epub (line 73) | def _try_create_epub(self):
    method make_bilingual_book (line 157) | def make_bilingual_book(self):
    method _save_temp_book (line 206) | def _save_temp_book(self):
    method _save_progress (line 225) | def _save_progress(self):
    method load_state (line 232) | def load_state(self):
    method save_file (line 239) | def save_file(self, book_path, content):

FILE: book_maker/loader/srt_loader.py
  class SRTBookLoader (line 14) | class SRTBookLoader(BaseBookLoader):
    method __init__ (line 15) | def __init__(
    method _make_new_book (line 60) | def _make_new_book(self, book):
    method _parse_srt (line 63) | def _parse_srt(self, srt_text):
    method _get_block_text (line 84) | def _get_block_text(self, block):
    method _get_block_except_text (line 87) | def _get_block_except_text(self, block):
    method _concat_blocks (line 90) | def _concat_blocks(self, sliced_text: str, text: str):
    method _get_block_translate (line 93) | def _get_block_translate(self, block):
    method _get_block_from (line 96) | def _get_block_from(self, text):
    method _get_blocks_from (line 107) | def _get_blocks_from(self, translate: str):
    method _check_blocks (line 118) | def _check_blocks(self, translate_blocks, origin_blocks):
    method _get_sliced_list (line 141) | def _get_sliced_list(self):
    method make_bilingual_book (line 161) | def make_bilingual_book(self):
    method _save_temp_book (line 260) | def _save_temp_book(self):
    method _save_progress (line 275) | def _save_progress(self):
    method load_state (line 282) | def load_state(self):
    method save_file (line 294) | def save_file(self, book_path, content):

FILE: book_maker/loader/txt_loader.py
  class TXTBookLoader (line 9) | class TXTBookLoader(BaseBookLoader):
    method __init__ (line 10) | def __init__(
    method _is_special_text (line 59) | def _is_special_text(text):
    method _make_new_book (line 62) | def _make_new_book(self, book):
    method make_bilingual_book (line 65) | def make_bilingual_book(self):
    method _save_temp_book (line 105) | def _save_temp_book(self):
    method _save_progress (line 126) | def _save_progress(self):
    method load_state (line 133) | def load_state(self):
    method save_file (line 140) | def save_file(self, book_path, content):

FILE: book_maker/obok.py
  class ENCRYPTIONError (line 196) | class ENCRYPTIONError(Exception):
  function _load_crypto_libcrypto (line 200) | def _load_crypto_libcrypto():
  function _load_crypto_pycrypto (line 263) | def _load_crypto_pycrypto():
  function _load_crypto (line 276) | def _load_crypto():
  class SafeUnbuffered (line 292) | class SafeUnbuffered:
    method __init__ (line 293) | def __init__(self, stream) -> None:
    method write (line 299) | def write(self, data):
    method __getattr__ (line 305) | def __getattr__(self, attr):
  class KoboLibrary (line 309) | class KoboLibrary:
    method __init__ (line 316) | def __init__(self, serials=None, device_path=None, desktopkobodir="") ...
    method close (line 469) | def close(self):
    method userkeys (line 477) | def userkeys(self):
    method books (line 488) | def books(self):
    method __bookfile (line 534) | def __bookfile(self, volumeid):
    method __getmacaddrs (line 538) | def __getmacaddrs(self):
    method __getuserids (line 594) | def __getuserids(self):
    method __getuserkeys (line 605) | def __getuserkeys(self, macaddr):
  class KoboBook (line 618) | class KoboBook:
    method __init__ (line 630) | def __init__(
    method encryptedfiles (line 651) | def encryptedfiles(self):
    method has_drm (line 699) | def has_drm(self):
  class KoboFile (line 703) | class KoboFile:
    method __init__ (line 711) | def __init__(self, filename, mimetype, key) -> None:
    method decrypt (line 716) | def decrypt(self, userkey, contents):
    method check (line 728) | def check(self, contents):
    method __removeaespadding (line 774) | def __removeaespadding(self, contents):
  function decrypt_book (line 793) | def decrypt_book(book, lib):
  function cli_main (line 827) | def cli_main(devicedir):

FILE: book_maker/translator/base_translator.py
  class Base (line 5) | class Base(ABC):
    method __init__ (line 6) | def __init__(self, key, language) -> None:
    method rotate_key (line 11) | def rotate_key(self):
    method translate (line 15) | def translate(self, text):
    method set_deployment_id (line 18) | def set_deployment_id(self, deployment_id):

FILE: book_maker/translator/caiyun_translator.py
  class Caiyun (line 11) | class Caiyun(Base):
    method __init__ (line 16) | def __init__(self, key, language, **kwargs) -> None:
    method rotate_key (line 30) | def rotate_key(self):
    method translate (line 33) | def translate(self, text):

FILE: book_maker/translator/chatgptapi_translator.py
  class ChatGPTAPI (line 72) | class ChatGPTAPI(Base):
    method __init__ (line 75) | def __init__(
    method rotate_key (line 123) | def rotate_key(self):
    method rotate_model (line 127) | def rotate_model(self):
    method create_messages (line 132) | def create_messages(self, text, intermediate_messages=None):
    method create_context_messages (line 148) | def create_context_messages(self):
    method create_chat_completion (line 160) | def create_chat_completion(self, text):
    method get_translation (line 169) | def get_translation(self, text):
    method save_context (line 187) | def save_context(self, text, t_text):
    method translate (line 196) | def translate(self, text, needprint=True):
    method translate_and_split_lines (line 235) | def translate_and_split_lines(self, text):
    method log_retry (line 241) | def log_retry(self, state, retry_count, elapsed_time, log_path="log/bu...
    method log_translation_mismatch (line 251) | def log_translation_mismatch(
    method join_lines (line 278) | def join_lines(self, text):
    method translate_list (line 309) | def translate_list(self, plist):
    method extract_paragraphs (line 411) | def extract_paragraphs(self, text, paragraph_count):
    method set_deployment_id (line 437) | def set_deployment_id(self, deployment_id):
    method set_gpt35_models (line 446) | def set_gpt35_models(self, ollama_model=""):
    method set_gpt4_models (line 461) | def set_gpt4_models(self):
    method set_gpt4omini_models (line 473) | def set_gpt4omini_models(self):
    method set_gpt4o_models (line 485) | def set_gpt4o_models(self):
    method set_gpt5mini_models (line 497) | def set_gpt5mini_models(self):
    method set_o1preview_models (line 509) | def set_o1preview_models(self):
    method set_o1_models (line 521) | def set_o1_models(self):
    method set_o1mini_models (line 533) | def set_o1mini_models(self):
    method set_o3mini_models (line 545) | def set_o3mini_models(self):
    method set_model_list (line 557) | def set_model_list(self, model_list):
    method batch_init (line 562) | def batch_init(self, book_name):
    method add_to_batch_translate_queue (line 565) | def add_to_batch_translate_queue(self, book_index, text):
    method sanitize_book_name (line 568) | def sanitize_book_name(self, book_name):
    method batch_metadata_file_path (line 575) | def batch_metadata_file_path(self):
    method batch_dir (line 578) | def batch_dir(self):
    method custom_id (line 581) | def custom_id(self, book_index):
    method is_completed_batch (line 584) | def is_completed_batch(self):
    method batch_translate (line 601) | def batch_translate(self, book_index):
    method create_batch_context_messages (line 638) | def create_batch_context_messages(self, index):
    method make_batch_request (line 669) | def make_batch_request(self, book_index, text):
    method create_batch_files (line 685) | def create_batch_files(self, dest_file_path):
    method batch (line 715) | def batch(self):
    method create_batch_info (line 747) | def create_batch_info(self, file_id, batch, start_index, end_index):
    method upload_batch_file (line 756) | def upload_batch_file(self, file_path):
    method batch_execute (line 762) | def batch_execute(self, file_id):
    method check_batch_status (line 777) | def check_batch_status(self, batch_id):
    method get_batch_result (line 780) | def get_batch_result(self, output_file_id):

FILE: book_maker/translator/claude_translator.py
  class Claude (line 8) | class Claude(Base):
    method __init__ (line 9) | def __init__(
    method rotate_key (line 37) | def rotate_key(self):
    method set_claude_model (line 40) | def set_claude_model(self, model_name):
    method create_messages (line 43) | def create_messages(self, text, intermediate_messages=None):
    method create_context_messages (line 60) | def create_context_messages(self):
    method save_context (line 77) | def save_context(self, text, t_text):
    method translate (line 90) | def translate(self, text):

FILE: book_maker/translator/custom_api_translator.py
  class CustomAPI (line 9) | class CustomAPI(Base):
    method __init__ (line 14) | def __init__(self, custom_api, language, **kwargs) -> None:
    method rotate_key (line 19) | def rotate_key(self):
    method translate (line 22) | def translate(self, text):

FILE: book_maker/translator/deepl_free_translator.py
  class DeepLFree (line 12) | class DeepLFree(Base):
    method __init__ (line 17) | def __init__(self, key, language, **kwargs) -> None:
    method rotate_key (line 57) | def rotate_key(self):
    method translate (line 60) | def translate(self, text):

FILE: book_maker/translator/deepl_translator.py
  class DeepL (line 13) | class DeepL(Base):
    method __init__ (line 18) | def __init__(self, key, language, **kwargs) -> None:
    method rotate_key (line 63) | def rotate_key(self):
    method translate (line 66) | def translate(self, text):

FILE: book_maker/translator/gemini_translator.py
  class Gemini (line 51) | class Gemini(Base):
    method __init__ (line 58) | def __init__(
    method create_convo (line 84) | def create_convo(self):
    method rotate_model (line 94) | def rotate_model(self):
    method rotate_key (line 99) | def rotate_key(self):
    method translate (line 103) | def translate(self, text):
    method set_interval (line 178) | def set_interval(self, interval):
    method set_geminipro_models (line 181) | def set_geminipro_models(self):
    method set_geminiflash_models (line 184) | def set_geminiflash_models(self):
    method set_models (line 187) | def set_models(self, allowed_models):
    method set_model_list (line 199) | def set_model_list(self, model_list):

FILE: book_maker/translator/google_translator.py
  class Google (line 9) | class Google(Base):
    method __init__ (line 14) | def __init__(self, key, language, **kwargs) -> None:
    method rotate_key (line 32) | def rotate_key(self):
    method translate (line 35) | def translate(self, text):
    method _retry_translate (line 51) | def _retry_translate(self, text, timeout=3):

FILE: book_maker/translator/groq_translator.py
  class GroqClient (line 14) | class GroqClient(ChatGPTAPI):
    method rotate_model (line 15) | def rotate_model(self):
    method create_chat_completion (line 22) | def create_chat_completion(self, text):

FILE: book_maker/translator/litellm_translator.py
  class liteLLM (line 13) | class liteLLM(ChatGPTAPI):
    method create_chat_completion (line 14) | def create_chat_completion(self, text):

FILE: book_maker/translator/qwen_translator.py
  class QwenTranslator (line 9) | class QwenTranslator(Base):
    method __init__ (line 63) | def __init__(
    method rotate_key (line 109) | def rotate_key(self):
    method _map_language (line 116) | def _map_language(self, language):
    method _create_translation_options (line 132) | def _create_translation_options(self):
    method save_context (line 154) | def save_context(self, text, t_text):
    method translate (line 167) | def translate(self, text, needprint=True):
    method set_terminology (line 229) | def set_terminology(self, terminology):
    method set_domain_hint (line 239) | def set_domain_hint(self, domain_hint):
    method set_qwen_model (line 249) | def set_qwen_model(self, model_name):

FILE: book_maker/translator/tencent_transmart_translator.py
  class TencentTranSmart (line 10) | class TencentTranSmart(Base):
    method __init__ (line 15) | def __init__(self, key, language, **kwargs) -> None:
    method rotate_key (line 31) | def rotate_key(self):
    method translate (line 34) | def translate(self, text):
    method text_analysis (line 59) | def text_analysis(self, text):
    method get_client_key (line 83) | def get_client_key(self):

FILE: book_maker/translator/xai_translator.py
  class XAIClient (line 9) | class XAIClient(ChatGPTAPI):
    method __init__ (line 10) | def __init__(self, key, language, api_base=None, **kwargs) -> None:
    method rotate_model (line 16) | def rotate_model(self):

FILE: book_maker/utils.py
  function prompt_config_to_kwargs (line 126) | def prompt_config_to_kwargs(prompt_config):
  function num_tokens_from_text (line 135) | def num_tokens_from_text(text, model="gpt-3.5-turbo-0301"):

FILE: tests/test_epub_metadata.py
  function test_epub_loader_handles_custom_metadata (line 7) | def test_epub_loader_handles_custom_metadata(tmp_path):

FILE: tests/test_integration.py
  function test_book_dir (line 11) | def test_book_dir() -> str:
  function test_google_translate_epub (line 17) | def test_google_translate_epub(test_book_dir, tmpdir):
  function test_deepl_free_translate_epub (line 43) | def test_deepl_free_translate_epub(test_book_dir, tmpdir):
  function test_google_translate_epub_cli (line 69) | def test_google_translate_epub_cli():
  function test_google_translate_txt (line 73) | def test_google_translate_txt(test_book_dir, tmpdir):
  function test_google_translate_txt_batch_size (line 98) | def test_google_translate_txt_batch_size(test_book_dir, tmpdir):
  function test_caiyun_translate_txt (line 130) | def test_caiyun_translate_txt(test_book_dir, tmpdir):
  function test_deepl_translate_txt (line 161) | def test_deepl_translate_txt(test_book_dir, tmpdir):
  function test_deepl_translate_srt (line 192) | def test_deepl_translate_srt(test_book_dir, tmpdir):
  function test_openai_translate_epub_zh_hans (line 226) | def test_openai_translate_epub_zh_hans(test_book_dir, tmpdir):
  function test_openai_translate_epub_ja_prompt_txt (line 254) | def test_openai_translate_epub_ja_prompt_txt(test_book_dir, tmpdir):
  function test_openai_translate_epub_ja_prompt_json (line 286) | def test_openai_translate_epub_ja_prompt_json(test_book_dir, tmpdir):
  function test_openai_translate_srt (line 316) | def test_openai_translate_srt(test_book_dir, tmpdir):

FILE: tests/test_pdf_cli.py
  function test_pdf_cli_creates_txt_and_optional_epub (line 10) | def test_pdf_cli_creates_txt_and_optional_epub(tmp_path):

FILE: tests/test_pdf_loader.py
  class DummyModel (line 11) | class DummyModel:
    method __init__ (line 12) | def __init__(
    method translate (line 23) | def translate(self, text):
    method translate_list (line 26) | def translate_list(self, texts):
  function test_pdf_loader_extracts_and_translates (line 30) | def test_pdf_loader_extracts_and_translates(tmp_path):

Download .json

Condensed preview — 60 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (287K chars).

[
  {
    "path": ".dockerignore",
    "chars": 100,
    "preview": "Dockerfile*\ndocker-compose*\nLICENSE\ntest_books\nREADME*\n.dockerignore\n.git\n.github\n.gitignore\n.vscode"
  },
  {
    "path": ".github/workflows/docs.yaml",
    "chars": 318,
    "preview": "name: Publish docs\non:\n  push:\n    branches:\n      - main\n\njobs:\n  deploy:\n    runs-on: ubuntu-latest\n    steps:\n      -"
  },
  {
    "path": ".github/workflows/make_test_ebook.yaml",
    "chars": 2845,
    "preview": "name: CI\n\non:\n  push:\n    branches: [ main ]\n  pull_request:\n    branches: [ main ]\n  workflow_dispatch:\n  \nenv:\n  ACTIO"
  },
  {
    "path": ".github/workflows/release.yaml",
    "chars": 655,
    "preview": "name: Release and Build Docker Image\n\npermissions:\n  contents: write\n\non:\n  push:\n    tags:\n      - \"*\"\n\njobs:\n  release"
  },
  {
    "path": ".gitignore",
    "chars": 2012,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
  },
  {
    "path": "Dockerfile",
    "chars": 173,
    "preview": "FROM python:3.10-slim\n\nRUN apt-get update\n\nWORKDIR /app\n\nCOPY requirements.txt .\n\nRUN pip install -r /app/requirements.t"
  },
  {
    "path": "LICENSE",
    "chars": 1063,
    "preview": "MIT License\n\nCopyright (c) 2023 yihong\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof "
  },
  {
    "path": "Makefile",
    "chars": 192,
    "preview": "SHELL := /bin/bash\n\nfmt:\n\t@echo \"Running formatter ...\"\n\tvenv/bin/black .\n\n.PHONY:tests\ntests:\n\t@echo \"Running tests ..."
  },
  {
    "path": "README-CN.md",
    "chars": 10389,
    "preview": "# bilingual_book_maker\n\nbilingual_book_maker 是一个 AI 翻译工具，使用 ChatGPT 帮助用户制作多语言版本的 epub/txt/srt 文件和图书。该工具仅适用于翻译进入公共版权领域的 e"
  },
  {
    "path": "README.md",
    "chars": 19375,
    "preview": "**[中文](./README-CN.md) | English**\n[![litellm](https://img.shields.io/badge/%20%F0%9F%9A%85%20liteLLM-OpenAI%7CAzure%7CA"
  },
  {
    "path": "book_maker/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "book_maker/__main__.py",
    "chars": 60,
    "preview": "from cli import main\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "book_maker/cli.py",
    "chars": 22289,
    "preview": "import argparse\nimport json\nimport os\nfrom os import environ as env\n\nfrom book_maker.loader import BOOK_LOADER_DICT\nfrom"
  },
  {
    "path": "book_maker/config.py",
    "chars": 165,
    "preview": "config = {\n    \"translator\": {\n        \"chatgptapi\": {\n            \"context_paragraph_limit\": 3,\n            \"batch_cont"
  },
  {
    "path": "book_maker/loader/__init__.py",
    "chars": 466,
    "preview": "from book_maker.loader.epub_loader import EPUBBookLoader\nfrom book_maker.loader.txt_loader import TXTBookLoader\nfrom boo"
  },
  {
    "path": "book_maker/loader/base_loader.py",
    "chars": 491,
    "preview": "from abc import ABC, abstractmethod\n\n\nclass BaseBookLoader(ABC):\n    @staticmethod\n    def _is_special_text(text):\n     "
  },
  {
    "path": "book_maker/loader/epub_loader.py",
    "chars": 39260,
    "preview": "import os\nimport pickle\nimport string\nimport sys\nimport time\nfrom concurrent.futures import ThreadPoolExecutor, as_compl"
  },
  {
    "path": "book_maker/loader/helper.py",
    "chars": 3688,
    "preview": "import re\nimport backoff\nimport logging\nfrom copy import copy\n\nlogging.basicConfig(level=logging.WARNING)\nlogger = loggi"
  },
  {
    "path": "book_maker/loader/md_loader.py",
    "chars": 6077,
    "preview": "import sys\nfrom pathlib import Path\n\nfrom book_maker.utils import prompt_config_to_kwargs\n\nfrom .base_loader import Base"
  },
  {
    "path": "book_maker/loader/pdf_loader.py",
    "chars": 8341,
    "preview": "import sys\nfrom pathlib import Path\n\nfrom book_maker.utils import prompt_config_to_kwargs\n\nfrom .base_loader import Base"
  },
  {
    "path": "book_maker/loader/srt_loader.py",
    "chars": 10526,
    "preview": "\"\"\"\ninspired by: https://github.com/jesselau76/srt-gpt-translator, MIT License\n\"\"\"\n\nimport re\nimport sys\nfrom pathlib im"
  },
  {
    "path": "book_maker/loader/txt_loader.py",
    "chars": 4787,
    "preview": "import sys\nfrom pathlib import Path\n\nfrom book_maker.utils import prompt_config_to_kwargs\n\nfrom .base_loader import Base"
  },
  {
    "path": "book_maker/obok.py",
    "chars": 30289,
    "preview": "# The original code comes from:\n# https://github.com/apprenticeharper/DeDRM_tools\n\n# Version 4.1.2 March 2023\n# Update l"
  },
  {
    "path": "book_maker/translator/__init__.py",
    "chars": 1742,
    "preview": "from book_maker.translator.caiyun_translator import Caiyun\nfrom book_maker.translator.chatgptapi_translator import ChatG"
  },
  {
    "path": "book_maker/translator/base_translator.py",
    "chars": 391,
    "preview": "import itertools\nfrom abc import ABC, abstractmethod\n\n\nclass Base(ABC):\n    def __init__(self, key, language) -> None:\n "
  },
  {
    "path": "book_maker/translator/caiyun_translator.py",
    "chars": 2115,
    "preview": "import json\nimport re\nimport time\n\nimport requests\nfrom rich import print\n\nfrom .base_translator import Base\n\n\nclass Cai"
  },
  {
    "path": "book_maker/translator/chatgptapi_translator.py",
    "chars": 28841,
    "preview": "import re\nimport time\nimport os\nimport shutil\nfrom copy import copy\nfrom os import environ\nfrom itertools import cycle\ni"
  },
  {
    "path": "book_maker/translator/claude_translator.py",
    "chars": 3470,
    "preview": "import re\nfrom rich import print\nfrom anthropic import Anthropic\n\nfrom .base_translator import Base\n\n\nclass Claude(Base)"
  },
  {
    "path": "book_maker/translator/custom_api_translator.py",
    "chars": 846,
    "preview": "from .base_translator import Base\nimport re\nimport json\nimport requests\nimport time\nfrom rich import print\n\n\nclass Custo"
  },
  {
    "path": "book_maker/translator/deepl_free_translator.py",
    "chars": 1508,
    "preview": "import time\nimport random\nimport re\n\nfrom book_maker.utils import LANGUAGES, TO_LANGUAGE_CODE\n\nfrom .base_translator imp"
  },
  {
    "path": "book_maker/translator/deepl_translator.py",
    "chars": 2210,
    "preview": "import json\nimport time\n\nimport requests\nimport re\n\nfrom book_maker.utils import LANGUAGES, TO_LANGUAGE_CODE\n\nfrom .base"
  },
  {
    "path": "book_maker/translator/gemini_translator.py",
    "chars": 6189,
    "preview": "import re\nimport time\nfrom os import environ\nfrom itertools import cycle\n\nimport google.generativeai as genai\nfrom googl"
  },
  {
    "path": "book_maker/translator/google_translator.py",
    "chars": 2135,
    "preview": "import re\nimport requests\nfrom rich import print\n\nfrom book_maker.utils import TO_LANGUAGE_CODE\nfrom .base_translator im"
  },
  {
    "path": "book_maker/translator/groq_translator.py",
    "chars": 1372,
    "preview": "from groq import Groq\nfrom .chatgptapi_translator import ChatGPTAPI\nfrom os import linesep\nfrom itertools import cycle\n\n"
  },
  {
    "path": "book_maker/translator/litellm_translator.py",
    "chars": 2472,
    "preview": "from os import linesep\n\nfrom litellm import completion\n\nfrom book_maker.translator.chatgptapi_translator import ChatGPTA"
  },
  {
    "path": "book_maker/translator/qwen_translator.py",
    "chars": 8879,
    "preview": "import re\nimport time\nfrom rich import print\nfrom openai import OpenAI\n\nfrom .base_translator import Base\n\n\nclass QwenTr"
  },
  {
    "path": "book_maker/translator/tencent_transmart_translator.py",
    "chars": 2839,
    "preview": "import re\nimport time\nimport uuid\nimport requests\n\nfrom rich import print\nfrom .base_translator import Base\n\n\nclass Tenc"
  },
  {
    "path": "book_maker/translator/xai_translator.py",
    "chars": 512,
    "preview": "from openai import OpenAI\nfrom .chatgptapi_translator import ChatGPTAPI\n\nXAI_MODEL_LIST = [\n    \"grok-beta\",\n]\n\n\nclass X"
  },
  {
    "path": "book_maker/utils.py",
    "chars": 4246,
    "preview": "import tiktoken\n\n# Borrowed from : https://github.com/openai/whisper\nLANGUAGES = {\n    \"en\": \"english\",\n    \"zh-hans\": \""
  },
  {
    "path": "disclaimer.md",
    "chars": 1214,
    "preview": "Disclaimer:\n\n1. The purpose of this project, bilingual_book_maker, is to assist users in creating multilingual versions "
  },
  {
    "path": "docs/book_source.md",
    "chars": 1119,
    "preview": "# Translate from Different Sources\n\n## txt/srt\nTxt files and srt files are plain text files. This program can translate "
  },
  {
    "path": "docs/cmd.md",
    "chars": 4464,
    "preview": "# Command Line Options\n\n## Test translate\n`--test` <br>\n\nUse this option to preview the result if you haven't paid for t"
  },
  {
    "path": "docs/disclaimer.md",
    "chars": 1214,
    "preview": "Disclaimer:\n\n1. The purpose of this project, bilingual_book_maker, is to assist users in creating multilingual versions "
  },
  {
    "path": "docs/env_settings.md",
    "chars": 310,
    "preview": "# Environment Settings\nYou can also write information into env to skip some options.\n\n## Model keys\n```\n# Set env BBM_OP"
  },
  {
    "path": "docs/index.md",
    "chars": 401,
    "preview": "# bilingual book maker\n\nThe `bilingual_book_maker` is an AI translation tool that uses ChatGPT to assist users in creati"
  },
  {
    "path": "docs/installation.md",
    "chars": 391,
    "preview": "# Installation\n## pip\nbilingual_book_maker has been published as a [Python package](https://pypi.org/project/bbook-maker"
  },
  {
    "path": "docs/model_lang.md",
    "chars": 4068,
    "preview": "# Model and Languages\n## Models\n`-m, --model <Model>` <br>\n\nCurrently `bbook_maker` supports these models: `chatgptapi` "
  },
  {
    "path": "docs/prompt.md",
    "chars": 2993,
    "preview": "# Tweak the prompt\n\nTo tweak the prompt, use the `--prompt` parameter. Valid placeholders for the `user` role template i"
  },
  {
    "path": "docs/quickstart.md",
    "chars": 1211,
    "preview": "# QuickStart\nAfter successfully install the package, you can see `bbook-maker` is in the output of `pip list`.\n\n## Prepa"
  },
  {
    "path": "make_book.py",
    "chars": 71,
    "preview": "from book_maker.cli import main\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "mkdocs.yml",
    "chars": 512,
    "preview": "site_name: bilingual book maker\ntheme:\n  name: material\n  features:\n    - navigation.tabs\n    - navigation.tabs.sticky\n "
  },
  {
    "path": "prompt_md.json",
    "chars": 2479,
    "preview": "{\n  \"system\": \"You are a highly skilled translator responsible for translating the content of books in Markdown format f"
  },
  {
    "path": "prompt_md.prompt.md",
    "chars": 478,
    "preview": "# Translation Prompt\n\n## Developer Message\n\nYou are a professional translator who specializes in accurate, natural-sound"
  },
  {
    "path": "prompt_template_sample.json",
    "chars": 2003,
    "preview": "{\n  \"system\": \"You are a highly skilled academic translator. Please complete the translation task according to the follo"
  },
  {
    "path": "pyproject.toml",
    "chars": 1206,
    "preview": "[project]\nname = \"bbook-maker\"\ndescription = \"The bilingual_book_maker is an AI translation tool that uses ChatGPT to as"
  },
  {
    "path": "tests/test_epub_metadata.py",
    "chars": 1359,
    "preview": "import pytest\nfrom ebooklib import epub\n\nfrom book_maker.loader.epub_loader import EPUBBookLoader\n\n\ndef test_epub_loader"
  },
  {
    "path": "tests/test_integration.py",
    "chars": 9507,
    "preview": "import os\nimport shutil\nimport subprocess\nimport sys\nfrom pathlib import Path\n\nimport pytest\n\n\n@pytest.fixture()\ndef tes"
  },
  {
    "path": "tests/test_pdf_cli.py",
    "chars": 1057,
    "preview": "import subprocess\nimport sys\nfrom pathlib import Path\n\nimport pytest\n\nfitz = pytest.importorskip(\"fitz\")\n\n\ndef test_pdf_"
  },
  {
    "path": "tests/test_pdf_loader.py",
    "chars": 1496,
    "preview": "import os\nfrom pathlib import Path\n\nimport pytest\n\nfitz = pytest.importorskip(\"fitz\")\n\nfrom book_maker.loader.pdf_loader"
  },
  {
    "path": "typos.toml",
    "chars": 239,
    "preview": "# See https://github.com/crate-ci/typos/blob/master/docs/reference.md to configure typos\n[default.extend-words]\nsur = \"s"
  }
]

About this extraction

This page contains the full source code of the yihong0618/bilingual_book_maker GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 60 files (264.8 KB), approximately 64.4k tokens, and a symbol index with 258 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo