Full Code of Jiayi-Pan/TinyZero for AI

main 95df88f2dcb0 cached
308 files
1.9 MB
475.5k tokens
1406 symbols
1 requests
Download .txt
Showing preview only (2,018K chars total). Download the full file or copy to clipboard to get everything.
Repository: Jiayi-Pan/TinyZero
Branch: main
Commit: 95df88f2dcb0
Files: 308
Total size: 1.9 MB

Directory structure:
gitextract_d67wvh9g/

├── .github/
│   └── workflows/
│       ├── dataset.yml
│       ├── e2e_digit_completion.yml
│       ├── e2e_gsm8k.yml
│       ├── model.yml
│       ├── ray_test.yml
│       ├── sanity.yml
│       ├── vllm.yml
│       └── yapf_format.yml
├── .gitignore
├── .readthedocs.yaml
├── .style.yapf
├── LICENSE
├── Notice.txt
├── OLD_README.md
├── README.md
├── docker/
│   ├── Dockerfile.ngc.vllm
│   └── Dockerfile.vemlp.vllm.te
├── docs/
│   ├── Makefile
│   ├── README.md
│   ├── advance/
│   │   ├── dpo_extension.rst
│   │   ├── fsdp_extension.rst
│   │   ├── megatron_extension.rst
│   │   └── placement.rst
│   ├── conf.py
│   ├── examples/
│   │   ├── config.rst
│   │   ├── gsm8k_example.rst
│   │   └── ppo_code_architecture.rst
│   ├── experiment/
│   │   └── ppo.rst
│   ├── faq/
│   │   └── faq.rst
│   ├── index.rst
│   ├── preparation/
│   │   ├── prepare_data.rst
│   │   └── reward_function.rst
│   ├── requirements-docs.txt
│   ├── start/
│   │   ├── install.rst
│   │   └── quickstart.rst
│   └── workers/
│       ├── fsdp_workers.rst
│       ├── megatron_workers.rst
│       └── ray_trainer.rst
├── examples/
│   ├── data_preprocess/
│   │   ├── arth.py
│   │   ├── countdown.py
│   │   ├── full_hh_rlhf.py
│   │   ├── gsm8k.py
│   │   ├── hellaswag.py
│   │   ├── math_dataset.py
│   │   └── multiply.py
│   ├── generation/
│   │   └── run_deepseek_v2_lite_math.sh
│   ├── grpo_trainer/
│   │   ├── run_deepseek7b_llm.sh
│   │   ├── run_deepseek7b_llm_seq_balance.sh
│   │   ├── run_qwen2-7b.sh
│   │   └── run_qwen2-7b_seq_balance.sh
│   ├── ppo_trainer/
│   │   ├── run_deepseek7b_llm.sh
│   │   ├── run_deepseek7b_llm_sp2.sh
│   │   ├── run_deepseek_full_hh_rlhf.sh
│   │   ├── run_deepseek_math_gsm8k_megatron.sh
│   │   ├── run_deepseek_megatron.sh
│   │   ├── run_gemma.sh
│   │   ├── run_qwen2-7b.sh
│   │   ├── run_qwen2-7b_rm.sh
│   │   ├── run_qwen2-7b_rm_seq_balance.sh
│   │   ├── run_qwen2-7b_seq_balance.sh
│   │   ├── run_qwen2.5-32b.sh
│   │   └── verl_getting_started.ipynb
│   ├── ray/
│   │   └── tutorial.ipynb
│   ├── sft/
│   │   └── gsm8k/
│   │       ├── run_deepseek_6b7.sh
│   │       ├── run_gemma_2b.sh
│   │       └── run_gemma_7b.sh
│   └── split_placement/
│       ├── README.md
│       ├── config/
│       │   └── ppo_trainer_split.yaml
│       ├── main_ppo_split.py
│       ├── run_deepseek7b_llm.sh
│       └── split_monkey_patch.py
├── patches/
│   └── megatron_v4.patch
├── pyproject.toml
├── requirements.txt
├── scripts/
│   ├── format.sh
│   └── train_tiny_zero.sh
├── setup.py
├── tests/
│   ├── __init__.py
│   ├── e2e/
│   │   ├── __init__.py
│   │   ├── arithmetic_sequence/
│   │   │   ├── data/
│   │   │   │   ├── create_dataset.py
│   │   │   │   ├── test.parquet
│   │   │   │   └── train.parquet
│   │   │   ├── model/
│   │   │   │   ├── config.json
│   │   │   │   ├── create_model_tokenizer.py
│   │   │   │   ├── generation_config.json
│   │   │   │   ├── model.safetensors
│   │   │   │   └── tokenizer_config.json
│   │   │   └── rl/
│   │   │       ├── README.md
│   │   │       ├── config/
│   │   │       │   └── ray_trainer.yaml
│   │   │       └── main_trainer.py
│   │   ├── check_results.py
│   │   ├── envs/
│   │   │   ├── __init__.py
│   │   │   └── digit_completion/
│   │   │       ├── __init__.py
│   │   │       ├── task.py
│   │   │       └── tokenizer.py
│   │   ├── run_qwen_gsm8k_function_rm.sh
│   │   ├── run_qwen_gsm8k_function_rm_no_rmpad.sh
│   │   ├── run_qwen_gsm8k_model_rm.sh
│   │   ├── run_qwen_gsm8k_model_rm_no_rmpad.sh
│   │   ├── run_qwen_gsm8k_model_rm_seq_balance.sh
│   │   ├── run_qwen_gsm8k_model_rm_ulysses.sh
│   │   ├── run_ray_trainer.sh
│   │   └── run_ray_trainer_rmpad.sh
│   ├── gpu_utility/
│   │   ├── test_memory_buffers.py
│   │   ├── test_ops.py
│   │   └── test_torch_functional.py
│   ├── model/
│   │   ├── test_transformer.py
│   │   └── test_transformers_ulysses.py
│   ├── ray/
│   │   ├── check_worker_alive/
│   │   │   └── main.py
│   │   ├── detached_worker/
│   │   │   ├── README.md
│   │   │   ├── client.py
│   │   │   ├── run.sh
│   │   │   └── server.py
│   │   ├── test_check_worker_alive.py
│   │   ├── test_colocated_workers.py
│   │   ├── test_data_transfer.py
│   │   ├── test_driverfunc_to_worker.py
│   │   ├── test_high_level_scheduling_api.py
│   │   ├── test_ray_local_envs.py
│   │   ├── test_rvdz.py
│   │   ├── test_worker_group_basics.py
│   │   └── test_worker_group_torch.py
│   ├── rollout/
│   │   ├── run_fsdp_vllm.py
│   │   └── test_vllm_hf_loader.py
│   ├── sanity/
│   │   ├── check_license.py
│   │   └── test_import.py
│   ├── utility/
│   │   └── test_tensor_dict_utilities.py
│   └── verl/
│       └── utils/
│           └── dataset/
│               ├── test_rl_dataset.py
│               ├── test_rm_dataset.py
│               └── test_sft_dataset.py
└── verl/
    ├── __init__.py
    ├── models/
    │   ├── README.md
    │   ├── __init__.py
    │   ├── llama/
    │   │   ├── __init__.py
    │   │   └── megatron/
    │   │       ├── __init__.py
    │   │       ├── checkpoint_utils/
    │   │       │   ├── __init__.py
    │   │       │   ├── llama_loader.py
    │   │       │   └── llama_saver.py
    │   │       ├── layers/
    │   │       │   ├── __init__.py
    │   │       │   ├── parallel_attention.py
    │   │       │   ├── parallel_decoder.py
    │   │       │   ├── parallel_linear.py
    │   │       │   ├── parallel_mlp.py
    │   │       │   └── parallel_rmsnorm.py
    │   │       └── modeling_llama_megatron.py
    │   ├── registry.py
    │   ├── transformers/
    │   │   ├── __init__.py
    │   │   ├── llama.py
    │   │   ├── monkey_patch.py
    │   │   └── qwen2.py
    │   └── weight_loader_registry.py
    ├── protocol.py
    ├── single_controller/
    │   ├── __init__.py
    │   ├── base/
    │   │   ├── __init__.py
    │   │   ├── decorator.py
    │   │   ├── megatron/
    │   │   │   ├── __init__.py
    │   │   │   ├── worker.py
    │   │   │   └── worker_group.py
    │   │   ├── register_center/
    │   │   │   ├── __init__.py
    │   │   │   └── ray.py
    │   │   ├── worker.py
    │   │   └── worker_group.py
    │   ├── ray/
    │   │   ├── __init__.py
    │   │   ├── base.py
    │   │   └── megatron.py
    │   └── version/
    │       └── version
    ├── third_party/
    │   ├── __init__.py
    │   └── vllm/
    │       ├── __init__.py
    │       ├── vllm_v_0_3_1/
    │       │   ├── __init__.py
    │       │   ├── arg_utils.py
    │       │   ├── config.py
    │       │   ├── llm.py
    │       │   ├── llm_engine_sp.py
    │       │   ├── model_loader.py
    │       │   ├── model_runner.py
    │       │   ├── parallel_state.py
    │       │   ├── tokenizer.py
    │       │   ├── weight_loaders.py
    │       │   └── worker.py
    │       ├── vllm_v_0_4_2/
    │       │   ├── __init__.py
    │       │   ├── arg_utils.py
    │       │   ├── config.py
    │       │   ├── dtensor_weight_loaders.py
    │       │   ├── hf_weight_loader.py
    │       │   ├── llm.py
    │       │   ├── llm_engine_sp.py
    │       │   ├── megatron_weight_loaders.py
    │       │   ├── model_loader.py
    │       │   ├── model_runner.py
    │       │   ├── parallel_state.py
    │       │   ├── spmd_gpu_executor.py
    │       │   ├── tokenizer.py
    │       │   └── worker.py
    │       ├── vllm_v_0_5_4/
    │       │   ├── __init__.py
    │       │   ├── arg_utils.py
    │       │   ├── config.py
    │       │   ├── dtensor_weight_loaders.py
    │       │   ├── hf_weight_loader.py
    │       │   ├── llm.py
    │       │   ├── llm_engine_sp.py
    │       │   ├── megatron_weight_loaders.py
    │       │   ├── model_loader.py
    │       │   ├── model_runner.py
    │       │   ├── parallel_state.py
    │       │   ├── spmd_gpu_executor.py
    │       │   ├── tokenizer.py
    │       │   └── worker.py
    │       └── vllm_v_0_6_3/
    │           ├── __init__.py
    │           ├── arg_utils.py
    │           ├── config.py
    │           ├── dtensor_weight_loaders.py
    │           ├── hf_weight_loader.py
    │           ├── llm.py
    │           ├── llm_engine_sp.py
    │           ├── megatron_weight_loaders.py
    │           ├── model_loader.py
    │           ├── model_runner.py
    │           ├── parallel_state.py
    │           ├── spmd_gpu_executor.py
    │           ├── tokenizer.py
    │           └── worker.py
    ├── trainer/
    │   ├── __init__.py
    │   ├── config/
    │   │   ├── evaluation.yaml
    │   │   ├── generation.yaml
    │   │   ├── ppo_megatron_trainer.yaml
    │   │   ├── ppo_trainer.yaml
    │   │   └── sft_trainer.yaml
    │   ├── fsdp_sft_trainer.py
    │   ├── main_eval.py
    │   ├── main_generation.py
    │   ├── main_ppo.py
    │   ├── ppo/
    │   │   ├── __init__.py
    │   │   ├── core_algos.py
    │   │   └── ray_trainer.py
    │   └── runtime_env.yaml
    ├── utils/
    │   ├── __init__.py
    │   ├── config.py
    │   ├── dataset/
    │   │   ├── README.md
    │   │   ├── __init__.py
    │   │   ├── rl_dataset.py
    │   │   ├── rm_dataset.py
    │   │   └── sft_dataset.py
    │   ├── debug/
    │   │   ├── __init__.py
    │   │   ├── performance.py
    │   │   └── trajectory_tracker.py
    │   ├── distributed.py
    │   ├── flops_counter.py
    │   ├── fs.py
    │   ├── fsdp_utils.py
    │   ├── hdfs_io.py
    │   ├── import_utils.py
    │   ├── logger/
    │   │   ├── __init__.py
    │   │   └── aggregate_logger.py
    │   ├── logging_utils.py
    │   ├── megatron/
    │   │   ├── __init__.py
    │   │   ├── memory.py
    │   │   ├── optimizer.py
    │   │   ├── optimizer_config.py
    │   │   ├── pipeline_parallel.py
    │   │   ├── sequence_parallel.py
    │   │   └── tensor_parallel.py
    │   ├── megatron_utils.py
    │   ├── memory_buffer.py
    │   ├── model.py
    │   ├── py_functional.py
    │   ├── ray_utils.py
    │   ├── rendezvous/
    │   │   ├── __init__.py
    │   │   └── ray_backend.py
    │   ├── reward_score/
    │   │   ├── __init__.py
    │   │   ├── countdown.py
    │   │   ├── gsm8k.py
    │   │   ├── math.py
    │   │   └── multiply.py
    │   ├── seqlen_balancing.py
    │   ├── tokenizer.py
    │   ├── torch_dtypes.py
    │   ├── torch_functional.py
    │   ├── tracking.py
    │   └── ulysses.py
    ├── version/
    │   └── version
    └── workers/
        ├── __init__.py
        ├── actor/
        │   ├── __init__.py
        │   ├── base.py
        │   ├── dp_actor.py
        │   └── megatron_actor.py
        ├── critic/
        │   ├── __init__.py
        │   ├── base.py
        │   ├── dp_critic.py
        │   └── megatron_critic.py
        ├── fsdp_workers.py
        ├── megatron_workers.py
        ├── reward_model/
        │   ├── __init__.py
        │   ├── base.py
        │   └── megatron/
        │       ├── __init__.py
        │       └── reward_model.py
        ├── rollout/
        │   ├── __init__.py
        │   ├── base.py
        │   ├── hf_rollout.py
        │   ├── naive/
        │   │   ├── __init__.py
        │   │   └── naive_rollout.py
        │   ├── tokenizer.py
        │   └── vllm_rollout/
        │       ├── __init__.py
        │       └── vllm_rollout.py
        └── sharding_manager/
            ├── __init__.py
            ├── base.py
            ├── fsdp_ulysses.py
            ├── fsdp_vllm.py
            └── megatron_vllm.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/workflows/dataset.yml
================================================
name: dataset

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch
  push:
    branches:
      - main
    paths:
      - "**/*.py"
      - .github/workflows/dataset.yml
  pull_request:
    branches:
      - main
    paths:
      - "**/*.py"
      - .github/workflows/dataset.yml

jobs:
  ray:
    runs-on: [self-hosted, gpu]
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
            fetch-depth: 0
      - name: Install the current repository
        run: |
          pip install -e .[test] --user
      - name: Running dataset tests
        run: |
          [ ! -d "$HOME/verl-data" ] && git clone --depth 1 https://github.com/eric-haibin-lin/verl-data ~/verl-data
          pytest -s -x tests/verl
      - name: Running ray test using cupy (move it to L20 when dockerfile ready)
        run: |
          cd tests/ray
          pytest -s -x test_rvdz.py

================================================
FILE: .github/workflows/e2e_digit_completion.yml
================================================
name: e2e_digit_completion

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch
  push:
    branches:
      - main
    paths:
      - "**/*.py"
      - .github/workflows/e2e_digit_completion.yml
  pull_request:
    branches:
      - main
    paths:
      - "**/*.py"
      - .github/workflows/e2e_digit_completion.yml
      - "tests/e2e/*.sh"

jobs:
  e2e_digit_completion:
    runs-on: [self-hosted, l20-0]
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1"
      HF_HUB_ENABLE_HF_TRANSFER: 1
    container:
      image: verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3
      options: --gpus all --shm-size=10g
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
            fetch-depth: 0
      - name: Install the current repository
        run: |
          pip3 install hf_transfer
          pip3 install -e .[test]
      - name: Running digit completon e2e training tests on 8 L20 GPUs
        run: |
          ray stop --force
          bash tests/e2e/run_ray_trainer.sh


================================================
FILE: .github/workflows/e2e_gsm8k.yml
================================================
name: e2e_gsm8k

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch
  push:
    branches:
      - main
    paths:
      - "**/*.py"
      - .github/workflows/e2e_gsm8k.yml
  pull_request:
    branches:
      - main
    paths:
      - "**/*.py"
      - .github/workflows/e2e_gsm8k.yml
      - "tests/e2e/*.sh"

jobs:
  e2e_gsm8k:
    runs-on: [self-hosted, l20-1]
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1"
      HF_HUB_ENABLE_HF_TRANSFER: 1
    container:
      image: verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3
      options: --gpus all --shm-size=10g
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
            fetch-depth: 0
      - name: Install the current repository
        run: |
          pip3 install hf_transfer
          pip3 install -e .[test]
      - name: Prepare gsm8k dataset
        run: |
          ray stop --force
          python3 examples/data_preprocess/gsm8k.py
      - name: Running gsm8k e2e training tests on 8 L20 GPUs with rmpad using function rm
        run: |
          ray stop --force
          bash tests/e2e/run_qwen_gsm8k_function_rm.sh
      - name: Running gsm8k e2e without rmpad using function rm
        run: |
          ray stop --force
          bash tests/e2e/run_qwen_gsm8k_function_rm_no_rmpad.sh
      - name: Running gsm8k e2e with rmpad using model rm
        run: |
          ray stop --force
          bash tests/e2e/run_qwen_gsm8k_model_rm.sh
      - name: Running gsm8k e2e without rmpad using model rm
        run: |
          ray stop --force
          bash tests/e2e/run_qwen_gsm8k_model_rm_no_rmpad.sh
      - name: Running gsm8k e2e with rmpad using model rm and ulysses sp=2
        run: |
          ray stop --force
          bash tests/e2e/run_qwen_gsm8k_model_rm_ulysses.sh
      - name: Running gsm8k e2e with rmpad using model rm and dynamic batch size
        run: |
          ray stop --force
          bash tests/e2e/run_qwen_gsm8k_model_rm_seq_balance.sh


================================================
FILE: .github/workflows/model.yml
================================================
name: model_rmpad

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch
  push:
    branches:
      - main
    paths:
      - "**/*.py"
      - .github/workflows/model.yml
  pull_request:
    branches:
      - main
    paths:
      - "**/*.py"
      - .github/workflows/model.yml

jobs:
  model_rmpad:
    runs-on: [self-hosted, l20-1]
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1"
    container:
      image: verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3
      options: --gpus all --shm-size=10g
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
            fetch-depth: 0
      - name: Install the current repository and upgrade to latest transformers/flash_attn
        run: |
          pip3 install -e .[test]
          pip3 install --upgrade transformers
      - name: Running digit completon e2e training tests on 8 L20 GPUs + flash_attn 2.5.8
        run: |
          pytest -s tests/model/test_transformer.py
      - name: Running digit completon e2e training tests on 8 L20 GPUs + latest flash_attn
        run: |
          pip3 install --upgrade flash_attn --no-build-isolation
          pytest -s tests/model/test_transformer.py


================================================
FILE: .github/workflows/ray_test.yml
================================================
name: ray

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch
  push:
    branches:
      - main
    paths:
      - "**/*.py"
      - .github/workflows/ray_test.yml
  pull_request:
    branches:
      - main
    paths:
      - "**/*.py"
      - .github/workflows/ray_test.yml

jobs:
  ray:
    runs-on: [self-hosted, l20-0]
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1"
      HF_HUB_ENABLE_HF_TRANSFER: 1
    container:
      image: verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3
      options: --gpus all --shm-size=10g
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
            fetch-depth: 0
      - name: Install the current repository
        run: |
          pip install hf_transfer
          pip install -e .[test]
          pip install --upgrade "ray>=2.40.0"
      - name: Running ray tests that need 8 GPUs
        run: |
          cd tests/ray
          pytest -s -x --ignore=test_check_worker_alive.py --ignore=test_rvdz.py .


================================================
FILE: .github/workflows/sanity.yml
================================================
name: sanity

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch
  push:
    branches:
      - main
    paths:
      - "**/*.py"
      - .github/workflows/sanity.yml
  pull_request:
    branches:
      - main
    paths:
      - "**/*.py"
      - .github/workflows/sanity.yml

jobs:
  sanity:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ["3.10"]
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
      - name: Set up Python ${{ matrix.python-version }}
        uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
        with:
          python-version: ${{ matrix.python-version }}
      - name: Install the current repository
        run: |
          pip install -e .[test]
      - name: Run sanity test
        run: |
          pytest -s -x tests/sanity
      - name: Run untility test
        run: |
          pytest -s -x tests/utility
      - name: Run license test
        run: |
          python3 tests/sanity/check_license.py --directory .


================================================
FILE: .github/workflows/vllm.yml
================================================
name: vllm

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch
  push:
    branches:
      - main
    paths:
      - "**/*.py"
      - .github/workflows/vllm.yml
  pull_request:
    branches:
      - main
    paths:
      - "**/*.py"
      - .github/workflows/vllm.yml

jobs:
  vllm:
    runs-on: [self-hosted, l20-0]
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1"
      HF_HUB_ENABLE_HF_TRANSFER: 1
    container:
      image: verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3
      options: --gpus all --shm-size=10g
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
            fetch-depth: 0
      - name: Install the current repository
        run: |
          pip3 install hf_transfer
          pip3 install -e .[test]
          pip3 install vllm==0.5.4
      - name: Running vllm tests on 8 L20 GPUs
        run: |
          cd tests/rollout
          torchrun --standalone --nnodes=1 --nproc_per_node=8 $(which pytest) -s test_vllm_hf_loader.py


================================================
FILE: .github/workflows/yapf_format.yml
================================================
name: yapf

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch
  push:
    branches:
      - main
    paths:
      - "**/*.py"
      - .github/workflows/yapf_format.yml
  pull_request:
    branches:
      - main
    paths:
      - "**/*.py"
      - .github/workflows/yapf_format.yml

jobs:
  yapf:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ["3.12"]
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
      # - name: checkout
      #   run: |
      #     commits=${{ github.event.pull_request.commits }}
      #     if [[ -n "$commits" ]]; then
      #       # Prepare enough depth for diffs with main
      #       git fetch --depth="$(( commits + 1 ))"
      #     fi
      - name: Set up Python ${{ matrix.python-version }}
        uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
        with:
          python-version: ${{ matrix.python-version }}
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install --upgrade yapf
          pip install toml==0.10.2
      - name: Running yapf
        run: |
          yapf -r -vv -d --style=./.style.yapf verl tests examples


================================================
FILE: .gitignore
================================================
**/*.pt
**/checkpoints
**/wget-log
**/_build/
**/*.ckpt
**/outputs
**/*.tar.gz
**/playground
**/wandb

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
dataset/*
tensorflow/my_graph/*
.idea/
# C extensions
*.so

# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
*.egg-info/
.installed.cfg
*.egg

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*,cover
.hypothesis/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# IPython Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# dotenv
.env

# virtualenv
venv/
ENV/

# Spyder project settings
.spyderproject

# Rope project settings
.ropeproject

# vscode
.vscode

# Mac
.DS_Store

# output logs
tests/e2e/toy_examples/deepspeed/synchronous/output.txt

# vim
*.swp


================================================
FILE: .readthedocs.yaml
================================================
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

version: 2

build:
  os: ubuntu-22.04
  tools:
    python: "3.8"

sphinx:
  configuration: docs/conf.py

python:
  install:
    - requirements: docs/requirements-docs.txt

================================================
FILE: .style.yapf
================================================
[style]
based_on_style = google
column_limit = 120
indent_width = 4
split_arguments_when_comma_terminated: true

================================================
FILE: LICENSE
================================================

                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: Notice.txt
================================================
Copyright 2023-2024 Bytedance Ltd. and/or its affiliates 

================================================
FILE: OLD_README.md
================================================
<h1 style="text-align: center;">veRL: Volcano Engine Reinforcement Learning for LLM</h1>

veRL is a flexible, efficient and production-ready RL training framework designed for large language models (LLMs). 

veRL is the open-source version of **[HybridFlow: A Flexible and Efficient RLHF Framework](https://arxiv.org/abs/2409.19256v2)** paper.

veRL is flexible and easy to use with:

- **Easy extension of diverse RL algorithms**: The Hybrid programming model combines the strengths of single-controller and multi-controller paradigms to enable flexible representation and efficient execution of complex Post-Training dataflows. Allowing users to build RL dataflows in a few lines of code.

- **Seamless integration of existing LLM infra with modular APIs**: Decouples computation and data dependencies, enabling seamless integration with existing LLM frameworks, such as PyTorch FSDP, Megatron-LM and vLLM. Moreover, users can easily extend to other LLM training and inference frameworks.

- **Flexible device mapping**: Supports various placement of models onto different sets of GPUs for efficient resource utilization and scalability across different cluster sizes.

- Readily integration with popular HuggingFace models


veRL is fast with:

- **State-of-the-art throughput**: By seamlessly integrating existing SOTA LLM training and inference frameworks, veRL achieves high generation and training throughput.

- **Efficient actor model resharding with 3D-HybridEngine**: Eliminates memory redundancy and significantly reduces communication overhead during transitions between training and generation phases.

<p align="center">
| <a href="https://verl.readthedocs.io/en/latest/index.html"><b>Documentation</b></a> | <a href="https://arxiv.org/abs/2409.19256v2"><b>Paper</b></a> | <a href="https://join.slack.com/t/verlgroup/shared_invite/zt-2w5p9o4c3-yy0x2Q56s_VlGLsJ93A6vA"><b>Slack</b></a> | <a href="https://raw.githubusercontent.com/eric-haibin-lin/verl-community/refs/heads/main/WeChat.JPG"><b>Wechat</b></a> | 

<!-- <a href=""><b>Slides</b></a> | -->
</p>

## News

- [2024/12] The team presented <a href="https://neurips.cc/Expo/Conferences/2024/workshop/100677">Post-training LLMs: From Algorithms to Infrastructure</a> at NeurIPS 2024. [Slides](https://github.com/eric-haibin-lin/verl-data/tree/neurips) and [video](https://neurips.cc/Expo/Conferences/2024/workshop/100677) available.
- [2024/10] veRL is presented at Ray Summit. [Youtube video](https://www.youtube.com/watch?v=MrhMcXkXvJU&list=PLzTswPQNepXntmT8jr9WaNfqQ60QwW7-U&index=37) available.
- [2024/08] HybridFlow (verl) is accepted to EuroSys 2025.

## Key Features

- **FSDP** and **Megatron-LM** for training.
- **vLLM** and **TGI** for rollout generation, **SGLang** support coming soon.
- huggingface models support
- Supervised fine-tuning
- Reward model training
- Reinforcement learning from human feedback with PPO
- flash-attention integration, sequence packing
- scales up to 70B models and hundreds of GPUs
- experiment tracking with wandb and mlflow


## Getting Started

Checkout this [Jupyter Notebook](https://github.com/volcengine/verl/tree/main/examples/ppo_trainer/verl_getting_started.ipynb) to get started with PPO training with a single 24GB L4 GPU (**FREE** GPU quota provided by [Lighting Studio](https://lightning.ai/hlin-verl/studios/verl-getting-started))!

**Quickstart:**
- [Installation](https://verl.readthedocs.io/en/latest/start/install.html)
- [Quickstart](https://verl.readthedocs.io/en/latest/start/quickstart.html)

**Running an PPO example step-by-step:**
- Data and Reward Preparation
  - [Prepare Data (Parquet) for Post-Training](https://verl.readthedocs.io/en/latest/preparation/prepare_data.html)
  - [Implement Reward Function for Dataset](https://verl.readthedocs.io/en/latest/preparation/reward_function.html)
- Understanding the PPO Example
  - [PPO Example Architecture](https://verl.readthedocs.io/en/latest/examples/ppo_code_architecture.html)
  - [Config Explanation](https://verl.readthedocs.io/en/latest/examples/config.html)
  - [Run GSM8K Example](https://verl.readthedocs.io/en/latest/examples/gsm8k_example.html)

**Reproducible algorithm baselines:**
- [PPO](https://verl.readthedocs.io/en/latest/experiment/ppo.html)

**For code explanation and advance usage (extension):**
- PPO Trainer and Workers
  - [PPO Ray Trainer](https://verl.readthedocs.io/en/latest/workers/ray_trainer.html)
  - [PyTorch FSDP Backend](https://verl.readthedocs.io/en/latest/workers/fsdp_workers.html)
  - [Megatron-LM Backend](https://verl.readthedocs.io/en/latest/index.html)
- Advance Usage and Extension
  - [Ray API Design Tutorial](https://verl.readthedocs.io/en/latest/advance/placement.html)
  - [Extend to other RL(HF) algorithms](https://verl.readthedocs.io/en/latest/advance/dpo_extension.html)
  - [Add models with the FSDP backend](https://verl.readthedocs.io/en/latest/advance/fsdp_extension.html)
  - [Add models with the Megatron-LM backend](https://verl.readthedocs.io/en/latest/advance/megatron_extension.html)


## Citation and acknowledgement

If you find the project helpful, please cite:
- [HybridFlow: A Flexible and Efficient RLHF Framework](https://arxiv.org/abs/2409.19256v2)
- [A Framework for Training Large Language Models for Code Generation via Proximal Policy Optimization](https://i.cs.hku.hk/~cwu/papers/gmsheng-NL2Code24.pdf)

```tex
@article{sheng2024hybridflow,
  title   = {HybridFlow: A Flexible and Efficient RLHF Framework},
  author  = {Guangming Sheng and Chi Zhang and Zilingfeng Ye and Xibin Wu and Wang Zhang and Ru Zhang and Yanghua Peng and Haibin Lin and Chuan Wu},
  year    = {2024},
  journal = {arXiv preprint arXiv: 2409.19256}
}
```

verl is inspired by the design of Nemo-Aligner, Deepspeed-chat and OpenRLHF. The project is adopted and supported by Anyscale, Bytedance, LMSys.org, Shanghai AI Lab, Tsinghua University, UC Berkeley, UCLA, UIUC, and University of Hong Kong.

## Publications Using veRL
- [Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization](https://arxiv.org/abs/2410.09302)
- [Flaming-hot Initiation with Regular Execution Sampling for Large Language Models](https://arxiv.org/abs/2410.21236)
- [Process Reinforcement Through Implicit Rewards](https://github.com/PRIME-RL/PRIME/)

We are HIRING! Send us an [email](mailto:haibin.lin@bytedance.com) if you are interested in internship/FTE opportunities in MLSys/LLM reasoning/multimodal alignment.


================================================
FILE: README.md
================================================
# TinyZero

> **⚠️ Deprecation Notice:** This repo is no longer actively maintained. For running RL experiments, please directly use the latest [veRL](https://github.com/volcengine/verl) library.
> For the archived original documentation, see [OLD_README.md](./OLD_README.md).

![image](cover.png)

TinyZero is a reproduction of [DeepSeek R1 Zero](https://github.com/deepseek-ai/DeepSeek-R1) in countdown and multiplication tasks. We built upon [veRL](https://github.com/volcengine/verl).

Through RL, the 3B base LM develops self-verification and search abilities all on its own.

You can experience the Aha moment yourself for < $30.

Twitter thread: https://x.com/jiayi_pirate/status/1882839370505621655

Full experiment log: https://wandb.ai/jiayipan/TinyZero

> 📢: We release [Adaptive Parallel Reasoning](https://github.com/Parallel-Reasoning/APR), where we explore a new dimension in scaling reasoning models.

## Installation

```
conda create -n zero python=3.9
# install torch [or you can skip this step and let vllm install the correct version for you]
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
# install vllm
pip3 install vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1
pip3 install ray

# verl
pip install -e .

# flash attention 2
pip3 install flash-attn --no-build-isolation
# quality of life
pip install wandb IPython matplotlib
```

## Countdown task

**Data Preparation**
```
conda activate zero
python ./examples/data_preprocess/countdown.py --local_dir {path_to_your_dataset}
```

### Run Training
```
conda activate zero
```

For the following code, if you see out-of-VRAM, try adding `critic.model.enable_gradient_checkpointing=True` to the script, and check out the discussion [here](https://github.com/Jiayi-Pan/TinyZero/issues/5#issuecomment-2624161643).

**Single GPU**


Works for model <= 1.5B. For Qwen2.5-0.5B base, we know it fails to learn reasoning.

```
export N_GPUS=1
export BASE_MODEL={path_to_your_model}
export DATA_DIR={path_to_your_dataset}
export ROLLOUT_TP_SIZE=1
export EXPERIMENT_NAME=countdown-qwen2.5-0.5b
export VLLM_ATTENTION_BACKEND=XFORMERS

bash ./scripts/train_tiny_zero.sh
```

**3B+ model**
In this case, the base model is able to develop sophisticated reasoning skills.
```
export N_GPUS=2
export BASE_MODEL={path_to_your_model}
export DATA_DIR={path_to_your_dataset}
export ROLLOUT_TP_SIZE=2
export EXPERIMENT_NAME=countdown-qwen2.5-3b
export VLLM_ATTENTION_BACKEND=XFORMERS

bash ./scripts/train_tiny_zero.sh
```

### Instruct Ablation
We experiment with Qwen-2.5-3B Instruct too.
**Data Preparation**
To follow chat template, we need to reprocess the data:
```
conda activate zero
python examples/data_preprocess/countdown.py --template_type=qwen-instruct --local_dir={path_to_your_dataset}
```

**Training**
```
export N_GPUS=2
export BASE_MODEL={path_to_your_model}
export DATA_DIR={path_to_your_dataset}
export ROLLOUT_TP_SIZE=2
export EXPERIMENT_NAME=countdown-qwen2.5-3b-instruct
export VLLM_ATTENTION_BACKEND=XFORMERS

bash ./scripts/train_tiny_zero.sh
```

## Acknowledgements
* We run our experiments based on [veRL](https://github.com/volcengine/verl).
* We use Qwen2.5 series base model [Qwen2.5](https://github.com/QwenLM/Qwen2.5).

## Citation
```
@misc{tinyzero,
author       = {Jiayi Pan and Junjie Zhang and Xingyao Wang and Lifan Yuan and Hao Peng and Alane Suhr},
title        = {TinyZero},
howpublished = {https://github.com/Jiayi-Pan/TinyZero},
note         = {Accessed: 2025-01-24},
year         = {2025}
}
```


================================================
FILE: docker/Dockerfile.ngc.vllm
================================================
FROM nvcr.io/nvidia/pytorch:24.05-py3

# uninstall nv-pytorch fork
RUN pip3 uninstall pytorch-quantization \
     pytorch-triton \
     torch \
     torch-tensorrt \
     torchvision \
     xgboost transformer_engine flash_attn \
     apex megatron-core -y

RUN pip3 install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu124

# make sure torch version is kept
RUN pip3 install --no-cache-dir \
    "torch==2.4.0" \
    accelerate \
    codetiming \
    datasets \
    dill \
    hydra-core \
    numpy \
    pybind11 \
    tensordict \
    "transformers<=4.46.0"

# ray is installed via vllm
RUN pip3 install --no-cache-dir vllm==0.6.3

# we choose flash-attn v2.7.0 or v2.7.2 which contain pre-built wheels
RUN pip3 install --no-cache-dir --no-build-isolation flash-attn==2.7.0.post2

# install apex, set MAX_JOBS to avoid OOMs
RUN MAX_JOBS=4 pip3 install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \
    --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" \
    git+https://github.com/NVIDIA/apex

# install Transformer Engine, which requires FA 2.5.8
RUN MAX_JOBS=4 NINJA_FLAGS="-j4" pip3 install flash-attn==2.5.8 --no-cache-dir --no-build-isolation
RUN MAX_JOBS=4 NINJA_FLAGS="-j4" pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@v1.7

# Pin wandb to v0.18 since v0.19.1 is released with ImportError
RUN pip3 install wandb==0.18.7 py-spy


================================================
FILE: docker/Dockerfile.vemlp.vllm.te
================================================
# docker buildx build --platform linux/x86_64 -t "verlai/verl:$TAG" -f docker/$FILE .

# the one in docker.io is an alias for the one veturbo
# FROM vemlp-cn-beijing.cr.volces.com/veturbo/pytorch:2.4-cu124
FROM docker.io/haibinlin/verl:v0.0.5-th2.4.0-cu124-base

# only config pip index with https://pypi.tuna.tsinghua.edu.cn/simple if needed
# unset for now
RUN pip3 config unset global.index-url

# transformers 4.47.0 contains the following bug:
# AttributeError: 'Gemma2Attention' object has no attribute '_flash_attn_uses_top_left_mask'
RUN pip3 install --no-cache-dir \
    torch==2.4.0 \
    accelerate \
    codetiming \
    dill \
    hydra-core \
    numpy \
    pybind11 \
    tensordict \
    "transformers <= 4.46.0"

RUN pip3 install --no-cache-dir flash-attn==2.7.0.post2 --no-build-isolation

# vllm depends on ray, and veRL does not support ray > 2.37
RUN pip3 install --no-cache-dir vllm==0.6.3 ray==2.10

# install apex
RUN MAX_JOBS=4 pip3 install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \
    --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" \
    git+https://github.com/NVIDIA/apex

# install Transformer Engine
# - flash-attn pinned to 2.5.3 by TransformerEngine, switch to eric-haibin-lin/TransformerEngine.git@v1.7.0 to relax version req
# - install with: MAX_JOBS=1 NINJA_FLAGS="-j1" TE_BUILD_WITH_NINJA=0 to avoid OOM
# - cudnn is required by TransformerEngine
# RUN CUDNN_PATH=/opt/conda/lib/python3.11/site-packages/nvidia/cudnn \
#     pip3 install git+https://github.com/eric-haibin-lin/TransformerEngine.git@v1.7.0
RUN MAX_JOBS=1 NINJA_FLAGS="-j1" pip3 install flash-attn==2.5.3 --no-cache-dir --no-build-isolation
RUN MAX_JOBS=1 NINJA_FLAGS="-j1" pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@v1.7


================================================
FILE: docs/Makefile
================================================
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line.
SPHINXOPTS    =
SPHINXBUILD   = sphinx-build
SPHINXPROJ    = verl
SOURCEDIR     = .
BUILDDIR      = _build

# Put it first so that "make" without argument is like "make help".
help:
	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)


================================================
FILE: docs/README.md
================================================
# veRL documents

## Build the docs

```bash
# Install dependencies.
pip install -r requirements-docs.txt

# Build the docs.
make clean
make html
```

## Open the docs with your browser

```bash
python -m http.server -d _build/html/
```
Launch your browser and open localhost:8000.

================================================
FILE: docs/advance/dpo_extension.rst
================================================
Extend to other RL(HF) algorithms
=================================

We already implemented the complete training pipeline of the PPO
algorithms. To extend to other algorithms, we analyze the high-level
principle to use veRL and provide a tutorial to implement the DPO
algorithm. Users can follow the similar paradigm to extend to other RL algorithms.

.. note:: **Key ideas**: Single process drives multi-process computation and data communication.

Overall Approach
----------------

Step 1: Consider what multi-machine multi-GPU computations are needed
for each model, such as ``generate_sequence`` , ``compute_log_prob`` and
``update_policy`` in the actor_rollout model. Implement distributed
single-process-multiple-data (SPMD) computation and encapsulate them
into APIs

Step 2: Based on different distributed scenarios, including FSDP and 3D
parallelism in Megatron-LM, implement single-process control of data
interaction among multi-process computations.

Step 3: Utilize the encapsulated APIs to implement the control flow

Example: Online DPO
-------------------

We use veRL to implement a simple online DPO algorithm. The algorithm
flow of Online DPO is as follows:

1. There is a prompt (rollout) generator which has the same weight as
   the actor model. After a batch of prompts are fed into the generator,
   it generates N responses for each prompt.
2. Send all the prompts + responses to a verifier for scoring, which can
   be reward model or a rule-based function. Then sort them in pairs to
   form a training batch.
3. Use this training batch to train the actor model using DPO. During
   the process, a reference policy is needed.

Step 1: What are the multi-machine multi-GPU computations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Sample Generator**

Implementation details:

.. code:: python

   from verl.single_controller.base import Worker
   from verl.single_controller.ray import RayWorkerGroup, RayClassWithInitArgs, RayResourcePool
   import ray

   @ray.remote
   class SampleGenerator(Worker):
       def __init__(self, config):
           super().__init__()
           self.config = config
           
       def generate_sequences(self, data):
           pass

Here, ``SampleGenerator`` can be viewed as a multi-process pulled up by
``torchrun``, with each process running the same code (SPMD).
``SampleGenerator`` needs to implement a ``generate_sequences`` API for
the control flow to call. The implementation details inside can use any
inference engine including vllm, sglang and huggingface. Users can
largely reuse the code in
verl/verl/trainer/ppo/rollout/vllm_rollout/vllm_rollout.py and we won't
go into details here.

**ReferencePolicy inference**

API: compute reference log probability

.. code:: python

   from verl.single_controller.base import Worker
   import ray

   @ray.remote
   class ReferencePolicy(Worker):
       def __init__(self):
           super().__init__()
           self.model = Model()
           
       def infer(self, data):
           return self.model(data)

**Actor update**

API: Update actor model parameters

.. code:: python

   from verl.single_controller.base import Worker
   import ray

   @ray.remote
   class DPOActor(Worker):
       def __init__(self):
           super().__init__()
           self.model = Model()
           self.model = FSDP(self.model)  # or other distributed strategy
           self.optimizer = optim.Adam(self.model.parameters(), lr=1e-3)
           self.loss_fn = xxx
           
       def update(self, data):
           self.optimizer.zero_grad()
           logits = self.model(data)
           loss = self.loss_fn(logits)
           loss.backward()
           self.optimizer.step()

**Notes: How to distinguish between control processes and distributed computation processes**
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- Control processes are generally functions directly decorated with
  ``@ray.remote``
- Computation processes are all wrapped into a ``RayWorkerGroup``.

Users can reuse most of the distribtued computation logics implemented
in PPO algorithm, including FSDP and Megatron-LM backend in
verl/verl/trainer/ppo.

Step 2: Based on different distributed scenarios, implement single-process control of multi-process data interaction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**The core problem to solve here is how a single process sends data to
multiple processes, drives multi-process computation, and how the
control process obtains the results of multi-process computation.**
First, we initialize the multi-process ``WorkerGroup`` in the control
process.

.. code:: python

   @ray.remote(num_cpus=1)
   def main_task(config):
       # construct SampleGenerator
       resource_pool = RayResourcePool(process_on_nodes=[8] * 2)  # 16 GPUs
       ray_cls = RayClassWithInitArgs(SampleGenerator, config=config)
       # put SampleGenerator onto resource pool
       worker_group = RayWorkerGroup(resource_pool, ray_cls)
       
       # construct reference policy

As we can see, in the control process, multiple processes are wrapped
into a ``RayWorkerGroup``. Inside this ``WorkerGroup``, there is a
``self._workers`` member, where each worker is a RayActor
(https://docs.ray.io/en/latest/ray-core/actors.html) of SampleGenerator.
ray_trainer.md also provide an implementation of
``MegatronRayWorkerGroup``.

Assuming the model is distributed using FSDP, and there is a batch of
data on the control process, for data parallelism, the underlying
calling process is:

.. code:: python

   data = xxx
   data_list = data.chunk(dp_size)

   output = []
   for d in data_list:
       # worker_group._workers[i] is a SampleGenerator
       output.append(worker_group._workers[i].generate_sequences.remote(d))

   output = ray.get(output)
   output = torch.cat(output)

Single process calling multiple processes involves the following 3
steps:

1. Split the data into DP parts on the control process.
2. Send the data to remote, call the remote computation through RPC, and
   utilize multi-process computation.
3. Obtain the computation results of each worker on the control process
   and merge them.

Frequently calling these 3 steps on the controller process greatly hurts
code readability. **In veRL, we have abstracted and encapsulated these 3
steps, so that the worker's method + dispatch + collect can be
registered into the worker_group**

.. code:: python

   from verl.single_controller.base.decorator import register

   def dispatch_data(worker_group, data):
       return data.chunk(worker_group.world_size)
       
   def collect_data(worker_group, data):
       return torch.cat(data)

   dispatch_mode = {
       'dispatch_fn': dispatch_data,
       'collect_fn': collect_data
   }

   @register(dispatch_mode=dispatch_mode)
   def generate_sequences(self, data):
       pass

In this way, we can directly call the method inside the worker through
the ``worker_group`` on the control (driver) process (which is a single
process):

.. code:: python

   output = worker_group.generate_sequences(data)

This single line includes data splitting, data distribution and
computation, and data collection.

Furthermore, the model parallelism size of each model is usually fixed,
including dp, tp, pp. So for these common distributed scenarios, we have
pre-implemented specific dispatch and collect methods,in `decorator.py <https://github.com/volcengine/verl/blob/main/verl/single_controller/base/decorator.py>`_, which can be directly used to wrap the computations.

.. code:: python

   from verl.single_controller.base.decorator import register, Dispatch

   @register(dispatch_mode=Dispatch.DP_COMPUTE_PROTO)
   def generate_sequences(self, data: DataProto) -> DataProto:
       pass

Here it requires the data interface to be ``DataProto``. Definition of
``DataProto`` is in `protocol.py <https://github.com/volcengine/verl/blob/main/verl/protocol.py>`_.

Step 3: Main training loop
~~~~~~~~~~~~~~~~~~~~~~~~~~

With the above training flows, we can implement the algorithm's control
flow. It is recommended that ``main_task`` is also a ray remote process.

.. code:: python

   @ray.remote(num_cpus=1)
   def main_task(config):
       # construct SampleGenerator
       resource_pool = RayResourcePool(process_on_nodes=[8] * 2)  # 16 GPUs
       ray_cls = RayClassWithInitArgs(SampleGenerator, config=config) 
       # put SampleGenerator onto resource pool
       sample_gen = RayWorkerGroup(resource_pool, ray_cls)
       
       # construct reference policy
       ray_cls = RayClassWithInitArgs(ReferencePolicy)
       ref_policy = RayWorkerGroup(resource_pool, ray_cls)
       
       # construct actor
       ray_cls = RayClassWithInitArgs(DPOActor)  
       dpo_policy = RayWorkerGroup(resource_pool, ray_cls)
       
       dataloader = DataLoader()
       
       for data in dataloader:
           # generate data
           data = sample_gen.generate_sequences(data)
           # generate scores for each data 
           data = generate_scores(data)
           # generate pairwise data using scores
           data = generate_pairwise_data(data)
           # generate ref_log_prob
           data.batch['ref_log_prob'] = ref_policy.infer(data)
           # update using dpo
           dpo_policy.update(data)
           # logging

Here, different ``WorkerGroups`` can be placed in the same resource pool or
in different resource pools using ``create_colocated_worker_cls``
similar as in `ray_trainer.py <https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/ray_trainer.py>`_.


================================================
FILE: docs/advance/fsdp_extension.rst
================================================

Add models with the FSDP backend
==================================

Model
--------------------------

In principle, our FSDP backend can support any HF model and we can
sychronoize the actor model weight with vLLM using `hf_weight_loader.py <https://github.com/volcengine/verl/blob/main/verl/third_party/vllm/vllm_v_0_5_4/hf_weight_loader.py>`_.
However, ``hf_weight_loader`` is will gather the full state_dict of a
model during synchronization, which may cause OOM. We suggest using
``dtensor_weight_loader`` which gather the full model parameter layer by
layer to reduce the peak memory usage. We already support dtensor weight
loader for the models below in `dtensor_weight_loader.py <https://github.com/volcengine/verl/blob/main/verl/third_party/vllm/vllm_v_0_5_4/dtensor_weight_loaders.py>`_.:

- ``GPT2LMHeadModel``
- ``LlamaForCausalLM``
- ``LLaMAForCausalLM``
- ``MistralForCausalLM``
- ``InternLMForCausalLM``
- ``AquilaModel``
- ``AquilaForCausalLM``
- ``Phi3ForCausalLM``
- ``GemmaForCausalLM``
- ``Gemma2ForCausalLM``
- ``GPTBigCodeForCausalLM``
- ``Starcoder2ForCausalLM``
- ``Qwen2ForCausalLM``
- ``DeepseekV2ForCausalLM``

To implement ``dtensor_weight_loader`` of a model that's supported in
vLLM, follow the guide of gemma model below:

1. Copy the
   ``load_weights(self, weights: Iterable[Tuple[str, torch.Tensor]])`` from the vllm model class
   to ``dtensor_weight_loaders.py``
2. Modify the arguments to
   ``(actor_weights: Dict, vllm_model: nn.Module)``
3. Replace the ``self`` to ``vllm_model``
4. Add the
   ``local_loaded_weight = redistribute_dtensor(param_name=name, loaded_weights=loaded_weight)``
   before each ``param = params_dict[name]`` and modify the following
   weight loading using ``local_loaded_weight``.
5. Register the implemented dtensor weight loader to ``__MODEL_DTENSOR_WEIGHT_LOADER_REGISTRY__``.

.. code-block:: diff

    - def load_weights(self, weights: Iterable[Tuple[str, torch.Tensor]]):
    + def gemma_dtensor_weight_loader(actor_weights: Dict, vllm_model: nn.Module) -> nn.Module:
        stacked_params_mapping = [
            # (param_name, shard_name, shard_id)
            ("qkv_proj", "q_proj", "q"),
            ("qkv_proj", "k_proj", "k"),
            ("qkv_proj", "v_proj", "v"),
            ("gate_up_proj", "gate_proj", 0),
            ("gate_up_proj", "up_proj", 1),
        ]
    -   params_dict = dict(self.named_parameters())
    +   params_dict = dict(vllm_model.named_parameters())
        loaded_params = set()
    -   for name, loaded_weight in weights:
    +   for name, loaded_weight in actor_weights.items():
            for (param_name, shard_name, shard_id) in stacked_params_mapping:
                if shard_name not in name:
                    continue
                name = name.replace(shard_name, param_name)
                # Skip loading extra bias for GPTQ models.
                if name.endswith(".bias") and name not in params_dict:
                    continue
    +           local_loaded_weight = redistribute_dtensor(param_name=name, loaded_weights=loaded_weight)
                param = params_dict[name]
                weight_loader = param.weight_loader
    -           weight_loader(param, loaded_weight, shard_id)
    +           weight_loader(param, local_loaded_weight.to(dtype=param.dtype), shard_id)
                break
            else:
                # lm_head is not used in vllm as it is tied with embed_token.
                # To prevent errors, skip loading lm_head.weight.
                if "lm_head.weight" in name:
                    continue
                # Skip loading extra bias for GPTQ models.
                if name.endswith(".bias") and name not in params_dict:
                    continue
    +           local_loaded_weight = redistribute_dtensor(param_name=name, loaded_weights=loaded_weight)
                param = params_dict[name]
                weight_loader = getattr(param, "weight_loader",
                                        default_weight_loader)
    -           weight_loader(param, loaded_weight)
    +           weight_loader(param, local_loaded_weight.to(dtype=param.dtype))
            loaded_params.add(name)
        unloaded_params = params_dict.keys() - loaded_params
        if unloaded_params:
            raise RuntimeError(
                "Some weights are not initialized from checkpoints: "
                f"{unloaded_params}")

================================================
FILE: docs/advance/megatron_extension.rst
================================================
Add models with the Megatron-LM backend
=========================================

Model
-----------

The most challenging aspect to use the Megatron-LM backend is implementing
the models for training. Currently, we implement Llama model that
support data parallelism, tensor parallelism, pipeline parallelism (also
vPP) and sequence parallelism. We also implement remove padding (sequence packing) on Llama
model, which can be found in `modeling_llama_megatron.py <https://github.com/volcengine/verl/blob/main/verl/models/llama/megatron/modeling_llama_megatron.py>`_.

To support other model, users are required to implement:

1. Implemnt a model similar to ``modeling_llama_megatron.py`` that satisfy the
   parallelism requirements of Megatron-LM. Then register your model in
   the `registry.py <https://github.com/volcengine/verl/blob/main/verl/models/registry.py>`_.
2. Checkpoint utils that can load full checkpoint (e.g. huggingface
   checkpoint) to partitioned models during the runtime. Then register
   your loader to ``weight_loader_registry`` in `weight_loader_registry.py <https://github.com/volcengine/verl/blob/main/verl/models/weight_loader_registry.py>`_.
3. Weight loader that synchronize the weight from Megatron to rollout
   (vLLM) model. Note that both the actor model and rollout model are
   partitioned during runtime. So, it's advisable to map the model name
   in actor model implementation. Otherwise, you may need an additional
   name mapping and even weight transformation. The weight loader implementation
   is in `megatron_weight_loaders.py <https://github.com/volcengine/verl/blob/main/verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py>`_.

================================================
FILE: docs/advance/placement.rst
================================================
Ray API Design Tutorial
=======================================

We provide a tutorial for our Ray API design, including:

- Ray basic concepts
- Resource Pool and RayWorkerGroup
- Data Dispatch, Execution and Collection
- Initialize the RayWorkerGroup and execute the distributed computation in the given Resource Pool

See details in `tutorial.ipynb <https://github.com/volcengine/verl/blob/main/examples/ray/tutorial.ipynb>`_.

================================================
FILE: docs/conf.py
================================================
# Copyright 2024 Bytedance Ltd. and/or its affiliates
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Path setup --------------------------------------------------------------

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))


# -- Project information -----------------------------------------------------

project = u'veRL'
# pylint: disable=W0622
copyright = u'2024 ByteDance Seed Foundation MLSys Team'
author = u'Guangming Sheng, Chi Zhang, Yanghua Peng, Haibin Lin'


# -- General configuration ---------------------------------------------------
# The master toctree document.
master_doc = 'index'

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = ['recommonmark',
  'sphinx.ext.autosectionlabel',
]

# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
source_suffix = ['.rst', 'rest', '.md']

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = u'en'

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']


# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages.  See the documentation for
# a list of builtin themes.
#
html_theme = 'sphinx_rtd_theme'

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']

================================================
FILE: docs/examples/config.rst
================================================
.. _config-explain-page:

Config Explaination
===================

ppo_trainer.yaml for FSDP Backend
---------------------------------

Data
~~~~

.. code:: yaml

   data:
     tokenizer: null
     train_files: ~/data/rlhf/gsm8k/train.parquet
     val_files: ~/data/rlhf/gsm8k/test.parquet
     prompt_key: prompt
     max_prompt_length: 512
     max_response_length: 512
     train_batch_size: 1024
     val_batch_size: 1312
     return_raw_input_ids: False  # This should be set to true when the tokenizer between policy and rm differs
     return_raw_chat: False

- ``data.train_files``: Training set parquet. Can be a list or a single
  file. The program will read all files into memory, so it can't be too
  large (< 100GB). The path can be either local path or HDFS path. For
  HDFS path, we provide utils to download it to DRAM and convert the
  HDFS path to local path.
- ``data.val_files``: Validation parquet. Can be a list or a single
  file.
- ``data.prompt_key``: The field in the dataset where the prompt is
  located. Default is 'prompt'.
- ``data.max_prompt_length``: Maximum prompt length. All prompts will be
  left-padded to this length. An error will be reported if the length is
  too long
- ``data.max_response_length``: Maximum response length. Rollout in RL
  algorithms (e.g. PPO) generates up to this length
- ``data.train_batch_size``: Batch size sampled for one training
  iteration of different RL algorithms.
- ``data.val_batch_size``: Batch size sampled for one validation
  iteration.
- ``data.return_raw_input_ids``: Whether to return the original
  input_ids without adding chat template. This is mainly used to
  accommodate situations where the reward model's chat template differs
  from the policy. It needs to be decoded first, then apply the RM's
  chat template. If using a model-based RM, and the policy and RM
  chat_templates are different, this flag needs to be set
- ``data.return_raw_chat``:
- ``data.truncation``: Truncate the input_ids or prompt length if they
  exceed max_prompt_length. Default is 'error', not allow exceed the
  max_prompt_length. The users should increase the max_prompt_length if
  throwing the error.

Actor/Rollout/Reference Policy
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: yaml

   actor_rollout_ref:
     hybrid_engine: True
     model:
       path: ~/models/deepseek-llm-7b-chat
       external_lib: null
       override_config: {}
       enable_gradient_checkpointing: False
     actor:
       strategy: fsdp  # This is for backward-compatibility
       ppo_mini_batch_size: 256
       ppo_micro_batch_size: 64
       grad_clip: 1.0
       clip_ratio: 0.2
       entropy_coeff: 0.001
       ppo_epochs: 1
       shuffle: True
       optim:
         lr: 1e-6
         lr_warmup_steps_ratio: 0.  # the total steps will be injected during runtime
         min_lr_ratio: null   # only useful for warmup with cosine
         warmup_style: constant  # select from constant/cosine
         total_training_steps: -1  # must be override by program
       fsdp_config:
         wrap_policy:
           # transformer_layer_cls_to_wrap: None
           min_num_params: 0
         param_offload: False
         grad_offload: False
         optimizer_offload: False
     ref:
       fsdp_config:
         param_offload: False
         wrap_policy:
           # transformer_layer_cls_to_wrap: None
           min_num_params: 0
       log_prob_micro_batch_size: 128
     rollout:
       name: vllm
       temperature: 1.0
       top_k: -1 # 0 for hf rollout, -1 for vllm rollout
       top_p: 1
       response_length: ${data.max_response_length}
       # for vllm rollout
       dtype: bfloat16 # should align with FSDP
       gpu_memory_utilization: 0.5
       ignore_eos: False
       enforce_eager: True
       free_cache_engine: True
       load_format: dummy_dtensor # or dummy_hf or dummy_megatron
       tensor_model_parallel_size: 2
       max_num_batched_tokens: 8192
       max_num_seqs: 1024
       log_prob_micro_batch_size: 128
       # for vllm and hf rollout
       do_sample: True

**Common config for actor, rollout and reference model**

- ``actor_rollout_ref.hybrid_engine``: Whether it's a hybrid engine,
  currently only supports hybrid engine
- ``actor_rollout_ref.model.path``: Huggingface model path. This can be
  either local path or HDFS path. For HDFS path, we provide utils to
  download it to DRAM and convert the HDFS path to local path.
- ``actor_rollout_ref.model.external_libs``: Additional Python packages
  that need to be imported. Used to register models or tokenizers into
  the Huggingface system.
- ``actor_rollout_ref.model.override_config``: Used to override some of
  the model's original configurations, mainly dropout
- ``actor_rollout_ref.model.enable_gradient_checkpointing``: Whether to
  enable gradient checkpointing for the actor

**Actor model**

- ``actor_rollout_ref.actor.strategy``: fsdp or megatron. In this
  example, we use fsdp backend.

- ``actor_rollout_ref.actor.ppo_mini_batch_size``: One sample is split
  into multiple sub-batches with batch_size=ppo_mini_batch_size for PPO
  updates

- ``actor_rollout_ref.actor.ppo_micro_batch_size``: Similar to gradient
  accumulation, the micro_batch_size for one forward pass, trading speed
  for GPU memory

- ``actor_rollout_ref.actor.grad_clip``: Gradient clipping for actor
  updates

- ``actor_rollout_ref.actor.clip_ratio``: PPO clip ratio

- ``actor_rollout_ref.actor.entropy_coeff``: The weight of entropy when
  calculating PPO loss

- ``actor_rollout_ref.actor.ppo_epochs``: Number of epochs for PPO
  updates on one set of sampled data

- ``actor_rollout_ref.actor.shuffle``: Whether to shuffle data when
  there are multiple epochs

- ``actor_rollout_ref.actor.optim``: Actor's optimizer parameters

- ``actor_rollout_ref.actor.fsdp_config``: FSDP config for actor
  training

  - ``wrap_policy``: FSDP wrap policy. By default, it uses Huggingface's
    wrap policy, i.e., wrapping by DecoderLayer

    - No need to set transformer_layer_cls_to_wrap, so we comment it.

  - ``*_offload``: Whether to enable parameter, gradient and optimizer
    offload

    - Trading speed for GPU memory.

**Reference Model**

- ``actor_rollout_ref.ref``: FSDP config same as actor. **For models
  larger than 7B, it's recommended to turn on offload for ref by
  default**
- ``actor_rollout_ref.ref.log_prob_micro_batch_size``: The batch size
  for one forward pass in the computation of ``ref_log_prob``.

**Rollout Model**

- ``actor_rollout_ref.rollout.name``: hf/vllm. We use vLLM by default
  because it's much efficient and our hybrid engine is implemented with
  vLLM.

- Rollout (Auto-regressive) parameters. The key should be equal to the
  property name in vLLM's ``SamplingParams``.

  - ``temperature``, ``top_k``, ``top_p`` and others: Sampling
    parameters in ``SamplingParams``.

- ``dtype``: Rollout model parameters type. This should be align with
  the actor model parameter type in FSDP/Megatron backend.

- ``gpu_memory_utilization``: The proportion of the remaining GPU memory
  allocated for kv cache after other models have initialized when using
  vllm.

- ``tensor_model_parallel_size``: TP size for rollout. Only effective
  for vllm.

- ``log_prob_micro_batch_size``: Micro_batch_size (The batch size for
  one forward pass) for recalculating log_prob.

- ``do_sample``: Whether to sample. If set to False, the rollout model
  will perform greedy sampling. We disable ``do_sample`` during
  validation.

- ``actor_rollout_ref.rollout.ignore_eos``: Whether to ignore the EOS
  token and continue generating tokens after the EOS token is generated.

- ``actor_rollout_ref.rollout.free_cache_engine``: Offload the KVCache
  after rollout generation stage. Default is True. When set to True, we
  need to disable the usage of CUDAGraph (set ``enforce_eager`` to
  True.)

- ``actor_rollout_ref.rollout.enforce_eager``: Whether to use CUDAGraph
  in vLLM generation. Default set to True to disable CUDAGraph.

- ``actor_rollout_ref.rollout.load_format``: Which weight loader to use
  to load the actor model weights to the rollout model.

  - ``auto``: Use Megatron weight loader.
  - ``megatron``: Use Megatron weight loader. Deployed with Megatron
    backend. The input model ``state_dict()`` is already partitioned
    along TP dimension and already gathered along PP dimension. This
    weight loader requires that the Rollout model and Actor model's
    parameters shape and name should be identical.
  - ``dtensor``: Default solution when using Huggingface weight loader.
    Deployed with FSDP backend and the state_dict_type is
    ``StateDictType.SHARDED_STATE_DICT``. Recommend to use this weight
    loader
  - ``hf``: Use Huggingface weight loader. Deployed with FSDP backend
    and the state_dict_type is ``StateDictType.FULL_STATE_DICT``. This
    solution doesn't need to rewrite the weight loader for each model
    implemented in vLLM but it results in larger peak memory usage.
  - ``dummy_hf``, ``dummy_megatron``, ``dummy_dtensor``: Random
    initialization.

.. note:: **NOTED**: In this config field, users only need to select from ``dummy_megatron``, ``dummy_dtensor``, ``dummy_hf`` for rollout initialization and our hybrid engine will select the corresponding weight loader (i.e., ``megatron``, ``dtensor``, ``hf``) during actor/rollout weight synchronization.

Critic Model
~~~~~~~~~~~~

Most parameters for Critic are similar to Actor Model.

Reward Model
~~~~~~~~~~~~

.. code:: yaml

   reward_model:
     enable: False
     model:
       input_tokenizer: ${actor_rollout_ref.model.path}  # set this to null if the chat template is identical
       path: ~/models/Anomy-RM-v0.1
       external_lib: ${actor_rollout_ref.model.external_lib}
       fsdp_config:
         min_num_params: 0
         param_offload: False
     micro_batch_size: 64
     max_length: null

- ``reward_model.enable``: Whether to enable reward model. If False, we
  compute the reward only with the user-defined reward functions. In
  GSM8K and Math examples, we disable reward model. For RLHF alignment
  example using full_hh_rlhf, we utilize reward model to assess the
  responses. If False, the following parameters are not effective.
- ``reward_model.model``

  - ``input_tokenizer``: Input tokenizer. If the reward model's chat
    template is inconsistent with the policy, we need to first decode to
    plaintext, then apply the rm's chat_template. Then score with RM. If
    chat_templates are consistent, it can be set to null.
  - ``path``: RM's HDFS path or local path. Note that RM only supports
    AutoModelForSequenceClassification. Other model types need to define
    their own RewardModelWorker and pass it from the code.

Algorithm
~~~~~~~~~

.. code:: yaml

   algorithm:
     gamma: 1.0
     lam: 1.0
     adv_estimator: gae
     kl_penalty: kl  # how to estimate kl divergence
     kl_ctrl:
       type: fixed
       kl_coef: 0.005

- ``gemma``: discount factor
- ``lam``: Trade-off between bias and variance in the GAE estimator
- ``adv_estimator``: gae. Currently only supports gae, will support GRPO
  in the future
- ``kl_penalty``\ :Support ``kl``, ``abs``, ``mse`` and ``full``.How to
  calculate the kl divergence between actor and reference policy. For
  specific options, refer to `core_algos.py <https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/core_algos.py#L192>`_ .

Trainer
~~~~~~~

.. code:: yaml

   trainer:
     total_epochs: 30
     project_name: verl_examples
     experiment_name: gsm8k
     logger: ['console', 'wandb']
     nnodes: 1
     n_gpus_per_node: 8
     save_freq: -1
     test_freq: 2
     critic_warmup: 0
     default_hdfs_dir: ~/experiments/gsm8k/ppo/${trainer.experiment_name} # hdfs checkpoint path
     default_local_dir: checkpoints/${trainer.project_name}/${trainer.experiment_name} # local checkpoint path

- ``trainer.total_epochs``: Number of epochs in training.
- ``trainer.project_name``: For wandb
- ``trainer.experiment_name``: For wandb
- ``trainer.logger``: Support console and wandb
- ``trainer.nnodes``: Number of nodes used in the training.
- ``trainer.n_gpus_per_node``: Number of GPUs per node.
- ``trainer.save_freq``: The frequency (by iteration) to save checkpoint
  of the actor and critic model.
- ``trainer.test_freq``: The validation frequency (by iteration).
- ``trainer.critic_warmup``: The number of iteration to train the critic
  model before actual policy learning.


================================================
FILE: docs/examples/gsm8k_example.rst
================================================
GSM8K Example
=============

Introduction
------------

In this example, we train an LLM to tackle the GSM8k task.

Paper: https://arxiv.org/pdf/2110.14168

Dataset: https://huggingface.co/datasets/gsm8k

Note that the original paper mainly focuses on training a verifier (a
reward model) to solve math problems via Best-of-N sampling. In this
example, we train an RLHF agent using a rule-based reward model.

Dataset Introduction
--------------------

GSM8k is a math problem dataset. The prompt is an elementary school
problem. The LLM model is required to answer the math problem.

The training set contains 7473 samples and the test set contains 1319
samples.

**An example**

Prompt

   Katy makes coffee using teaspoons of sugar and cups of water in the
   ratio of 7:13. If she used a total of 120 teaspoons of sugar and cups
   of water, calculate the number of teaspoonfuls of sugar she used.

Solution

   The total ratio representing the ingredients she used to make the
   coffee is 7+13 = <<7+13=20>>20 Since the fraction representing the
   number of teaspoons she used is 7/20, she used 7/20\ *120 =
   <<7/20*\ 120=42>>42 #### 42

Step 1: Prepare dataset
-----------------------

.. code:: bash

   cd examples/data_preprocess
   python3 gsm8k.py --local_dir ~/data/gsm8k

Step 2: Download Model
----------------------

There're three ways to prepare the model checkpoints for post-training:

- Download the required models from hugging face

.. code:: bash

   huggingface-cli download deepseek-ai/deepseek-math-7b-instruct --local-dir ~/models/deepseek-math-7b-instruct --local-dir-use-symlinks False

- Already store your store model in the local directory or HDFS path.
- Also, you can directly use the model name in huggingface (e.g.,
  deepseek-ai/deepseek-math-7b-instruct) in
  ``actor_rollout_ref.model.path`` and ``critic.model.path`` field in
  the run script.

Noted that users should prepare checkpoints for actor, critic and reward
model.

[Optional] Step 3: SFT your Model
---------------------------------

We provide a SFT Trainer using PyTorch FSDP in
`fsdp_sft_trainer.py <https://github.com/volcengine/verl/blob/main/verl/trainer/fsdp_sft_trainer.py>`_. 
Users can customize their own SFT
script using our FSDP SFT Trainer.

We also provide various training scripts for SFT on GSM8K dataset in `gsm8k sft directory <https://github.com/volcengine/verl/blob/main/examples/gsm8k/sft/>`_.

.. code:: shell

   set -x

   torchrun -m verl.trainer.fsdp_sft_trainer \
       data.train_files=$HOME/data/gsm8k/train.parquet \
       data.val_files=$HOME/data/gsm8k/test.parquet \
       data.prompt_key=question \
       data.response_key=answer \
       data.micro_batch_size=8 \
       model.partial_pretrain=deepseek-ai/deepseek-coder-6.7b-instruct \
       trainer.default_hdfs_dir=hdfs://user/verl/experiments/gsm8k/deepseek-coder-6.7b-instruct/ \
       trainer.project_name=gsm8k-sft \
       trainer.experiment_name=gsm8k-sft-deepseek-coder-6.7b-instruct \
       trainer.total_epochs=4 \
       trainer.logger=['console','wandb']

Step 4: Perform PPO training with your model on GSM8K Dataset
-------------------------------------------------------------

- Prepare your own run.sh script. Here's an example for GSM8k dataset
  and deepseek-llm-7b-chat model.
- Users could replace the ``data.train_files`` ,\ ``data.val_files``,
  ``actor_rollout_ref.model.path`` and ``critic.model.path`` based on
  their environment.
- See :doc:`config` for detailed explaination of each config field.

**Reward Model/Function**

We use a rule-based reward model. We force the model to produce a final
answer following 4 “#” as shown in the solution. We extract the final
answer from both the solution and model's output using regular
expression matching. We compare them and assign a reward of 1 to correct
answer, 0.1 to incorrect answer and 0 to no answer.

**Training Script**

The training script example for FSDP and Megatron-LM backend are stored in examples/ppo_trainer directory.

.. code:: bash

   cd ../ppo_trainer
   bash run_deepseek7b_llm.sh

The script of run_deepseek7b_llm.sh

.. code:: bash

   set -x

   python3 -m verl.trainer.main_ppo \
       data.train_files=~/data/rlhf/gsm8k/train.parquet \
       data.val_files=~/data/rlhf/gsm8k/test.parquet \
       data.train_batch_size=1024 \
       data.val_batch_size=1312 \
       data.max_prompt_length=512 \
       data.max_response_length=512 \
       actor_rollout_ref.model.path=~/models/deepseek-llm-7b-chat \
       actor_rollout_ref.actor.optim.lr=1e-6 \
       actor_rollout_ref.actor.ppo_mini_batch_size=256 \
       actor_rollout_ref.actor.ppo_micro_batch_size=64 \
       actor_rollout_ref.actor.fsdp_config.param_offload=False \
       actor_rollout_ref.actor.fsdp_config.grad_offload=False \
       actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
       actor_rollout_ref.rollout.micro_batch_size=256 \
       actor_rollout_ref.rollout.log_prob_micro_batch_size=128 \
       actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
       actor_rollout_ref.rollout.name=vllm \
       actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
       actor_rollout_ref.ref.log_prob_micro_batch_size=128 \
       actor_rollout_ref.ref.fsdp_config.param_offload=True \
       critic.optim.lr=1e-5 \
       critic.model.path=~/models/deepseek-llm-7b-chat \
       critic.model.enable_gradient_checkpointing=False \
       critic.ppo_micro_batch_size=64 \
       critic.model.fsdp_config.param_offload=False \
       critic.model.fsdp_config.grad_offload=False \
       critic.model.fsdp_config.optimizer_offload=False \
       algorithm.kl_ctrl.kl_coef=0.001 \
       trainer.critic_warmup=0 \
       trainer.logger=['console','wandb'] \
       trainer.project_name='verl_example_gsm8k' \
       trainer.experiment_name='deepseek_llm_7b_function_rm' \
       trainer.n_gpus_per_node=8 \
       trainer.nnodes=1 \
       trainer.save_freq=-1 \
       trainer.total_epochs=15


================================================
FILE: docs/examples/ppo_code_architecture.rst
================================================
PPO Example Architecture
========================

Let's start with the Proximal Policy Optimization algorithm, which is
most widely used algorithm in LLM post-training.

The main entry point of the PPO algorithm example is:
`main_ppo.py <https://github.com/volcengine/verl/blob/main/verl/trainer/main_ppo.py>`_.
In this tutorial, we will go through the code architecture in `main_ppo.py <https://github.com/volcengine/verl/blob/main/verl/trainer/main_ppo.py>`_.

Define the data
---------------

Users need to preprocess and store the dataset in parquet files.
And we implement `RLHFDataset` to load and tokenize the parquet files.

For ``RLHFDataset`` (Default), at least 1 fields are required:

- ``prompt``: Contains the string prompt

We already provide some examples of processing the datasets to parquet
files in `data_preprocess directory <https://github.com/volcengine/verl/blob/main/examples/data_preprocess>`_. Currently, we support
preprocess of GSM8k, MATH, Hellasage, Full_hh_rlhf datasets. See :doc:`../preparation/prepare_data` for
more information.

Define the reward functions for different datasets
--------------------------------------------------

In this main entry point, the users only need to define their own reward
function based on the datasets (or applications) utilized in PPO
training.

For example, we already provide reward functions for `GSM8k <https://github.com/volcengine/verl/blob/main/verl/utils/reward_score/gsm8k.py>`_ 
and `MATH <https://github.com/volcengine/verl/blob/main/verl/utils/reward_score/math.py>`_
datasets in the ``_select_rm_score_fn``. In the ``RewardManager``, we
will compute the reward score based on the data_source to select
corresponding reward functions. For some RLHF datasets (e.g.,
full_hh_rlhf), the reward model is utilized to assess the responses
without any reward functions. In this case, the ``RewardManager`` will
return the ``rm_score`` computed by the reward model directly.

See `reward functions <https://github.com/volcengine/verl/blob/main/verl/utils/reward_score>`_ for detailed implementation.

Define worker classes
---------------------

.. code:: python

   if config.actor_rollout_ref.actor.strategy == 'fsdp': # for FSDP backend
       assert config.actor_rollout_ref.actor.strategy == config.critic.strategy
       from verl.workers.fsdp_workers import ActorRolloutRefWorker, CriticWorker
       from verl.single_controller.ray import RayWorkerGroup
       ray_worker_group_cls = RayWorkerGroup

   elif config.actor_rollout_ref.actor.strategy == 'megatron': # for Megatron backend
       assert config.actor_rollout_ref.actor.strategy == config.critic.strategy
       from verl.workers.megatron_workers import ActorRolloutRefWorker, CriticWorker
       from verl.single_controller.ray.megatron import NVMegatronRayWorkerGroup
       ray_worker_group_cls = NVMegatronRayWorkerGroup # Ray worker class for Megatron-LM

   else:
       raise NotImplementedError

   from verl.trainer.ppo.ray_trainer import ResourcePoolManager, Role

   role_worker_mapping = {
       Role.ActorRollout: ActorRolloutRefWorker,
       Role.Critic: CriticWorker,
       Role.RefPolicy: ActorRolloutRefWorker
   }

   global_pool_id = 'global_pool'
   resource_pool_spec = {
       global_pool_id: [config.trainer.n_gpus_per_node] * config.trainer.nnodes,
   }
   mapping = {
       Role.ActorRollout: global_pool_id,
       Role.Critic: global_pool_id,
       Role.RefPolicy: global_pool_id,
   }

Step 1: Construct the mapping between roles and workers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A role represents a group of workers in the same process. We have
pre-defined several roles in `ray_trainer.py <https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/ray_trainer.py#L38>`_.

.. code:: python

   class Role(Enum):
       """
       To create more roles dynamically, you can subclass Role and add new members
       """
       Actor = 0  # This worker only has Actor
       Rollout = 1 # This worker only has Rollout
       ActorRollout = 2 # This worker has both actor and rollout, it's a HybridEngine
       Critic = 3 # This worker only has critic
       RefPolicy = 4 # This worker only has reference policy
       RewardModel = 5 # This worker only has reward model
       ActorRolloutRef = 6 # This worker contains actor, rollout and reference policy simultaneously 

Step 2: Define the worker class corresponding to this role
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- We have pre-implemented the ``ActorRolloutRefWorker``. Through
  different configs, it can be a standalone actor, a standalone rollout,
  an ActorRollout HybridEngine, or an ActorRolloutRef HybridEngine
- We also pre-implemented workers for ``Actor``, ``Rollout``,
  ``Critic``, ``Reward Model`` and ``Reference model`` on two different
  backend: PyTorch FSDP
  and Megatron-LM.
  See `FSDP Workers <https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/workers/fsdp_workers.py>`_ 
  and `Megatron-LM Workers <https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/workers/megatron_workers.py>`_
  for more information.

Step 3: Define resource pool id and resource pool spec
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- Resource pool is a division of global GPU resources,
  ``resource_pool_spec`` is a dict, mapping from id to # of GPUs

  - In the above example, we defined a global resource pool:
    global_pool_id, and then put all roles on this one resource pool
    with all the GPUs in this post-training task. This refers to
    *co-locate* placement where all the models share the same set of
    GPUs.

- See resource pool and placement for advance usage.

Defining reward model/function
------------------------------

.. code:: python

   # we should adopt a multi-source reward function here
   # - for rule-based rm, we directly call a reward score
   # - for model-based rm, we call a model
   # - for code related prompt, we send to a sandbox if there are test cases
   # - finally, we combine all the rewards together
   # - The reward type depends on the tag of the data
   if config.reward_model.enable:
       from verl.workers.fsdp_workers import RewardModelWorker
       role_worker_mapping[Role.RewardModel] = RewardModelWorker
       mapping[Role.RewardModel] = global_pool_id
    
   reward_fn = RewardManager(tokenizer=tokenizer, num_examine=0)

   # Note that we always use function-based RM for validation
   val_reward_fn = RewardManager(tokenizer=tokenizer, num_examine=1)

   resource_pool_manager = ResourcePoolManager(resource_pool_spec=resource_pool_spec, mapping=mapping)

Since not all tasks use model-based RM, users need to define here
whether it's a model-based RM or a function-based RM

- If it's a model-based RM, directly add the ``RewardModel`` role in the
  resource mapping and add it to the resource pool mapping.

  - Note that the pre-defined ``RewardModelWorker`` only supports models
    with the structure of huggingface
    ``AutoModelForSequenceClassification``. If it's not this model, you
    need to define your own RewardModelWorker in `FSDP Workers <https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/workers/fsdp_workers.py>`_ 
    and `Megatron-LM Workers <https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/workers/megatron_workers.py>`_.

- If it's a function-based RM, the users are required to classified the
  reward function for each datasets.

.. code:: python

   def _select_rm_score_fn(data_source):
       if data_source == 'openai/gsm8k':
           return gsm8k.compute_score
       elif data_source == 'lighteval/MATH':
           return math.compute_score
       else:
           raise NotImplementedError

See reward functions implemented in `directory <https://github.com/volcengine/verl/blob/main/verl/utils/reward_score/>`_ 
for more information.

Define, init and run the PPO Trainer
------------------------------------

.. code:: python

   trainer = RayPPOTrainer(config=config,
                           tokenizer=tokenizer,
                           role_worker_mapping=role_worker_mapping,
                           resource_pool_manager=resource_pool_manager,
                           ray_worker_group_cls=ray_worker_group_cls,
                           reward_fn=reward_fn,
                           val_reward_fn=val_reward_fn)
   trainer.init_workers()
   trainer.fit()

- We first initialize the ``RayPPOTrainer`` with user config, tokenizer
  and all the above worker mapping, resource pool, worker group and
  reward functions
- We first call the ``trainer.init_workers()`` to initialize the models
  on the allocated GPUs (in the resource pool)
- The actual PPO training will be executed in ``trainer.fit()``

veRL can be easily extended to other RL algorithms by reusing the Ray
model workers, resource pool and reward functions. See :doc:`extension<../advance/dpo_extension>` for
more information.

Details of the ``RayPPOTrainer`` is discussed in :doc:`Ray Trainer<../workers/ray_trainer>`.


================================================
FILE: docs/experiment/ppo.rst
================================================
.. _algo-baseline-page:

Algorithm Baselines
===================

GSM8k 
------------------

Assuming GSM8k dataset is preprocess via ``python3 examples/data_preprocess/gsm8k.py``

Refer to the table below to reproduce PPO training from different pre-trained models.

.. _Huggingface: https://huggingface.co/google/gemma-2-2b-it#benchmark-results
.. _SFT Command and logs: https://github.com/eric-haibin-lin/verl-data/blob/experiments/gsm8k/gemma-2-2b-it-sft-0.411.log
.. _SFT+PPO Command and logs: https://github.com/eric-haibin-lin/verl-data/blob/experiments/gsm8k/gemma-2-2b-it-ppo-bsz512_4-prompt1024-resp-512-0.640.log
.. _wandb: https://api.wandb.ai/links/verl-team/h7ux8602
.. _Qwen Blog: https://qwenlm.github.io/blog/qwen2.5-llm/
.. _PPO Command and logs: https://github.com/eric-haibin-lin/verl-data/blob/experiments/gsm8k/Qwen2.5-0.5B-bsz256_2-prompt1024-resp512-0.567.log

+----------------------------+------------------------+------------+-----------------------------------------------------------------------------------------------+
| Model                      | Method                 | Test score |  Details                                                                                      |
+============================+========================+============+=====================+=========================================================================+
| google/gemma-2-2b-it       | pretrained checkpoint  | 23.9       |   `Huggingface`_                                                                              |
+----------------------------+------------------------+------------+-----------------------------------------------------------------------------------------------+
| google/gemma-2-2b-it       | SFT                    | 52.06      |   `SFT Command and logs`_                                                                     |
+----------------------------+------------------------+------------+-----------------------------------------------------------------------------------------------+
| google/gemma-2-2b-it       | SFT + PPO              | 64.02      |   `SFT+PPO Command and logs`_, `wandb`_                                                       |
+----------------------------+------------------------+------------+-----------------------------------------------------------------------------------------------+
| Qwen/Qwen2.5-0.5B-Instruct | pretrained checkpoint  | 36.4       |   `Qwen Blog`_                                                                                |
+----------------------------+------------------------+------------+-----------------------------------------------------------------------------------------------+
| Qwen/Qwen2.5-0.5B-Instruct | PPO                    | 56.7       |   `PPO Command and logs`_                                                                     |
+----------------------------+------------------------+------------+-----------------------------------------------------------------------------------------------+

================================================
FILE: docs/faq/faq.rst
================================================
Frequently Asked Questions
====================================

Ray related
------------

How to add breakpoint for debugging with distributed Ray?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Please checkout the official debugging guide from Ray: https://docs.ray.io/en/latest/ray-observability/ray-distributed-debugger.html


Distributed training
------------------------

How to run multi-node post-training with Ray?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You can start a ray cluster and submit a ray job, following the official guide from Ray: https://docs.ray.io/en/latest/ray-core/starting-ray.html


================================================
FILE: docs/index.rst
================================================
Welcome to veRL's documentation!
================================================

.. _hf_arxiv: https://arxiv.org/pdf/2409.19256

veRL is a flexible, efficient and production-ready RL training framework designed for large language models (LLMs) post-training. It is an open source implementation of the `HybridFlow <hf_arxiv>`_ paper.

veRL is flexible and easy to use with:

- **Easy extension of diverse RL algorithms**: The Hybrid programming model combines the strengths of single-controller and multi-controller paradigms to enable flexible representation and efficient execution of complex Post-Training dataflows. Allowing users to build RL dataflows in a few lines of code.

- **Seamless integration of existing LLM infra with modular APIs**: Decouples computation and data dependencies, enabling seamless integration with existing LLM frameworks, such as PyTorch FSDP, Megatron-LM and vLLM. Moreover, users can easily extend to other LLM training and inference frameworks.

- **Flexible device mapping and parallelism**: Supports various placement of models onto different sets of GPUs for efficient resource utilization and scalability across different cluster sizes.

- Readily integration with popular HuggingFace models


veRL is fast with:

- **State-of-the-art throughput**: By seamlessly integrating existing SOTA LLM training and inference frameworks, veRL achieves high generation and training throughput.

- **Efficient actor model resharding with 3D-HybridEngine**: Eliminates memory redundancy and significantly reduces communication overhead during transitions between training and generation phases.

--------------------------------------------

.. _Contents:

.. toctree::
   :maxdepth: 5
   :caption: Quickstart
   :titlesonly:
   :numbered:

   start/install
   start/quickstart

.. toctree::
   :maxdepth: 5
   :caption: Data Preparation
   :titlesonly:
   :numbered:

   preparation/prepare_data
   preparation/reward_function

.. toctree::
   :maxdepth: 2
   :caption: PPO Example
   :titlesonly:
   :numbered:

   examples/ppo_code_architecture
   examples/config
   examples/gsm8k_example

.. toctree:: 
   :maxdepth: 1
   :caption: PPO Trainer and Workers

   workers/ray_trainer
   workers/fsdp_workers
   workers/megatron_workers

.. toctree::
   :maxdepth: 1
   :caption: Experimental Results

   experiment/ppo

.. toctree::
   :maxdepth: 1
   :caption: Advance Usage and Extension

   advance/placement
   advance/dpo_extension
   advance/fsdp_extension
   advance/megatron_extension

.. toctree::
   :maxdepth: 1
   :caption: FAQ

   faq/faq

Contribution
-------------

veRL is free software; you can redistribute it and/or modify it under the terms
of the Apache License 2.0. We welcome contributions.
Join us on `GitHub <https://github.com/volcengine/verl>`_, `Slack <https://join.slack.com/t/verlgroup/shared_invite/zt-2w5p9o4c3-yy0x2Q56s_VlGLsJ93A6vA>`_ and `Wechat <https://raw.githubusercontent.com/eric-haibin-lin/verl-community/refs/heads/main/WeChat.JPG>`_ for discussions.

Code formatting
^^^^^^^^^^^^^^^^^^^^^^^^
We use yapf (Google style) to enforce strict code formatting when reviewing MRs. Run yapf at the top level of verl repo:

.. code-block:: bash

   pip3 install yapf
   yapf -ir -vv --style ./.style.yapf verl examples tests


================================================
FILE: docs/preparation/prepare_data.rst
================================================
Prepare Data (Parquet) for Post-Training
========================================

Before starting the post-training job, we need to prepare the data for
the policy training. The data should be stored in the parquet format.

We provide several data preprocess scripts for different datasets,
including GSM8K, MATH, HelloSwag, Full_hh_rlhf. To prepare other datasets, we need
to follow the following steps: The data preprocess script can be divided
into two parts:

1. The first part is the common part, which loads the dataset from
   huggingface's ``datasets`` package. Then preprocess the datasets with
   the ``make_map_fn`` and then store in the parquet format.

.. code:: python

   import re
   import os
   import datasets

   from verl.utils.hdfs_io import copy, makedirs
   import argparse

   # To extract the solution for each prompts in the dataset
   # def extract_solution(solution_str): 
   # ...


   if __name__ == '__main__':
       parser = argparse.ArgumentParser()
       parser.add_argument('--local_dir', default='/opt/tiger/gsm8k')
       parser.add_argument('--hdfs_dir', default=None)

       args = parser.parse_args()

       num_few_shot = 5
       data_source = 'openai/gsm8k'

       dataset = datasets.load_dataset(data_source, 'main')

       train_dataset = dataset['train']
       test_dataset = dataset['test']

           # Construct a `def make_map_fn(split)` for the corresponding datasets.
       # ...
           
       train_dataset = train_dataset.map(function=make_map_fn('train'), with_indices=True)
       test_dataset = test_dataset.map(function=make_map_fn('test'), with_indices=True)

       local_dir = args.local_dir
       hdfs_dir = args.hdfs_dir

       train_dataset.to_parquet(os.path.join(local_dir, 'train.parquet'))
       test_dataset.to_parquet(os.path.join(local_dir, 'test.parquet'))

       makedirs(hdfs_dir)

       copy(src=local_dir, dst=hdfs_dir)

2. The users are required to implement the ``make_map_fn()`` function
   (as well as the ``extract_solution``) on their own to support
   different datasets or tasks.

We already implemented the data preprocess of GSM8k, MATH, Hellaswag and Full_hh_rlhf
datasets. And we take the GSM8k dataset as an example:

**GSM8K**

In the ``make_map_fn``, each data field should consist of the following
5 fields:

1. ``data_source``: The name of the dataset. To index the corresponding
   reward function in the ``RewardModule``
2. ``prompt``: This field should be constructed in the format of
   huggingface chat_template. The tokenizer in ``RLHFDataset`` will
   apply chat template and tokenize the prompt.
3. ``ability``: Define the task category.
4. ``reward_model``: Currently, we only utilize the ``ground_truth``
   field during evaluation. The ``ground_truth`` is computed by the
   ``extract_solution`` function. **NOTED** that the implementation of
   the corresponding reward function should align with this extracted
   ``ground_truth``.
5. ``extra_info``: Record some information of the current prompt. Not
   use for now.

.. code:: python

   def extract_solution(solution_str):
       solution = re.search("#### (\\-?[0-9\\.\\,]+)", solution_str) # extract the solution after ####
       assert solution is not None
       final_solution = solution.group(0)
       final_solution = final_solution.split('#### ')[1].replace(',', '')
       return final_solution

   instruction_following = "Let's think step by step and output the final answer after \"####\"."

   # add a row to each data item that represents a unique id
   def make_map_fn(split):

       def process_fn(example, idx):
           question = example.pop('question')

           question = question + ' ' + instruction_following

           answer = example.pop('answer')
           solution = extract_solution(answer)
           data = {
               "data_source": data_source,
               "prompt": [{
                   "role": "user",
                   "content": question
               }],
               "ability": "math",
               "reward_model": {
                   "style": "rule",
                   "ground_truth": solution
               },
               "extra_info": {
                   'split': split,
                   'index': idx
               }
           }
           return data

       return process_fn


================================================
FILE: docs/preparation/reward_function.rst
================================================
Implement Reward Function for Dataset
======================================

For each dataset, we need to implement a reward function or utilize a reward model to compute the rewards for the generated responses.
We already pre-implemented some reward functions in `reward_score directory <https://github.com/volcengine/verl/blob/main/verl/utils/reward_score>`_.

Currently, we support reward functions for GSM8k and MATH datasets. For RLHF datasets (e.g.,
full_hh_rlhf) and Code Generation (e.g., APPS), we utilize reward model
and SandBox (will opensource soon) for evaluation respectively.

RewardManager
-------------

In the entrypoint of the PPO Post-Training script `main_ppo.py <https://github.com/volcengine/verl/blob/main/verl/trainer/main_ppo.py#L33>`_,
we implement a ``RewardManager`` that utilze pre-implemented reward functions to compute the scores for each response.

In the ``RewardManager``, we implemented a ``__call__`` function to
compute the score for each response. 
All the reward functions are executed by ``compute_score_fn``.
The input is a ``DataProto``, which includes:

- ``input_ids``, ``attention_mask``: ``input_ids`` and ``attention_mask`` after applying
  chat_template, including prompt and response
- ``responses``: response tokens
- ``ground_truth``: The ground truth string of the current prompt.
  Stored in ``non_tensor_batch`` in the ``DataProto``, which should be
  preprocessed in the parquet files.
- ``data_source``: The dataset name of the current prompt. Stored in
  ``non_tensor_batch`` in the ``DataProto``, which should be
  preprocessed in the parquet files.

After detokenize the responses, the responses string and the ground
truth string will be input to the ``compute_score_fn`` to compute the
score for each response.

Reward Functions
----------------
We already pre-implemented some reward functions in `reward_score directory <https://github.com/volcengine/verl/blob/main/verl/utils/reward_score>`_.

- In the `GSM8k example <https://github.com/volcengine/verl/blob/main/verl/utils/reward_score/gsm8k.py>`_, we
  force the response to output the final answer after four ####, then
  use string matching to compare with the ground truth. If completely
  correct, score 1 point; if the format is correct, score 0.1 points; if
  the format is incorrect, score 0 points.
- In the `MATH example <https://github.com/volcengine/verl/blob/main/verl/utils/reward_score/math.py>`_, we follow
  the implementation in `lm-evaluation-harness repository <https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/hendrycks_math/utils.py>`_.


================================================
FILE: docs/requirements-docs.txt
================================================
# markdown suport
recommonmark
# markdown table suport
sphinx-markdown-tables

# theme default rtd

# crate-docs-theme
sphinx-rtd-theme

================================================
FILE: docs/start/install.rst
================================================
Installation
============

Requirements
------------

- **Python**: Version >= 3.9
- **CUDA**: Version >= 12.1

veRL supports various backends. Currently, the following configurations are available:

- **FSDP** and **Megatron-LM** (optional) for training.
- **vLLM** adn **TGI** for rollout generation, **SGLang** support coming soon.

Training backends
------------------

We recommend using **FSDP** backend to investigate, research and prototype different models, datasets and RL algorithms. The guide for using FSDP backend can be found in `PyTorch FSDP Backend <https://verl.readthedocs.io/en/latest/workers/fsdp_workers.html>`_.

For users who pursue better scalability, we recommend using **Megatron-LM** backend. Currently, we support Megatron-LM@core_v0.4.0 with some internal patches (soon be updated to latest version directly relying on upstream Megatron-LM). The guide for using Megatron-LM backend can be found in `Megatron-LM Backend <https://verl.readthedocs.io/en/latest/workers/megatron_workers.html>`_.


Install from docker image
-------------------------

We provide pre-built Docker images for quick setup.

Image and tag: ``verlai/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te1.7-v0.0.3``. See files under ``docker/`` if you want to build your own image.

1. Launch the desired Docker image:

.. code:: bash

    docker run --runtime=nvidia -it --rm --shm-size="10g" --cap-add=SYS_ADMIN -v <image:tag>


2.	Inside the container, install veRL:

.. code:: bash

    # install the nightly version (recommended)
    git clone https://github.com/volcengine/verl && cd verl && pip3 install -e .
    # or install from pypi via `pip3 install verl`


3. Setup Megatron (optional)

If you want to enable training with Megatron, Megatron code must be added to PYTHONPATH:

.. code:: bash

    cd ..
    git clone -b core_v0.4.0 https://github.com/NVIDIA/Megatron-LM.git
    cp verl/patches/megatron_v4.patch Megatron-LM/
    cd Megatron-LM && git apply megatron_v4.patch
    pip3 install -e .
    export PYTHONPATH=$PYTHONPATH:$(pwd)


You can also get the Megatron code after verl's patch via

.. code:: bash

    git clone -b core_v0.4.0_verl https://github.com/eric-haibin-lin/Megatron-LM

Install from custom environment
---------------------------------

To manage environment, we recommend using conda:

.. code:: bash

   conda create -n verl python==3.9
   conda activate verl

For installing the latest version of veRL, the best way is to clone and
install it from source. Then you can modify our code to customize your
own post-training jobs.

.. code:: bash

   # install verl together with some lightweight dependencies in setup.py
   git clone https://github.com/volcengine/verl.git
   cd verl
   pip3 install -e .

You can also install veRL using ``pip3 install``

.. code:: bash

   # directly install from pypi
   pip3 install verl

Dependencies
------------

veRL requires Python >= 3.9 and CUDA >= 12.1.

veRL support various backend, we currently release FSDP and Megatron-LM
for actor training and vLLM for rollout generation.

The following dependencies are required for all backends, PyTorch FSDP and Megatron-LM.

The pros, cons and extension guide for using PyTorch FSDP backend can be
found in :doc:`FSDP Workers<../workers/fsdp_workers>`.

.. code:: bash

   # install torch [or you can skip this step and let vllm to install the correct version for you]
   pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121

   # install vllm
   pip3 install ray vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1

   # flash attention 2
   pip3 install flash-attn --no-build-isolation

For users who pursue better scalability, we recommend using Megatron-LM
backend. Please install the above dependencies first.

Currently, we support Megatron-LM\@core_v0.4.0 and we fix some internal
issues of Megatron-LM. Here's the additional installation guide (optional).

The pros, cons and extension guide for using Megatron-LM backend can be
found in :doc:`Megatron-LM Workers<../workers/megatron_workers>`.

.. code:: bash

   # Megatron-LM Backend (optional)
   # apex
   pip3 install -v --disable-pip-version-check --no-cache-dir --no-build-isolation \
            --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" \
            git+https://github.com/NVIDIA/apex

   # transformer engine
   pip3 install git+https://github.com/NVIDIA/TransformerEngine.git@v1.7

   # megatron core v0.4.0: clone and apply the patch
   # You can also get the patched Megatron code patch via
   # git clone -b core_v0.4.0_verl https://github.com/eric-haibin-lin/Megatron-LM
   cd ..
   git clone -b core_v0.4.0 https://github.com/NVIDIA/Megatron-LM.git
   cd Megatron-LM
   cp ../verl/patches/megatron_v4.patch .
   git apply megatron_v4.patch
   pip3 install -e .
   export PYTHONPATH=$PYTHONPATH:$(pwd)

================================================
FILE: docs/start/quickstart.rst
================================================
.. _quickstart:

=========================================================
Quickstart: Post-train a LLM using PPO with GSM8K dataset
=========================================================

Post-train a LLM using GSM8K dataset
===================================================================

Introduction
------------

.. _hf_dataset_gsm8k: https://huggingface.co/datasets/gsm8k

In this example, we train an LLM to tackle the `GSM8k <hf_dataset_gsm8k>`_ task with function-based rewards. [1]_

Prerequisite:

- the latest version of ``verl`` and its dependencies installed following the installation guide. Using the docker image is recommended.

- an GPU with at least 24 GB HBM


Dataset Introduction
--------------------

GSM8k is a math problem dataset. The prompt is an elementary school
problem. The LLM model is asked to solve the math problem. Below is an example:

Prompt

   Katy makes coffee using teaspoons of sugar and cups of water in the
   ratio of 7:13. If she used a total of 120 teaspoons of sugar and cups
   of water, calculate the number of teaspoonfuls of sugar she used.

Solution

   The total ratio representing the ingredients she used to make the
   coffee is 7+13 = <<7+13=20>>20 Since the fraction representing the
   number of teaspoons she used is 7/20, she used 7/20\ *120 =
   <<7/20*\ 120=42>>42 #### 42

Step 1: Prepare the dataset
----------------------------

We preprocess the dataset in parquet format so that (1) it contains necessary fields for computing RL rewards and (2) is faster to read.

.. code-block:: bash

   python3 examples/data_preprocess/gsm8k.py --local_dir ~/data/gsm8k

Step 2: Download a model for post-training
-------------------------------------------

Usually we recommend starting with an "instruct" model variant so that the model follows instructions. In this example, we start with the ``Qwen2.5-0.5B-Instruct`` model.

If you start from a "base" model variant, doing SFT before RL is recommended. Refer to the `sft directory <https://github.com/volcengine/verl/blob/main/examples/gsm8k/sft/>`_ and `SFT Trainer <https://github.com/volcengine/verl/blob/main/verl/trainer/fsdp_sft_trainer.py>`_ for further details.

.. code-block:: bash

   python3 -c "import transformers; transformers.pipeline('text-generation', model='Qwen/Qwen2.5-0.5B-Instruct')"

Step 3: Perform PPO training with the instruct model
----------------------------------------------------------------------

**Reward Model/Function**

We use a pre-defined rule-based reward model. We force the model to produce a final
answer following 4 “#” as shown in the solution. We extract the final
answer from both the solution and model's output using regular
expression matching. We assign a reward of 1 to correct
answer, 0.1 to incorrect answer and 0 to no answer. 

For mode details, please refer to `verl/utils/reward_score/gsm8k.py <https://github.com/volcengine/verl/blob/v0.1/verl/utils/reward_score/gsm8k.py>`_.

**Training Script**

Now let's run PPO training with the dataset and model above. [2]_


Set the ``data.train_files`` ,\ ``data.val_files``, ``actor_rollout_ref.model.path`` and ``critic.model.path`` based on your dataset and model names or paths.

.. code-block:: bash

   PYTHONUNBUFFERED=1 python3 -m verl.trainer.main_ppo \
    data.train_files=$HOME/data/gsm8k/train.parquet \
    data.val_files=$HOME/data/gsm8k/test.parquet \
    data.train_batch_size=256 \
    data.val_batch_size=1312 \
    data.max_prompt_length=512 \
    data.max_response_length=256 \
    actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.actor.ppo_mini_batch_size=64 \
    actor_rollout_ref.actor.ppo_micro_batch_size=4 \
    actor_rollout_ref.rollout.log_prob_micro_batch_size=8 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
    actor_rollout_ref.ref.log_prob_micro_batch_size=4 \
    critic.optim.lr=1e-5 \
    critic.model.path=Qwen/Qwen2.5-0.5B-Instruct \
    critic.ppo_micro_batch_size=4 \
    algorithm.kl_ctrl.kl_coef=0.001 \
    trainer.logger=['console'] \
    +trainer.val_before_train=False \
    trainer.default_hdfs_dir=null \
    trainer.n_gpus_per_node=1 \
    trainer.nnodes=1 \
    trainer.save_freq=10 \
    trainer.test_freq=10 \
    trainer.total_epochs=15 2>&1 | tee verl_demo.log

You are expected to see the following logs, indicating training in progress. The key metric ``val/test_score/openai/gsm8k`` is computed every ``trainer.test_freq`` steps:

.. code-block:: bash

    step:0 - timing/gen:21.470 - timing/ref:4.360 - timing/values:5.800 - critic/kl:0.000 - critic/kl_coeff:0.001 - timing/adv:0.109 - timing/update_critic:15.664 - critic/vf_loss:14.947 - critic/vf_clipfrac:0.000 - critic/vpred_mean:-2.056 - critic/grad_norm:1023.278 - critic/lr(1e-4):0.100 - timing/update_actor:20.314 - actor/entropy_loss:0.433 - actor/pg_loss:-0.005 - actor/pg_clipfrac:0.000 - actor/ppo_kl:0.000 - actor/grad_norm:1.992 - actor/lr(1e-4):0.010 - critic/score/mean:0.004 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.004 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.000 - critic/advantages/max:2.360 - critic/advantages/min:-2.280 - critic/returns/mean:0.003 - critic/returns/max:0.000 - critic/returns/min:0.000 - critic/values/mean:-2.045 - critic/values/max:9.500 - critic/values/min:-14.000 - response_length/mean:239.133 - response_length/max:256.000 - response_length/min:77.000 - prompt_length/mean:104.883 - prompt_length/max:175.000 - prompt_length/min:68.000
    step:1 - timing/gen:23.020 - timing/ref:4.322 - timing/values:5.953 - critic/kl:0.000 - critic/kl_coeff:0.001 - timing/adv:0.118 - timing/update_critic:15.646 - critic/vf_loss:18.472 - critic/vf_clipfrac:0.384 - critic/vpred_mean:1.038 - critic/grad_norm:942.924 - critic/lr(1e-4):0.100 - timing/update_actor:20.526 - actor/entropy_loss:0.440 - actor/pg_loss:0.000 - actor/pg_clipfrac:0.002 - actor/ppo_kl:0.000 - actor/grad_norm:2.060 - actor/lr(1e-4):0.010 - critic/score/mean:0.000 - critic/score/max:0.000 - critic/score/min:0.000 - critic/rewards/mean:0.000 - critic/rewards/max:0.000 - critic/rewards/min:0.000 - critic/advantages/mean:0.000 - critic/advantages/max:2.702 - critic/advantages/min:-2.616 - critic/returns/mean:0.000 - critic/returns/max:0.000 - critic/returns/min:0.000 - critic/values/mean:-2.280 - critic/values/max:11.000 - critic/values/min:-16.000 - response_length/mean:232.242 - response_length/max:256.000 - response_length/min:91.000 - prompt_length/mean:102.398 - prompt_length/max:185.000 - prompt_length/min:70.000

Checkout :ref:`algo-baseline-page` for full training and validation logs for reference.

The checkpoint is saved at the following dir by default: ``checkpoints/${trainer.project_name}/${trainer.experiment_name}``

To enable ``wandb`` for experiment tracking, set the following configs:

.. code-block:: bash

    trainer.logger=['console','wandb'] \
    trainer.project_name=$YOUR_PROJECT_NAME \
    trainer.experiment_name=$YOUR_RUN_NAME \

If you encounter out of memory issues with HBM less than 32GB, enable the following configs would help:

.. code-block:: bash

    actor_rollout_ref.actor.ppo_micro_batch_size=1 \
    critic.ppo_micro_batch_size=1 \

For the full set of configs, please refer to :ref:`config-explain-page` for detailed explaination and performance tuning.


.. [1] The original paper (https://arxiv.org/pdf/2110.14168) mainly focuses on training a verifier (a reward model) to solve math problems via Best-of-N sampling. In this example, we train an RL agent using a rule-based reward model.
.. [2] More training script examples for FSDP and Megatron-LM backend are stored in `examples/ppo_trainer <https://github.com/volcengine/verl/tree/main/examples/ppo_trainer>`_ directory.


================================================
FILE: docs/workers/fsdp_workers.rst
================================================
PyTorch FSDP Backend
======================

We support PyTorch FSDP Backend by implementing various workers for
actor, critic, reference, rollout and reward models. We also implement
the ``FSDPVLLMShardingManager`` that reshard weight between FSDP and
vLLM in `fsdp_vllm.py <https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/hybrid_engine/fsdp_vllm.py>`_.

**Pros**

- Readily support various models.

  - Users only need to implement the corresponding
    ``dtensor_weight_loader`` for weight synchronization between FSDP
    and vLLM. While for ``hf_weight_loader``, users can directly apply
    any models supported both in HF and vLLM without any code change.

- Easy to organize the forward and backward computation for each model.

**Cons**

- Poor scalability when it comes to large-scale models (e.g. Llama 70B
  and 405B)
- The resharding overhead between actor and rollout could be larger than
  Megatron-LM backend.

Due to the simplicity, we recommend using FSDP backend for algorithm
research and prototyping.

FSDP Workers
--------------

ActorRolloutRefWorker
^^^^^^^^^^^^^^^^^^^^^

Actor/Rollout HybridEngine
''''''''''''''''''''''''''

1. HybridEngine, Actor and Rollout initialization API.

.. code:: python

   @register(dispatch_mode=Dispatch.ONE_TO_ALL)
   def init_model(self):

``ONE_TO_ALL``: when calling the ``init_model`` function from the driver
process, each worker (on a GPU) will execute the following model
initialization process.

The initialization details of HybridEngine, Actor and Rollout are
highlighted below:

1. ``DataParallelPPOActor`` implements the simple PPO computation logics
   when the model is built with FSDP, including compute log prob, model
   update.
2. ``vLLMRollout`` support generation with vLLM. We modify the vLLM
   Engine and make it executed under SPMD to fit into our
   ``WorkerGroup`` design.
3. ``FSDPVLLMShardingManager`` a context manager to perform actual
   resharding between actor and rollout.

See `source code <https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/workers/fsdp_workers.py#L42>`_. for more information.

1. Generate sequence and recompute log prob

.. code:: python

   @register(dispatch_mode=Dispatch.DP_COMPUTE_PROTO)
   def generate_sequences(self, prompts: DataProto):

- ``Dispatch.DP_COMPUTE_PROTO``: The data will be dispatched and
  collected along the DP dimension

- In this function, the rollout model will perform auto-regressive
  generation and the actor model will recompute the old log prob for the
  generetad response.

3. Update actor model

.. code:: python

   @register(dispatch_mode=Dispatch.DP_COMPUTE_PROTO)
   def update_actor(self, data: DataProto):

- Update the actor model weight using PPO & entropy loss.

ReferenceModel
''''''''''''''

1. Reference model initialization

The reference model is initialized using the same function as the actor
model without initializing the HybridEngine and Optimizer. Then the
actor model is also wrapped by the ``DataParallelPPOActor``.

2. Compute reference log prob

.. code:: python

   @register(dispatch_mode=Dispatch.DP_COMPUTE_PROTO)
   def compute_ref_log_prob(self, data: DataProto):

- In this function, the reference model will call the compute log prob
  function in ``DataParallelPPOActor`` to compute the reference log
  prob.

CriticWorker and RewardWorker
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

1. Model initialization

Quite similar to reference model. The CriticWorker will perform
additional initialization for the Optimizer.

2. Compute Values for CriticWorker

.. code:: python

   @register(dispatch_mode=Dispatch.DP_COMPUTE_PROTO)
   def compute_values(self, data: DataProto):

3. Update Critic

.. code:: python

   @register(dispatch_mode=Dispatch.DP_COMPUTE_PROTO)
   def update_critic(self, data: DataProto):

4. Compute Reward

.. code:: python

   @register(dispatch_mode=Dispatch.DP_COMPUTE_PROTO)
   def compute_rm_score(self, data: DataProto):


HybridShard
------------

We didn't support FSDP `HybridShard`. To support this, we may need to
construct a 2D device mesh and test the corresponding
``dtensor_weight_loader`` and ``hf_weight_loader`` for each model.


================================================
FILE: docs/workers/megatron_workers.rst
================================================
Megatron-LM Backend
=====================

We support Megatron Backend by implementing various workers for actor,
critic, reference, rollout and reward models. We also implement the
``3DHybridEngine`` using Megatron-LM and vLLM in `megatron_vllm.py <https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/hybrid_engine/megatron_vllm.py>`_.

**Pros**

- Support 3D parallelism and sequence parallelism for best scalablility
  and throughput.
- 3D HybridEngine can significantly reduce peak memory usage and reduce
  weight synchronize overhead between actor and rollout.

**Cons**

- Users should implement their own models for Megatron-LM
- Users should implement the corresponding weight_loader to

  - synchronize the model weight between actor (in Megatron) and rollout
    (in vLLM).
  - load weights from checkpoints to corresponding model in Megatron-LM

Megatron Workers
----------------

MegatronWorker
^^^^^^^^^^^^^^

``MegatronWorker`` is the base class of different megatron worker
classes. In this class, ``get_megatron_global_info`` and
``get_megatron_rank_info`` function to retrive the 3D parallel world
size and rank of each ``Worker`` running on specific GPU. These information
will be used in transfer protocol for Megatron Backend.

The following ``Worker`` class for different models will be utilized to
construct the ``WorkerGroup`` .

We implement various of APIs for each ``Worker`` class decorated by the
``@register(dispatch_mode=)`` . These APIs can be called by the ray
driver process. The data can be correctly collect and dispatch following
the ``dispatch_mode`` on each function. The supported dispatch_model
(i.e., transfer protocols) can be found in `decorator.py <https://github.com/volcengine/verl/blob/main/verl/single_controller/base/decorator.py>`_.

ActorRolloutRefWorker
^^^^^^^^^^^^^^^^^^^^^

This class is implemented for Actor/Rollout HybridEngine or for the
reference model to initialize their model and perform computation.

Actor/Rollout HybridEngine
''''''''''''''''''''''''''

1. HybridEngine, Actor and Rollout initialization API.

.. code:: python

   @register(dispatch_mode=Dispatch.ONE_TO_ALL)
   def init_model(self):

``ONE_TO_ALL``: when calling the ``init_model`` function from the driver
process, each worker (on a GPU) will execute the following model
initialization process.

The initialization details of HybridEngine, Actor and Rollout are
highlighted below:

1. ``AllGatherPPModel`` holds memory buffer for both Actor and Rollout
   and support weight resharding between actor and rollout.
2. ``MegatronPPOActor`` implements the simple PPO computation logics
   when the model is built with Megatron, including compute log prob,
   model update.
3. ``vLLMRollout`` support generation with vLLM. We modify the vLLM
   Engine and make it executed under SPMD to fit into our
   ``WorkerGroup`` design.
4. ``MegatronVLLMShardingManager`` a context manager to perform actual
   resharding between actor and rollout.

See `source code <https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/workers/megatron_workers.py#L63>`_ for more information.

.. code:: python

   # Initialize the 3D HybridEngine
   hybrid_engine = AllGatherPPModel(model_provider=megatron_actor_model_provider)
   # Fetch the model at current rank
   actor_module = hybrid_engine.this_rank_models
   ...

   # build actor model
   self.actor = MegatronPPOActor(config=self.config.actor,
                                 model_config=self.actor_model_config,
                                 megatron_config=megatron_config,
                                 actor_module=self.actor_module,
                                 actor_optimizer=self.actor_optimizer,
                                 actor_optimizer_config=self.actor_optim_config)

   # build rollout
   # rollout initialization
   rollout = vLLMRollout(actor_module=params,
                        config=self.config.rollout,
                        tokenizer=self.tokenizer,
                        model_hf_config=self.actor_model_config,
                        train_tp=mpu.get_tensor_model_parallel_world_size())
   # perform weight resharding between actor and rollout
   sharding_manager = MegatronVLLMShardingManager(module=self.hybrid_engine,
                                                  inference_engine=rollout.inference_engine,
                                                  model_config=self.actor_model_config,
                                                  layer_name_mapping=layer_name_mapping)
   ...

2. Generate sequence and recompute log prob

.. code:: python

   @register(dispatch_mode=Dispatch.MEGATRON_PP_AS_DP_PROTO)
   def generate_sequences(self, prompts: DataProto):

- ``Dispatch.MEGATRON_PP_AS_DP_PROTO``: The PP dimension of the actor
  model will be regarded as DP dimension. Then the driver process will
  dispatch and collect the data according to this reorganization. This
  is because, in HybridEngine, the actor weight, which usually applied
  larger 3D parallel sizes, will be gathered along the PP dimension and
  TP dimension. Therefore, the corresponding data should be dispatched
  and collected through the 3D parallel group of the rollout model,
  rather than the actor model. However, the world_size and rank
  information can only be retrived from ``get_megatron_global_info`` and
  ``get_megatron_rank_info``, which records the 3D information for the
  actor model. Moreover, the data resharding inside TP dimension will be
  processed within the HybridEngine.

- In this function, the rollout model will perform auto-regressive
  generation and the actor model will recompute the old log prob for the
  generetad response.

3. Update actor model

.. code:: python

   @register(dispatch_mode=Dispatch.MEGATRON_COMPUTE_PROTO)
   def update_actor(self, data: DataProto):

- ``Dispatch.MEGATRON_COMPUTE_PROTO``: User passes the data partitioned
  by DP dimension. The data is dispatched to all tp/pp ranks within the
  same dp group, and ultimately only collects output data from tp=0 and
  the last pp.
- Update the actor model weight using PPO & entropy loss.

ReferenceModel
''''''''''''''

1. Reference model initialization

The reference model is initialized using the same function as the actor
model without initializing the HybridEngine and Optimizer. Then the
actor model is also wrapped by the ``MegatronPPOActor``.

2. Compute reference log prob

.. code:: python

   @register(dispatch_mode=Dispatch.MEGATRON_COMPUTE_PROTO)
   def compute_ref_log_prob(self, data: DataProto):

- In this function, the reference model will call the compute log prob
  function in ``MegatronPPOActor`` to compute the reference log prob.

CriticWorker and RewardWorker
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

1. Model initialization

Quite similar to reference model. The CriticWorker will perform
additional initialization for the Optimizer.

2. Compute Values for CriticWorker

.. code:: python

   @register(dispatch_mode=Dispatch.MEGATRON_COMPUTE_PROTO)
   def compute_values(self, data: DataProto):

3. Update Critic

.. code:: python

   @register(dispatch_mode=Dispatch.MEGATRON_COMPUTE_PROTO)
   def update_critic(self, data: DataProto):

4. Compute Reward

.. code:: python

   @register(dispatch_mode=Dispatch.MEGATRON_COMPUTE_PROTO)
   def compute_rm_score(self, data: DataProto):

Context Parallel
----------------

This require the developer/contributor to implement the context parallel
both in Megatron-LM and models.


================================================
FILE: docs/workers/ray_trainer.rst
================================================
PPO Ray Trainer
===============

We implement the RayPPOTrainer, which is a trainer runs on the driver
process on a single CPU/GPU node (default is CPU).

The PPORayTrainer include 3 core functions for data preparation,
WorkerGroup initialization and PPO training loop.

Data Preparation
----------------

The ``PPORayTrainer``, as a single process, is responsible for loading a
complete batch of samples (prompts) from the dataset and then dispatch
to different worker_groups runnning on different GPUs.

To generalize the data loading, we implement the ``RLHFDataset`` class
to load the preprocessed parquet files, apply chat templates to the
prompts, add padding, truncate prompts that exceed max prompt length and
then tokenize.

.. code:: python

   self.train_dataset = RLHFDataset(parquet_files=self.config.data.train_files,
                                       tokenizer=self.tokenizer,
                                       prompt_key=self.config.data.prompt_key,
                                       max_prompt_length=self.config.data.max_prompt_length,
                                       filter_prompts=True,
                                       return_raw_chat=self.config.data.get('return_raw_chat', False),
                                       truncation='error')

Then, the dataloader will iterate the dataset under PPO mini batch size.

WorkerGroup Initialization
--------------------------

We first introduce a basic implementation of initializing the
``WorkerGroup`` of the actor model on a given set of GPUs.

.. code:: python

   # max_colocate_count means the number of WorkerGroups (i.e. processes) in each RayResourcePool
   # For FSDP backend, we recommend using max_colocate_count=1 that merge all WorkerGroups into one.
   # For Megatron backend, we recommend using max_colocate_count>1 that can utilize different WorkerGroup for differnt models
   resource_pool = RayResourcePool(process_on_nodes=[config.trainer.n_gpus_per_node] * config.trainer.nnodes,
                                   use_gpu=True,
                                   max_colocate_count=1)
   # define actor rollout cls to be init on remote
   actor_rollout_cls = RayClassWithInitArgs(cls=ActorRolloutWorker)
   # define actor_rollout worker group
   actor_rollout_worker_group = MegatronRayWorkerGroup(resource_pool=resource_pool,
                                                       ray_cls_with_init=actor_rollout_cls,
                                                       default_megatron_kwargs=config.actor_rollout.megatron)

Different WorkerGroups, like ``actor_rollout_worker_group`` ,
``critic_worker_group`` and ``ref_worker_group`` lies on a separate
process in the above implementation.

The driver process can then call the distributed compute function within
the ``actor_rollout_worker_group`` and other roles to construct the RL
training loop.

For models colocated in the same set of GPUs, we further provide a
fine-grain optimization, which merge the ``worker_group`` of different roles
in the same process. This optimization can save the redundant
CUDA/distributed context in different processes.

.. code:: python

   # initialize WorkerGroup
   # NOTE: if you want to use a different resource pool for each role, which can support different parallel size,
   # you should not use `create_colocated_worker_cls`. Instead, directly pass different resource pool to different worker groups.
   # See TODO(url) for more information.
   all_wg = {}
   for resource_pool, class_dict in self.resource_pool_to_cls.items():
       worker_dict_cls = create_colocated_worker_cls(class_dict=class_dict)
       wg_dict = self.ray_worker_group_cls(resource_pool=resource_pool, ray_cls_with_init=worker_dict_cls)
       spawn_wg = wg_dict.spawn(prefix_set=class_dict.keys())
       all_wg.update(spawn_wg)

   if self.use_critic:
       self.critic_wg = all_wg['critic']
       self.critic_wg.init_model()

   if self.use_reference_policy:
       self.ref_policy_wg = all_wg['ref']
       self.ref_policy_wg.init_model()

   if self.use_rm:
       self.rm_wg = all_wg['rm']
       self.rm_wg.init_model()

   # we should create rollout at the end so that vllm can have a better estimation of kv cache memory
   self.actor_rollout_wg = all_wg['actor_rollout']
   self.actor_rollout_wg.init_model()

.. note:: For megatron backend, if we merge the ``worker_groups`` into the same processes, all the roles will utilize the same 3D parallel size. To optimize this, we may need to maintain several 3D process groups for each role in the same distributed context. If you want to use different 3D parallel size for different roles, please follow the similar architecture of the first code block to initialize each role's ``worker_group``


PPO Training Loop
-----------------

We implement the PPO training loop by calling the functions in
worker_group of each role. The input and output data of each function is
a ``DataProto`` object implemented in `protocol.py <https://github.com/volcengine/verl/blob/main/verl/protocol.py>`_. In the training
loop, trainer will dispatch/collect the data to/from different GPUs
following the transfer protocols wrapped in the workers' functions. The
computation of PPO micro batches is processed in ``update_actor`` and
``update_critic`` functions.

To extend to other RLHF algorithms, such as DPO, GRPO, please refer to
:doc:`../advance/dpo_extension`.

.. code:: python

   def fit(self):
       """
       The training loop of PPO.
       The driver process only need to call the compute functions of the worker group through RPC to construct the PPO dataflow.
       The light-weight advantage computation is done on the driver process.
       """
       from verl.utils.tracking import Tracking
       from omegaconf import OmegaConf

       logger = Tracking(project_name=self.config.trainer.project_name,
                           experiment_name=self.config.trainer.experiment_name,
                           default_backend=self.config.trainer.logger,
                           config=OmegaConf.to_container(self.config, resolve=True))

       global_steps = 0

       # perform validation before training
       # currently, we only support validation using the reward_function.
       if self.val_reward_fn is not None:
           val_metrics = self._validate()
           pprint(f'Initial validation metrics: {val_metrics}')

       for epoch in range(self.config.trainer.total_epochs):
           for batch_dict in self.train_dataloader:
               metrics = {}

               batch: DataProto = DataProto.from_single_dict(batch_dict)
               # batch = batch.to('cuda')

               # pop those keys for generation
               gen_batch = batch.pop(batch_keys=['input_ids', 'attention_mask', 'position_ids'])

               # generate a batch
               with Timer(name='gen', logger=None) as timer:
                   gen_batch_output = self.actor_rollout_wg.generate_sequences(gen_batch)
               metrics['timing/gen'] = timer.last

               batch = batch.union(gen_batch_output)

               if self.use_reference_policy:
                   # compute reference log_prob
                   with Timer(name='ref', logger=None) as timer:
                       ref_log_prob = self.ref_policy_wg.compute_ref_log_prob(batch)
                       batch = batch.union(ref_log_prob)
                   metrics['timing/ref'] = timer.last

               # compute values
               with Timer(name='values', logger=None) as timer:
                   values = self.critic_wg.compute_values(batch)
                   batch = batch.union(values)
               metrics['timing/values'] = timer.last

               with Timer(name='adv', logger=None) as timer:
                   # compute scores. Support both model and function-based.
                   # We first compute the scores using reward model. Then, we call reward_fn to combine
                   # the results from reward model and rule-based results.
                   if self.use_rm:
                       # we first compute reward model score
                       reward_tensor = self.rm_wg.compute_rm_score(batch)
                       batch = batch.union(reward_tensor)

                   # we combine with rule-based rm
                   reward_tensor = self.reward_fn(batch)
                   batch.batch['token_level_scores'] = reward_tensor

                   # compute rewards. apply_kl_penalty if available
                   batch, kl_metrics = apply_kl_penalty(batch,
                                                           kl_ctrl=self.kl_ctrl,
                                                           kl_penalty=self.config.algorithm.kl_penalty)
                   metrics.update(kl_metrics)

                   # compute advantages, executed on the driver process
                   batch = compute_advantage(batch,
                                               self.config.algorithm.gamma,
                                               self.config.algorithm.lam,
                                               adv_estimator=self.config.algorithm.adv_estimator)
               metrics['timing/adv'] = timer.last

               # update critic
               if self.use_critic:
                   with Timer(name='update_critic', logger=None) as timer:
                       critic_output = self.critic_wg.update_critic(batch)
                   metrics['timing/update_critic'] = timer.last
                   critic_output_metrics = reduce_metrics(critic_output.meta_info['metrics'])
                   metrics.update(critic_output_metrics)

               # implement critic warmup
               if self.config.trainer.critic_warmup <= global_steps:
                   # update actor
                   with Timer(name='update_actor', logger=None) as timer:
                       actor_output = self.actor_rollout_wg.update_actor(batch)
                   metrics['timing/update_actor'] = timer.last
                   actor_output_metrics = reduce_metrics(actor_output.meta_info['metrics'])
                   metrics.update(actor_output_metrics)

               # validate
               if self.val_reward_fn is not None and (global_steps + 1) % self.config.trainer.test_freq == 0:
                   with Timer(name='testing', logger=None) as timer:
                       val_metrics: dict = self._validate()
                       val_metrics = {f'val/{key}': val for key, val in val_metrics.items()}
                   metrics['timing/testing'] = timer.last
                   metrics.update(val_metrics)

               # collect metrics
               data_metrics = compute_data_metrics(batch=batch)
               metrics.update(data_metrics)

               # TODO: make a canonical logger that supports various backend
               logger.log(data=metrics, step=global_steps)

               if self.config.trainer.save_freq > 0 and (global_steps + 1) % self.config.trainer.save_freq == 0:
                   actor_local_path = os.path.join(self.config.trainer.default_local_dir, 'actor',
                                                   f'global_step_{global_steps}')
                   actor_remote_path = os.path.join(self.config.trainer.default_hdfs_dir, 'actor')
                   self.actor_rollout_wg.save_checkpoint(actor_local_path, actor_remote_path)

                   if self.use_critic:
                       critic_local_path = os.path.join(self.config.trainer.default_local_dir, 'critic',
                                                           f'global_step_{global_steps}')
                       critic_remote_path = os.path.join(self.config.trainer.default_hdfs_dir, 'critic')
                       self.critic_wg.save_checkpoint(critic_local_path, critic_remote_path)

               global_steps += 1

       # perform validation after training
       if self.val_reward_fn is not None:
           val_metrics = self._validate()
           pprint(f'Final validation metrics: {val_metrics}')


================================================
FILE: examples/data_preprocess/arth.py
================================================
# Copyright 2024 Bytedance Ltd. and/or its affiliates
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Preprocess the GSM8k dataset to parquet format
"""

import re
import os
from datasets import Dataset

from verl.utils.hdfs_io import copy, makedirs
import argparse


from random import randint, seed, choice
from tqdm import tqdm

def gen_dataset(
    N,
    DIGIT,
    LESS_OR_EQUAL=True,
):
    """
    any score <0.4 is ok, since +- is easy
    the model have 
    """
    seed(1)
    # Generate N pairs of numbers and their results for different operations
    equations = []
    # operations = ['*', '+', '-', '*', '*']
    operations = ['*']
    for _ in tqdm(range(N)):
        # Helper function to generate a number with 50% chance of being N-digit or N/2-digit
        def get_random_num():
            r = randint(1,3)
            if r == 0:
                # 2 digits less than original
                max_num = 10**(DIGIT-2)
                return randint(0 if LESS_OR_EQUAL else max_num//10, max_num-1)
            elif r == 1:
                # 1 digit less than original
                max_num = 10**(DIGIT-1)
                return randint(0 if LESS_OR_EQUAL else max_num//10, max_num-1)
            else:
                # N-digit number
                max_num = 10**DIGIT
                return randint(0 if LESS_OR_EQUAL else max_num//10, max_num-1)
        # Generate two numbers independently
        num1 = get_random_num()
        num2 = get_random_num()
        # Randomly choose operation
        op = choice(operations)
        # Calculate result based on operation
        if op == '*':
            result = num1 * num2
        elif op == '+':
            result = num1 + num2
        else:  # op == '-'
            assert op == '-'
            # For subtraction, ensure num1 >= num2
            if num1 < num2:
                num1, num2 = num2, num1
            result = num1 - num2
        equations.append((num1, num2, result, op))
    return equations
    
    
def make_prefix(dp):
    num1 = dp['num1']
    num2 = dp['num2']
    op = dp['operation']
    prefix = f"""A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> RESULT_NUMBER </answer>. \nUser: Give me the answer of the following equation: {num1} {op} {num2}.\nAssistant: Let me solve this step by step.\n<think>"""
    return prefix

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--local_dir', default='~/data/arithmetic-3_digit')
    parser.add_argument('--hdfs_dir', default=None)

    args = parser.parse_args()

    data_source = 'yolo/arithmetic-3_digit'
    DIGIT = 3
    # N = 1000000
    N = 100000
    LESS_OR_EQUAL = True
    TRAIN_SIZE = 32768
    TEST_SIZE = 4096

    dataset = gen_dataset(N=N, DIGIT=DIGIT, LESS_OR_EQUAL=LESS_OR_EQUAL)
    dataset = list(set(dataset))
    assert len(dataset) > TRAIN_SIZE + TEST_SIZE
    train_dataset = dataset[:TRAIN_SIZE]
    test_dataset = dataset[-TEST_SIZE:]


    # add a row to each data item that represents a unique id
    def make_map_fn(split):

        def process_fn(example, idx):
            question = make_prefix(example)
            solution = example['result']
            data = {
                "data_source": data_source,
                "prompt": [{
                    "role": "user",
                    "content": question,
                }],
                "ability": "math",
                "reward_model": {
                    "style": "rule",
                    "ground_truth": solution
                },
                "extra_info": {
                    'split': split,
                    'index': idx,
                }
            }
            return data

        return process_fn
    
    def to_dataset(dataset_list):
        dataset_dict = {
            "num1": [],
            "num2": [],
            "result": [],
            "operation": []
        }
        for dp in dataset_list:
            dataset_dict["num1"].append(dp[0])
            dataset_dict["num2"].append(dp[1])
            dataset_dict["result"].append(dp[2])
            dataset_dict["operation"].append(dp[3])
        return Dataset.from_dict(dataset_dict)

    train_dataset = to_dataset(train_dataset)
    test_dataset = to_dataset(test_dataset)

    train_dataset = train_dataset.map(function=make_map_fn('train'), with_indices=True)
    test_dataset = test_dataset.map(function=make_map_fn('test'), with_indices=True)

    local_dir = args.local_dir
    hdfs_dir = args.hdfs_dir

    train_dataset.to_parquet(os.path.join(local_dir, 'train.parquet'))
    test_dataset.to_parquet(os.path.join(local_dir, 'test.parquet'))

    if hdfs_dir is not None:
        makedirs(hdfs_dir)

        copy(src=local_dir, dst=hdfs_dir)


================================================
FILE: examples/data_preprocess/countdown.py
================================================
"""
Preprocess dataset for countdown task - given a target number and N numbers, generate equations to reach target
"""

import re
import os
from datasets import Dataset, load_dataset
from random import randint, seed, choice
from typing import List, Tuple
from tqdm import tqdm
from verl.utils.hdfs_io import copy, makedirs
import argparse


def gen_dataset(
    num_samples: int,
    num_operands: int = 6,
    max_target: int = 1000,
    min_number: int = 1,
    max_number: int = 100,
    operations: List[str] = ['+', '-', '*', '/'],
    seed_value: int = 42,
) -> List[Tuple]:
    """Generate dataset for countdown task.
    
    Args:
        num_samples: Number of samples to generate
        num_operands: Number of numbers provided in each sample
        max_target: Maximum value for target number
        min_number: Minimum value for provided numbers
        max_number: Maximum value for provided numbers
        operations: List of allowed operations
        seed_value: Random seed for reproducibility
        
    Returns:
        List of tuples containing (target, numbers, solution)
    """
    seed(seed_value)
    samples = []
    
    for _ in tqdm(range(num_samples)):
        # Generate random target
        target = randint(1, max_target)
        
        # Generate random numbers
        numbers = [randint(min_number, max_number) for _ in range(num_operands)]
        
        
        samples.append((target, numbers))
    
    return samples

def make_prefix(dp, template_type):
    target = dp['target']
    numbers = dp['nums']
    # NOTE: also need to change reward_score/countdown.py
    if template_type == 'base':
        """This works for any base model"""
        prefix = f"""A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer.
User: Using the numbers {numbers}, create an equation that equals {target}. You can use basic arithmetic operations (+, -, *, /) and each number can only be used once. Show your work in <think> </think> tags. And return the final answer in <answer> </answer> tags, for example <answer> (1 + 2) / 3 </answer>.
Assistant: Let me solve this step by step.
<think>"""
    elif template_type == 'qwen-instruct':
        """This works for Qwen Instruct Models"""
        prefix = f"""<|im_start|>system\nYou are a helpful assistant. You first thinks about the reasoning process in the mind and then provides the user with the answer.<|im_end|>\n<|im_start|>user\n Using the numbers {numbers}, create an equation that equals {target}. You can use basic arithmetic operations (+, -, *, /) and each number can only be used once. Show your work in <think> </think> tags. And return the final answer in <answer> </answer> tags, for example <answer> (1 + 2) / 3 </answer>.<|im_end|>\n<|im_start|>assistant\nLet me solve this step by step.\n<think>"""
    return prefix


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--local_dir', default='~/data/countdown')
    parser.add_argument('--hdfs_dir', default=None)
    parser.add_argument('--num_samples', type=int, default=100000)
    parser.add_argument('--num_operands', type=int, default=6)
    parser.add_argument('--max_target', type=int, default=1000)
    parser.add_argument('--min_number', type=int, default=1)
    parser.add_argument('--max_number', type=int, default=100)
    parser.add_argument('--train_size', type=int, default=327680)
    parser.add_argument('--test_size', type=int, default=1024)
    parser.add_argument('--template_type', type=str, default='base')

    args = parser.parse_args()

    data_source = 'countdown'
    TRAIN_SIZE = args.train_size
    TEST_SIZE = args.test_size

    raw_dataset = load_dataset('Jiayi-Pan/Countdown-Tasks-3to4', split='train')

    assert len(raw_dataset) > TRAIN_SIZE + TEST_SIZE
    train_dataset = raw_dataset.select(range(TRAIN_SIZE))
    test_dataset = raw_dataset.select(range(TRAIN_SIZE, TRAIN_SIZE + TEST_SIZE))

    def make_map_fn(split):
        def process_fn(example, idx):
            question = make_prefix(example, template_type=args.template_type)
            solution = {
                "target": example['target'],
                "numbers": example['nums']
            }
            data = {
                "data_source": data_source,
                "prompt": [{
                    "role": "user",
                    "content": question,
                }],
                "ability": "math",
                "reward_model": {
                    "style": "rule",
                    "ground_truth": solution
                },
                "extra_info": {
                    'split': split,
                    'index': idx,
                }
            }
            return data
        return process_fn
    
    train_dataset = train_dataset.map(function=make_map_fn('train'), with_indices=True)
    test_dataset = test_dataset.map(function=make_map_fn('test'), with_indices=True)

    local_dir = args.local_dir
    hdfs_dir = args.hdfs_dir

    train_dataset.to_parquet(os.path.join(local_dir, 'train.parquet'))
    test_dataset.to_parquet(os.path.join(local_dir, 'test.parquet'))

    if hdfs_dir is not None:
        makedirs(hdfs_dir)
        copy(src=local_dir, dst=hdfs_dir) 


================================================
FILE: examples/data_preprocess/full_hh_rlhf.py
================================================
# Copyright 2024 Bytedance Ltd. and/or its affiliates
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
- Preprocess data and split the training set into 75% for training RM and 25% for validting RM.
- All the training data is used to train SFT and RL.
- Both chosen and rejected is used to train SFT
"""
import argparse
import os

import pandas as pd
from datasets import load_dataset

from tqdm.auto import tqdm

from verl.utils.fs import copy, makedirs


def generate_sft_dataset(target_hdfs_path_dir, local_dir='~/data/full_hh_rlh/sft'):
    dataset = load_dataset('Dahoas/full-hh-rlhf')
    output = {'prompt': [], 'response': []}
    for data in tqdm(dataset['train']):
        # add chosen
        output['prompt'].append(data['prompt'])
        output['response'].append(data['chosen'])

        # add rejection
        output['prompt'].append(data['prompt'])
        output['response'].append(data['rejected'])

    df = pd.DataFrame(output)

    local_dir = os.path.expanduser(local_dir)
    os.makedirs(local_dir, exist_ok=True)

    local_path = os.path.join(local_dir, 'train.parquet')

    df.to_parquet(path=local_path)

    if target_hdfs_path_dir is not None:
        hdfs_dir = target_hdfs_path_dir + '/' + 'train.parquet'
        makedirs(hdfs_dir)

        copy(local_path, hdfs_dir)


def generate_rm_dataset(target_hdfs_path_dir, local_dir='~/data/full_hh_rlh/rm'):
    train_dataset = load_dataset('Dahoas/full-hh-rlhf', split='train[:75%]')
    test_dataset = load_dataset('Dahoas/full-hh-rlhf', split='train[-25%:]')

    local_dir = os.path.expanduser(local_dir)
    os.makedirs(local_dir, exist_ok=True)

    for dataset, name in zip([train_dataset, test_dataset], ['train', 'test']):
        output = {'prompt': [], 'chosen': [], 'rejected': []}
        for data in tqdm(dataset):
            # add chosen
            output['prompt'].append(data['prompt'])
            output['chosen'].append(data['chosen'])
            output['rejected'].append(data['rejected'])

        df = pd.DataFrame(output)

        local_path = os.path.join(local_dir, name + '.parquet')

        df.to_parquet(path=local_path)

        if target_hdfs_path_dir is not None:
            hdfs_dir = target_hdfs_path_dir + '/' + name + '.parquet'
            makedirs(hdfs_dir)

            copy(local_path, hdfs_dir)


def generate_rl_dataset(target_hdfs_path_dir, local_dir='~/data/full_hh_rlhf/rl'):
    dataset = load_dataset('Dahoas/full-hh-rlhf')
    train_dataset = dataset['train']

    data_source = 'Dahoas/full-hh-rlhf'

    # add a row to each data item that represents a unique id
    def make_map_fn(split):

        def process_fn(example, idx):
            prompt = example.pop('prompt')
            response = example.pop('response')

            data = {
                "data_source": data_source,
                "prompt": [{
                    "role": "user",
                    "content": prompt
                }],
                "ability": "alignment",
                "reward_model": {
                    "style": "model",
                    "ground_truth": response  # should not be used
                },
                "extra_info": {
                    'split': split,
                    'index': idx
                }
            }
            return data

        return process_fn

    train_dataset = train_dataset.map(function=make_map_fn('train'), with_indices=True)
    local_dir = os.path.expanduser(local_dir)
    local_path = os.path.join(local_dir, 'train.parquet')
    train_dataset.to_parquet(local_path)

    if target_hdfs_path_dir is not None:
        hdfs_dir = target_hdfs_path_dir + '/' + 'train.parquet'
        makedirs(hdfs_dir)

        copy(local_path, hdfs_dir)


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--split', type=str, choices=['sft', 'rm', 'rl'], required=True)
    parser.add_argument('--local_dir', type=str, default='~/data/full_hh_rlhf')
    parser.add_argument('--hdfs_dir', type=str, required=False, default=None)

    args = parser.parse_args()

    if args.split == 'sft':
        generate_sft_dataset(args.hdfs_dir, os.path.join(args.local_dir, args.split))
    elif args.split == 'rm':
        generate_rm_dataset(args.hdfs_dir, os.path.join(args.local_dir, args.split))
    elif args.split == 'rl':
        generate_rl_dataset(args.hdfs_dir, os.path.join(args.local_dir, args.split))
    else:
        raise NotImplementedError


================================================
FILE: examples/data_preprocess/gsm8k.py
================================================
# Copyright 2024 Bytedance Ltd. and/or its affiliates
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Preprocess the GSM8k dataset to parquet format
"""

import re
import os
import datasets

from verl.utils.hdfs_io import copy, makedirs
import argparse


def extract_solution(solution_str):
    solution = re.search("#### (\\-?[0-9\\.\\,]+)", solution_str)
    assert solution is not None
    final_solution = solution.group(0)
    final_solution = final_solution.split('#### ')[1].replace(',', '')
    return final_solution


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--local_dir', default='~/data/gsm8k')
    parser.add_argument('--hdfs_dir', default=None)

    args = parser.parse_args()

    num_few_shot = 5
    data_source = 'openai/gsm8k'

    dataset = datasets.load_dataset(data_source, 'main')

    train_dataset = dataset['train']
    test_dataset = dataset['test']

    instruction_following = "Let's think step by step and output the final answer after \"####\"."

    # add a row to each data item that represents a unique id
    def make_map_fn(split):

        def process_fn(example, idx):
            question_raw = example.pop('question')

            question = question_raw + ' ' + instruction_following

            answer_raw = example.pop('answer')
            solution = extract_solution(answer_raw)
            data = {
                "data_source": data_source,
                "prompt": [{
                    "role": "user",
                    "content": question,
                }],
                "ability": "math",
                "reward_model": {
                    "style": "rule",
                    "ground_truth": solution
                },
                "extra_info": {
                    'split': split,
                    'index': idx,
                    'answer': answer_raw,
                    "question": question_raw,
                }
            }
            return data

        return process_fn

    train_dataset = train_dataset.map(function=make_map_fn('train'), with_indices=True)
    test_dataset = test_dataset.map(function=make_map_fn('test'), with_indices=True)

    local_dir = args.local_dir
    hdfs_dir = args.hdfs_dir

    train_dataset.to_parquet(os.path.join(local_dir, 'train.parquet'))
    test_dataset.to_parquet(os.path.join(local_dir, 'test.parquet'))

    if hdfs_dir is not None:
        makedirs(hdfs_dir)

        copy(src=local_dir, dst=hdfs_dir)


================================================
FILE: examples/data_preprocess/hellaswag.py
================================================
# Copyright 2024 Bytedance Ltd. and/or its affiliates
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Preprocess Hellaswag dataset.

"""

import re
import os
import datasets

from verl.utils.hdfs_io import copy, makedirs
import argparse


def preprocess(text):
    text = text.strip()
    # NOTE: Brackets are artifacts of the WikiHow dataset portion of HellaSwag.
    text = text.replace(" [title]", ". ")
    text = re.sub("\\[.*?\\]", "", text)
    text = text.replace("  ", " ")
    return text


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--local_dir', default='/opt/tiger/hellaswag')
    parser.add_argument('--hdfs_dir', default=None)

    args = parser.parse_args()

    data_source = 'Rowan/hellaswag'

    dataset = datasets.load_dataset(data_source, trust_remote_code=True)

    train_dataset = dataset['train']
    val_dataset = dataset['validation']
    test_dataset = dataset['test']

    instruction = 'Please complete the following sentence.\n'

    def make_map_fn(split):

        def process_fn(doc, idx):
            ctx = doc["ctx_a"] + " " + doc["ctx_b"].capitalize()
            query = preprocess(doc["activity_label"] + ": " + ctx)
            choices = [preprocess(ending) for ending in doc["endings"]]
            gold = int(doc["label"])

            data = {
                "data_source": data_source,
                "prompt": [{
                    "role": "user",
                    "content": query
                }],
                "ability": "nlp",
                "reward_model": {
                    "style": "model",
                    "eval": "multiple_choice",  # using loglikelihood
                    "ground_truth": gold,
                    "choices": choices
                },
                "extra_info": {
                    'split': split,
                    'index': idx
                }
            }
            return data

        return process_fn

    # filter data that doesn't have a label
    train_dataset = train_dataset.filter(lambda x: len(x['label']) > 0)
    val_dataset = val_dataset.filter(lambda x: len(x['label']) > 0)
    test_dataset = test_dataset.filter(lambda x: len(x['label']) > 0)

    train_dataset = train_dataset.map(function=make_map_fn('train'), with_indices=True)
    val_dataset = val_dataset.map(function=make_map_fn('validation'), with_indices=True)
    test_dataset = test_dataset.map(function=make_map_fn('test'), with_indices=True)

    local_dir = args.local_dir
    hdfs_dir = args.hdfs_dir

    train_dataset.to_parquet(os.path.join(local_dir, 'train.parquet'))
    val_dataset.to_parquet(os.path.join(local_dir, 'validation.parquet'))
    test_dataset.to_parquet(os.path.join(local_dir, 'test.parquet'))

    if hdfs_dir is not None:
        makedirs(hdfs_dir)

        copy(src=local_dir, dst=hdfs_dir)


================================================
FILE: examples/data_preprocess/math_dataset.py
================================================
# Copyright 2024 Bytedance Ltd. and/or its affiliates
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Preprocess the GSM8k dataset to parquet format
"""

import os
import datasets

from verl.utils.hdfs_io import copy, makedirs
import argparse

from verl.utils.reward_score.math import remove_boxed, last_boxed_only_string


def extract_solution(solution_str):
    return remove_boxed(last_boxed_only_string(solution_str))


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--local_dir', default='~/data/math')
    parser.add_argument('--hdfs_dir', default=None)

    args = parser.parse_args()

    data_source = 'lighteval/MATH'

    dataset = datasets.load_dataset(data_source, trust_remote_code=True)

    train_dataset = dataset['train']
    test_dataset = dataset['test']

    instruction_following = "Let's think step by step and output the final answer within \\boxed{}."

    # add a row to each data item that represents a unique id
    def make_map_fn(split):

        def process_fn(example, idx):
            question = example.pop('problem')

            question = question + ' ' + instruction_following

            answer = example.pop('solution')
            solution = extract_solution(answer)
            data = {
                "data_source": data_source,
                "prompt": [{
                    "role": "user",
                    "content": question
                }],
                "ability": "math",
                "reward_model": {
                    "style": "rule",
                    "ground_truth": solution
                },
                "extra_info": {
                    'split': split,
                    'index': idx
                }
            }
            return data

        return process_fn

    train_dataset = train_dataset.map(function=make_map_fn('train'), with_indices=True)
    test_dataset = test_dataset.map(function=make_map_fn('test'), with_indices=True)

    local_dir = args.local_dir
    hdfs_dir = args.hdfs_dir

    train_dataset.to_parquet(os.path.join(local_dir, 'train.parquet'))
    test_dataset.to_parquet(os.path.join(local_dir, 'test.parquet'))

    if hdfs_dir is not None:
        makedirs(hdfs_dir)

        copy(src=local_dir, dst=hdfs_dir)


================================================
FILE: examples/data_preprocess/multiply.py
================================================
# Copyright 2024 Bytedance Ltd. and/or its affiliates
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Preprocess the GSM8k dataset to parquet format
"""

import re
import os
from datasets import Dataset

from verl.utils.hdfs_io import copy, makedirs
import argparse


from random import randint, seed
from tqdm import tqdm

def gen_dataset(
    N,
    DIGIT,
    LESS_OR_EQUAL=True,
):
    seed(1)
    # Generate N pairs of 4-digit numbers and their products
    equations = []
    for _ in tqdm(range(N)):
        # Helper function to generate a number with 50% chance of being N-digit or N/2-digit
        def get_random_num():
            r = randint(,3)
            if r == 0:
                # 2 digits less than original
                max_num = 10**(DIGIT-2)
                return randint(0 if LESS_OR_EQUAL else max_num//10, max_num-1)
            elif r == 1:
                # 1 digit less than original
                max_num = 10**(DIGIT-1)
                return randint(0 if LESS_OR_EQUAL else max_num//10, max_num-1)
            else:
                # N-digit number
                max_num = 10**DIGIT
                return randint(0 if LESS_OR_EQUAL else max_num//10, max_num-1)
        # Generate two numbers independently
        num1 = get_random_num()
        num2 = get_random_num()
        # Calculate their product
        result = num1 * num2
        equations.append((num1, num2, result))
    return equations
    
    
def extract_solution(solution_str, *args):
    solution = re.search("#### (\\-?[0-9\\.\\,]+)", solution_str)
    assert solution is not None
    final_solution = solution.group(0)
    final_solution = final_solution.split('#### ')[1].replace(',', '')
    return final_solution

def make_prefix(dp):
    num1 = dp['num1']
    num2 = dp['num2']
    prefix = f"""A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> RESULT_NUMBER </answer>. User: Give me the answer of the following equation: {num1} * {num2} = Assistant: Ok let me think about it.\n<think>"""
    return prefix

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--local_dir', default='~/data/multiply-3_digit')
    parser.add_argument('--hdfs_dir', default=None)

    args = parser.parse_args()

    data_source = 'yolo/multiply-3_digit'
    DIGIT = 3
    # N = 1000000
    N = 100000
    LESS_OR_EQUAL = True
    TRAIN_SIZE = 32768
    TEST_SIZE = 4096

    dataset = gen_dataset(N=N, DIGIT=DIGIT, LESS_OR_EQUAL=LESS_OR_EQUAL)
    dataset = list(set(dataset))
    assert len(dataset) > TRAIN_SIZE + TEST_SIZE
    train_dataset = dataset[:TRAIN_SIZE]
    test_dataset = dataset[-TEST_SIZE:]


    # add a row to each data item that represents a unique id
    def make_map_fn(split):

        def process_fn(example, idx):
            question = make_prefix(example)
            solution = example['result']
            data = {
                "data_source": data_source,
                "prompt": [{
                    "role": "user",
                    "content": question,
                }],
                "ability": "math",
                "reward_model": {
                    "style": "rule",
                    "ground_truth": solution
                },
                "extra_info": {
                    'split': split,
                    'index': idx,
                }
            }
            return data

        return process_fn
    
    def to_dataset(dataset_list):
        dataset_dict = {
            "num1": [],
            "num2": [],
            "result": [],
        }
        for dp in dataset_list:
            dataset_dict["num1"].append(dp[0])
            dataset_dict["num2"].append(dp[1])
            dataset_dict["result"].append(dp[2])
        return Dataset.from_dict(dataset_dict)

    train_dataset = to_dataset(train_dataset)
    test_dataset = to_dataset(test_dataset)

    train_dataset = train_dataset.map(function=make_map_fn('train'), with_indices=True)
    test_dataset = test_dataset.map(function=make_map_fn('test'), with_indices=True)

    local_dir = args.local_dir
    hdfs_dir = args.hdfs_dir

    train_dataset.to_parquet(os.path.join(local_dir, 'train.parquet'))
    test_dataset.to_parquet(os.path.join(local_dir, 'test.parquet'))

    if hdfs_dir is not None:
        makedirs(hdfs_dir)

        copy(src=local_dir, dst=hdfs_dir)


================================================
FILE: examples/generation/run_deepseek_v2_lite_math.sh
================================================
python3 -m verl.trainer.main_generation \
    trainer.nnodes=1 \
    trainer.n_gpus_per_node=8 \
    data.path=~/data/rlhf/gsm8k/test.parquet \
    data.prompt_key=prompt \
    data.n_samples=1 \
    data.output_path=~/data/rlhf/math/deepseek_v2_lite_gen_test.parquet \
    model.path=deepseek-ai/deepseek-llm-7b-chat \
    +model.trust_remote_code=True \
    rollout.temperature=1.0 \
    rollout.top_k=50 \
    rollout.top_p=0.7 \
    rollout.prompt_length=2048 \
    rollout.response_length=1024 \
    rollout.tensor_model_parallel_size=2 \
    rollout.gpu_memory_utilization=0.8


================================================
FILE: examples/grpo_trainer/run_deepseek7b_llm.sh
================================================
set -x

python3 -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_files=$HOME/data/gsm8k/train.parquet \
    data.val_files=$HOME/data/gsm8k/test.parquet \
    data.train_batch_size=1024 \
    data.val_batch_size=1312 \
    data.max_prompt_length=512 \
    data.max_response_length=1024 \
    actor_rollout_ref.model.path=deepseek-ai/deepseek-llm-7b-chat \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=256 \
    actor_rollout_ref.actor.ppo_micro_batch_size=128 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.001 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.grad_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size=256 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.n=5 \
    actor_rollout_ref.ref.log_prob_micro_batch_size=256 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.kl_ctrl.kl_coef=0.001 \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='verl_grpo_example_gsm8k' \
    trainer.experiment_name='deepseek_llm_7b_function_rm' \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=1 \
    trainer.save_freq=-1 \
    trainer.test_freq=5 \
    trainer.total_epochs=15 $@

================================================
FILE: examples/grpo_trainer/run_deepseek7b_llm_seq_balance.sh
================================================
set -x

python3 -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_files=$HOME/data/gsm8k/train.parquet \
    data.val_files=$HOME/data/gsm8k/test.parquet \
    data.train_batch_size=1024 \
    data.val_batch_size=1312 \
    data.max_prompt_length=512 \
    data.max_response_length=512 \
    actor_rollout_ref.model.path=deepseek-ai/deepseek-llm-7b-chat \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=256 \
    actor_rollout_ref.actor.use_dynamic_bsz=True \
    actor_rollout_ref.actor.ppo_max_token_len_per_gpu=24000 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.001 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.grad_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.n=5 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.kl_ctrl.kl_coef=0.001 \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='verl_grpo_example_gsm8k' \
    trainer.experiment_name='deepseek_llm_7b_function_rm_seq_packing' \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=1 \
    trainer.save_freq=-1 \
    trainer.test_freq=5 \
    trainer.total_epochs=15 $@

================================================
FILE: examples/grpo_trainer/run_qwen2-7b.sh
================================================
set -x

export VLLM_ATTENTION_BACKEND=XFORMERS

python3 -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_files=$HOME/data/gsm8k/train.parquet \
    data.val_files=$HOME/data/gsm8k/test.parquet \
    data.train_batch_size=1024 \
    data.val_batch_size=1312 \
    data.max_prompt_length=512 \
    data.max_response_length=1024 \
    actor_rollout_ref.model.path=Qwen/Qwen2-7B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=256 \
    actor_rollout_ref.actor.ppo_micro_batch_size=128 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.001 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.grad_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size=256 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.n=5 \
    actor_rollout_ref.ref.log_prob_micro_batch_size=256 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.kl_ctrl.kl_coef=0.001 \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='verl_grpo_example_gsm8k' \
    trainer.experiment_name='qwen2_7b_function_rm' \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=1 \
    trainer.save_freq=-1 \
    trainer.test_freq=5 \
    trainer.total_epochs=15 $@

================================================
FILE: examples/grpo_trainer/run_qwen2-7b_seq_balance.sh
================================================
set -x

export VLLM_ATTENTION_BACKEND=XFORMERS

python3 -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_files=$HOME/data/gsm8k/train.parquet \
    data.val_files=$HOME/data/gsm8k/test.parquet \
    data.train_batch_size=1024 \
    data.val_batch_size=1312 \
    data.max_prompt_length=512 \
    data.max_response_length=1024 \
    actor_rollout_ref.model.path=Qwen/Qwen2-7B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=256 \
    actor_rollout_ref.actor.use_dynamic_bsz=True \
    actor_rollout_ref.actor.ppo_max_token_len_per_gpu=24000 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.001 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.grad_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.n=5 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.kl_ctrl.kl_coef=0.001 \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='verl_grpo_example_gsm8k' \
    trainer.experiment_name='qwen2_7b_function_rm_kl1e-3' \
    +trainer.val_before_train=False \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=1 \
    trainer.save_freq=-1 \
    trainer.test_freq=5 \
    trainer.total_epochs=15 $@

================================================
FILE: examples/ppo_trainer/run_deepseek7b_llm.sh
================================================
set -x

python3 -m verl.trainer.main_ppo \
    data.train_files=$HOME/data/gsm8k/train.parquet \
    data.val_files=$HOME/data/gsm8k/test.parquet \
    data.train_batch_size=1024 \
    data.val_batch_size=1312 \
    data.max_prompt_length=512 \
    data.max_response_length=512 \
    actor_rollout_ref.model.path=deepseek-ai/deepseek-llm-7b-chat \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=256 \
    actor_rollout_ref.actor.ppo_micro_batch_size=32 \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.grad_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size=128 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=4 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
    actor_rollout_ref.ref.log_prob_micro_batch_size=128 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    critic.optim.lr=1e-5 \
    critic.model.use_remove_padding=True \
    critic.model.path=deepseek-ai/deepseek-llm-7b-chat \
    critic.model.enable_gradient_checkpointing=False \
    critic.ppo_micro_batch_size=32 \
    critic.model.fsdp_config.param_offload=False \
    critic.model.fsdp_config.grad_offload=False \
    critic.model.fsdp_config.optimizer_offload=False \
    algorithm.kl_ctrl.kl_coef=0.001 \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='verl_example_gsm8k' \
    trainer.experiment_name='deepseek_llm_7b_function_rm' \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=1 \
    trainer.save_freq=-1 \
    trainer.total_epochs=15 $@


================================================
FILE: examples/ppo_trainer/run_deepseek7b_llm_sp2.sh
================================================
set -x

python3 -m verl.trainer.main_ppo \
    data.train_files=$HOME/data/gsm8k/train.parquet \
    data.val_files=$HOME/data/gsm8k/test.parquet \
    data.train_batch_size=1024 \
    data.val_batch_size=1312 \
    data.max_prompt_length=512 \
    data.max_response_length=512 \
    actor_rollout_ref.model.path=deepseek-ai/deepseek-llm-7b-chat \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=256 \
    actor_rollout_ref.actor.ppo_micro_batch_size=128 \
    actor_rollout_ref.actor.ulysses_sequence_parallel_size=2 \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.grad_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size=256 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=4 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.ref.log_prob_micro_batch_size=256 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    critic.optim.lr=1e-5 \
    critic.ulysses_sequence_parallel_size=2 \
    critic.model.use_remove_padding=True \
    critic.model.path=deepseek-ai/deepseek-llm-7b-chat \
    critic.model.enable_gradient_checkpointing=False \
    critic.ppo_micro_batch_size=64 \
    critic.model.fsdp_config.param_offload=False \
    critic.model.fsdp_config.grad_offload=False \
    critic.model.fsdp_config.optimizer_offload=False \
    algorithm.kl_ctrl.kl_coef=0.001 \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='verl_example_gsm8k' \
    trainer.experiment_name='deepseek_llm_7b_function_rm_sp2' \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=1 \
    trainer.save_freq=-1 \
    trainer.test_freq=5 \
    trainer.total_epochs=15 $@


================================================
FILE: examples/ppo_trainer/run_deepseek_full_hh_rlhf.sh
================================================
set -x

train_files=$HOME/data/full_hh_rlhf/rl/train.parquet
test_files=$HOME/data/full_hh_rlhf/rl/train.parquet # no use

python3 -m verl.trainer.main_ppo --config-path=./config --config-name='ppo_megatron_trainer'\
    data.train_files="$train_files" \
    data.val_files="$test_files" \
    data.train_batch_size=512 \
    data.val_batch_size=128 \
    data.max_prompt_length=128 \
    data.max_response_length=128 \
    actor_rollout_ref.model.path=deepseek-ai/deepseek-llm-7b-chat \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.actor.ppo_mini_batch_size=128 \
    actor_rollout_ref.actor.ppo_micro_batch_size=16 \
    actor_rollout_ref.rollout.log_prob_micro_batch_size=16 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=4 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
    actor_rollout_ref.ref.log_prob_micro_batch_size=16 \
    actor_rollout_ref.ref.param_offload=False \
    critic.optim.lr=1e-5 \
    critic.model.path=deepseek-ai/deepseek-llm-7b-chat \
    critic.model.enable_gradient_checkpointing=False \
    critic.ppo_micro_batch_size=16 \
    reward_model.enable=True \
    reward_model.megatron.tensor_model_parallel_size=4 \
    reward_model.model.path=deepseek-ai/deepseek-llm-7b-chat \
    reward_model.micro_batch_size=16 \
    reward_model.param_offload=False \
    algorithm.kl_ctrl.kl_coef=0.001 \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='verl_megatron_full_hh_rlhf_examples' \
    trainer.experiment_name='deepseek_llm_7b_model_rm' \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=1 \
    trainer.save_freq=-1 \
    trainer.test_freq=5 \
    trainer.total_epochs=100 $@


================================================
FILE: examples/ppo_trainer/run_deepseek_math_gsm8k_megatron.sh
================================================
set -x

gsm8k_train_path=$HOME/data/gsm8k/train.parquet
gsm8k_test_path=$HOME/data/gsm8k/test.parquet
math_train_path=$HOME/data/math/train.parquet
math_test_path=$HOME/data/math/test.parquet

train_files="['$gsm8k_train_path', '$math_train_path']"
test_files="['$gsm8k_test_path', '$math_test_path']"

python3 -m verl.trainer.main_ppo --config-path=./config --config-name='ppo_megatron_trainer'\
    data.train_files="$train_files" \
    data.val_files="$test_files" \
    data.train_batch_size=1024 \
    data.val_batch_size=6312 \
    data.max_prompt_length=1024 \
    data.max_response_length=512 \
    actor_rollout_ref.model.path=deepseek-ai/deepseek-coder-6.7b-instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.actor.ppo_mini_batch_size=256 \
    actor_rollout_ref.actor.ppo_micro_batch_size=32 \
    actor_rollout_ref.rollout.log_prob_micro_batch_size=32 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=4 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
    actor_rollout_ref.ref.log_prob_micro_batch_size=32 \
    critic.optim.lr=1e-5 \
    critic.model.path=deepseek-ai/deepseek-coder-6.7b-instruct \
    critic.model.enable_gradient_checkpointing=False \
    critic.ppo_micro_batch_size=32 \
    algorithm.kl_ctrl.kl_coef=0.001 \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='verl_megatron_math_gsm8k_examples' \
    trainer.experiment_name='deepseek_llm_7b_function_rm' \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=1 \
    trainer.save_freq=-1 \
    trainer.test_freq=5 \
    trainer.total_epochs=100 $@


================================================
FILE: examples/ppo_trainer/run_deepseek_megatron.sh
================================================
set -x

python3 -m verl.trainer.main_ppo --config-path=./config --config-name='ppo_megatron_trainer'\
    data.train_files=$HOME/data/gsm8k/train.parquet \
    data.val_files=$HOME/data/gsm8k/test.parquet \
    data.train_batch_size=1024 \
    data.val_batch_size=1312 \
    data.max_prompt_length=512 \
    data.max_response_length=512 \
    actor_rollout_ref.model.path=deepseek-ai/deepseek-coder-6.7b-instruct \
    actor_rollout_ref.actor.optim.lr=2e-6 \
    actor_rollout_ref.actor.ppo_mini_batch_size=256 \
    actor_rollout_ref.actor.ppo_micro_batch_size=64 \
    actor_rollout_ref.rollout.log_prob_micro_batch_size=64 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.5 \
    actor_rollout_ref.ref.log_prob_micro_batch_size=128 \
    critic.optim.lr=2e-5 \
    critic.model.path=deepseek-ai/deepseek-coder-6.7b-instruct \
    critic.model.enable_gradient_checkpointing=False \
    critic.ppo_micro_batch_size=64 \
    algorithm.kl_ctrl.kl_coef=0.001 \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='verl_megatron_gsm8k_examples' \
    trainer.experiment_name='deepseek_llm_7b_function_rm' \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=1 \
    trainer.save_freq=-1 \
    trainer.total_epochs=15 \
    +trainer.val_before_train=False $@


================================================
FILE: examples/ppo_trainer/run_gemma.sh
================================================
set -x

python3 -m verl.trainer.main_ppo \
    data.train_files=$HOME/data/gsm8k/train.parquet \
    data.val_files=$HOME/data/gsm8k/test.parquet \
    data.train_batch_size=512 \
    data.val_batch_size=1312 \
    data.max_prompt_length=1024 \
    data.max_response_length=512 \
    actor_rollout_ref.model.path=google/gemma-2-2b-it \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=128 \
    actor_rollout_ref.actor.ppo_micro_batch_size=4 \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.grad_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size=4 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
    actor_rollout_ref.ref.log_prob_micro_batch_size=4 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    critic.optim.lr=1e-5 \
    critic.model.use_remove_padding=True \
    critic.model.path=google/gemma-2-2b-it \
    critic.model.enable_gradient_checkpointing=False \
    critic.ppo_micro_batch_size=4 \
    critic.model.fsdp_config.param_offload=False \
    critic.model.fsdp_config.grad_offload=False \
    critic.model.fsdp_config.optimizer_offload=False \
    algorithm.kl_ctrl.kl_coef=0.001 \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='verl_example' \
    trainer.experiment_name='gemma2b_function_rm' \
    trainer.n_gpus_per_node=2 \
    trainer.nnodes=1 \
    trainer.save_freq=-1 \
    trainer.test_freq=10 \
    trainer.total_epochs=15 $@


================================================
FILE: examples/ppo_trainer/run_qwen2-7b.sh
================================================
set -x

gsm8k_train_path=$HOME/data/gsm8k/train.parquet
gsm8k_test_path=$HOME/data/gsm8k/test.parquet
math_train_path=$HOME/data/math/train.parquet
math_test_path=$HOME/data/math/test.parquet

train_files="['$gsm8k_train_path', '$math_train_path']"
test_files="['$gsm8k_test_path', '$math_test_path']"

python3 -m verl.trainer.main_ppo \
    data.train_files="$train_files" \
    data.val_files="$test_files" \
    data.train_batch_size=1024 \
    data.val_batch_size=6312 \
    data.max_prompt_length=1024 \
    data.max_response_length=512 \
    actor_rollout_ref.model.path=Qwen/Qwen2-7B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=256 \
    actor_rollout_ref.actor.ppo_micro_batch_size=16 \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.grad_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size=16 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
    actor_rollout_ref.ref.log_prob_micro_batch_size=16 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    critic.optim.lr=1e-5 \
    critic.model.use_remove_padding=True \
    critic.model.path=Qwen/Qwen2-7B-Instruct \
    critic.model.enable_gradient_checkpointing=False \
    critic.ppo_micro_batch_size=16 \
    critic.model.fsdp_config.param_offload=False \
    critic.model.fsdp_config.grad_offload=False \
    critic.model.fsdp_config.optimizer_offload=False \
    algorithm.kl_ctrl.kl_coef=0.001 \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='verl_example' \
    trainer.experiment_name='Qwen2-7B-Instruct_function_rm' \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=1 \
    trainer.save_freq=-1 \
    trainer.test_freq=10 \
    trainer.total_epochs=15 $@


================================================
FILE: examples/ppo_trainer/run_qwen2-7b_rm.sh
================================================
set -x
# Discliamer: the model used in the script is only for academic example,
gsm8k_train_path=$HOME/data/gsm8k/train.parquet
gsm8k_test_path=$HOME/data/gsm8k/test.parquet
math_train_path=$HOME/data/math/train.parquet
math_test_path=$HOME/data/math/test.parquet

train_files="['$gsm8k_train_path', '$math_train_path']"
test_files="['$gsm8k_test_path', '$math_test_path']"

export VLLM_ATTENTION_BACKEND=XFORMERS # vllm + qwen2-7b with flash_attn has some issues

python3 -m verl.trainer.main_ppo \
    data.train_files="$train_files" \
    data.val_files="$test_files" \
    data.train_batch_size=1024 \
    data.val_batch_size=6312 \
    data.max_prompt_length=1024 \
    data.max_response_length=512 \
    data.return_raw_chat=True \
    actor_rollout_ref.model.path=Qwen/Qwen2-7B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.optim.lr_warmup_steps_ratio=0.1 \
    actor_rollout_ref.actor.ppo_mini_batch_size=256 \
    actor_rollout_ref.actor.ppo_micro_batch_size=16 \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.grad_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size=16 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.ref.log_prob_micro_batch_size=16 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    critic.optim.lr=1e-5 \
    critic.model.use_remove_padding=True \
    critic.optim.lr_warmup_steps_ratio=0.05 \
    critic.model.path=Qwen/Qwen2-7B-Instruct \
    critic.model.enable_gradient_checkpointing=False \
    critic.ppo_micro_batch_size=16 \
    critic.model.fsdp_config.param_offload=False \
    critic.model.fsdp_config.grad_offload=False \
    critic.model.fsdp_config.optimizer_offload=False \
    reward_model.enable=True \
    reward_model.model.path=sfairXC/FsfairX-LLaMA3-RM-v0.1\
    reward_model.model.use_remove_padding=True \
    reward_model.model.fsdp_config.param_offload=True \
    reward_model.micro_batch_size=16 \
    algorithm.kl_ctrl.kl_coef=0.001 \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='verl_example' \
    trainer.experiment_name='Qwen2-7B-Instruct_hybrid_rm' \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=1 \
    trainer.save_freq=-1 \
    trainer.test_freq=5 \
    trainer.total_epochs=15 $@


================================================
FILE: examples/ppo_trainer/run_qwen2-7b_rm_seq_balance.sh
================================================
set -x

gsm8k_train_path=$HOME/data/gsm8k/train.parquet
gsm8k_test_path=$HOME/data/gsm8k/test.parquet
math_train_path=$HOME/data/math/train.parquet
math_test_path=$HOME/data/math/test.parquet

train_files="['$gsm8k_train_path', '$math_train_path']"
test_files="['$gsm8k_test_path', '$math_test_path']"

python3 -m verl.trainer.main_ppo \
    data.train_files="$train_files" \
    data.val_files="$test_files" \
    data.train_batch_size=4096 \
    data.val_batch_size=1312 \
    data.max_prompt_length=4096 \
    data.max_response_length=4096 \
    data.return_raw_chat=True \
    actor_rollout_ref.model.path=Qwen/Qwen2-7B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=512 \
    actor_rollout_ref.actor.use_dynamic_bsz=True \
    actor_rollout_ref.actor.ppo_max_token_len_per_gpu=24000 \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.grad_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.5 \
    actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu=24000 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    actor_rollout_ref.ref.log_prob_max_token_len_per_gpu=24000 \
    critic.optim.lr=1e-5 \
    critic.model.use_remove_padding=True \
    critic.model.path=Qwen/Qwen2-7B-Instruct \
    critic.model.enable_gradient_checkpointing=True \
    critic.use_dynamic_bsz=True \
    critic.ppo_max_token_len_per_gpu=98304 \
    critic.model.fsdp_config.param_offload=False \
    critic.model.fsdp_config.grad_offload=False \
    critic.model.fsdp_config.optimizer_offload=False \
    reward_model.enable=True \
    reward_model.model.path=sfairXC/FsfairX-LLaMA3-RM-v0.1\
    reward_model.model.use_remove_padding=True \
    reward_model.model.fsdp_config.param_offload=True \
    reward_model.micro_batch_size=16 \
    reward_model.use_dynamic_bsz=True \
    reward_model.forward_max_token_len_per_gpu=98304 \
    algorithm.kl_ctrl.kl_coef=0.001 \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='verl_example_gsm8k' \
    trainer.experiment_name='qwen2-7b_hybrid_rm_bsz8k_p4k_r4k_seq_packing' \
    trainer.n_gpus_per_node=8 \
    +trainer.val_before_train=False \
    trainer.nnodes=1 \
    trainer.save_freq=-1 \
    trainer.test_freq=5 \
    trainer.total_epochs=15 $@


================================================
FILE: examples/ppo_trainer/run_qwen2-7b_seq_balance.sh
================================================
set -x

gsm8k_train_path=$HOME/data/gsm8k/train.parquet
gsm8k_test_path=$HOME/data/gsm8k/test.parquet
math_train_path=$HOME/data/math/train.parquet
math_test_path=$HOME/data/math/test.parquet

train_files="['$gsm8k_train_path', '$math_train_path']"
test_files="['$gsm8k_test_path', '$math_test_path']"

python3 -m verl.trainer.main_ppo \
    data.train_files="$train_files" \
    data.val_files="$test_files" \
    data.train_batch_size=4096 \
    data.val_batch_size=1312 \
    data.max_prompt_length=4096 \
    data.max_response_length=4096 \
    actor_rollout_ref.model.path=Qwen/Qwen2-7B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=512 \
    actor_rollout_ref.actor.use_dynamic_bsz=True \
    actor_rollout_ref.actor.ppo_max_token_len_per_gpu=24000 \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.grad_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.5 \
    actor_rollout_ref.roll
Download .txt
gitextract_d67wvh9g/

├── .github/
│   └── workflows/
│       ├── dataset.yml
│       ├── e2e_digit_completion.yml
│       ├── e2e_gsm8k.yml
│       ├── model.yml
│       ├── ray_test.yml
│       ├── sanity.yml
│       ├── vllm.yml
│       └── yapf_format.yml
├── .gitignore
├── .readthedocs.yaml
├── .style.yapf
├── LICENSE
├── Notice.txt
├── OLD_README.md
├── README.md
├── docker/
│   ├── Dockerfile.ngc.vllm
│   └── Dockerfile.vemlp.vllm.te
├── docs/
│   ├── Makefile
│   ├── README.md
│   ├── advance/
│   │   ├── dpo_extension.rst
│   │   ├── fsdp_extension.rst
│   │   ├── megatron_extension.rst
│   │   └── placement.rst
│   ├── conf.py
│   ├── examples/
│   │   ├── config.rst
│   │   ├── gsm8k_example.rst
│   │   └── ppo_code_architecture.rst
│   ├── experiment/
│   │   └── ppo.rst
│   ├── faq/
│   │   └── faq.rst
│   ├── index.rst
│   ├── preparation/
│   │   ├── prepare_data.rst
│   │   └── reward_function.rst
│   ├── requirements-docs.txt
│   ├── start/
│   │   ├── install.rst
│   │   └── quickstart.rst
│   └── workers/
│       ├── fsdp_workers.rst
│       ├── megatron_workers.rst
│       └── ray_trainer.rst
├── examples/
│   ├── data_preprocess/
│   │   ├── arth.py
│   │   ├── countdown.py
│   │   ├── full_hh_rlhf.py
│   │   ├── gsm8k.py
│   │   ├── hellaswag.py
│   │   ├── math_dataset.py
│   │   └── multiply.py
│   ├── generation/
│   │   └── run_deepseek_v2_lite_math.sh
│   ├── grpo_trainer/
│   │   ├── run_deepseek7b_llm.sh
│   │   ├── run_deepseek7b_llm_seq_balance.sh
│   │   ├── run_qwen2-7b.sh
│   │   └── run_qwen2-7b_seq_balance.sh
│   ├── ppo_trainer/
│   │   ├── run_deepseek7b_llm.sh
│   │   ├── run_deepseek7b_llm_sp2.sh
│   │   ├── run_deepseek_full_hh_rlhf.sh
│   │   ├── run_deepseek_math_gsm8k_megatron.sh
│   │   ├── run_deepseek_megatron.sh
│   │   ├── run_gemma.sh
│   │   ├── run_qwen2-7b.sh
│   │   ├── run_qwen2-7b_rm.sh
│   │   ├── run_qwen2-7b_rm_seq_balance.sh
│   │   ├── run_qwen2-7b_seq_balance.sh
│   │   ├── run_qwen2.5-32b.sh
│   │   └── verl_getting_started.ipynb
│   ├── ray/
│   │   └── tutorial.ipynb
│   ├── sft/
│   │   └── gsm8k/
│   │       ├── run_deepseek_6b7.sh
│   │       ├── run_gemma_2b.sh
│   │       └── run_gemma_7b.sh
│   └── split_placement/
│       ├── README.md
│       ├── config/
│       │   └── ppo_trainer_split.yaml
│       ├── main_ppo_split.py
│       ├── run_deepseek7b_llm.sh
│       └── split_monkey_patch.py
├── patches/
│   └── megatron_v4.patch
├── pyproject.toml
├── requirements.txt
├── scripts/
│   ├── format.sh
│   └── train_tiny_zero.sh
├── setup.py
├── tests/
│   ├── __init__.py
│   ├── e2e/
│   │   ├── __init__.py
│   │   ├── arithmetic_sequence/
│   │   │   ├── data/
│   │   │   │   ├── create_dataset.py
│   │   │   │   ├── test.parquet
│   │   │   │   └── train.parquet
│   │   │   ├── model/
│   │   │   │   ├── config.json
│   │   │   │   ├── create_model_tokenizer.py
│   │   │   │   ├── generation_config.json
│   │   │   │   ├── model.safetensors
│   │   │   │   └── tokenizer_config.json
│   │   │   └── rl/
│   │   │       ├── README.md
│   │   │       ├── config/
│   │   │       │   └── ray_trainer.yaml
│   │   │       └── main_trainer.py
│   │   ├── check_results.py
│   │   ├── envs/
│   │   │   ├── __init__.py
│   │   │   └── digit_completion/
│   │   │       ├── __init__.py
│   │   │       ├── task.py
│   │   │       └── tokenizer.py
│   │   ├── run_qwen_gsm8k_function_rm.sh
│   │   ├── run_qwen_gsm8k_function_rm_no_rmpad.sh
│   │   ├── run_qwen_gsm8k_model_rm.sh
│   │   ├── run_qwen_gsm8k_model_rm_no_rmpad.sh
│   │   ├── run_qwen_gsm8k_model_rm_seq_balance.sh
│   │   ├── run_qwen_gsm8k_model_rm_ulysses.sh
│   │   ├── run_ray_trainer.sh
│   │   └── run_ray_trainer_rmpad.sh
│   ├── gpu_utility/
│   │   ├── test_memory_buffers.py
│   │   ├── test_ops.py
│   │   └── test_torch_functional.py
│   ├── model/
│   │   ├── test_transformer.py
│   │   └── test_transformers_ulysses.py
│   ├── ray/
│   │   ├── check_worker_alive/
│   │   │   └── main.py
│   │   ├── detached_worker/
│   │   │   ├── README.md
│   │   │   ├── client.py
│   │   │   ├── run.sh
│   │   │   └── server.py
│   │   ├── test_check_worker_alive.py
│   │   ├── test_colocated_workers.py
│   │   ├── test_data_transfer.py
│   │   ├── test_driverfunc_to_worker.py
│   │   ├── test_high_level_scheduling_api.py
│   │   ├── test_ray_local_envs.py
│   │   ├── test_rvdz.py
│   │   ├── test_worker_group_basics.py
│   │   └── test_worker_group_torch.py
│   ├── rollout/
│   │   ├── run_fsdp_vllm.py
│   │   └── test_vllm_hf_loader.py
│   ├── sanity/
│   │   ├── check_license.py
│   │   └── test_import.py
│   ├── utility/
│   │   └── test_tensor_dict_utilities.py
│   └── verl/
│       └── utils/
│           └── dataset/
│               ├── test_rl_dataset.py
│               ├── test_rm_dataset.py
│               └── test_sft_dataset.py
└── verl/
    ├── __init__.py
    ├── models/
    │   ├── README.md
    │   ├── __init__.py
    │   ├── llama/
    │   │   ├── __init__.py
    │   │   └── megatron/
    │   │       ├── __init__.py
    │   │       ├── checkpoint_utils/
    │   │       │   ├── __init__.py
    │   │       │   ├── llama_loader.py
    │   │       │   └── llama_saver.py
    │   │       ├── layers/
    │   │       │   ├── __init__.py
    │   │       │   ├── parallel_attention.py
    │   │       │   ├── parallel_decoder.py
    │   │       │   ├── parallel_linear.py
    │   │       │   ├── parallel_mlp.py
    │   │       │   └── parallel_rmsnorm.py
    │   │       └── modeling_llama_megatron.py
    │   ├── registry.py
    │   ├── transformers/
    │   │   ├── __init__.py
    │   │   ├── llama.py
    │   │   ├── monkey_patch.py
    │   │   └── qwen2.py
    │   └── weight_loader_registry.py
    ├── protocol.py
    ├── single_controller/
    │   ├── __init__.py
    │   ├── base/
    │   │   ├── __init__.py
    │   │   ├── decorator.py
    │   │   ├── megatron/
    │   │   │   ├── __init__.py
    │   │   │   ├── worker.py
    │   │   │   └── worker_group.py
    │   │   ├── register_center/
    │   │   │   ├── __init__.py
    │   │   │   └── ray.py
    │   │   ├── worker.py
    │   │   └── worker_group.py
    │   ├── ray/
    │   │   ├── __init__.py
    │   │   ├── base.py
    │   │   └── megatron.py
    │   └── version/
    │       └── version
    ├── third_party/
    │   ├── __init__.py
    │   └── vllm/
    │       ├── __init__.py
    │       ├── vllm_v_0_3_1/
    │       │   ├── __init__.py
    │       │   ├── arg_utils.py
    │       │   ├── config.py
    │       │   ├── llm.py
    │       │   ├── llm_engine_sp.py
    │       │   ├── model_loader.py
    │       │   ├── model_runner.py
    │       │   ├── parallel_state.py
    │       │   ├── tokenizer.py
    │       │   ├── weight_loaders.py
    │       │   └── worker.py
    │       ├── vllm_v_0_4_2/
    │       │   ├── __init__.py
    │       │   ├── arg_utils.py
    │       │   ├── config.py
    │       │   ├── dtensor_weight_loaders.py
    │       │   ├── hf_weight_loader.py
    │       │   ├── llm.py
    │       │   ├── llm_engine_sp.py
    │       │   ├── megatron_weight_loaders.py
    │       │   ├── model_loader.py
    │       │   ├── model_runner.py
    │       │   ├── parallel_state.py
    │       │   ├── spmd_gpu_executor.py
    │       │   ├── tokenizer.py
    │       │   └── worker.py
    │       ├── vllm_v_0_5_4/
    │       │   ├── __init__.py
    │       │   ├── arg_utils.py
    │       │   ├── config.py
    │       │   ├── dtensor_weight_loaders.py
    │       │   ├── hf_weight_loader.py
    │       │   ├── llm.py
    │       │   ├── llm_engine_sp.py
    │       │   ├── megatron_weight_loaders.py
    │       │   ├── model_loader.py
    │       │   ├── model_runner.py
    │       │   ├── parallel_state.py
    │       │   ├── spmd_gpu_executor.py
    │       │   ├── tokenizer.py
    │       │   └── worker.py
    │       └── vllm_v_0_6_3/
    │           ├── __init__.py
    │           ├── arg_utils.py
    │           ├── config.py
    │           ├── dtensor_weight_loaders.py
    │           ├── hf_weight_loader.py
    │           ├── llm.py
    │           ├── llm_engine_sp.py
    │           ├── megatron_weight_loaders.py
    │           ├── model_loader.py
    │           ├── model_runner.py
    │           ├── parallel_state.py
    │           ├── spmd_gpu_executor.py
    │           ├── tokenizer.py
    │           └── worker.py
    ├── trainer/
    │   ├── __init__.py
    │   ├── config/
    │   │   ├── evaluation.yaml
    │   │   ├── generation.yaml
    │   │   ├── ppo_megatron_trainer.yaml
    │   │   ├── ppo_trainer.yaml
    │   │   └── sft_trainer.yaml
    │   ├── fsdp_sft_trainer.py
    │   ├── main_eval.py
    │   ├── main_generation.py
    │   ├── main_ppo.py
    │   ├── ppo/
    │   │   ├── __init__.py
    │   │   ├── core_algos.py
    │   │   └── ray_trainer.py
    │   └── runtime_env.yaml
    ├── utils/
    │   ├── __init__.py
    │   ├── config.py
    │   ├── dataset/
    │   │   ├── README.md
    │   │   ├── __init__.py
    │   │   ├── rl_dataset.py
    │   │   ├── rm_dataset.py
    │   │   └── sft_dataset.py
    │   ├── debug/
    │   │   ├── __init__.py
    │   │   ├── performance.py
    │   │   └── trajectory_tracker.py
    │   ├── distributed.py
    │   ├── flops_counter.py
    │   ├── fs.py
    │   ├── fsdp_utils.py
    │   ├── hdfs_io.py
    │   ├── import_utils.py
    │   ├── logger/
    │   │   ├── __init__.py
    │   │   └── aggregate_logger.py
    │   ├── logging_utils.py
    │   ├── megatron/
    │   │   ├── __init__.py
    │   │   ├── memory.py
    │   │   ├── optimizer.py
    │   │   ├── optimizer_config.py
    │   │   ├── pipeline_parallel.py
    │   │   ├── sequence_parallel.py
    │   │   └── tensor_parallel.py
    │   ├── megatron_utils.py
    │   ├── memory_buffer.py
    │   ├── model.py
    │   ├── py_functional.py
    │   ├── ray_utils.py
    │   ├── rendezvous/
    │   │   ├── __init__.py
    │   │   └── ray_backend.py
    │   ├── reward_score/
    │   │   ├── __init__.py
    │   │   ├── countdown.py
    │   │   ├── gsm8k.py
    │   │   ├── math.py
    │   │   └── multiply.py
    │   ├── seqlen_balancing.py
    │   ├── tokenizer.py
    │   ├── torch_dtypes.py
    │   ├── torch_functional.py
    │   ├── tracking.py
    │   └── ulysses.py
    ├── version/
    │   └── version
    └── workers/
        ├── __init__.py
        ├── actor/
        │   ├── __init__.py
        │   ├── base.py
        │   ├── dp_actor.py
        │   └── megatron_actor.py
        ├── critic/
        │   ├── __init__.py
        │   ├── base.py
        │   ├── dp_critic.py
        │   └── megatron_critic.py
        ├── fsdp_workers.py
        ├── megatron_workers.py
        ├── reward_model/
        │   ├── __init__.py
        │   ├── base.py
        │   └── megatron/
        │       ├── __init__.py
        │       └── reward_model.py
        ├── rollout/
        │   ├── __init__.py
        │   ├── base.py
        │   ├── hf_rollout.py
        │   ├── naive/
        │   │   ├── __init__.py
        │   │   └── naive_rollout.py
        │   ├── tokenizer.py
        │   └── vllm_rollout/
        │       ├── __init__.py
        │       └── vllm_rollout.py
        └── sharding_manager/
            ├── __init__.py
            ├── base.py
            ├── fsdp_ulysses.py
            ├── fsdp_vllm.py
            └── megatron_vllm.py
Download .txt
SYMBOL INDEX (1406 symbols across 170 files)

FILE: examples/data_preprocess/arth.py
  function gen_dataset (line 29) | def gen_dataset(
  function make_prefix (line 79) | def make_prefix(dp):
  function make_map_fn (line 109) | def make_map_fn(split):
  function to_dataset (line 134) | def to_dataset(dataset_list):

FILE: examples/data_preprocess/countdown.py
  function gen_dataset (line 15) | def gen_dataset(
  function make_prefix (line 53) | def make_prefix(dp, template_type):
  function make_map_fn (line 94) | def make_map_fn(split):

FILE: examples/data_preprocess/full_hh_rlhf.py
  function generate_sft_dataset (line 30) | def generate_sft_dataset(target_hdfs_path_dir, local_dir='~/data/full_hh...
  function generate_rm_dataset (line 58) | def generate_rm_dataset(target_hdfs_path_dir, local_dir='~/data/full_hh_...
  function generate_rl_dataset (line 86) | def generate_rl_dataset(target_hdfs_path_dir, local_dir='~/data/full_hh_...

FILE: examples/data_preprocess/gsm8k.py
  function extract_solution (line 26) | def extract_solution(solution_str):
  function make_map_fn (line 52) | def make_map_fn(split):

FILE: examples/data_preprocess/hellaswag.py
  function preprocess (line 27) | def preprocess(text):
  function make_map_fn (line 53) | def make_map_fn(split):

FILE: examples/data_preprocess/math_dataset.py
  function extract_solution (line 27) | def extract_solution(solution_str):
  function make_map_fn (line 48) | def make_map_fn(split):

FILE: examples/data_preprocess/multiply.py
  function gen_dataset (line 29) | def gen_dataset(
  function extract_solution (line 62) | def extract_solution(solution_str, *args):
  function make_prefix (line 69) | def make_prefix(dp):
  function make_map_fn (line 98) | def make_map_fn(split):
  function to_dataset (line 123) | def to_dataset(dataset_list):

FILE: examples/split_placement/main_ppo_split.py
  function _select_rm_score_fn (line 24) | def _select_rm_score_fn(data_source):
  class RewardManager (line 33) | class RewardManager():
    method __init__ (line 35) | def __init__(self, tokenizer, num_examine) -> None:
    method __call__ (line 39) | def __call__(self, data: DataProto):
  function main (line 93) | def main(config):
  function main_task (line 102) | def main_task(config):

FILE: examples/split_placement/split_monkey_patch.py
  function fit (line 25) | def fit(self):

FILE: tests/e2e/arithmetic_sequence/rl/main_trainer.py
  function make_reward_function (line 35) | def make_reward_function(tokenizer, num_examine):
  function main (line 92) | def main(config):

FILE: tests/e2e/check_results.py
  function extract_reward_from_line (line 20) | def extract_reward_from_line(line):

FILE: tests/e2e/envs/digit_completion/task.py
  class DigitCompletion (line 19) | class DigitCompletion(object):
    method __init__ (line 35) | def __init__(self, max_number: int, max_diff: int, max_num_in_response...
    method __str__ (line 56) | def __str__(self):
    method get_state (line 61) | def get_state(self):
    method set_state (line 64) | def set_state(self, state):
    method prompt_length (line 69) | def prompt_length(self):
    method response_length (line 73) | def response_length(self):
    method add (line 78) | def add(self, a, b):
    method get_all_prompts (line 81) | def get_all_prompts(self):
    method sample_str_prompts (line 91) | def sample_str_prompts(self):
    method sample_batch_str_prompts (line 100) | def sample_batch_str_prompts(self, batch_size):
  function compute_attention_mask (line 107) | def compute_attention_mask(prompts, pad_token_id):
  function compute_position_id_with_mask (line 113) | def compute_position_id_with_mask(mask):
  function generate_ground_truth_response (line 117) | def generate_ground_truth_response(prompt: str):
  function compute_reward (line 137) | def compute_reward(prompt: str, response: str, sequence_reward=1.):

FILE: tests/e2e/envs/digit_completion/tokenizer.py
  class CharTokenizer (line 29) | class CharTokenizer(PreTrainedTokenizer):
    method __init__ (line 31) | def __init__(self, characters: Sequence[str], model_max_length: int, c...
    method vocab_size (line 86) | def vocab_size(self) -> int:
    method get_vocab (line 89) | def get_vocab(self):
    method _tokenize (line 92) | def _tokenize(self, text: str) -> List[str]:
    method _convert_token_to_id (line 95) | def _convert_token_to_id(self, token: str) -> int:
    method _convert_id_to_token (line 98) | def _convert_id_to_token(self, index: int) -> str:
    method convert_tokens_to_string (line 101) | def convert_tokens_to_string(self, tokens):
    method build_inputs_with_special_tokens (line 104) | def build_inputs_with_special_tokens(self,
    method get_special_tokens_mask (line 114) | def get_special_tokens_mask(
    method get_config (line 132) | def get_config(self) -> Dict:
    method from_config (line 140) | def from_config(cls, config: Dict) -> "DigitCompletionTokenizer":
    method save_pretrained (line 147) | def save_pretrained(self, save_directory: Union[str, os.PathLike], **k...
    method from_pretrained (line 154) | def from_pretrained(cls, save_directory: Union[str, os.PathLike], **kw...

FILE: tests/gpu_utility/test_memory_buffers.py
  function test_memory_buffers (line 27) | def test_memory_buffers():

FILE: tests/gpu_utility/test_ops.py
  function test_flash_attn_cross_entropy (line 16) | def test_flash_attn_cross_entropy():

FILE: tests/gpu_utility/test_torch_functional.py
  function test_log_probs_from_logits_response_rmpad (line 20) | def test_log_probs_from_logits_response_rmpad():
  function test_lr_scheduler (line 52) | def test_lr_scheduler():

FILE: tests/model/test_transformer.py
  function test_hf_casual_models (line 32) | def test_hf_casual_models():
  function test_hf_value_models (line 91) | def test_hf_value_models():

FILE: tests/model/test_transformers_ulysses.py
  function sync_model_parameters_global (line 44) | def sync_model_parameters_global(layer):
  function test_hf_casual_fwd (line 50) | def test_hf_casual_fwd():
  function test_hf_casual_fwd_bwd (line 128) | def test_hf_casual_fwd_bwd():

FILE: tests/ray/check_worker_alive/main.py
  class TestActor (line 27) | class TestActor(Worker):
    method __init__ (line 29) | def __init__(self) -> None:
    method foo (line 33) | def foo(self, wait_time):

FILE: tests/ray/detached_worker/client.py
  function compute_position_id_with_mask (line 30) | def compute_position_id_with_mask(mask):

FILE: tests/ray/detached_worker/server.py
  class Trainer (line 49) | class Trainer(MegatronWorker):
    method __init__ (line 51) | def __init__(self):
    method init_model (line 73) | def init_model(self):
    method train_model (line 120) | def train_model(self, data: DataProto) -> DataProto:

FILE: tests/ray/test_check_worker_alive.py
  function test (line 20) | def test():

FILE: tests/ray/test_colocated_workers.py
  class Actor (line 25) | class Actor(Worker):
    method __init__ (line 27) | def __init__(self) -> None:
    method add (line 31) | def add(self, data: DataProto):
  class Critic (line 37) | class Critic(Worker):
    method __init__ (line 39) | def __init__(self, config) -> None:
    method sub (line 44) | def sub(self, data: DataProto):
  function test_colocated_workers (line 49) | def test_colocated_workers():

FILE: tests/ray/test_data_transfer.py
  class DummyWorker (line 37) | class DummyWorker(Worker):
    method __init__ (line 39) | def __init__(self):
    method do_nothing (line 44) | def do_nothing(self, data):
  function test_data_transfer (line 52) | def test_data_transfer():

FILE: tests/ray/test_driverfunc_to_worker.py
  class ModelActor (line 30) | class ModelActor(Worker):
    method __init__ (line 32) | def __init__(self):
  class HackSelf (line 36) | class HackSelf():
    method __init__ (line 38) | def __init__(self):
  function get_aux_metrics (line 42) | def get_aux_metrics(self, test_proto):
  function test (line 55) | def test():

FILE: tests/ray/test_high_level_scheduling_api.py
  class TestActor (line 24) | class TestActor(Worker):
    method __init__ (line 26) | def __init__(self, cuda_visible_devices=None) -> None:
    method get_node_id (line 29) | def get_node_id(self):
  function test (line 33) | def test():

FILE: tests/ray/test_ray_local_envs.py
  class TestActor (line 26) | class TestActor(Worker):
    method __init__ (line 28) | def __init__(self) -> None:
    method getenv (line 31) | def getenv(self, key):
  function test_basics (line 36) | def test_basics():

FILE: tests/ray/test_rvdz.py
  class TestWorker (line 19) | class TestWorker:
    method __init__ (line 21) | def __init__(self, rank, world_size, group_name):
    method init (line 27) | def init(self):
    method test (line 31) | def test(self):
  function test_rvdz (line 37) | def test_rvdz():

FILE: tests/ray/test_worker_group_basics.py
  function two_to_all_dispatch_fn (line 26) | def two_to_all_dispatch_fn(worker_group, *args, **kwargs):
  class TestActor (line 42) | class TestActor(Worker):
    method __init__ (line 44) | def __init__(self, x) -> None:
    method foo (line 48) | def foo(self, y):
    method foo_rank_zero (line 52) | def foo_rank_zero(self, x, y):
    method foo_one_to_all (line 56) | def foo_one_to_all(self, x, y):
    method foo_all_to_all (line 60) | def foo_all_to_all(self, x, y):
    method foo_custom (line 64) | def foo_custom(self, x, y):
  function remote_call_wg (line 69) | def remote_call_wg(worker_names):
  function add_one (line 83) | def add_one(data):
  function test_basics (line 90) | def test_basics():

FILE: tests/ray/test_worker_group_torch.py
  class TestAllGatherActor (line 29) | class TestAllGatherActor(Worker):
    method __init__ (line 31) | def __init__(self, size) -> None:
    method init (line 35) | def init(self):
    method all_gather (line 40) | def all_gather(self):
  class TestAllGatherActorV2 (line 50) | class TestAllGatherActorV2(Worker):
    method __init__ (line 52) | def __init__(self, size) -> None:
    method all_gather (line 60) | def all_gather(self):
  function test_all_gather_torch (line 69) | def test_all_gather_torch():
  function test_all_gather_torch_v2 (line 93) | def test_all_gather_torch_v2():

FILE: tests/rollout/run_fsdp_vllm.py
  function main (line 27) | def main():

FILE: tests/rollout/test_vllm_hf_loader.py
  function levenshtein (line 30) | def levenshtein(s1, s2):
  function are_lists_similar (line 51) | def are_lists_similar(a, b):
  function test_vllm_with_hf (line 72) | def test_vllm_with_hf():

FILE: tests/sanity/test_import.py
  function test_import (line 16) | def test_import():
  function test_single_controller_import (line 21) | def test_single_controller_import():

FILE: tests/utility/test_tensor_dict_utilities.py
  function test_union_tensor_dict (line 26) | def test_union_tensor_dict():
  function test_tensor_dict_constructor (line 52) | def test_tensor_dict_constructor():
  function test_tensor_dict_make_iterator (line 66) | def test_tensor_dict_make_iterator():
  function test_reorder (line 95) | def test_reorder():
  function test_chunk_concat (line 106) | def test_chunk_concat():
  function test_pop (line 130) | def test_pop():
  function test_repeat (line 143) | def test_repeat():
  function test_dataproto_pad_unpad (line 168) | def test_dataproto_pad_unpad():
  function test_dataproto_fold_unfold (line 206) | def test_dataproto_fold_unfold():
  function test_torch_save_data_proto (line 229) | def test_torch_save_data_proto():
  function test_len (line 245) | def test_len():
  function test_seqlen_balancing (line 265) | def test_seqlen_balancing():

FILE: tests/verl/utils/dataset/test_rl_dataset.py
  function get_gsm8k_data (line 20) | def get_gsm8k_data():
  function test_rl_dataset (line 29) | def test_rl_dataset():

FILE: tests/verl/utils/dataset/test_rm_dataset.py
  function get_rm_data (line 21) | def get_rm_data():
  function test_rm_dataset (line 30) | def test_rm_dataset():

FILE: tests/verl/utils/dataset/test_sft_dataset.py
  function get_gsm8k_data (line 21) | def get_gsm8k_data():
  function test_sft_cot_dataset (line 29) | def test_sft_cot_dataset():
  function test_sft_dataset (line 46) | def test_sft_dataset():

FILE: verl/models/llama/megatron/checkpoint_utils/llama_loader.py
  function _megatron_calc_layer_map (line 21) | def _megatron_calc_layer_map(config):
  function load_state_dict_to_megatron_llama (line 51) | def load_state_dict_to_megatron_llama(state_dict, wrapped_models, config...

FILE: verl/models/llama/megatron/checkpoint_utils/llama_saver.py
  function _megatron_calc_global_rank (line 28) | def _megatron_calc_global_rank(tp_rank: int = 0, dp_rank: int = 0, pp_ra...
  function _megatron_calc_layer_map (line 45) | def _megatron_calc_layer_map(config):
  function merge_megatron_ckpt_llama (line 76) | def merge_megatron_ckpt_llama(wrapped_models, config, is_value_model=Fal...

FILE: verl/models/llama/megatron/layers/parallel_attention.py
  class LlamaRotaryEmbedding (line 35) | class LlamaRotaryEmbedding(nn.Module):
    method __init__ (line 37) | def __init__(self, dim, max_position_embeddings=2048, base=10000, devi...
    method _set_cos_sin_cache (line 51) | def _set_cos_sin_cache(self, seq_len, device, dtype):
    method forward (line 61) | def forward(self, x, seq_len=None):
  class LlamaLinearScalingRotaryEmbedding (line 72) | class LlamaLinearScalingRotaryEmbedding(LlamaRotaryEmbedding):
    method __init__ (line 75) | def __init__(self, dim, max_position_embeddings=2048, base=10000, devi...
    method _set_cos_sin_cache (line 79) | def _set_cos_sin_cache(self, seq_len, device, dtype):
  class LlamaDynamicNTKScalingRotaryEmbedding (line 91) | class LlamaDynamicNTKScalingRotaryEmbedding(LlamaRotaryEmbedding):
    method __init__ (line 94) | def __init__(self, dim, max_position_embeddings=2048, base=10000, devi...
    method _set_cos_sin_cache (line 98) | def _set_cos_sin_cache(self, seq_len, device, dtype):
  function rotate_half (line 116) | def rotate_half(x):
  function apply_rotary_pos_emb (line 123) | def apply_rotary_pos_emb(q, k, cos, sin, position_ids):
  function repeat_kv (line 131) | def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor:
  class ParallelLlamaAttention (line 143) | class ParallelLlamaAttention(nn.Module):
    method __init__ (line 146) | def __init__(self, config: LlamaConfig, megatron_config: ModelParallel...
    method _init_rope (line 204) | def _init_rope(self):
    method _shape (line 231) | def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int):
    method forward (line 234) | def forward(
  function apply_rotary_pos_emb_rmpad (line 299) | def apply_rotary_pos_emb_rmpad(q, k, cos, sin, position_ids, indices, se...
  function apply_rotary_pos_emb_rmpad_flash (line 320) | def apply_rotary_pos_emb_rmpad_flash(q, k, cos, sin, cu_seqlens, max_seq...
  class ParallelLlamaAttentionRmPad (line 338) | class ParallelLlamaAttentionRmPad(ParallelLlamaAttention):
    method forward (line 340) | def forward(self,

FILE: verl/models/llama/megatron/layers/parallel_decoder.py
  class ParallelLlamaDecoderLayer (line 33) | class ParallelLlamaDecoderLayer(nn.Module):
    method __init__ (line 35) | def __init__(self, config: LlamaConfig, megatron_config: ModelParallel...
    method forward (line 44) | def forward(
  class ParallelLlamaDecoderLayerRmPad (line 99) | class ParallelLlamaDecoderLayerRmPad(nn.Module):
    method __init__ (line 101) | def __init__(self, config: LlamaConfig, megatron_config: ModelParallel...
    method forward (line 112) | def forward(

FILE: verl/models/llama/megatron/layers/parallel_linear.py
  class QKVParallelLinear (line 21) | class QKVParallelLinear(tensor_parallel.ColumnParallelLinear):
    method __init__ (line 23) | def __init__(self,
  class MergedColumnParallelLinear (line 52) | class MergedColumnParallelLinear(tensor_parallel.ColumnParallelLinear):
    method __init__ (line 54) | def __init__(self,

FILE: verl/models/llama/megatron/layers/parallel_mlp.py
  class ParallelLlamaMLP (line 31) | class ParallelLlamaMLP(nn.Module):
    method __init__ (line 33) | def __init__(self, config, megatron_config: ModelParallelConfig = None...
    method forward (line 71) | def forward(self, x):

FILE: verl/models/llama/megatron/layers/parallel_rmsnorm.py
  class ParallelLlamaRMSNorm (line 25) | class ParallelLlamaRMSNorm(nn.Module):
    method __init__ (line 27) | def __init__(self, config: LlamaConfig, megatron_config: ModelParallel...
    method forward (line 41) | def forward(self, hidden_states):

FILE: verl/models/llama/megatron/modeling_llama_megatron.py
  function _make_causal_mask (line 45) | def _make_causal_mask(input_ids_shape: torch.Size, dtype: torch.dtype, d...
  function _expand_mask (line 58) | def _expand_mask(mask: torch.Tensor, dtype: torch.dtype, tgt_len: Option...
  class ParallelLlamaModel (line 72) | class ParallelLlamaModel(nn.Module):
    method __init__ (line 80) | def __init__(self, config: LlamaConfig, megatron_config: ModelParallel...
    method _prepare_decoder_attention_mask (line 97) | def _prepare_decoder_attention_mask(self, attention_mask, input_shape,...
    method forward (line 117) | def forward(
  class ParallelLlamaForCausalLM (line 155) | class ParallelLlamaForCausalLM(nn.Module):
    method __init__ (line 157) | def __init__(self, config: LlamaConfig, megatron_config: ModelParallel...
    method forward (line 174) | def forward(
  class ParallelLlamaModelRmPad (line 215) | class ParallelLlamaModelRmPad(nn.Module):
    method __init__ (line 223) | def __init__(self, config: LlamaConfig, megatron_config: ModelParallel...
    method forward (line 240) | def forward(self,
  class ParallelLlamaForCausalLMRmPad (line 279) | class ParallelLlamaForCausalLMRmPad(nn.Module):
    method __init__ (line 281) | def __init__(self, config: LlamaConfig, megatron_config: ModelParallel...
    method _init_head (line 289) | def _init_head(self):
    method _forward_head (line 301) | def _forward_head(self, hidden_states):
    method forward (line 308) | def forward(
  class ParallelLlamaForValueRmPad (line 366) | class ParallelLlamaForValueRmPad(ParallelLlamaForCausalLMRmPad):
    method _init_head (line 368) | def _init_head(self):
    method _forward_head (line 377) | def _forward_head(self, hidden_states):
    method forward (line 384) | def forward(
  class ParallelLlamaModelRmPadPP (line 400) | class ParallelLlamaModelRmPadPP(nn.Module):
    method __init__ (line 410) | def __init__(self, config: LlamaConfig, megatron_config: ModelParallel...
    method set_input_tensor (line 457) | def set_input_tensor(self, input_tensor):
    method forward (line 467) | def forward(self,
  class ParallelLlamaForCausalLMRmPadPP (line 514) | class ParallelLlamaForCausalLMRmPadPP(nn.Module):
    method __init__ (line 516) | def __init__(self, config: LlamaConfig, megatron_config: ModelParallel...
    method set_input_tensor (line 531) | def set_input_tensor(self, input_tensor):
    method _init_head (line 542) | def _init_head(self):
    method _forward_head (line 554) | def _forward_head(self, hidden_states):
    method forward (line 562) | def forward(
  class ParallelLlamaForValueRmPadPP (line 626) | class ParallelLlamaForValueRmPadPP(ParallelLlamaForCausalLMRmPadPP):
    method _init_head (line 628) | def _init_head(self):
    method _forward_head (line 637) | def _forward_head(self, hidden_states):
    method forward (line 644) | def forward(

FILE: verl/models/registry.py
  function check_model_support_rmpad (line 27) | def check_model_support_rmpad(model_type: str):
  class ModelRegistry (line 46) | class ModelRegistry:
    method load_model_cls (line 49) | def load_model_cls(model_arch: str, value=False) -> Optional[Type[nn.M...
    method get_supported_archs (line 65) | def get_supported_archs() -> List[str]:

FILE: verl/models/transformers/llama.py
  function llama_flash_attn_forward (line 27) | def llama_flash_attn_forward(

FILE: verl/models/transformers/monkey_patch.py
  function apply_monkey_patch_to_llama (line 22) | def apply_monkey_patch_to_llama():
  function apply_monkey_patch_to_qwen2 (line 28) | def apply_monkey_patch_to_qwen2():
  function apply_monkey_patch (line 42) | def apply_monkey_patch(config: PretrainedConfig, verbose=True):
  function is_transformers_version_in_range (line 66) | def is_transformers_version_in_range(min_version: str, max_version: str)...

FILE: verl/models/transformers/qwen2.py
  function qwen2_flash_attn_forward (line 28) | def qwen2_flash_attn_forward(

FILE: verl/models/weight_loader_registry.py
  function get_weight_loader (line 16) | def get_weight_loader(arch: str):

FILE: verl/protocol.py
  function pad_dataproto_to_divisor (line 40) | def pad_dataproto_to_divisor(data: 'DataProto', size_divisor: int):
  function unpad_dataproto (line 60) | def unpad_dataproto(data: 'DataProto', pad_size):
  function union_tensor_dict (line 66) | def union_tensor_dict(tensor_dict1: TensorDict, tensor_dict2: TensorDict...
  function union_numpy_dict (line 80) | def union_numpy_dict(tensor_dict1: dict[np.ndarray], tensor_dict2: dict[...
  function list_of_dict_to_dict_of_list (line 92) | def list_of_dict_to_dict_of_list(list_of_dict: list[dict]):
  function fold_batch_dim (line 104) | def fold_batch_dim(data: 'DataProto', new_batch_size):
  function unfold_batch_dim (line 124) | def unfold_batch_dim(data: 'DataProto', batch_dims=2):
  function collate_fn (line 143) | def collate_fn(x: list['DataProtoItem']):
  class DataProtoItem (line 157) | class DataProtoItem:
  class DataProto (line 165) | class DataProto:
    method __post_init__ (line 176) | def __post_init__(self):
    method __len__ (line 180) | def __len__(self):
    method __getitem__ (line 189) | def __getitem__(self, item):
    method __getstate__ (line 194) | def __getstate__(self):
    method __setstate__ (line 204) | def __setstate__(self, data):
    method save_to_disk (line 215) | def save_to_disk(self, filepath):
    method load_from_disk (line 220) | def load_from_disk(filepath) -> 'DataProto':
    method print_size (line 225) | def print_size(self, prefix=""):
    method check_consistency (line 242) | def check_consistency(self):
    method from_single_dict (line 266) | def from_single_dict(cls, data: Dict[str, Union[torch.Tensor, np.ndarr...
    method from_dict (line 281) | def from_dict(cls, tensors: Dict[str, torch.Tensor], non_tensors=None,...
    method to (line 316) | def to(self, device) -> 'DataProto':
    method select (line 330) | def select(self, batch_keys=None, non_tensor_batch_keys=None, meta_inf...
    method pop (line 365) | def pop(self, batch_keys=None, non_tensor_batch_keys=None, meta_info_k...
    method rename (line 397) | def rename(self, old_keys=None, new_keys=None) -> 'DataProto':
    method union (line 423) | def union(self, other: 'DataProto') -> 'DataProto':
    method make_iterator (line 441) | def make_iterator(self, mini_batch_size, epochs, seed=None, dataloader...
    method chunk (line 482) | def chunk(self, chunks: int) -> List['DataProto']:
    method concat (line 515) | def concat(data: List['DataProto']) -> 'DataProto':
    method reorder (line 539) | def reorder(self, indices):
    method repeat (line 547) | def repeat(self, repeat_times=2, interleave=True):
  class DataProtoFuture (line 596) | class DataProtoFuture:
    method concat (line 613) | def concat(data: List[ray.ObjectRef]) -> 'DataProtoFuture':
    method chunk (line 617) | def chunk(self, chunks: int) -> List['DataProtoFuture']:
    method get (line 632) | def get(self):

FILE: verl/single_controller/base/decorator.py
  class Dispatch (line 25) | class Dispatch(Enum):
  class Execute (line 40) | class Execute(Enum):
  function _split_args_kwargs_data_proto (line 45) | def _split_args_kwargs_data_proto(chunks, *args, **kwargs):
  function dispatch_one_to_all (line 60) | def dispatch_one_to_all(worker_group, *args, **kwargs):
  function dispatch_all_to_all (line 66) | def dispatch_all_to_all(worker_group, *args, **kwargs):
  function collect_all_to_all (line 70) | def collect_all_to_all(worker_group, output):
  function dispatch_megatron_compute (line 74) | def dispatch_megatron_compute(worker_group, *args, **kwargs):
  function collect_megatron_compute (line 103) | def collect_megatron_compute(worker_group, output):
  function dispatch_megatron_compute_data_proto (line 118) | def dispatch_megatron_compute_data_proto(worker_group, *args, **kwargs):
  function _concat_data_proto_or_future (line 129) | def _concat_data_proto_or_future(output: List):
  function collect_megatron_compute_data_proto (line 147) | def collect_megatron_compute_data_proto(worker_group, output):
  function dispatch_megatron_pp_as_dp (line 161) | def dispatch_megatron_pp_as_dp(worker_group, *args, **kwargs):
  function collect_megatron_pp_as_dp (line 209) | def collect_megatron_pp_as_dp(worker_group, output):
  function collect_megatron_pp_only (line 223) | def collect_megatron_pp_only(worker_group, output):
  function dispatch_megatron_pp_as_dp_data_proto (line 237) | def dispatch_megatron_pp_as_dp_data_proto(worker_group, *args, **kwargs):
  function collect_megatron_pp_as_dp_data_proto (line 246) | def collect_megatron_pp_as_dp_data_proto(worker_group, output):
  function dispatch_dp_compute (line 255) | def dispatch_dp_compute(worker_group, *args, **kwargs):
  function collect_dp_compute (line 265) | def collect_dp_compute(worker_group, output):
  function dispatch_dp_compute_data_proto (line 272) | def dispatch_dp_compute_data_proto(worker_group, *args, **kwargs):
  function dispatch_dp_compute_data_proto_with_func (line 279) | def dispatch_dp_compute_data_proto_with_func(worker_group, *args, **kwar...
  function collect_dp_compute_data_proto (line 289) | def collect_dp_compute_data_proto(worker_group, output):
  function get_predefined_dispatch_fn (line 300) | def get_predefined_dispatch_fn(dispatch_mode):
  function get_predefined_execute_fn (line 350) | def get_predefined_execute_fn(execute_mode):
  function _check_dispatch_mode (line 366) | def _check_dispatch_mode(dispatch_mode):
  function _check_execute_mode (line 375) | def _check_execute_mode(execute_mode):
  function _materialize_futures (line 379) | def _materialize_futures(*args, **kwargs):
  function register (line 394) | def register(dispatch_mode=Dispatch.ALL_TO_ALL, execute_mode=Execute.ALL...

FILE: verl/single_controller/base/megatron/worker.py
  class MegatronWorker (line 20) | class MegatronWorker(Worker):
    method __init__ (line 22) | def __init__(self, cuda_visible_devices=None) -> None:
    method get_megatron_global_info (line 25) | def get_megatron_global_info(self):
    method get_megatron_rank_info (line 33) | def get_megatron_rank_info(self):

FILE: verl/single_controller/base/megatron/worker_group.py
  class MegatronWorkerGroup (line 21) | class MegatronWorkerGroup(WorkerGroup):
    method __init__ (line 23) | def __init__(self, resource_pool: ResourcePool, **kwargs):
    method init_megatron (line 28) | def init_megatron(self, default_megatron_kwargs: Dict = None):
    method get_megatron_rank_info (line 31) | def get_megatron_rank_info(self, rank: int) -> DistRankInfo:
    method tp_size (line 36) | def tp_size(self):
    method dp_size (line 41) | def dp_size(self):
    method pp_size (line 46) | def pp_size(self):
    method get_megatron_global_info (line 50) | def get_megatron_global_info(self):

FILE: verl/single_controller/base/register_center/ray.py
  class WorkerGroupRegisterCenter (line 19) | class WorkerGroupRegisterCenter:
    method __init__ (line 21) | def __init__(self, rank_zero_info):
    method get_rank_zero_info (line 24) | def get_rank_zero_info(self):
  function create_worker_group_register_center (line 28) | def create_worker_group_register_center(name, info):

FILE: verl/single_controller/base/worker.py
  class DistRankInfo (line 24) | class DistRankInfo:
  class DistGlobalInfo (line 31) | class DistGlobalInfo:
  class WorkerHelper (line 37) | class WorkerHelper:
    method _get_node_ip (line 39) | def _get_node_ip(self):
    method _get_free_port (line 58) | def _get_free_port(self):
    method get_availale_master_addr_port (line 63) | def get_availale_master_addr_port(self):
    method _get_pid (line 66) | def _get_pid(self):
  class WorkerMeta (line 70) | class WorkerMeta:
    method __init__ (line 75) | def __init__(self, store) -> None:
    method to_dict (line 78) | def to_dict(self):
  class Worker (line 83) | class Worker(WorkerHelper):
    method __new__ (line 85) | def __new__(cls, *args, **kwargs):
    method _configure_before_init (line 102) | def _configure_before_init(self, register_center_name: str, rank: int):
    method __init__ (line 119) | def __init__(self, cuda_visible_devices=None) -> None:
    method _configure_with_meta (line 147) | def _configure_with_meta(self, meta: WorkerMeta):
    method get_master_addr_port (line 162) | def get_master_addr_port(self):
    method get_cuda_visible_devices (line 165) | def get_cuda_visible_devices(self):
    method world_size (line 171) | def world_size(self):
    method rank (line 175) | def rank(self):
    method execute_with_func_generator (line 179) | def execute_with_func_generator(self, func, *args, **kwargs):
    method execute_func_rank_zero (line 184) | def execute_func_rank_zero(self, func, *args, **kwargs):

FILE: verl/single_controller/base/worker_group.py
  class ResourcePool (line 26) | class ResourcePool:
    method __init__ (line 28) | def __init__(self, process_on_nodes=None, max_collocate_count: int = 1...
    method add_node (line 35) | def add_node(self, process_count):
    method world_size (line 39) | def world_size(self):
    method __call__ (line 42) | def __call__(self) -> Any:
    method store (line 46) | def store(self):
    method local_world_size_list (line 49) | def local_world_size_list(self) -> List[int]:
    method local_rank_list (line 55) | def local_rank_list(self) -> List[int]:
  class ClassWithInitArgs (line 60) | class ClassWithInitArgs:
    method __init__ (line 66) | def __init__(self, cls, *args, **kwargs) -> None:
    method __call__ (line 77) | def __call__(self) -> Any:
  function check_workers_alive (line 81) | def check_workers_alive(workers: List, is_alive: Callable, gap_time: flo...
  class WorkerGroup (line 91) | class WorkerGroup:
    method __init__ (line 93) | def __init__(self, resource_pool: ResourcePool, **kwargs) -> None:
    method _is_worker_alive (line 110) | def _is_worker_alive(self, worker):
    method _block_until_all_workers_alive (line 113) | def _block_until_all_workers_alive(self) -> None:
    method start_worker_aliveness_check (line 121) | def start_worker_aliveness_check(self, every_n_seconds=1) -> None:
    method world_size (line 130) | def world_size(self):
    method _bind_worker_method (line 136) | def _bind_worker_method(self, user_defined_cls, func_generator):

FILE: verl/single_controller/ray/base.py
  function get_random_string (line 29) | def get_random_string(length: int) -> str:
  function func_generator (line 36) | def func_generator(self, method_name, dispatch_fn, collect_fn, execute_f...
  class RayResourcePool (line 49) | class RayResourcePool(ResourcePool):
    method __init__ (line 51) | def __init__(self,
    method get_placement_groups (line 64) | def get_placement_groups(self, strategy="STRICT_PACK", name=None):
  function extract_pg_from_exist (line 91) | def extract_pg_from_exist(resource_pools: Dict[str, RayResourcePool], sr...
  function merge_resource_pool (line 114) | def merge_resource_pool(rp1: RayResourcePool, rp2: RayResourcePool) -> R...
  class RayClassWithInitArgs (line 128) | class RayClassWithInitArgs(ClassWithInitArgs):
    method __init__ (line 130) | def __init__(self, cls, *args, **kwargs) -> None:
    method set_additional_resource (line 136) | def set_additional_resource(self, additional_resource):
    method update_options (line 139) | def update_options(self, options: Dict):
    method __call__ (line 142) | def __call__(self,
  class RayWorkerGroup (line 176) | class RayWorkerGroup(WorkerGroup):
    method __init__ (line 178) | def __init__(self,
    method _is_worker_alive (line 205) | def _is_worker_alive(self, worker: ray.actor.ActorHandle):
    method _init_with_detached_workers (line 209) | def _init_with_detached_workers(self, worker_names):
    method _init_with_resource_pool (line 214) | def _init_with_resource_pool(self, resource_pool, ray_cls_with_init, b...
    method worker_names (line 281) | def worker_names(self):
    method from_detached (line 285) | def from_detached(cls, worker_names=None, ray_cls_with_init=None):
    method spawn (line 292) | def spawn(self, prefix_set):
    method execute_rank_zero_sync (line 319) | def execute_rank_zero_sync(self, method_name: str, *args, **kwargs):
    method execute_rank_zero_async (line 322) | def execute_rank_zero_async(self, method_name: str, *args, **kwargs):
    method execute_rank_zero (line 326) | def execute_rank_zero(self, method_name: str, *args, **kwargs):
    method execute_all (line 329) | def execute_all(self, method_name: str, *args, **kwargs):
    method execute_all_sync (line 332) | def execute_all_sync(self, method_name: str, *args, **kwargs):
    method execute_all_async (line 335) | def execute_all_async(self, method_name: str, *args, **kwargs):
    method master_address (line 354) | def master_address(self):
    method master_port (line 358) | def master_port(self):
    method workers (line 362) | def workers(self):
    method world_size (line 366) | def world_size(self):
  function _bind_workers_method_to_parent (line 380) | def _bind_workers_method_to_parent(cls, key, user_defined_cls):
  function _unwrap_ray_remote (line 414) | def _unwrap_ray_remote(cls):
  function create_colocated_worker_cls (line 420) | def create_colocated_worker_cls(class_dict: dict[str, RayClassWithInitAr...

FILE: verl/single_controller/ray/megatron.py
  class NVMegatronRayWorkerGroup (line 25) | class NVMegatronRayWorkerGroup(RayWorkerGroup, MegatronWorkerGroup):
    method __init__ (line 31) | def __init__(self, resource_pool: RayResourcePool, ray_cls_with_init: ...
  class MegatronRayWorkerGroup (line 38) | class MegatronRayWorkerGroup(RayWorkerGroup, MegatronWorkerGroup):
    method __init__ (line 44) | def __init__(self,
    method init_megatron (line 58) | def init_megatron(self, default_megatron_kwargs: Optional[Dict] = None):

FILE: verl/third_party/vllm/__init__.py
  function get_version (line 18) | def get_version(pkg):

FILE: verl/third_party/vllm/vllm_v_0_3_1/arg_utils.py
  class EngineArgs (line 27) | class EngineArgs:
    method add_cli_args (line 61) | def add_cli_args(parser: argparse.ArgumentParser) -> argparse.Argument...
    method from_cli_args (line 178) | def from_cli_args(cls, args: argparse.Namespace) -> 'EngineArgs':
    method create_engine_configs (line 185) | def create_engine_configs(
  class AsyncEngineArgs (line 208) | class AsyncEngineArgs(EngineArgs):
    method add_cli_args (line 215) | def add_cli_args(parser: argparse.ArgumentParser) -> argparse.Argument...

FILE: verl/third_party/vllm/vllm_v_0_3_1/config.py
  class ModelConfig (line 31) | class ModelConfig:
    method __init__ (line 75) | def __init__(
    method _verify_load_format (line 109) | def _verify_load_format(self) -> None:
    method _verify_quantization (line 124) | def _verify_quantization(self) -> None:
    method _verify_cuda_graph (line 153) | def _verify_cuda_graph(self) -> None:
    method verify_with_parallel_config (line 163) | def verify_with_parallel_config(
    method get_sliding_window (line 181) | def get_sliding_window(self) -> Optional[int]:
    method get_vocab_size (line 184) | def get_vocab_size(self) -> int:
    method get_hidden_size (line 187) | def get_hidden_size(self) -> int:
    method get_head_size (line 190) | def get_head_size(self) -> int:
    method get_total_num_kv_heads (line 194) | def get_total_num_kv_heads(self) -> int:
    method get_num_kv_heads (line 226) | def get_num_kv_heads(self, parallel_config: "ParallelConfig") -> int:
    method get_num_layers (line 235) | def get_num_layers(self, parallel_config: "ParallelConfig") -> int:
  class CacheConfig (line 240) | class CacheConfig:
    method __init__ (line 251) | def __init__(
    method _verify_args (line 271) | def _verify_args(self) -> None:
    method _verify_cache_dtype (line 276) | def _verify_cache_dtype(self) -> None:
    method verify_with_parallel_config (line 294) | def verify_with_parallel_config(
  class ParallelConfig (line 313) | class ParallelConfig:
    method __init__ (line 329) | def __init__(
    method _verify_args (line 348) | def _verify_args(self) -> None:
  class SchedulerConfig (line 370) | class SchedulerConfig:
    method __init__ (line 383) | def __init__(
    method _verify_args (line 401) | def _verify_args(self) -> None:
  class DeviceConfig (line 415) | class DeviceConfig:
    method __init__ (line 417) | def __init__(self, device: str = "cuda") -> None:
  class LoRAConfig (line 422) | class LoRAConfig:
    method __post_init__ (line 431) | def __post_init__(self):
    method verify_with_model_config (line 449) | def verify_with_model_config(self, model_config: ModelConfig):
    method verify_with_scheduler_config (line 457) | def verify_with_scheduler_config(self, scheduler_config: SchedulerConf...
  function _get_and_verify_dtype (line 475) | def _get_and_verify_dtype(
  function _get_and_verify_max_len (line 525) | def _get_and_verify_max_len(

FILE: verl/third_party/vllm/vllm_v_0_3_1/llm.py
  class LLM (line 33) | class LLM:
    method __init__ (line 87) | def __init__(
    method init_cache_engine (line 133) | def init_cache_engine(self):
    method free_cache_engine (line 136) | def free_cache_engine(self):
    method get_tokenizer (line 139) | def get_tokenizer(self) -> Union[PreTrainedTokenizer, PreTrainedTokeni...
    method set_tokenizer (line 142) | def set_tokenizer(
    method generate (line 148) | def generate(
    method _add_request (line 201) | def _add_request(
    method _run_engine (line 217) | def _run_engine(self, use_tqdm: bool) -> List[RequestOutput]:
    method _pre_process_inputs (line 242) | def _pre_process_inputs(self, prompt_token_ids: torch.Tensor) -> List[...
    method _post_process_outputs (line 250) | def _post_process_outputs(self, outputs: List[RequestOutput]) -> Tuple...
    method sync_model_weights (line 271) | def sync_model_weights(self, actor_weights: Dict[str, torch.Tensor]) -...
    method offload_model_weights (line 274) | def offload_model_weights(self) -> None:

FILE: verl/third_party/vllm/vllm_v_0_3_1/llm_engine_sp.py
  class LLMEngine (line 41) | class LLMEngine:
    method __init__ (line 70) | def __init__(
    method _init_tokenizer (line 138) | def _init_tokenizer(self, tokenizer, **tokenizer_init_kwargs):
    method get_tokenizer_for_seq (line 146) | def get_tokenizer_for_seq(self, sequence: Sequence):
    method _init_workers_sp (line 149) | def _init_workers_sp(self, model, distributed_init_method: str):
    method _verify_args (line 172) | def _verify_args(self) -> None:
    method _init_cache_sp (line 176) | def _init_cache_sp(self) -> None:
    method init_cache_engine (line 215) | def init_cache_engine(self):
    method free_cache_engine (line 218) | def free_cache_engine(self):
    method from_engine_args (line 222) | def from_engine_args(cls, model, tokenizer, engine_args: EngineArgs) -...
    method add_request (line 238) | def add_request(
    method abort_request (line 317) | def abort_request(self, request_id: Union[str, Iterable[str]]) -> None:
    method get_model_config (line 336) | def get_model_config(self) -> ModelConfig:
    method get_num_unfinished_requests (line 340) | def get_num_unfinished_requests(self) -> int:
    method has_unfinished_requests (line 344) | def has_unfinished_requests(self) -> bool:
    method _check_beam_search_early_stopping (line 348) | def _check_beam_search_early_stopping(
    method _process_sequence_group_outputs (line 385) | def _process_sequence_group_outputs(self, seq_group: SequenceGroup, ou...
    method _process_model_outputs (line 545) | def _process_model_outputs(self, output: SamplerOutput, scheduler_outp...
    method step (line 574) | def step(self) -> List[RequestOutput]:
    method do_log_stats (line 595) | def do_log_stats(self) -> None:
    method _get_stats (line 600) | def _get_stats(self, scheduler_outputs: Optional[SchedulerOutputs]) ->...
    method _decode_sequence (line 662) | def _decode_sequence(self, seq: Sequence, prms: SamplingParams) -> None:
    method _check_stop (line 681) | def _check_stop(self, seq: Sequence, sampling_params: SamplingParams) ...
    method _finalize_sequence (line 710) | def _finalize_sequence(self, seq: Sequence, sampling_params: SamplingP...
    method add_lora (line 716) | def add_lora(self, lora_request: LoRARequest) -> bool:
    method remove_lora (line 720) | def remove_lora(self, lora_id: int) -> bool:
    method list_loras (line 724) | def list_loras(self) -> List[int]:
    method sync_model_weights (line 727) | def sync_model_weights(self, actor_weights: Dict[str, torch.Tensor]) -...
    method offload_model_weights (line 730) | def offload_model_weights(self) -> None:
  function initialize_cluster (line 734) | def initialize_cluster(
  function get_open_port (line 762) | def get_open_port():

FILE: verl/third_party/vllm/vllm_v_0_3_1/model_loader.py
  function _set_default_torch_dtype (line 38) | def _set_default_torch_dtype(dtype: torch.dtype):
  function _get_model_architecture (line 46) | def _get_model_architecture(config: PretrainedConfig) -> Type[nn.Module]:
  function vocab_init (line 87) | def vocab_init(self,
  function _get_model_weight_loader (line 123) | def _get_model_weight_loader(arch: str):
  function get_model (line 130) | def get_model(actor_model: Union[PreTrainedModel, Dict],
  function load_weights (line 181) | def load_weights(actor_weights: Dict, vllm_model: nn.Module):
  function _get_logits (line 193) | def _get_logits(self, hidden_states: torch.Tensor, embedding: torch.Tensor,
  function forward (line 206) | def forward(

FILE: verl/third_party/vllm/vllm_v_0_3_1/model_runner.py
  class ModelRunner (line 46) | class ModelRunner(ModelRunner):
    method __init__ (line 48) | def __init__(
    method load_model (line 90) | def load_model(self) -> None:
    method _prepare_sample (line 109) | def _prepare_sample(
    method prepare_input_tensors (line 174) | def prepare_input_tensors(
    method execute_model (line 203) | def execute_model(
    method profile_run (line 235) | def profile_run(self) -> None:

FILE: verl/third_party/vllm/vllm_v_0_3_1/parallel_state.py
  function initialize_model_parallel_from_megatron (line 26) | def initialize_model_parallel_from_megatron(
  function get_tensor_model_parallel_group (line 108) | def get_tensor_model_parallel_group():
  function get_tensor_model_parallel_world_size (line 114) | def get_tensor_model_parallel_world_size():
  function get_tensor_model_parallel_rank (line 119) | def get_tensor_model_parallel_rank():
  function get_tensor_model_parallel_src_rank (line 124) | def get_tensor_model_parallel_src_rank():
  function get_micro_data_parallel_group (line 137) | def get_micro_data_parallel_group():
  function get_micro_data_parallel_world_size (line 142) | def get_micro_data_parallel_world_size():
  function get_micro_data_parallel_rank (line 146) | def get_micro_data_parallel_rank():

FILE: verl/third_party/vllm/vllm_v_0_3_1/tokenizer.py
  class TokenizerGroup (line 25) | class TokenizerGroup:
    method __init__ (line 28) | def __init__(self, tokenizer: PreTrainedTokenizer, enable_lora: bool, ...
    method encode (line 38) | def encode(self,
    method encode_async (line 45) | async def encode_async(self,
    method get_lora_tokenizer (line 52) | def get_lora_tokenizer(self, lora_request: Optional[LoRARequest]) -> "...
    method pad_token_id (line 67) | def pad_token_id(self):
    method eos_token_id (line 71) | def eos_token_id(self):

FILE: verl/third_party/vllm/vllm_v_0_3_1/weight_loaders.py
  function parallel_weight_loader (line 22) | def parallel_weight_loader(self, param: torch.Tensor, loaded_weight: tor...
  function default_weight_loader (line 32) | def default_weight_loader(param: torch.Tensor, loaded_weight: torch.Tens...
  function gpt2_weight_loader (line 40) | def gpt2_weight_loader(actor_weights: Dict, vllm_model: nn.Module) -> nn...
  function llama_weight_loader (line 68) | def llama_weight_loader(actor_weights: Dict, vllm_model: nn.Module) -> n...
  function mistral_weight_loader (line 83) | def mistral_weight_loader(actor_weights: Dict, vllm_model: nn.Module) ->...

FILE: verl/third_party/vllm/vllm_v_0_3_1/worker.py
  class Worker (line 39) | class Worker:
    method __init__ (line 47) | def __init__(
    method init_model (line 89) | def init_model(self, cupy_port: Optional[int] = None):
    method load_model (line 117) | def load_model(self):
    method profile_num_available_blocks (line 121) | def profile_num_available_blocks(
    method init_cache_engine (line 168) | def init_cache_engine(self, cache_config: CacheConfig) -> None:
    method free_cache_engine (line 176) | def free_cache_engine(self):
    method warm_up_model (line 181) | def warm_up_model(self) -> None:
    method cache_swap (line 188) | def cache_swap(
    method execute_model (line 215) | def execute_model(
    method sync_model_weights (line 247) | def sync_model_weights(self, actor_weights: Dict):
    method offload_model_weights (line 250) | def offload_model_weights(self) -> None:
    method add_lora (line 260) | def add_lora(self, lora_request: LoRARequest) -> bool:
    method remove_lora (line 263) | def remove_lora(self, lora_id: int) -> bool:
    method list_loras (line 266) | def list_loras(self) -> Set[int]:
  function _init_distributed_environment (line 270) | def _init_distributed_environment(
  function _pad_to_alignment (line 298) | def _pad_to_alignment(x: List[int], multiple_of: int, pad: int) -> List[...
  function _pad_to_max (line 302) | def _pad_to_max(x: List[int], max_len: int, pad: int) -> List[int]:
  function _check_if_gpu_supports_dtype (line 306) | def _check_if_gpu_supports_dtype(torch_dtype: torch.dtype):

FILE: verl/third_party/vllm/vllm_v_0_4_2/arg_utils.py
  function nullable_str (line 33) | def nullable_str(val: str):
  class EngineArgs (line 40) | class EngineArgs:
    method add_cli_args (line 106) | def add_cli_args(parser: argparse.ArgumentParser) -> argparse.Argument...
    method from_cli_args (line 223) | def from_cli_args(cls, args: argparse.Namespace) -> 'EngineArgs':
    method create_engine_config (line 230) | def create_engine_config(

FILE: verl/third_party/vllm/vllm_v_0_4_2/config.py
  class ModelConfig (line 37) | class ModelConfig(ModelConfig):
    method __init__ (line 98) | def __init__(
  class LoadFormat (line 147) | class LoadFormat(str, enum.Enum):
  class LoadConfig (line 158) | class LoadConfig:
    method __post_init__ (line 180) | def __post_init__(self):
    method _verify_load_format (line 186) | def _verify_load_format(self) -> None:

FILE: verl/third_party/vllm/vllm_v_0_4_2/dtensor_weight_loaders.py
  function gemma_dtensor_weight_loader (line 26) | def gemma_dtensor_weight_loader(actor_weights: Dict, vllm_model: nn.Modu...
  function gptbigcode_dtensor_load_weights (line 74) | def gptbigcode_dtensor_load_weights(actor_weights: Dict, vllm_model: nn....
  function starcoder2_dtensor_load_weights (line 89) | def starcoder2_dtensor_load_weights(actor_weights: Dict, vllm_model: nn....
  function llama_dtensor_weight_loader (line 120) | def llama_dtensor_weight_loader(actor_weights: Dict, vllm_model: nn.Modu...
  function qwen2_dtensor_weight_loader (line 164) | def qwen2_dtensor_weight_loader(actor_weights: Dict, vllm_model: nn.Modu...
  function gpt2_dtensor_weight_loader (line 201) | def gpt2_dtensor_weight_loader(actor_weights: Dict, vllm_model: nn.Modul...
  function redistribute_dtensor (line 205) | def redistribute_dtensor(param_name: str, loaded_weights: DTensor, paral...
  function _process_parameter_names (line 218) | def _process_parameter_names(name):
  function load_dtensor_weights (line 252) | def load_dtensor_weights(actor_weights: Dict, vllm_model: nn.Module):
  function _get_model_weight_loader (line 260) | def _get_model_weight_loader(arch: str):
  function update_dtensor_weight_loader (line 268) | def update_dtensor_weight_loader():

FILE: verl/third_party/vllm/vllm_v_0_4_2/hf_weight_loader.py
  function update_hf_weight_loader (line 25) | def update_hf_weight_loader():
  function gemma_load_weights (line 30) | def gemma_load_weights(self, weights: Iterable[Tuple[str, torch.Tensor]]):
  function load_hf_weights (line 79) | def load_hf_weights(actor_weights: Dict, vllm_model: nn.Module):

FILE: verl/third_party/vllm/vllm_v_0_4_2/llm.py
  class LLM (line 35) | class LLM:
    method __init__ (line 89) | def __init__(
    method init_cache_engine (line 137) | def init_cache_engine(self):
    method free_cache_engine (line 140) | def free_cache_engine(self):
    method get_tokenizer (line 143) | def get_tokenizer(self) -> Union[PreTrainedTokenizer, PreTrainedTokeni...
    method set_tokenizer (line 146) | def set_tokenizer(
    method generate (line 152) | def generate(
    method _add_request (line 232) | def _add_request(
    method _run_engine (line 248) | def _run_engine(self, use_tqdm: bool) -> List[RequestOutput]:
    method _pre_process_inputs (line 273) | def _pre_process_inputs(self, prompt_token_ids: torch.Tensor) -> List[...
    method _post_process_outputs (line 281) | def _post_process_outputs(self, request_outputs: List[RequestOutput]) ...
    method sync_model_weights (line 302) | def sync_model_weights(self, actor_weights: Dict[str, torch.Tensor], l...
    method offload_model_weights (line 305) | def offload_model_weights(self) -> None:

FILE: verl/third_party/vllm/vllm_v_0_4_2/llm_engine_sp.py
  class LLMEngine (line 43) | class LLMEngine(LLMEngine):
    method __init__ (line 74) | def __init__(
    method _init_tokenizer (line 229) | def _init_tokenizer(self, tokenizer, **tokenizer_init_kwargs):
    method init_cache_engine (line 236) | def init_cache_engine(self):
    method free_cache_engine (line 241) | def free_cache_engine(self):
    method from_engine_args (line 247) | def from_engine_args(
    method sync_model_weights (line 279) | def sync_model_weights(self, actor_weights: Dict[str, torch.Tensor], l...
    method offload_model_weights (line 282) | def offload_model_weights(self) -> None:

FILE: verl/third_party/vllm/vllm_v_0_4_2/megatron_weight_loaders.py
  function parallel_weight_loader (line 27) | def parallel_weight_loader(self, param: torch.Tensor, loaded_weight: tor...
  function default_weight_loader (line 37) | def default_weight_loader(param: torch.Tensor, loaded_weight: torch.Tens...
  function gpt2_weight_loader (line 45) | def gpt2_weight_loader(actor_weights: Dict, vllm_model: nn.Module) -> nn...
  function llama_megatron_weight_loader (line 73) | def llama_megatron_weight_loader(actor_weights: Dict, vllm_model: nn.Mod...
  function llama_megatron_core_te_weight_loader (line 85) | def llama_megatron_core_te_weight_loader(actor_weights: Dict, vllm_model...
  function llama_megatron_core_weight_loader (line 116) | def llama_megatron_core_weight_loader(actor_weights: Dict, vllm_model: n...
  function _replace_name (line 146) | def _replace_name(megatron_name, name_mapping):
  function llama_megatron_core_te_weight_loader (line 169) | def llama_megatron_core_te_weight_loader(actor_weights: Dict, vllm_model...
  function llama_megatron_core_weight_loader (line 200) | def llama_megatron_core_weight_loader(actor_weights: Dict, vllm_model: n...
  function _replace_name (line 230) | def _replace_name(megatron_name, name_mapping):
  function mistral_megatron_weight_loader (line 253) | def mistral_megatron_weight_loader(actor_weights: Dict, vllm_model: nn.M...
  function load_megatron_weights (line 290) | def load_megatron_weights(actor_weights: Dict, vllm_model: nn.Module):
  function _get_model_weight_loader (line 298) | def _get_model_weight_loader(arch: str):
  function update_megatron_weight_loader (line 305) | def update_megatron_weight_loader():
  function vocab_init (line 316) | def vocab_init(self,

FILE: verl/third_party/vllm/vllm_v_0_4_2/model_loader.py
  function get_model (line 34) | def get_model(actor_model: Union[PreTrainedModel, Dict], model_config: M...
  function get_model_loader (line 55) | def get_model_loader(load_config: LoadConfig) -> BaseModelLoader:
  class DummyModelLoader (line 94) | class DummyModelLoader(BaseModelLoader):
    method __init__ (line 97) | def __init__(self, load_config: LoadConfig):
    method load_model (line 103) | def load_model(self, *, model_config: ModelConfig, device_config: Devi...
  class MegatronLoader (line 115) | class MegatronLoader(BaseModelLoader):
    method __init__ (line 118) | def __init__(self, load_config: LoadConfig):
    method _get_weights_iterator (line 124) | def _get_weights_iterator(actor_model: Union[PreTrainedModel, Dict]):
    method load_model (line 133) | def load_model(self, actor_model: Union[PreTrainedModel,
  class HFLoader (line 161) | class HFLoader(BaseModelLoader):
    method __init__ (line 164) | def __init__(self, load_config: LoadConfig):
    method _get_weights_iterator (line 170) | def _get_weights_iterator(self, actor_model: Union[PreTrainedModel, Di...
    method load_model (line 178) | def load_model(self, actor_model: Union[PreTrainedModel,
  class DTensorLoader (line 200) | class DTensorLoader(BaseModelLoader):
    method __init__ (line 203) | def __init__(self, load_config: LoadConfig):
    method _get_weights_iterator (line 209) | def _get_weights_iterator(actor_model: Union[PreTrainedModel, Dict]):
    method load_model (line 218) | def load_model(self, actor_model: Union[PreTrainedModel,
  function _get_logits (line 250) | def _get_logits(self, hidden_states: torch.Tensor, embedding: torch.Tensor,

FILE: verl/third_party/vllm/vllm_v_0_4_2/model_runner.py
  class BatchType (line 39) | class BatchType(IntEnum):
  class ModelRunner (line 48) | class ModelRunner(ModelRunner):
    method __init__ (line 50) | def __init__(
    method load_model (line 105) | def load_model(self) -> None:
    method prepare_input_tensors (line 147) | def prepare_input_tensors(
    method execute_model (line 238) | def execute_model(

FILE: verl/third_party/vllm/vllm_v_0_4_2/parallel_state.py
  function initialize_parallel_state (line 35) | def initialize_parallel_state(
  function ensure_model_parallel_initialized (line 66) | def ensure_model_parallel_initialized(
  function model_parallel_is_initialized (line 92) | def model_parallel_is_initialized():
  function initialize_model_parallel_for_vllm (line 98) | def initialize_model_parallel_for_vllm(tensor_model_parallel_size: int,
  function initialize_model_parallel (line 172) | def initialize_model_parallel(
  function get_device_mesh (line 263) | def get_device_mesh():
  function get_tensor_model_parallel_group (line 273) | def get_tensor_model_parallel_group():
  function get_tensor_model_parallel_world_size (line 279) | def get_tensor_model_parallel_world_size():
  function get_tensor_model_parallel_rank (line 284) | def get_tensor_model_parallel_rank():
  function get_tensor_model_parallel_src_rank (line 289) | def get_tensor_model_parallel_src_rank():

FILE: verl/third_party/vllm/vllm_v_0_4_2/spmd_gpu_executor.py
  class SPMDGPUExecutor (line 33) | class SPMDGPUExecutor(ExecutorBase):
    method __init__ (line 36) | def __init__(
    method _init_executor (line 63) | def _init_executor(self, model, distributed_init_method) -> None:
    method _init_workers_sp (line 69) | def _init_workers_sp(self, model, distributed_init_method: str):
    method determine_num_available_blocks (line 97) | def determine_num_available_blocks(self) -> Tuple[int, int]:
    method initialize_cache (line 117) | def initialize_cache(self, num_gpu_blocks: int, num_cpu_blocks: int) -...
    method init_cache_engine (line 140) | def init_cache_engine(self) -> None:
    method free_cache_engine (line 143) | def free_cache_engine(self) -> None:
    method execute_model (line 146) | def execute_model(self, execute_model_req) -> List[SamplerOutput]:
    method add_lora (line 154) | def add_lora(self, lora_request: LoRARequest) -> bool:
    method remove_lora (line 158) | def remove_lora(self, lora_id: int) -> bool:
    method list_loras (line 162) | def list_loras(self) -> Set[int]:
    method check_health (line 165) | def check_health(self) -> None:
    method offload_model_weights (line 171) | def offload_model_weights(self) -> None:
    method sync_model_weights (line 174) | def sync_model_weights(self, actor_weights: Dict[str, torch.Tensor], l...
  function initialize_cluster (line 178) | def initialize_cluster(
  function get_open_port (line 202) | def get_open_port():
  class SPMDGPUExecutorAsync (line 209) | class SPMDGPUExecutorAsync(SPMDGPUExecutor, ExecutorAsyncBase):
    method execute_model_async (line 211) | async def execute_model_async(self, execute_model_req: ExecuteModelReq...
    method check_health_async (line 215) | async def check_health_async(self) -> None:

FILE: verl/third_party/vllm/vllm_v_0_4_2/tokenizer.py
  class TokenizerGroup (line 25) | class TokenizerGroup:
    method __init__ (line 28) | def __init__(self, tokenizer: PreTrainedTokenizer, enable_lora: bool, ...
    method ping (line 35) | def ping(self) -> bool:
    method get_max_input_len (line 39) | def get_max_input_len(self, lora_request: Optional[LoRARequest] = None...
    method encode (line 43) | def encode(self,
    method encode_async (line 50) | async def encode_async(self,
    method get_lora_tokenizer (line 57) | def get_lora_tokenizer(self, lora_request: Optional[LoRARequest]) -> "...
    method pad_token_id (line 72) | def pad_token_id(self):
    method eos_token_id (line 76) | def eos_token_id(self):

FILE: verl/third_party/vllm/vllm_v_0_4_2/worker.py
  class Worker (line 42) | class Worker(Worker):
    method __init__ (line 50) | def __init__(
    method init_device (line 105) | def init_device(self) -> None:
    method determine_num_available_blocks (line 142) | def determine_num_available_blocks(self) -> Tuple[int, int]:
    method _init_cache_engine (line 199) | def _init_cache_engine(self):
    method free_cache_engine (line 203) | def free_cache_engine(self):
    method execute_model (line 209) | def execute_model(self, execute_model_req: Optional[ExecuteModelReques...
    method sync_model_weights (line 237) | def sync_model_weights(self, actor_weights: Dict, load_format: str):
    method offload_model_weights (line 246) | def offload_model_weights(self) -> None:
  function init_worker_distributed_environment (line 257) | def init_worker_distributed_environment(

FILE: verl/third_party/vllm/vllm_v_0_5_4/arg_utils.py
  function nullable_str (line 43) | def nullable_str(val: str):
  class EngineArgs (line 50) | class EngineArgs:
    method add_cli_args (line 140) | def add_cli_args(parser: argparse.ArgumentParser) -> argparse.Argument...
    method from_cli_args (line 257) | def from_cli_args(cls, args: argparse.Namespace) -> 'EngineArgs':
    method create_engine_config (line 264) | def create_engine_config(

FILE: verl/third_party/vllm/vllm_v_0_5_4/config.py
  class ModelConfig (line 38) | class ModelConfig(ModelConfig):
    method __init__ (line 99) | def __init__(
  class LoadFormat (line 181) | class LoadFormat(str, enum.Enum):
  class LoadConfig (line 193) | class LoadConfig:
    method __post_init__ (line 221) | def __post_init__(self):
    method _verify_load_format (line 232) | def _verify_load_format(self) -> None:

FILE: verl/third_party/vllm/vllm_v_0_5_4/dtensor_weight_loaders.py
  function gemma_dtensor_weight_loader (line 27) | def gemma_dtensor_weight_loader(actor_weights: Dict, vllm_model: nn.Modu...
  function gptbigcode_dtensor_load_weights (line 64) | def gptbigcode_dtensor_load_weights(actor_weights: Dict, vllm_model: nn....
  function starcoder2_dtensor_load_weights (line 79) | def starcoder2_dtensor_load_weights(actor_weights: Dict, vllm_model: nn....
  function llama_dtensor_weight_loader (line 110) | def llama_dtensor_weight_loader(actor_weights: Dict, vllm_model: nn.Modu...
  function qwen2_dtensor_weight_loader (line 154) | def qwen2_dtensor_weight_loader(actor_weights: Dict, vllm_model: nn.Modu...
  function deepseekv2_dtensor_weight_loader (line 194) | def deepseekv2_dtensor_weight_loader(actor_weights: Dict, vllm_model: nn...
  function gpt2_dtensor_weight_loader (line 270) | def gpt2_dtensor_weight_loader(actor_weights: Dict, vllm_model: nn.Modul...
  function redistribute_dtensor (line 274) | def redistribute_dtensor(param_name: str, loaded_weights: DTensor, paral...
  function _process_parameter_names (line 287) | def _process_parameter_names(name):
  function load_dtensor_weights (line 323) | def load_dtensor_weights(actor_weights: Dict, vllm_model: nn.Module):
  function _get_model_weight_loader (line 331) | def _get_model_weight_loader(arch: str):
  function update_dtensor_weight_loader (line 339) | def update_dtensor_weight_loader():

FILE: verl/third_party/vllm/vllm_v_0_5_4/hf_weight_loader.py
  function update_hf_weight_loader (line 25) | def update_hf_weight_loader():
  function load_hf_weights (line 30) | def load_hf_weights(actor_weights: Dict, vllm_model: nn.Module):

FILE: verl/third_party/vllm/vllm_v_0_5_4/llm.py
  class LLM (line 43) | class LLM(LLM):
    method __init__ (line 97) | def __init__(
    method init_cache_engine (line 151) | def init_cache_engine(self):
    method free_cache_engine (line 154) | def free_cache_engine(self):
    method get_tokenizer (line 157) | def get_tokenizer(self) -> Union[PreTrainedTokenizer, PreTrainedTokeni...
    method set_tokenizer (line 160) | def set_tokenizer(
    method _run_engine (line 166) | def _run_engine(self, *, use_tqdm: bool) -> List[Union[RequestOutput, ...
    method _post_process_outputs (line 214) | def _post_process_outputs(self, request_outputs: List[RequestOutput]) ...
    method sync_model_weights (line 235) | def sync_model_weights(self, actor_weights: Dict[str, torch.Tensor], l...
    method offload_model_weights (line 238) | def offload_model_weights(self) -> None:

FILE: verl/third_party/vllm/vllm_v_0_5_4/llm_engine_sp.py
  class LLMEngine (line 46) | class LLMEngine(LLMEngine):
    method __init__ (line 77) | def __init__(
    method _init_tokenizer (line 263) | def _init_tokenizer(self, tokenizer, **tokenizer_init_kwargs):
    method init_cache_engine (line 270) | def init_cache_engine(self):
    method free_cache_engine (line 275) | def free_cache_engine(self):
    method _get_executor_cls (line 281) | def _get_executor_cls(cls, engine_config: EngineConfig) -> Type[Execut...
    method from_engine_args (line 293) | def from_engine_args(
    method sync_model_weights (line 324) | def sync_model_weights(self, actor_weights: Dict[str, torch.Tensor], l...
    method offload_model_weights (line 327) | def offload_model_weights(self) -> None:

FILE: verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py
  function parallel_weight_loader (line 27) | def parallel_weight_loader(self, param: torch.Tensor, loaded_weight: tor...
  function default_weight_loader (line 37) | def default_weight_loader(param: torch.Tensor, loaded_weight: torch.Tens...
  function gpt2_weight_loader (line 45) | def gpt2_weight_loader(actor_weights: Dict, vllm_model: nn.Module) -> nn...
  function llama_megatron_weight_loader (line 73) | def llama_megatron_weight_loader(actor_weights: Dict, vllm_model: nn.Mod...
  function llama_megatron_core_te_weight_loader (line 85) | def llama_megatron_core_te_weight_loader(actor_weights: Dict, vllm_model...
  function llama_megatron_core_weight_loader (line 116) | def llama_megatron_core_weight_loader(actor_weights: Dict, vllm_model: n...
  function _replace_name (line 146) | def _replace_name(megatron_name, name_mapping):
  function llama_megatron_core_te_weight_loader (line 169) | def llama_megatron_core_te_weight_loader(actor_weights: Dict, vllm_model...
  function llama_megatron_core_weight_loader (line 200) | def llama_megatron_core_weight_loader(actor_weights: Dict, vllm_model: n...
  function _replace_name (line 230) | def _replace_name(megatron_name, name_mapping):
  function mistral_megatron_weight_loader (line 253) | def mistral_megatron_weight_loader(actor_weights: Dict, vllm_model: nn.M...
  function load_megatron_weights (line 290) | def load_megatron_weights(actor_weights: Dict, vllm_model: nn.Module):
  function _get_model_weight_loader (line 298) | def _get_model_weight_loader(arch: str):
  function update_megatron_weight_loader (line 305) | def update_megatron_weight_loader():

FILE: verl/third_party/vllm/vllm_v_0_5_4/model_loader.py
  function get_model (line 35) | def get_model(actor_model: Union[PreTrainedModel, Dict],
  function get_model_loader (line 64) | def get_model_loader(load_config: LoadConfig) -> BaseModelLoader:
  class DummyModelLoader (line 103) | class DummyModelLoader(BaseModelLoader):
    method __init__ (line 106) | def __init__(self, load_config: LoadConfig):
    method load_model (line 112) | def load_model(self, *, model_config: ModelConfig, device_config: Devi...
  class MegatronLoader (line 125) | class MegatronLoader(BaseModelLoader):
    method __init__ (line 128) | def __init__(self, load_config: LoadConfig):
    method _get_weights_iterator (line 134) | def _get_weights_iterator(actor_model: Union[PreTrainedModel, Dict]):
    method load_model (line 143) | def load_model(self, actor_model: Union[PreTrainedModel, Dict], model_...
  class HFLoader (line 172) | class HFLoader(BaseModelLoader):
    method __init__ (line 175) | def __init__(self, load_config: LoadConfig):
    method _get_weights_iterator (line 181) | def _get_weights_iterator(self, actor_model: Union[PreTrainedModel, Di...
    method load_model (line 189) | def load_model(self, actor_model: Union[PreTrainedModel, Dict], model_...
  class DTensorLoader (line 212) | class DTensorLoader(BaseModelLoader):
    method __init__ (line 215) | def __init__(self, load_config: LoadConfig):
    method _get_weights_iterator (line 221) | def _get_weights_iterator(actor_model: Union[PreTrainedModel, Dict]):
    method load_model (line 230) | def load_model(self, actor_model: Union[PreTrainedModel, Dict], model_...
  function _get_logits (line 263) | def _get_logits(self, hidden_states: torch.Tensor, embedding: torch.Tensor,
  function logitsprocessor_init (line 279) | def logitsprocessor_init(self,

FILE: verl/third_party/vllm/vllm_v_0_5_4/model_runner.py
  class BatchType (line 43) | class BatchType(IntEnum):
  class ModelRunner (line 52) | class ModelRunner(ModelRunner):
    method __init__ (line 54) | def __init__(
    method load_model (line 89) | def load_model(self) -> None:

FILE: verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py
  function initialize_parallel_state (line 37) | def initialize_parallel_state(
  function ensure_model_parallel_initialized (line 68) | def ensure_model_parallel_initialized(
  function model_parallel_is_initialized (line 95) | def model_parallel_is_initialized():
  function initialize_model_parallel_for_vllm (line 101) | def initialize_model_parallel_for_vllm(tensor_model_parallel_size: int,
  function initialize_model_parallel (line 191) | def initialize_model_parallel(
  function get_device_mesh (line 272) | def get_device_mesh():
  function get_tensor_model_parallel_group (line 282) | def get_tensor_model_parallel_group():
  function get_tensor_model_parallel_world_size (line 288) | def get_tensor_model_parallel_world_size():
  function get_tensor_model_parallel_rank (line 293) | def get_tensor_model_parallel_rank():
  function get_tensor_model_parallel_src_rank (line 298) | def get_tensor_model_parallel_src_rank():

FILE: verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py
  class SPMDGPUExecutor (line 34) | class SPMDGPUExecutor(ExecutorBase):
    method __init__ (line 37) | def __init__(
    method _init_executor (line 66) | def _init_executor(self, model, distributed_init_method) -> None:
    method _init_workers_sp (line 72) | def _init_workers_sp(self, model, distributed_init_method: str):
    method determine_num_available_blocks (line 107) | def determine_num_available_blocks(self) -> Tuple[int, int]:
    method initialize_cache (line 127) | def initialize_cache(self, num_gpu_blocks: int, num_cpu_blocks: int) -...
    method init_cache_engine (line 150) | def init_cache_engine(self) -> None:
    method free_cache_engine (line 153) | def free_cache_engine(self) -> None:
    method execute_model (line 156) | def execute_model(self, execute_model_req) -> List[SamplerOutput]:
    method add_lora (line 164) | def add_lora(self, lora_request: LoRARequest) -> bool:
    method remove_lora (line 168) | def remove_lora(self, lora_id: int) -> bool:
    method list_loras (line 172) | def list_loras(self) -> Set[int]:
    method check_health (line 175) | def check_health(self) -> None:
    method add_prompt_adapter (line 183) | def add_prompt_adapter(self, prompt_adapter_request: PromptAdapterRequ...
    method list_prompt_adapters (line 188) | def list_prompt_adapters(self) -> Set[int]:
    method pin_lora (line 191) | def pin_lora(self, lora_id: int) -> bool:
    method pin_prompt_adapter (line 195) | def pin_prompt_adapter(self, prompt_adapter_id: int) -> bool:
    method remove_prompt_adapter (line 200) | def remove_prompt_adapter(self, prompt_adapter_id: int) -> bool:
    method offload_model_weights (line 206) | def offload_model_weights(self) -> None:
    method sync_model_weights (line 209) | def sync_model_weights(self, actor_weights: Dict[str, torch.Tensor], l...
  function initialize_cluster (line 213) | def initialize_cluster(
  function get_open_port (line 237) | def get_open_port():
  class SPMDGPUExecutorAsync (line 244) | class SPMDGPUExecutorAsync(SPMDGPUExecutor, ExecutorAsyncBase):
    method execute_model_async (line 246) | async def execute_model_async(self, execute_model_req: ExecuteModelReq...
    method check_health_async (line 250) | async def check_health_async(self) -> None:

FILE: verl/third_party/vllm/vllm_v_0_5_4/tokenizer.py
  class TokenizerGroup (line 25) | class TokenizerGroup:
    method __init__ (line 28) | def __init__(self, tokenizer: PreTrainedTokenizer, enable_lora: bool, ...
    method ping (line 35) | def ping(self) -> bool:
    method get_max_input_len (line 39) | def get_max_input_len(self, lora_request: Optional[LoRARequest] = None...
    method encode (line 43) | def encode(self,
    method encode_async (line 50) | async def encode_async(self,
    method get_lora_tokenizer (line 57) | def get_lora_tokenizer(self, lora_request: Optional[LoRARequest]) -> "...
    method pad_token_id (line 72) | def pad_token_id(self):
    method eos_token_id (line 76) | def eos_token_id(self):

FILE: verl/third_party/vllm/vllm_v_0_5_4/worker.py
  class Worker (line 44) | class Worker(Worker):
    method __init__ (line 52) | def __init__(
    method init_device (line 134) | def init_device(self) -> None:
    method determine_num_available_blocks (line 171) | def determine_num_available_blocks(self) -> Tuple[int, int]:
    method _init_cache_engine (line 229) | def _init_cache_engine(self):
    method free_cache_engine (line 233) | def free_cache_engine(self):
    method execute_model (line 239) | def execute_model(self,
    method sync_model_weights (line 266) | def sync_model_weights(self, actor_weights: Dict, load_format: str):
    method offload_model_weights (line 275) | def offload_model_weights(self) -> None:
  function init_worker_distributed_environment (line 286) | def init_worker_distributed_environment(

FILE: verl/third_party/vllm/vllm_v_0_6_3/arg_utils.py
  class EngineArgs (line 27) | class EngineArgs(EngineArgs):
    method __post_init__ (line 30) | def __post_init__(self):
    method create_model_config (line 33) | def create_model_config(self) -> ModelConfig:
    method create_load_config (line 62) | def create_load_config(self) -> LoadConfig:
    method create_engine_config (line 70) | def create_engine_config(self) -> EngineConfig:

FILE: verl/third_party/vllm/vllm_v_0_6_3/config.py
  class LoadFormat (line 34) | class LoadFormat(str, enum.Enum):
  class ModelConfig (line 44) | class ModelConfig(ModelConfig):
    method __init__ (line 46) | def __init__(self, hf_config: PretrainedConfig, *args, **kwargs) -> None:
  class LoadConfig (line 52) | class LoadConfig:
    method __post_init__ (line 80) | def __post_init__(self):
    method _verify_load_format (line 91) | def _verify_load_format(self) -> None:

FILE: verl/third_party/vllm/vllm_v_0_6_3/dtensor_weight_loaders.py
  function gemma_dtensor_weight_loader (line 24) | def gemma_dtensor_weight_loader(actor_weights: Dict, vllm_model: nn.Modu...
  function gptbigcode_dtensor_load_weights (line 61) | def gptbigcode_dtensor_load_weights(actor_weights: Dict, vllm_model: nn....
  function starcoder2_dtensor_load_weights (line 76) | def starcoder2_dtensor_load_weights(actor_weights: Dict, vllm_model: nn....
  function llama_dtensor_weight_loader (line 107) | def llama_dtensor_weight_loader(actor_weights: Dict, vllm_model: nn.Modu...
  function qwen2_dtensor_weight_loader (line 151) | def qwen2_dtensor_weight_loader(actor_weights: Dict, vllm_model: nn.Modu...
  function qwen2vl_dtensor_weight_loader (line 188) | def qwen2vl_dtensor_weight_loader(actor_weights: Dict, vllm_model: nn.Mo...
  function deepseekv2_dtensor_weight_loader (line 228) | def deepseekv2_dtensor_weight_loader(actor_weights: Dict, vllm_model: nn...
  function gpt2_dtensor_weight_loader (line 308) | def gpt2_dtensor_weight_loader(actor_weights: Dict, vllm_model: nn.Modul...
  function redistribute_dtensor (line 312) | def redistribute_dtensor(param_name: str, loaded_weights: DTensor, paral...
  function _process_parameter_names (line 326) | def _process_parameter_names(name):
  function load_dtensor_weights (line 363) | def load_dtensor_weights(actor_weights: Dict, vllm_model: nn.Module):
  function _get_model_weight_loader (line 371) | def _get_model_weight_loader(arch: str):
  function update_dtensor_weight_loader (line 379) | def update_dtensor_weight_loader():

FILE: verl/third_party/vllm/vllm_v_0_6_3/hf_weight_loader.py
  function update_hf_weight_loader (line 22) | def update_hf_weight_loader():
  function load_hf_weights (line 27) | def load_hf_weights(actor_weights: Dict, vllm_model: nn.Module):

FILE: verl/third_party/vllm/vllm_v_0_6_3/llm.py
  class LLM (line 31) | class LLM(LLM):
    method __init__ (line 85) | def __init__(
    method init_cache_engine (line 145) | def init_cache_engine(self):
    method free_cache_engine (line 148) | def free_cache_engine(self):
    method get_tokenizer (line 151) | def get_tokenizer(self) -> Union[PreTrainedTokenizer, PreTrainedTokeni...
    method set_tokenizer (line 154) | def set_tokenizer(
    method _run_engine (line 160) | def _run_engine(self, *, use_tqdm: bool) -> List[Union[RequestOutput, ...
    method _post_process_outputs (line 174) | def _post_process_outputs(self, request_outputs: List[RequestOutput]) ...
    method sync_model_weights (line 196) | def sync_model_weights(self, actor_weights: Dict[str, torch.Tensor], l...
    method offload_model_weights (line 199) | def offload_model_weights(self) -> None:

FILE: verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py
  class LLMEngine (line 61) | class LLMEngine(LLMEngine):
    method __init__ (line 95) | def __init__(
    method _init_tokenizer (line 337) | def _init_tokenizer(self, tokenizer, **tokenizer_init_kwargs):
    method init_cache_engine (line 344) | def init_cache_engine(self):
    method free_cache_engine (line 349) | def free_cache_engine(self):
    method _get_executor_cls (line 355) | def _get_executor_cls(cls, engine_config: EngineConfig) -> Type[Execut...
    method from_engine_args (line 372) | def from_engine_args(
    method sync_model_weights (line 404) | def sync_model_weights(self, actor_weights: Dict[str, torch.Tensor], l...
    method offload_model_weights (line 407) | def offload_model_weights(self) -> None:

FILE: verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py
  function parallel_weight_loader (line 26) | def parallel_weight_loader(self, param: torch.Tensor, loaded_weight: tor...
  function default_weight_loader (line 37) | def default_weight_loader(param: torch.Tensor, loaded_weight: torch.Tens...
  function gpt2_weight_loader (line 46) | def gpt2_weight_loader(actor_weights: Dict, vllm_model: nn.Module) -> nn...
  function llama_megatron_weight_loader (line 74) | def llama_megatron_weight_loader(actor_weights: Dict, vllm_model: nn.Mod...
  function llama_megatron_core_te_weight_loader (line 86) | def llama_megatron_core_te_weight_loader(actor_weights: Dict, vllm_model...
  function llama_megatron_core_weight_loader (line 117) | def llama_megatron_core_weight_loader(actor_weights: Dict, vllm_model: n...
  function _replace_name (line 147) | def _replace_name(megatron_name, name_mapping):
  function llama_megatron_core_te_weight_loader (line 170) | def llama_megatron_core_te_weight_loader(actor_weights: Dict, vllm_model...
  function llama_megatron_core_weight_loader (line 201) | def llama_megatron_core_weight_loader(actor_weights: Dict, vllm_model: n...
  function _replace_name (line 231) | def _replace_name(megatron_name, name_mapping):
  function mistral_megatron_weight_loader (line 254) | def mistral_megatron_weight_loader(actor_weights: Dict, vllm_model: nn.M...
  function load_megatron_weights (line 291) | def load_megatron_weights(actor_weights: Dict, vllm_model: nn.Module):
  function _get_model_weight_loader (line 299) | def _get_model_weight_loader(arch: str):
  function update_megatron_weight_loader (line 306) | def update_megatron_weight_loader():

FILE: verl/third_party/vllm/vllm_v_0_6_3/model_loader.py
  function get_model (line 33) | def get_model(
  function get_model_loader (line 65) | def get_model_loader(load_config: LoadConfig) -> BaseModelLoader:
  class DummyModelLoader (line 104) | class DummyModelLoader(BaseModelLoader):
    method __init__ (line 107) | def __init__(self, load_config: LoadConfig):
    method download_model (line 113) | def download_model(self, model_config: ModelConfig) -> None:
    method load_model (line 116) | def load_model(
  class MegatronLoader (line 135) | class MegatronLoader(BaseModelLoader):
    method __init__ (line 138) | def __init__(self, load_config: LoadConfig):
    method download_model (line 144) | def download_model(self, model_config: ModelConfig) -> None:
    method _get_weights_iterator (line 147) | def _get_weights_iterator(actor_model: Union[PreTrainedModel, Dict]):
    method load_model (line 156) | def load_model(
  class HFLoader (line 190) | class HFLoader(BaseModelLoader):
    method __init__ (line 193) | def __init__(self, load_config: LoadConfig):
    method download_model (line 199) | def download_model(self, model_config: ModelConfig) -> None:
    method _get_weights_iterator (line 202) | def _get_weights_iterator(self, actor_model: Union[PreTrainedModel, Di...
    method load_model (line 210) | def load_model(
  class DTensorLoader (line 238) | class DTensorLoader(BaseModelLoader):
    method __init__ (line 241) | def __init__(self, load_config: LoadConfig):
    method download_model (line 247) | def download_model(self, model_config: ModelConfig) -> None:
    method _get_weights_iterator (line 250) | def _get_weights_iterator(actor_model: Union[PreTrainedModel, Dict]):
    method load_model (line 259) | def load_model(
  function _get_logits (line 297) | def _get_logits(self, hidden_states: torch.Tensor, embedding: torch.Tensor,
  function logitsprocessor_init (line 313) | def logitsprocessor_init(

FILE: verl/third_party/vllm/vllm_v_0_6_3/model_runner.py
  class BatchType (line 51) | class BatchType(IntEnum):
  class ModelRunner (line 60) | class ModelRunner(ModelRunner):
    method __init__ (line 62) | def __init__(
    method load_model (line 101) | def load_model(self) -> None:

FILE: verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py
  function initialize_parallel_state (line 38) | def initialize_parallel_state(
  function ensure_model_parallel_initialized (line 71) | def ensure_model_parallel_initialized(
  function model_parallel_is_initialized (line 98) | def model_parallel_is_initialized():
  function initialize_model_parallel_for_vllm (line 104) | def initialize_model_parallel_for_vllm(
  function initialize_model_parallel (line 199) | def initialize_model_parallel(
  function get_device_mesh (line 281) | def get_device_mesh():
  function get_tensor_model_parallel_group (line 291) | def get_tensor_model_parallel_group():
  function get_tensor_model_parallel_world_size (line 297) | def get_tensor_model_parallel_world_size():
  function get_tensor_model_parallel_rank (line 302) | def get_tensor_model_parallel_rank():
  function get_tensor_model_parallel_src_rank (line 307) | def get_tensor_model_parallel_src_rank():

FILE: verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py
  class SPMDGPUExecutor (line 42) | class SPMDGPUExecutor(ExecutorBase):
    method __init__ (line 45) | def __init__(
    method _init_executor (line 74) | def _init_executor(self, model, distributed_init_method) -> None:
    method _init_workers_sp (line 80) | def _init_workers_sp(self, model, distributed_init_method: str):
    method determine_num_available_blocks (line 114) | def determine_num_available_blocks(self) -> Tuple[int, int]:
    method initialize_cache (line 134) | def initialize_cache(self, num_gpu_blocks: int, num_cpu_blocks: int) -...
    method init_cache_engine (line 156) | def init_cache_engine(self) -> None:
    method free_cache_engine (line 159) | def free_cache_engine(self) -> None:
    method execute_model (line 162) | def execute_model(self, execute_model_req) -> List[SamplerOutput]:
    method add_lora (line 170) | def add_lora(self, lora_request: LoRARequest) -> bool:
    method remove_lora (line 174) | def remove_lora(self, lora_id: int) -> bool:
    method list_loras (line 178) | def list_loras(self) -> Set[int]:
    method check_health (line 181) | def check_health(self) -> None:
    method add_prompt_adapter (line 189) | def add_prompt_adapter(self, prompt_adapter_request: PromptAdapterRequ...
    method list_prompt_adapters (line 193) | def list_prompt_adapters(self) -> Set[int]:
    method pin_lora (line 196) | def pin_lora(self, lora_id: int) -> bool:
    method pin_prompt_adapter (line 200) | def pin_prompt_adapter(self, prompt_adapter_id: int) -> bool:
    method remove_prompt_adapter (line 204) | def remove_prompt_adapter(self, prompt_adapter_id: int) -> bool:
    method offload_model_weights (line 209) | def offload_model_weights(self) -> None:
    method sync_model_weights (line 212) | def sync_model_weights(self, actor_weights: Dict[str, torch.Tensor], l...
  function initialize_cluster (line 216) | def initialize_cluster(
  function get_open_port (line 240) | def get_open_port():
  class SPMDGPUExecutorAsync (line 247) | class SPMDGPUExecutorAsync(SPMDGPUExecutor, ExecutorAsyncBase):
    method execute_model_async (line 249) | async def execute_model_async(self, execute_model_req: ExecuteModelReq...
    method check_health_async (line 253) | async def check_health_async(self) -> None:

FILE: verl/third_party/vllm/vllm_v_0_6_3/tokenizer.py
  class TokenizerGroup (line 23) | class TokenizerGroup(TokenizerGroup):
    method __init__ (line 26) | def __init__(self, tokenizer: PreTrainedTokenizer, enable_lora: bool, ...
    method pad_token_id (line 35) | def pad_token_id(self):
    method eos_token_id (line 39) | def eos_token_id(self):

FILE: verl/third_party/vllm/vllm_v_0_6_3/worker.py
  class Worker (line 53) | class Worker(Worker):
    method __init__ (line 61) | def __init__(
    method init_device (line 140) | def init_device(self) -> None:
    method determine_num_available_blocks (line 177) | def determine_num_available_blocks(self) -> Tuple[int, int]:
    method _init_cache_engine (line 235) | def _init_cache_engine(self):
    method free_cache_engine (line 239) | def free_cache_engine(self):
    method execute_model (line 245) | def execute_model(self,
    method sync_model_weights (line 274) | def sync_model_weights(self, actor_weights: Dict, load_format: str):
    method offload_model_weights (line 283) | def offload_model_weights(self) -> None:
  function init_worker_distributed_environment (line 294) | def init_worker_distributed_environment(

FILE: verl/trainer/fsdp_sft_trainer.py
  function extract_step (line 51) | def extract_step(path):
  class FSDPSFTTrainer (line 58) | class FSDPSFTTrainer(object):
    method __init__ (line 60) | def __init__(self, config, device_mesh: DeviceMesh):
    method _normalize_config_bsz (line 81) | def _normalize_config_bsz(self):
    method _build_dataloader (line 92) | def _build_dataloader(self):
    method _build_model_optimizer (line 139) | def _build_model_optimizer(self):
    method _compute_loss (line 218) | def _compute_loss(self, batch):
    method training_step (line 252) | def training_step(self, batch: TensorDict):
    method validation_step (line 288) | def validation_step(self, batch: TensorDict):
    method save_checkpoint (line 295) | def save_checkpoint(self, step):
    method fit (line 313) | def fit(self):
  function main (line 360) | def main(config):

FILE: verl/trainer/main_eval.py
  function select_reward_fn (line 27) | def select_reward_fn(data_source):
  function main (line 35) | def main(config):

FILE: verl/trainer/main_generation.py
  function main (line 40) | def main(config):

FILE: verl/trainer/main_ppo.py
  function _select_rm_score_fn (line 24) | def _select_rm_score_fn(data_source):
  class RewardManager (line 37) | class RewardManager():
    method __init__ (line 41) | def __init__(self, tokenizer, num_examine) -> None:
    method __call__ (line 45) | def __call__(self, data: DataProto):
  function main (line 98) | def main(config):
  function main_task (line 107) | def main_task(config):

FILE: verl/trainer/ppo/core_algos.py
  class AdaptiveKLController (line 28) | class AdaptiveKLController:
    method __init__ (line 34) | def __init__(self, init_kl_coef, target_kl, horizon):
    method update (line 39) | def update(self, current_kl, n_steps):
  class FixedKLController (line 46) | class FixedKLController:
    method __init__ (line 49) | def __init__(self, kl_coef):
    method update (line 52) | def update(self, current_kl, n_steps):
  function get_kl_controller (line 56) | def get_kl_controller(config):
  function compute_gae_advantage_return (line 70) | def compute_gae_advantage_return(token_level_rewards: torch.Tensor, valu...
  function compute_grpo_outcome_advantage (line 111) | def compute_grpo_outcome_advantage(token_level_rewards: torch.Tensor,
  function compute_rewards (line 158) | def compute_rewards(token_level_scores, old_log_prob, ref_log_prob, kl_r...
  function compute_policy_loss (line 163) | def compute_policy_loss(old_log_prob, log_prob, advantages, eos_mask, cl...
  function compute_entropy_loss (line 197) | def compute_entropy_loss(logits, eos_mask):
  function compute_value_loss (line 216) | def compute_value_loss(vpreds, returns, values, eos_mask, cliprange_value):
  function kl_penalty (line 242) | def kl_penalty(logprob: torch.FloatTensor, ref_logprob: torch.FloatTenso...

FILE: verl/trainer/ppo/ray_trainer.py
  class Role (line 41) | class Role(Enum):
  class ResourcePoolManager (line 55) | class ResourcePoolManager:
    method create_resource_pool (line 64) | def create_resource_pool(self):
    method get_resource_pool (line 75) | def get_resource_pool(self, role: Role) -> RayResourcePool:
  function apply_kl_penalty (line 84) | def apply_kl_penalty(data: DataProto, kl_ctrl: core_algos.AdaptiveKLCont...
  function compute_advantage (line 116) | def compute_advantage(data: DataProto, adv_estimator, gamma=1.0, lam=1.0...
  function reduce_metrics (line 150) | def reduce_metrics(metrics: dict):
  function _compute_response_info (line 156) | def _compute_response_info(batch):
  function compute_data_metrics (line 172) | def compute_data_metrics(batch, use_critic=True):
  function compute_timing_metrics (line 260) | def compute_timing_metrics(batch, timing_raw):
  function _timer (line 285) | def _timer(name: str, timing_raw: Dict[str, float]):
  class RayPPOTrainer (line 291) | class RayPPOTrainer(object):
    method __init__ (line 298) | def __init__(self,
    method _create_dataloader (line 342) | def _create_dataloader(self):
    method _validate (line 392) | def _validate(self):
    method init_workers (line 444) | def init_workers(self):
    method _save_checkpoint (line 516) | def _save_checkpoint(self):
    method _balance_batch (line 530) | def _balance_batch(self, batch: DataProto, metrics, logging_prefix='gl...
    method fit (line 547) | def fit(self):

FILE: verl/utils/config.py
  function update_dict_with_config (line 20) | def update_dict_with_config(dictionary: Dict, config: DictConfig):

FILE: verl/utils/dataset/rl_dataset.py
  function collate_fn (line 31) | def collate_fn(data_list: list[dict]) -> dict:
  class RLHFDataset (line 58) | class RLHFDataset(Dataset):
    method __init__ (line 63) | def __init__(self,
    method _download (line 91) | def _download(self):
    method _read_files_and_tokenize (line 96) | def _read_files_and_tokenize(self):
    method __len__ (line 117) | def __len__(self):
    method __getitem__ (line 120) | def __getitem__(self, item):

FILE: verl/utils/dataset/rm_dataset.py
  function download_files_distributed (line 27) | def download_files_distributed(download_fn):
  class RMDataset (line 40) | class RMDataset(Dataset):
    method __init__ (line 42) | def __init__(self,
    method _download (line 70) | def _download(self):
    method _read_files_and_tokenize (line 85) | def _read_files_and_tokenize(self):
    method __len__ (line 96) | def __len__(self):
    method _pad_to_length (line 99) | def _pad_to_length(self, input_ids, attention_mask):
    method __getitem__ (line 114) | def __getitem__(self, item):

FILE: verl/utils/dataset/sft_dataset.py
  class SFTDataset (line 34) | class SFTDataset(Dataset):
    method __init__ (line 39) | def __init__(self,
    method _download (line 69) | def _download(self):
    method _read_files_and_tokenize (line 73) | def _read_files_and_tokenize(self):
    method __len__ (line 107) | def __len__(self):
    method __getitem__ (line 110) | def __getitem__(self, item):

FILE: verl/utils/debug/performance.py
  function log_gpu_memory_usage (line 20) | def log_gpu_memory_usage(head: str, logger: logging.Logger = None, level...

FILE: verl/utils/debug/trajectory_tracker.py
  function save_to_hdfs (line 33) | def save_to_hdfs(data: io.BytesIO, name, hdfs_dir, verbose):
  class TrajectoryTracker (line 50) | class TrajectoryTracker():
    method __init__ (line 52) | def __init__(self, hdfs_dir, verbose) -> None:
    method dump (line 59) | def dump(self, data: io.BytesIO, name):
    method wait_for_hdfs (line 63) | def wait_for_hdfs(self):
  function dump_data (line 69) | def dump_data(data, name):
  function get_trajectory_tracker (line 79) | def get_trajectory_tracker():
  function process (line 94) | def process(iter):

FILE: verl/utils/distributed.py
  function initialize_global_process_group (line 18) | def initialize_global_process_group(timeout_second=36000):

FILE: verl/utils/flops_counter.py
  function get_device_flops (line 21) | def get_device_flops(unit="T"):
  class FlopsCounter (line 51) | class FlopsCounter:
    method __init__ (line 61) | def __init__(self, config: PretrainedConfig):
    method _estimate_unknown_flops (line 69) | def _estimate_unknown_flops(self, tokens_sum, batch_seqlens, delta_time):
    method _estimate_qwen2_flops (line 72) | def _estimate_qwen2_flops(self, tokens_sum, batch_seqlens, delta_time):
    method estimate_flops (line 107) | def estimate_flops(self, batch_seqlens, delta_time):

FILE: verl/utils/fs.py
  function _is_non_local (line 29) | def _is_non_local(path):
  function md5_encode (line 33) | def md5_encode(path: str) -> str:
  function get_local_temp_path (line 37) | def get_local_temp_path(hdfs_path: str, cache_dir: str) -> str:
  function copy_local_path_from_hdfs (line 55) | def copy_local_path_from_hdfs(src: str, cache_dir=None, filelock='.file....

FILE: verl/utils/fsdp_utils.py
  function init_fn (line 29) | def init_fn(x: torch.nn.Module):
  function get_init_weight_context_manager (line 36) | def get_init_weight_context_manager(use_meta_tensor=True):
  function get_fsdp_wrap_policy (line 48) | def get_fsdp_wrap_policy(module, config=None):
  function offload_fsdp_grad (line 79) | def offload_fsdp_grad(module):
  function load_fsdp_grad (line 86) | def load_fsdp_grad(module, device_id):
  function offload_fsdp_param_and_grad (line 93) | def offload_fsdp_param_and_grad(module, offload_grad=False):
  function load_fsdp_param_and_grad (line 103) | def load_fsdp_param_and_grad(module, device_id, load_grad=False):
  function offload_fsdp_optimizer (line 113) | def offload_fsdp_optimizer(optimizer):
  function load_fsdp_optimizer (line 123) | def load_fsdp_optimizer(optimizer, device_id):
  function meta_device_init (line 134) | def meta_device_init():
  function parallel_load_safetensors (line 165) | def parallel_load_safetensors(filepath):
  function parallel_init_module_fn (line 221) | def parallel_init_module_fn(module: torch.nn.Module, shard_states: Dict[...

FILE: verl/utils/hdfs_io.py
  function exists (line 27) | def exists(path: str, **kwargs) -> bool:
  function _exists (line 43) | def _exists(file_path: str):
  function makedirs (line 50) | def makedirs(name, mode=0o777, exist_ok=False, **kwargs) -> None:
  function _mkdir (line 75) | def _mkdir(file_path: str) -> bool:
  function copy (line 84) | def copy(src: str, dst: str, **kwargs) -> bool:
  function _copy (line 113) | def _copy(from_path: str, to_path: str, timeout: int = None) -> bool:
  function _run_cmd (line 135) | def _run_cmd(cmd: str, timeout=None):
  function _hdfs_cmd (line 139) | def _hdfs_cmd(cmd: str) -> str:
  function _is_non_local (line 143) | def _is_non_local(path: str):

FILE: verl/utils/import_utils.py
  function is_megatron_core_available (line 24) | def is_megatron_core_available():
  function is_vllm_available (line 33) | def is_vllm_available():
  function import_external_libs (line 41) | def import_external_libs(external_libs=None):

FILE: verl/utils/logger/aggregate_logger.py
  function concat_dict_to_str (line 21) | def concat_dict_to_str(dict: Dict, step):
  class LocalLogger (line 30) | class LocalLogger:
    method __init__ (line 32) | def __init__(self, remote_logger=None, enable_wandb=False, print_to_co...
    method flush (line 37) | def flush(self):
    method log (line 40) | def log(self, data, step):

FILE: verl/utils/logging_utils.py
  function set_basic_config (line 18) | def set_basic_config(level):

FILE: verl/utils/megatron/memory.py
  class MemoryBuffer (line 18) | class MemoryBuffer:
    method __init__ (line 20) | def __init__(self, numel, numel_padded, dtype):
    method zero (line 29) | def zero(self):
    method get (line 33) | def get(self, shape, start_index):

FILE: verl/utils/megatron/optimizer.py
  function get_megatron_optimizer (line 26) | def get_megatron_optimizer(

FILE: verl/utils/megatron/optimizer_config.py
  class OptimizerConfig (line 23) | class OptimizerConfig:

FILE: verl/utils/megatron/pipeline_parallel.py
  function compute_transformers_input_shapes (line 22) | def compute_transformers_input_shapes(batches, meta_info):
  function make_batch_generator (line 43) | def make_batch_generator(batches, vpp_size):

FILE: verl/utils/megatron/sequence_parallel.py
  function mark_parameter_as_sequence_parallel (line 21) | def mark_parameter_as_sequence_parallel(parameter):
  function is_sequence_parallel_param (line 25) | def is_sequence_parallel_param(param):
  function pad_to_sequence_parallel (line 29) | def pad_to_sequence_parallel(unpad_tokens: torch.Tensor):

FILE: verl/utils/megatron/tensor_parallel.py
  function update_kwargs_with_config (line 27) | def update_kwargs_with_config(dictionary: Dict, config: ModelParallelCon...
  function get_default_kwargs_for_model_parallel_config (line 32) | def get_default_kwargs_for_model_parallel_config():
  function get_default_model_parallel_config (line 43) | def get_default_model_parallel_config():
  function get_common_default_kwargs_for_parallel_linear (line 47) | def get_common_default_kwargs_for_parallel_linear():
  function get_default_kwargs_for_column_parallel_linear (line 58) | def get_default_kwargs_for_column_parallel_linear():
  function get_default_kwargs_for_row_parallel_linear (line 72) | def get_default_kwargs_for_row_parallel_linear():
  function get_default_kwargs_for_parallel_embedding (line 77) | def get_default_kwargs_for_parallel_embedding():
  function is_tensor_parallel_param (line 86) | def is_tensor_parallel_param(param):
  function get_tensor_parallel_partition_dim (line 90) | def get_tensor_parallel_partition_dim(param):
  function get_tensor_parallel_partition_stride (line 95) | def get_tensor_parallel_partition_stride(param):
  class _VocabParallelEntropy (line 100) | class _VocabParallelEntropy(torch.autograd.Function):
    method forward (line 103) | def forward(ctx, vocab_parallel_logits: torch.Tensor) -> torch.Tensor:
    method backward (line 118) | def backward(ctx, grad_output: torch.Tensor) -> torch.Tensor:
  function vocab_parallel_entropy (line 124) | def vocab_parallel_entropy(vocab_parallel_logits: torch.Tensor) -> torch...
  function vocab_parallel_log_probs_from_logits (line 136) | def vocab_parallel_log_probs_from_logits(logits, labels):
  function vocab_parallel_log_probs_from_logits_response_rmpad (line 141) | def vocab_parallel_log_probs_from_logits_response_rmpad(input_ids, atten...
  function vocab_parallel_compute_entropy_loss (line 168) | def vocab_parallel_compute_entropy_loss(logits, eos_mask):

FILE: verl/utils/megatron_utils.py
  function get_model (line 34) | def get_model(model_provider_func, model_type=ModelType.encoder_or_decod...
  function unwrap_model (line 122) | def unwrap_model(model, module_instances=ALL_MODULE_WRAPPER_CLASSNAMES):
  function convert_config (line 140) | def convert_config(hf_config: PretrainedConfig, megatron_config) -> Tran...
  function init_megatron_optim_config (line 185) | def init_megatron_optim_config(optim_config: Dict) -> OptimizerConfig:
  function init_model_parallel_config (line 201) | def init_model_parallel_config(config: DictConfig) -> ModelParallelConfig:
  class FakeTimers (line 215) | class FakeTimers:
    method __init__ (line 218) | def __init__(self):
    method __call__ (line 222) | def __call__(self, *args: Any, **kwds: Any) -> Any:
  function offload_megatron_param_and_grad (line 226) | def offload_megatron_param_and_grad(module_list: nn.ModuleList, offload_...
  function load_megatron_param_and_grad (line 241) | def load_megatron_param_and_grad(module_list: nn.ModuleList, device_id, ...

FILE: verl/utils/memory_buffer.py
  class MemoryBuffer (line 24) | class MemoryBuffer:
    method __init__ (line 30) | def __init__(self, numel: int, numel_padded: int, dtype: torch.dtype):
    method zero (line 36) | def zero(self):
    method get (line 40) | def get(self, shape, start_index):
  function calc_padded_numel (line 51) | def calc_padded_numel(shape: torch.Size, dtype: torch.dtype):
  function get_weight_buffer_meta_from_module (line 58) | def get_weight_buffer_meta_from_module(module: nn.Module) -> Dict[str, D...
  function build_memory_buffer (line 68) | def build_memory_buffer(weight_buffer_meta: Dict[str, Dict]) -> Dict[tor...
  function build_memory_reference_from_module (line 97) | def build_memory_reference_from_module(module: torch.nn.Module,
  function build_memory_reference (line 113) | def build_memory_reference(weight_buffer_meta: Dict[str, Dict], memory_b...
  class MemoryBufferModuleWrapper (line 140) | class MemoryBufferModuleWrapper:
    method __init__ (line 146) | def __init__(self, module: nn.Module):
    method get_memory_buffers (line 153) | def get_memory_buffers(self):
    method get_weight_buffer_meta (line 156) | def get_weight_buffer_meta(self):
  class MegatronMemoryBufferForRollout (line 160) | class MegatronMemoryBufferForRollout(object):
    method __init__ (line 175) | def __init__(self, transform_memory_param_fn):
    method initialize_weight_buffer (line 181) | def initialize_weight_buffer(self, weight_buffer_meta_pp: List[Dict[st...
    method build_memory_reference (line 199) | def build_memory_reference(self):
    method named_parameters (line 205) | def named_parameters(self):
    method weight_buffers (line 209) | def weight_buffers(self):
    method memory_buffers (line 213) | def memory_buffers(self):

FILE: verl/utils/model.py
  class LambdaLayer (line 28) | class LambdaLayer(nn.Module):
    method __init__ (line 30) | def __init__(self, fn):
    method forward (line 34) | def forward(self, *args, **kwargs):
  function squeeze (line 38) | def squeeze(x):
  function update_model_config (line 42) | def update_model_config(module_config, override_config_kwargs):
  function get_huggingface_actor_config (line 47) | def get_huggingface_actor_config(model_name: str, override_config_kwargs...
  function create_huggingface_actor (line 58) | def create_huggingface_actor(model_name: str, override_config_kwargs=Non...
  function create_huggingface_critic (line 81) | def create_huggingface_critic(model_name: str, override_config_kwargs=No...
  function get_model_size (line 102) | def get_model_size(model: nn.Module, scale='auto'):
  function print_model_size (line 129) | def print_model_size(model: nn.Module, name: str = None):
  function create_random_mask (line 136) | def create_random_mask(input_ids: torch.Tensor,
  function compute_position_id_with_mask (line 177) | def compute_position_id_with_mask(mask):
  function normalize_pp_vpp_params (line 181) | def normalize_pp_vpp_params(params, num_hidden_layers, layer_name='layer...
  function get_parallel_model_from_config (line 234) | def get_parallel_model_from_config(config, megatron_config, pre_process=...
  function _get_parallel_model_architecture_from_config (line 243) | def _get_parallel_model_architecture_from_config(config: PretrainedConfi...
  function load_megatron_model_weights (line 253) | def load_megatron_model_weights(config,
  function pad_packed_inputs (line 299) | def pad_packed_inputs(unpad_tokens: torch.Tensor, cu_seqlens, max_seqlen...

FILE: verl/utils/py_functional.py
  function union_two_dict (line 22) | def union_two_dict(dict1: Dict, dict2: Dict):
  function append_to_dict (line 41) | def append_to_dict(data: Dict, new_data: Dict):
  class NestedNamespace (line 48) | class NestedNamespace(SimpleNamespace):
    method __init__ (line 50) | def __init__(self, dictionary, **kwargs):

FILE: verl/utils/ray_utils.py
  function parallel_put (line 23) | def parallel_put(data_list, max_workers=None):

FILE: verl/utils/rendezvous/ray_backend.py
  class NCCLIDStore (line 25) | class NCCLIDStore:
    method __init__ (line 27) | def __init__(self, nccl_id):
    method get (line 30) | def get(self):
  function get_nccl_id_store_by_name (line 34) | def get_nccl_id_store_by_name(name):
  function create_nccl_communicator_in_ray (line 47) | def create_nccl_communicator_in_ray(rank: int,

FILE: verl/utils/reward_score/countdown.py
  function extract_solution (line 7) | def extract_solution(solution_str):
  function validate_equation (line 28) | def validate_equation(equation_str, available_numbers):
  function evaluate_equation (line 44) | def evaluate_equation(equation_str):
  function compute_score (line 59) | def compute_score(solution_str, ground_truth, method='strict', format_sc...

FILE: verl/utils/reward_score/gsm8k.py
  function extract_solution (line 18) | def extract_solution(solution_str, method='strict'):
  function compute_score (line 44) | def compute_score(solution_str, ground_truth, method='strict', format_sc...

FILE: verl/utils/reward_score/math.py
  function compute_score (line 17) | def compute_score(solution_str, ground_truth) -> float:
  function is_equiv (line 32) | def is_equiv(str1, str2, verbose=False):
  function remove_boxed (line 49) | def remove_boxed(s):
  function last_boxed_only_string (line 63) | def last_boxed_only_string(string):
  function fix_fracs (line 93) | def fix_fracs(string):
  function fix_a_slash_b (line 125) | def fix_a_slash_b(string):
  function remove_right_units (line 140) | def remove_right_units(string):
  function fix_sqrt (line 150) | def fix_sqrt(string):
  function strip_string (line 165) | def strip_string(string):

FILE: verl/utils/reward_score/multiply.py
  function extract_solution (line 5) | def extract_solution(solution_str):
  function compute_score (line 27) | def compute_score(solution_str, ground_truth, method='strict', format_sc...

FILE: verl/utils/seqlen_balancing.py
  function karmarkar_karp (line 25) | def karmarkar_karp(seqlen_list: List[int], k_partitions: int, equal_size...
  function greedy_partition (line 133) | def greedy_partition(seqlen_list: List[int], k_partitions: int, equal_si...
  function get_seqlen_balanced_partitions (line 152) | def get_seqlen_balanced_partitions(seqlen_list: List[int], k_partitions:...
  function log_seqlen_unbalance (line 186) | def log_seqlen_unbalance(seqlen_list: List[int], partitions: List[List[i...
  function ceildiv (line 220) | def ceildiv(a, b):
  function rearrange_micro_batches (line 224) | def rearrange_micro_batches(batch: TensorDict, max_token_len, dp_group=N...
  function get_reverse_idx (line 259) | def get_reverse_idx(idx_map):

FILE: verl/utils/tokenizer.py
  function set_pad_token_id (line 20) | def set_pad_token_id(tokenizer):
  function hf_tokenizer (line 35) | def hf_tokenizer(name_or_path, correct_pad_token=True, correct_gemma2=Tr...

FILE: verl/utils/torch_dtypes.py
  class PrecisionType (line 27) | class PrecisionType(object):
    method supported_type (line 43) | def supported_type(precision: Union[str, int]) -> bool:
    method supported_types (line 47) | def supported_types() -> list[str]:
    method is_fp16 (line 51) | def is_fp16(precision):
    method is_fp32 (line 55) | def is_fp32(precision):
    method is_bf16 (line 59) | def is_bf16(precision):
    method to_dtype (line 63) | def to_dtype(precision):
    method to_str (line 74) | def to_str(precision):

FILE: verl/utils/torch_functional.py
  function gather_from_labels (line 34) | def gather_from_labels(data, label):
  function logprobs_from_logits (line 49) | def logprobs_from_logits(logits, labels):
  function logprobs_from_logits_flash_attn (line 65) | def logprobs_from_logits_flash_attn(logits, labels):
  function logprobs_from_logits_naive (line 70) | def logprobs_from_logits_naive(logits, labels):
  function logprobs_of_labels_v2 (line 76) | def logprobs_of_labels_v2(logits: torch.FloatTensor, labels):
  function clip_by_value (line 86) | def clip_by_value(x, tensor_min, tensor_max):
  function entropy_from_logits (line 95) | def entropy_from_logits(logits: torch.Tensor):
  function masked_sum (line 102) | def masked_sum(values, mask, axis=None):
  function masked_mean (line 107) | def masked_mean(values, mask, axis=None):
  function masked_var (line 112) | def masked_var(values, mask, unbiased=True):
  function masked_whiten (line 130) | def masked_whiten(values, mask, shift_mean=True):
  function get_eos_mask (line 139) | def get_eos_mask(response_id: torch.Tensor, eos_token: int = 2, dtype=to...
  function compute_grad_norm (line 151) | def compute_grad_norm(model: nn.Module):
  function broadcast_dict_tensor (line 160) | def broadcast_dict_tensor(tensors: Union[Dict[str, torch.Tensor], Tensor...
  function allgather_dict_tensors (line 169) | def allgather_dict_tensors(tensors: Union[Dict[str, torch.Tensor], Tenso...
  function split_dict_tensor_into_batches (line 203) | def split_dict_tensor_into_batches(tensors: TensorDict, batch_size) -> L...
  function pad_sequence_to_length (line 209) | def pad_sequence_to_length(tensors, max_seq_len, pad_token_id, left_pad=...
  function tokenize_and_postprocess_data (line 225) | def tokenize_and_postprocess_data(prompt: str,
  function remove_pad_token (line 269) | def remove_pad_token(input_ids: torch.Tensor, attention_mask: torch.Tens...
  function log_probs_from_logits_response (line 284) | def log_probs_from_logits_response(input_ids, logits, response_length):
  function log_probs_from_logits_response_rmpad (line 300) | def log_probs_from_logits_response_rmpad(input_ids, attention_mask, logi...
  function log_probs_from_logits_all_rmpad (line 328) | def log_probs_from_logits_all_rmpad(input_ids_rmpad, logits_rmpad, indic...
  function post_process_logits (line 359) | def post_process_logits(input_ids, logits, temperature, top_k, top_p):
  function get_cosine_schedule_with_warmup (line 379) | def get_cosine_schedule_with_warmup(
  function get_constant_schedule_with_warmup (line 422) | def get_constant_schedule_with_warmup(
  function prepare_decoder_attention_mask (line 434) | def prepare_decoder_attention_mask(attention_mask, input_shape, inputs_e...
  function _make_causal_mask (line 456) | def _make_causal_mask(input_ids_shape: torch.Size, dtype: torch.dtype, d...
  function _expand_mask (line 469) | def _expand_mask(mask: torch.Tensor, dtype: torch.dtype, tgt_len: Option...
  function get_unpad_data (line 483) | def get_unpad_data(attention_mask):

FILE: verl/utils/tracking.py
  class Tracking (line 24) | class Tracking(object):
    method __init__ (line 27) | def __init__(self, project_name, experiment_name, default_backend: Uni...
    method log (line 59) | def log(self, data, step, backend=None):
  class _MlflowLoggingAdapter (line 65) | class _MlflowLoggingAdapter:
    method log (line 67) | def log(self, data, step):
  function _compute_mlflow_params_from_objects (line 72) | def _compute_mlflow_params_from_objects(params) -> Dict[str, Any]:
  function _transform_params_to_json_serializable (line 79) | def _transform_params_to_json_serializable(x, convert_list_to_dict: bool):
  function _flatten_dict (line 99) | def _flatten_dict(raw: Dict[str, Any], *, sep: str) -> Dict[str, Any]:

FILE: verl/utils/ulysses.py
  function set_ulysses_sequence_parallel_group (line 29) | def set_ulysses_sequence_parallel_group(group: dist.ProcessGroup):
  function get_ulysses_sequence_parallel_group (line 37) | def get_ulysses_sequence_parallel_group() -> Optional[dist.ProcessGroup]:
  function get_ulysses_sequence_parallel_world_size (line 45) | def get_ulysses_sequence_parallel_world_size(group: ProcessGroup = None)...
  function get_ulysses_sequence_parallel_rank (line 53) | def get_ulysses_sequence_parallel_rank(group: ProcessGroup = None) -> int:
  function gather_seq_scatter_heads (line 61) | def gather_seq_scatter_heads(
  function gather_heads_scatter_seq (line 85) | def gather_heads_scatter_seq(x: Tensor, head_dim: int, seq_dim: int, gro...
  function _pad_tensor (line 103) | def _pad_tensor(x: Tensor, dim: int, padding_size: int) -> Tensor:
  function _unpad_tensor (line 110) | def _unpad_tensor(x: Tensor, dim: int, padding_size: int) -> Tensor:
  function slice_input_tensor (line 116) | def slice_input_tensor(x: Tensor, dim: int, padding: bool = True, group:...
  function all_to_all_tensor (line 132) | def all_to_all_tensor(
  function all_gather_tensor (line 154) | def all_gather_tensor(local_tensor: Tensor, group: Optional[dist.Process...
  class SeqAllToAll (line 164) | class SeqAllToAll(torch.autograd.Function):
    method forward (line 167) | def forward(
    method backward (line 182) | def backward(ctx: Any, *grad_output: Tensor) -> Tuple[None, Tensor, No...
  class Gather (line 197) | class Gather(torch.autograd.Function):
    method forward (line 200) | def forward(ctx: Any,
    method backward (line 226) | def backward(ctx: Any, grad_output: Tensor) -> Any:
  function gather_outpus_and_unpad (line 233) | def gather_outpus_and_unpad(x: Tensor,
  function ulysses_pad_and_slice_inputs (line 252) | def ulysses_pad_and_slice_inputs(input_ids_rmpad: torch.Tensor,

FILE: verl/workers/actor/base.py
  class BasePPOActor (line 26) | class BasePPOActor(ABC):
    method __init__ (line 28) | def __init__(self, config):
    method compute_log_prob (line 39) | def compute_log_prob(self, data: DataProto) -> torch.Tensor:
    method update_policy (line 54) | def update_policy(self, data: DataProto) -> Dict:

FILE: verl/workers/actor/dp_actor.py
  class DataParallelPPOActor (line 39) | class DataParallelPPOActor(BasePPOActor):
    method __init__ (line 41) | def __init__(
    method _forward_micro_batch (line 58) | def _forward_micro_batch(self, micro_batch, temperature) -> Tuple[torc...
    method _optimizer_step (line 143) | def _optimizer_step(self):
    method compute_log_prob (line 153) | def compute_log_prob(self, data: DataProto) -> torch.Tensor:
    method update_policy (line 203) | def update_policy(self, data: DataProto):

FILE: verl/workers/actor/megatron_actor.py
  class MegatronPPOActor (line 48) | class MegatronPPOActor(BasePPOActor):
    method __init__ (line 50) | def __init__(self, config, model_config, megatron_config: ModelParalle...
    method compute_log_prob (line 129) | def compute_log_prob(self, data: DataProto) -> torch.Tensor:
    method make_minibatch_iterator (line 190) | def make_minibatch_iterator(self, data: DataProto) -> Iterable[DataPro...
    method forward_backward_batch (line 218) | def forward_backward_batch(self, data: DataProto, forward_only=False, ...
    method update_policy (line 329) | def update_policy(self, dataloader: Iterable[DataProto]) -> Dict:

FILE: verl/workers/critic/base.py
  class BasePPOCritic (line 26) | class BasePPOCritic(ABC):
    method __init__ (line 28) | def __init__(self, config):
    method compute_values (line 33) | def compute_values(self, data: DataProto) -> torch.Tensor:
    method update_critic (line 38) | def update_critic(self, data: DataProto):

FILE: verl/workers/critic/dp_critic.py
  class DataParallelPPOCritic (line 39) | class DataParallelPPOCritic(BasePPOCritic):
    method __init__ (line 41) | def __init__(self, config, critic_module: nn.Module, critic_optimizer:...
    method _forward_micro_batch (line 53) | def _forward_micro_batch(self, micro_batch):
    method _optimizer_step (line 103) | def _optimizer_step(self):
    method compute_values (line 113) | def compute_values(self, data: DataProto) -> torch.Tensor:
    method update_critic (line 146) | def update_critic(self, data: DataProto):

FILE: verl/workers/critic/megatron_critic.py
  class MegatronPPOCritic (line 41) | class MegatronPPOCritic(BasePPOCritic):
    method __init__ (line 43) | def __init__(self, config, model_config, megatron_config, critic_modul...
    method compute_values (line 77) | def compute_values(self, data: DataProto) -> DataProto:
    method make_minibatch_iterator (line 106) | def make_minibatch_iterator(self, data: DataProto) -> Iterable[DataPro...
    method forward_backward_batch (line 113) | def forward_backward_batch(self, data: DataProto, forward_only=False):
    method update_critic (line 204) | def update_critic(self, dataloader: Iterable[DataProto]):

FILE: verl/workers/fsdp_workers.py
  class ActorRolloutRefWorker (line 47) | class ActorRolloutRefWorker(Worker):
    method __init__ (line 53) | def __init__(self, config: DictConfig, role: str):
    method _build_model_optimizer (line 111) | def _build_model_optimizer(self,
    method _build_rollout (line 250) | def _build_rollout(self):
    method init_model (line 285) | def init_model(self):
    method update_actor (line 356) | def update_actor(self, data: DataProto):
    method generate_sequences (line 401) | def generate_sequences(self, prompts: DataProto):
    method compute_ref_log_prob (line 449) | def compute_ref_log_prob(self, data: DataProto):
    method save_checkpoint (line 478) | def save_checkpoint(self, local_path, hdfs_path=None):
  class CriticWorker (line 507) | class CriticWorker(Worker):
    method __init__ (line 509) | def __init__(self, config):
    method _build_critic_model_optimizer (line 540) | def _build_critic_model_optimizer(self, config):
    method init_model (line 652) | def init_model(self):
    method compute_values (line 674) | def compute_values(self, data: DataProto):
    method update_critic (line 699) | def update_critic(self, data: DataProto):
    method save_checkpoint (line 736) | def save_checkpoint(self, local_path, hdfs_path=None):
  class RewardModelWorker (line 765) | class RewardModelWorker(Worker):
    method __init__ (line 770) | def __init__(self, config):
    method _build_model (line 793) | def _build_model(self, config):
    method init_model (line 851) | def init_model(self):
    method _forward_micro_batch (line 857) | def _forward_micro_batch(self, micro_batch):
    method _expand_to_token_level (line 911) | def _expand_to_token_level(self, data: DataProto, scores: torch.Tensor):
    method _switch_chat_template (line 926) | def _switch_chat_template(self, data: DataProto):
    method compute_rm_score (line 984) | def compute_rm_score(self, data: DataProto):

FILE: verl/workers/megatron_workers.py
  function set_random_seed (line 47) | def set_random_seed(seed):
  class ActorRolloutRefWorker (line 63) | class ActorRolloutRefWorker(MegatronWorker):
    method __init__ (line 69) | def __init__(self, config: DictConfig, role: str):
    method _build_model_optimizer (line 124) | def _build_model_optimizer(self,
    method _build_rollout (line 216) | def _build_rollout(self):
    method init_model (line 261) | def init_model(self):
    method update_actor (line 326) | def update_actor(self, data: DataProto):
    method generate_sequences (line 345) | def generate_sequences(self, prompts: DataProto):
    method compute_ref_log_prob (line 376) | def compute_ref_log_prob(self, data: DataProto):
    method load_checkpoint (line 395) | def load_checkpoint(self, checkpoint_path):
    method load_pretrained_model (line 399) | def load_pretrained_model(self, checkpoint_path):
    method save_checkpoint (line 403) | def save_checkpoint(self, checkpoint_path):
  class CriticWorker (line 408) | class CriticWorker(MegatronWorker):
    method __init__ (line 410) | def __init__(self, config):
    method _build_critic_model_optimizer (line 446) | def _build_critic_model_optimizer(self,
    method init_model (line 514) | def init_model(self):
    method compute_values (line 551) | def compute_values(self, data: DataProto):
    method update_critic (line 559) | def update_critic(self, data: DataProto):
    method load_checkpoint (line 568) | def load_checkpoint(self, checkpoint_path):
    method save_checkpoint (line 572) | def save_checkpoint(self, checkpoint_path):
  class RewardModelWorker (line 576) | class RewardModelWorker(MegatronWorker):
    method __init__ (line 581) | def __init__(self, config):
    method _build_rm_model (line 614) | def _build_rm_model(self, model_path, megatron_config: ModelParallelCo...
    method init_model (line 672) | def init_model(self):
    method compute_rm_score (line 723) | def compute_rm_score(self, data: DataProto):

FILE: verl/workers/reward_model/base.py
  class BasePPORewardModel (line 23) | class BasePPORewardModel(ABC):
    method __init__ (line 25) | def __init__(self, config):
    method compute_reward (line 29) | def compute_reward(self, data: DataProto) -> DataProto:

FILE: verl/workers/reward_model/megatron/reward_model.py
  class MegatronRewardModel (line 37) | class MegatronRewardModel(BasePPORewardModel):
    method __init__ (line 39) | def __init__(self,
    method re_encode_by_rm_tokenizer (line 58) | def re_encode_by_rm_tokenizer(self, data: DataProto) -> DataProto:
    method compute_reward (line 123) | def compute_reward(self, data: DataProto) -> DataProto:
    method forward_batch (line 185) | def forward_batch(self, data: DataProto):
    method offload_params_to_cpu (line 262) | def offload_params_to_cpu(self):
    method load_params_to_cuda (line 270) | def load_params_to_cuda(self):

FILE: verl/workers/rollout/base.py
  class BaseRollout (line 23) | class BaseRollout(ABC):
    method __init__ (line 25) | def __init__(self):
    method generate_sequences (line 35) | def generate_sequences(self, prompts: DataProto) -> DataProto:

FILE: verl/workers/rollout/hf_rollout.py
  class HFRollout (line 35) | class HFRollout(BaseRollout):
    method __init__ (line 37) | def __init__(self, module: nn.Module, config):
    method generate_sequences (line 42) | def generate_sequences(self, prompts: DataProto) -> DataProto:
    method _generate_minibatch (line 51) | def _generate_minibatch(self, prompts: DataProto) -> DataProto:

FILE: verl/workers/rollout/naive/naive_rollout.py
  class NaiveRollout (line 36) | class NaiveRollout(BaseRollout):
    method __init__ (line 38) | def __init__(self, module: nn.Module, config):
    method generate_sequences (line 52) | def generate_sequences(self, prompts: DataProto) -> DataProto:

FILE: verl/workers/rollout/tokenizer.py
  class HybridEngineBaseTokenizer (line 23) | class HybridEngineBaseTokenizer(ABC):
    method vocab_size (line 28) | def vocab_size(self):
    method pad_token_id (line 36) | def pad_token_id(self):
    method eos_token_id (line 44) | def eos_token_id(self):
    method all_special_ids (line 53) | def all_special_ids(self) -> List[int]:
    method all_special_tokens (line 61) | def all_special_tokens(self) -> List[str]:
    method encode (line 70) | def encode(self, text):
    method decode (line 86) | def decode(
    method convert_ids_to_tokens (line 116) | def convert_ids_to_tokens(self,
    method get_added_vocab (line 135) | def get_added_vocab(self) -> Dict[str, int]:
    method convert_tokens_to_string (line 147) | def convert_tokens_to_string(self, tokens: List[str]) -> str:
    method is_fast (line 161) | def is_fast(self):

FILE: verl/workers/rollout/vllm_rollout/vllm_rollout.py
  function _pre_process_inputs (line 49) | def _pre_process_inputs(pad_token_id, prompt_token_ids: torch.Tensor) ->...
  class vLLMRollout (line 57) | class vLLMRollout(BaseRollout):
    method __init__ (line 59) | def __init__(self, actor_module: nn.Module, config: DictConfig, tokeni...
    method update_sampling_params (line 126) | def update_sampling_params(self, **kwargs):
    method generate_sequences (line 142) | def generate_sequences(self, prompts: DataProto, **kwargs) -> DataProto:

FILE: verl/workers/sharding_manager/base.py
  class BaseShardingManager (line 21) | class BaseShardingManager:
    method __enter__ (line 23) | def __enter__(self):
    method __exit__ (line 26) | def __exit__(self, exc_type, exc_value, traceback):
    method preprocess_data (line 29) | def preprocess_data(self, data: DataProto) -> DataProto:
    method postprocess_data (line 32) | def postprocess_data(self, data: DataProto) -> DataProto:

FILE: verl/workers/sharding_manager/fsdp_ulysses.py
  class FSDPUlyssesShardingManager (line 33) | class FSDPUlyssesShardingManager(BaseShardingManager):
    method __init__ (line 38) | def __init__(self, device_mesh: DeviceMesh):
    method __enter__ (line 43) | def __enter__(self):
    method __exit__ (line 51) | def __exit__(self, exc_type, exc_value, traceback):
    method preprocess_data (line 58) | def preprocess_data(self, data: DataProto) -> DataProto:
    method postprocess_data (line 80) | def postprocess_data(self, data: DataProto) -> DataProto:

FILE: verl/workers/sharding_manager/fsdp_vllm.py
  class FSDPVLLMShardingManager (line 34) | class FSDPVLLMShardingManager(BaseShardingManager):
    method __init__ (line 36) | def __init__(self,
    method __enter__ (line 69) | def __enter__(self):
    method __exit__ (line 93) | def __exit__(self, exc_type, exc_value, traceback):
    method preprocess_data (line 112) | def preprocess_data(self, data: DataProto) -> DataProto:
    method postprocess_data (line 121) | def postprocess_data(self, data: DataProto) -> DataProto:

FILE: verl/workers/sharding_manager/megatron_vllm.py
  class AllGatherPPModel (line 35) | class AllGatherPPModel:
    method __init__ (line 37) | def __init__(self, model_provider) -> None:
    method _build_param_buffer (line 82) | def _build_param_buffer(self, pp_rank):
    method _build_param_references (line 88) | def _build_param_references(self, pp_rank, maintain_weight=False):
    method _load_params_to_cuda (line 92) | def _load_params_to_cuda(self, pp_rank, to_empty=False):
    method _offload_params_to_cpu (line 102) | def _offload_params_to_cpu(self, pp_rank, to_empty=False):
    method load_params_to_cuda (line 112) | def load_params_to_cuda(self, to_empty=False):
    method allgather_params (line 118) | def allgather_params(self):
    method forward (line 127) | def forward(self, *inputs, **kwargs):
    method __call__ (line 146) | def __call__(self, *inputs, **kwargs):
    method eval (line 149) | def eval(self):
    method train (line 153) | def train(self):
    method offload_params_to_cpu (line 157) | def offload_params_to_cpu(self, to_empty=False):
    method get_all_params (line 163) | def get_all_params(self):
    method update_this_rank_models (line 186) | def update_this_rank_models(self, new_models):
    method this_rank_models (line 191) | def this_rank_models(self):
    method pp_size (line 195) | def pp_size(self):
    method pp_rank (line 199) | def pp_rank(self):
    method pp_group (line 203) | def pp_group(self):
    method pp_models (line 207) | def pp_models(self):
  class MegatronVLLMShardingManager (line 238) | class MegatronVLLMShardingManager(BaseShardingManager):
    method __init__ (line 240) | def __init__(self, module: AllGatherPPModel, inference_engine: LLM, mo...
    method default_tp_concat_fn (line 267) | def default_tp_concat_fn(self, name, param, infer_params, model_config):
    method _post_process_params (line 318) | def _post_process_params(self, params):
    method __enter__ (line 345) | def __enter__(self):
    method __exit__ (line 360) | def __exit__(self, exc_type, exc_value, traceback):
    method preprocess_data (line 378) | def preprocess_data(self, data: DataProto) -> DataProto:
    method postprocess_data (line 394) | def postprocess_data(self, data: DataProto) -> DataProto:
  function get_micro_data_parallel_group (line 418) | def get_micro_data_parallel_group():
  function get_micro_data_parallel_world_size (line 423) | def get_micro_data_parallel_world_size():
  function get_micro_data_parallel_rank (line 427) | def get_micro_data_parallel_rank():
Condensed preview — 308 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (2,049K chars).
[
  {
    "path": ".github/workflows/dataset.yml",
    "chars": 949,
    "preview": "name: dataset\n\non:\n  # Trigger the workflow on push or pull request,\n  # but only for the main branch\n  push:\n    branch"
  },
  {
    "path": ".github/workflows/e2e_digit_completion.yml",
    "chars": 1166,
    "preview": "name: e2e_digit_completion\n\non:\n  # Trigger the workflow on push or pull request,\n  # but only for the main branch\n  pus"
  },
  {
    "path": ".github/workflows/e2e_gsm8k.yml",
    "chars": 2132,
    "preview": "name: e2e_gsm8k\n\non:\n  # Trigger the workflow on push or pull request,\n  # but only for the main branch\n  push:\n    bran"
  },
  {
    "path": ".github/workflows/model.yml",
    "chars": 1338,
    "preview": "name: model_rmpad\n\non:\n  # Trigger the workflow on push or pull request,\n  # but only for the main branch\n  push:\n    br"
  },
  {
    "path": ".github/workflows/ray_test.yml",
    "chars": 1140,
    "preview": "name: ray\n\non:\n  # Trigger the workflow on push or pull request,\n  # but only for the main branch\n  push:\n    branches:\n"
  },
  {
    "path": ".github/workflows/sanity.yml",
    "chars": 1085,
    "preview": "name: sanity\n\non:\n  # Trigger the workflow on push or pull request,\n  # but only for the main branch\n  push:\n    branche"
  },
  {
    "path": ".github/workflows/vllm.yml",
    "chars": 1148,
    "preview": "name: vllm\n\non:\n  # Trigger the workflow on push or pull request,\n  # but only for the main branch\n  push:\n    branches:"
  },
  {
    "path": ".github/workflows/yapf_format.yml",
    "chars": 1271,
    "preview": "name: yapf\n\non:\n  # Trigger the workflow on push or pull request,\n  # but only for the main branch\n  push:\n    branches:"
  },
  {
    "path": ".gitignore",
    "chars": 1305,
    "preview": "**/*.pt\n**/checkpoints\n**/wget-log\n**/_build/\n**/*.ckpt\n**/outputs\n**/*.tar.gz\n**/playground\n**/wandb\n\n# Byte-compiled /"
  },
  {
    "path": ".readthedocs.yaml",
    "chars": 282,
    "preview": "# Read the Docs configuration file\n# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details\n\nversion:"
  },
  {
    "path": ".style.yapf",
    "chars": 111,
    "preview": "[style]\nbased_on_style = google\ncolumn_limit = 120\nindent_width = 4\nsplit_arguments_when_comma_terminated: true"
  },
  {
    "path": "LICENSE",
    "chars": 11358,
    "preview": "\n                                 Apache License\n                           Version 2.0, January 2004\n                  "
  },
  {
    "path": "Notice.txt",
    "chars": 57,
    "preview": "Copyright 2023-2024 Bytedance Ltd. and/or its affiliates "
  },
  {
    "path": "OLD_README.md",
    "chars": 6481,
    "preview": "<h1 style=\"text-align: center;\">veRL: Volcano Engine Reinforcement Learning for LLM</h1>\n\nveRL is a flexible, efficient "
  },
  {
    "path": "README.md",
    "chars": 3532,
    "preview": "# TinyZero\n\n> **⚠️ Deprecation Notice:** This repo is no longer actively maintained. For running RL experiments, please "
  },
  {
    "path": "docker/Dockerfile.ngc.vllm",
    "chars": 1475,
    "preview": "FROM nvcr.io/nvidia/pytorch:24.05-py3\n\n# uninstall nv-pytorch fork\nRUN pip3 uninstall pytorch-quantization \\\n     pytorc"
  },
  {
    "path": "docker/Dockerfile.vemlp.vllm.te",
    "chars": 1818,
    "preview": "# docker buildx build --platform linux/x86_64 -t \"verlai/verl:$TAG\" -f docker/$FILE .\n\n# the one in docker.io is an alia"
  },
  {
    "path": "docs/Makefile",
    "chars": 602,
    "preview": "# Minimal makefile for Sphinx documentation\n#\n\n# You can set these variables from the command line.\nSPHINXOPTS    =\nSPHI"
  },
  {
    "path": "docs/README.md",
    "chars": 281,
    "preview": "# veRL documents\n\n## Build the docs\n\n```bash\n# Install dependencies.\npip install -r requirements-docs.txt\n\n# Build the d"
  },
  {
    "path": "docs/advance/dpo_extension.rst",
    "chars": 9681,
    "preview": "Extend to other RL(HF) algorithms\n=================================\n\nWe already implemented the complete training pipeli"
  },
  {
    "path": "docs/advance/fsdp_extension.rst",
    "chars": 4400,
    "preview": "\nAdd models with the FSDP backend\n==================================\n\nModel\n--------------------------\n\nIn principle, ou"
  },
  {
    "path": "docs/advance/megatron_extension.rst",
    "chars": 1688,
    "preview": "Add models with the Megatron-LM backend\n=========================================\n\nModel\n-----------\n\nThe most challengi"
  },
  {
    "path": "docs/advance/placement.rst",
    "chars": 429,
    "preview": "Ray API Design Tutorial\n=======================================\n\nWe provide a tutorial for our Ray API design, including"
  },
  {
    "path": "docs/conf.py",
    "chars": 3137,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\r\n#\r\n# Licensed under the Apache License, Version 2.0 (the \"License"
  },
  {
    "path": "docs/examples/config.rst",
    "chars": 12465,
    "preview": ".. _config-explain-page:\n\nConfig Explaination\n===================\n\nppo_trainer.yaml for FSDP Backend\n-------------------"
  },
  {
    "path": "docs/examples/gsm8k_example.rst",
    "chars": 5987,
    "preview": "GSM8K Example\n=============\n\nIntroduction\n------------\n\nIn this example, we train an LLM to tackle the GSM8k task.\n\nPape"
  },
  {
    "path": "docs/examples/ppo_code_architecture.rst",
    "chars": 9045,
    "preview": "PPO Example Architecture\n========================\n\nLet's start with the Proximal Policy Optimization algorithm, which is"
  },
  {
    "path": "docs/experiment/ppo.rst",
    "chars": 3029,
    "preview": ".. _algo-baseline-page:\n\nAlgorithm Baselines\n===================\n\nGSM8k \n------------------\n\nAssuming GSM8k dataset is p"
  },
  {
    "path": "docs/faq/faq.rst",
    "chars": 799,
    "preview": "Frequently Asked Questions\n====================================\n\nRay related\n------------\n\nHow to add breakpoint for deb"
  },
  {
    "path": "docs/index.rst",
    "chars": 3289,
    "preview": "Welcome to veRL's documentation!\n================================================\n\n.. _hf_arxiv: https://arxiv.org/pdf/2"
  },
  {
    "path": "docs/preparation/prepare_data.rst",
    "chars": 4336,
    "preview": "Prepare Data (Parquet) for Post-Training\n========================================\n\nBefore starting the post-training job"
  },
  {
    "path": "docs/preparation/reward_function.rst",
    "chars": 2606,
    "preview": "Implement Reward Function for Dataset\n======================================\n\nFor each dataset, we need to implement a r"
  },
  {
    "path": "docs/requirements-docs.txt",
    "chars": 143,
    "preview": "# markdown suport\r\nrecommonmark\r\n# markdown table suport\r\nsphinx-markdown-tables\r\n\r\n# theme default rtd\r\n\r\n# crate-docs-"
  },
  {
    "path": "docs/start/install.rst",
    "chars": 4914,
    "preview": "Installation\n============\n\nRequirements\n------------\n\n- **Python**: Version >= 3.9\n- **CUDA**: Version >= 12.1\n\nveRL sup"
  },
  {
    "path": "docs/start/quickstart.rst",
    "chars": 7900,
    "preview": ".. _quickstart:\n\n=========================================================\nQuickstart: Post-train a LLM using PPO with G"
  },
  {
    "path": "docs/workers/fsdp_workers.rst",
    "chars": 4167,
    "preview": "PyTorch FSDP Backend\n======================\n\nWe support PyTorch FSDP Backend by implementing various workers for\nactor, "
  },
  {
    "path": "docs/workers/megatron_workers.rst",
    "chars": 7478,
    "preview": "Megatron-LM Backend\n=====================\n\nWe support Megatron Backend by implementing various workers for actor,\ncritic"
  },
  {
    "path": "docs/workers/ray_trainer.rst",
    "chars": 12037,
    "preview": "PPO Ray Trainer\n===============\n\nWe implement the RayPPOTrainer, which is a trainer runs on the driver\nprocess on a sing"
  },
  {
    "path": "examples/data_preprocess/arth.py",
    "chars": 5561,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "examples/data_preprocess/countdown.py",
    "chars": 5381,
    "preview": "\"\"\"\nPreprocess dataset for countdown task - given a target number and N numbers, generate equations to reach target\n\"\"\"\n"
  },
  {
    "path": "examples/data_preprocess/full_hh_rlhf.py",
    "chars": 4962,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "examples/data_preprocess/gsm8k.py",
    "chars": 2986,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "examples/data_preprocess/hellaswag.py",
    "chars": 3358,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "examples/data_preprocess/math_dataset.py",
    "chars": 2776,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "examples/data_preprocess/multiply.py",
    "chars": 5172,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "examples/generation/run_deepseek_v2_lite_math.sh",
    "chars": 583,
    "preview": "python3 -m verl.trainer.main_generation \\\n    trainer.nnodes=1 \\\n    trainer.n_gpus_per_node=8 \\\n    data.path=~/data/rl"
  },
  {
    "path": "examples/grpo_trainer/run_deepseek7b_llm.sh",
    "chars": 1732,
    "preview": "set -x\n\npython3 -m verl.trainer.main_ppo \\\n    algorithm.adv_estimator=grpo \\\n    data.train_files=$HOME/data/gsm8k/trai"
  },
  {
    "path": "examples/grpo_trainer/run_deepseek7b_llm_seq_balance.sh",
    "chars": 1681,
    "preview": "set -x\n\npython3 -m verl.trainer.main_ppo \\\n    algorithm.adv_estimator=grpo \\\n    data.train_files=$HOME/data/gsm8k/trai"
  },
  {
    "path": "examples/grpo_trainer/run_qwen2-7b.sh",
    "chars": 1755,
    "preview": "set -x\n\nexport VLLM_ATTENTION_BACKEND=XFORMERS\n\npython3 -m verl.trainer.main_ppo \\\n    algorithm.adv_estimator=grpo \\\n  "
  },
  {
    "path": "examples/grpo_trainer/run_qwen2-7b_seq_balance.sh",
    "chars": 1738,
    "preview": "set -x\n\nexport VLLM_ATTENTION_BACKEND=XFORMERS\n\npython3 -m verl.trainer.main_ppo \\\n    algorithm.adv_estimator=grpo \\\n  "
  },
  {
    "path": "examples/ppo_trainer/run_deepseek7b_llm.sh",
    "chars": 1789,
    "preview": "set -x\n\npython3 -m verl.trainer.main_ppo \\\n    data.train_files=$HOME/data/gsm8k/train.parquet \\\n    data.val_files=$HOM"
  },
  {
    "path": "examples/ppo_trainer/run_deepseek7b_llm_sp2.sh",
    "chars": 1994,
    "preview": "set -x\n\npython3 -m verl.trainer.main_ppo \\\n    data.train_files=$HOME/data/gsm8k/train.parquet \\\n    data.val_files=$HOM"
  },
  {
    "path": "examples/ppo_trainer/run_deepseek_full_hh_rlhf.sh",
    "chars": 1744,
    "preview": "set -x\n\ntrain_files=$HOME/data/full_hh_rlhf/rl/train.parquet\ntest_files=$HOME/data/full_hh_rlhf/rl/train.parquet # no us"
  },
  {
    "path": "examples/ppo_trainer/run_deepseek_math_gsm8k_megatron.sh",
    "chars": 1667,
    "preview": "set -x\n\ngsm8k_train_path=$HOME/data/gsm8k/train.parquet\ngsm8k_test_path=$HOME/data/gsm8k/test.parquet\nmath_train_path=$H"
  },
  {
    "path": "examples/ppo_trainer/run_deepseek_megatron.sh",
    "chars": 1410,
    "preview": "set -x\n\npython3 -m verl.trainer.main_ppo --config-path=./config --config-name='ppo_megatron_trainer'\\\n    data.train_fil"
  },
  {
    "path": "examples/ppo_trainer/run_gemma.sh",
    "chars": 1772,
    "preview": "set -x\n\npython3 -m verl.trainer.main_ppo \\\n    data.train_files=$HOME/data/gsm8k/train.parquet \\\n    data.val_files=$HOM"
  },
  {
    "path": "examples/ppo_trainer/run_qwen2-7b.sh",
    "chars": 2054,
    "preview": "set -x\n\ngsm8k_train_path=$HOME/data/gsm8k/train.parquet\ngsm8k_test_path=$HOME/data/gsm8k/test.parquet\nmath_train_path=$H"
  },
  {
    "path": "examples/ppo_trainer/run_qwen2-7b_rm.sh",
    "chars": 2588,
    "preview": "set -x\n# Discliamer: the model used in the script is only for academic example,\ngsm8k_train_path=$HOME/data/gsm8k/train."
  },
  {
    "path": "examples/ppo_trainer/run_qwen2-7b_rm_seq_balance.sh",
    "chars": 2656,
    "preview": "set -x\n\ngsm8k_train_path=$HOME/data/gsm8k/train.parquet\ngsm8k_test_path=$HOME/data/gsm8k/test.parquet\nmath_train_path=$H"
  },
  {
    "path": "examples/ppo_trainer/run_qwen2-7b_seq_balance.sh",
    "chars": 2262,
    "preview": "set -x\n\ngsm8k_train_path=$HOME/data/gsm8k/train.parquet\ngsm8k_test_path=$HOME/data/gsm8k/test.parquet\nmath_train_path=$H"
  },
  {
    "path": "examples/ppo_trainer/run_qwen2.5-32b.sh",
    "chars": 2133,
    "preview": "set -x\n\ngsm8k_train_path=$HOME/data/gsm8k/train.parquet\ngsm8k_test_path=$HOME/data/gsm8k/test.parquet\nmath_train_path=$H"
  },
  {
    "path": "examples/ppo_trainer/verl_getting_started.ipynb",
    "chars": 253713,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"eXkR4NjYhezg\"\n   },\n   \"source\": [\n    \"# Run "
  },
  {
    "path": "examples/ray/tutorial.ipynb",
    "chars": 31455,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"0ddc582b\",\n   \"metadata\": {},\n   \"source\": [\n    \"# VeRL Ray API"
  },
  {
    "path": "examples/sft/gsm8k/run_deepseek_6b7.sh",
    "chars": 729,
    "preview": "set -x\n\nhdfs_path=hdfs://user/verl/experiments/gsm8k/deepseek-coder-6.7b-instruct/ # replace to your own hdfs/local path"
  },
  {
    "path": "examples/sft/gsm8k/run_gemma_2b.sh",
    "chars": 924,
    "preview": "# Tested with 2 & 4 GPUs\n\nset -x\n\nif [ \"$#\" -lt 2 ]; then\n    echo \"Usage: run_gemma_2b.sh <nproc_per_node> <save_path> "
  },
  {
    "path": "examples/sft/gsm8k/run_gemma_7b.sh",
    "chars": 685,
    "preview": "set -x\n\nhdfs_path=hdfs://user/verl/experiments/gsm8k/gemma-1.1-7b-it/ # replace to your own hdfs/local path\n\nnproc_per_n"
  },
  {
    "path": "examples/split_placement/README.md",
    "chars": 2686,
    "preview": "# Split Placement Example\nHere we introduce how to run the naive implementation of the split placement of PPO algorithm."
  },
  {
    "path": "examples/split_placement/config/ppo_trainer_split.yaml",
    "chars": 3956,
    "preview": "data:\n  tokenizer: null\n  train_files: ~/data/rlhf/gsm8k/train.parquet\n  val_files: ~/data/rlhf/gsm8k/test.parquet\n  pro"
  },
  {
    "path": "examples/split_placement/main_ppo_split.py",
    "chars": 7775,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "examples/split_placement/run_deepseek7b_llm.sh",
    "chars": 1683,
    "preview": "set -x\n\npython3 main_ppo_split.py \\\n    data.train_files=$HOME/data/gsm8k/train.parquet \\\n    data.val_files=$HOME/data/"
  },
  {
    "path": "examples/split_placement/split_monkey_patch.py",
    "chars": 7737,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "patches/megatron_v4.patch",
    "chars": 23947,
    "preview": "diff --git a/.gitignore b/.gitignore\nindex 5955b349..ade0cd51 100644\n--- a/.gitignore\n+++ b/.gitignore\n@@ -7,3 +7,5 @@ b"
  },
  {
    "path": "pyproject.toml",
    "chars": 2219,
    "preview": "# -------------------------------\n# build-system\n# -------------------------------\n[build-system]\nrequires = [\n    \"setu"
  },
  {
    "path": "requirements.txt",
    "chars": 135,
    "preview": "accelerate\ncodetiming\ndatasets\ndill\nflash-attn\nhydra-core\nnumpy\npandas\npybind11\nray\ntensordict<0.6\ntransformers<4.48\nvll"
  },
  {
    "path": "scripts/format.sh",
    "chars": 112,
    "preview": "#!/bin/bash\npip3 install --upgrade yapf\nyapf -ir -vv --style ./.style.yapf verl tests single_controller examples"
  },
  {
    "path": "scripts/train_tiny_zero.sh",
    "chars": 1186,
    "preview": "python3 -m verl.trainer.main_ppo \\\ndata.train_files=$DATA_DIR/train.parquet \\\ndata.val_files=$DATA_DIR/test.parquet \\\nda"
  },
  {
    "path": "setup.py",
    "chars": 1917,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/__init__.py",
    "chars": 599,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/e2e/__init__.py",
    "chars": 600,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/e2e/arithmetic_sequence/data/create_dataset.py",
    "chars": 1774,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/e2e/arithmetic_sequence/model/config.json",
    "chars": 674,
    "preview": "{\n  \"architectures\": [\n    \"LlamaForCausalLM\"\n  ],\n  \"attention_bias\": false,\n  \"attention_dropout\": 0.0,\n  \"bos_token_i"
  },
  {
    "path": "tests/e2e/arithmetic_sequence/model/create_model_tokenizer.py",
    "chars": 2608,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/e2e/arithmetic_sequence/model/generation_config.json",
    "chars": 111,
    "preview": "{\n  \"_from_model_config\": true,\n  \"eos_token_id\": 1,\n  \"pad_token_id\": 2,\n  \"transformers_version\": \"4.43.3\"\n}\n"
  },
  {
    "path": "tests/e2e/arithmetic_sequence/model/tokenizer_config.json",
    "chars": 645,
    "preview": "{\n    \"char_ords\": [\n        48,\n        49,\n        50,\n        51,\n        52,\n        53,\n        54,\n        55,\n   "
  },
  {
    "path": "tests/e2e/arithmetic_sequence/rl/README.md",
    "chars": 1299,
    "preview": "# Digit completion\n\nThis is an example of solving a digit completion problem. The problem is defined as below:\n\nThe prom"
  },
  {
    "path": "tests/e2e/arithmetic_sequence/rl/config/ray_trainer.yaml",
    "chars": 4922,
    "preview": "data:\n  tokenizer: null \n  train_files: ~/verl/tests/e2e/arithmetic_sequence/data/train.parquet\n  val_files: ~/verl/test"
  },
  {
    "path": "tests/e2e/arithmetic_sequence/rl/main_trainer.py",
    "chars": 6040,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/e2e/check_results.py",
    "chars": 1651,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/e2e/envs/__init__.py",
    "chars": 677,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/e2e/envs/digit_completion/__init__.py",
    "chars": 905,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/e2e/envs/digit_completion/task.py",
    "chars": 6354,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/e2e/envs/digit_completion/tokenizer.py",
    "chars": 5702,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/e2e/run_qwen_gsm8k_function_rm.sh",
    "chars": 1794,
    "preview": "set -x\n\nexport VLLM_ATTENTION_BACKEND=XFORMERS\n\npython3 -m verl.trainer.main_ppo \\\n    data.train_files=$HOME/data/gsm8k"
  },
  {
    "path": "tests/e2e/run_qwen_gsm8k_function_rm_no_rmpad.sh",
    "chars": 1834,
    "preview": "set -x\n\nexport VLLM_ATTENTION_BACKEND=XFORMERS\n\npython3 -m verl.trainer.main_ppo \\\n    data.train_files=$HOME/data/gsm8k"
  },
  {
    "path": "tests/e2e/run_qwen_gsm8k_model_rm.sh",
    "chars": 2190,
    "preview": "set -x\n\nexport VLLM_ATTENTION_BACKEND=XFORMERS\n\npython3 -m verl.trainer.main_ppo \\\n    data.train_files=$HOME/data/gsm8k"
  },
  {
    "path": "tests/e2e/run_qwen_gsm8k_model_rm_no_rmpad.sh",
    "chars": 2193,
    "preview": "set -x\n\nexport VLLM_ATTENTION_BACKEND=XFORMERS\n\npython3 -m verl.trainer.main_ppo \\\n    data.train_files=$HOME/data/gsm8k"
  },
  {
    "path": "tests/e2e/run_qwen_gsm8k_model_rm_seq_balance.sh",
    "chars": 2565,
    "preview": "set -x\n\nexport VLLM_ATTENTION_BACKEND=XFORMERS\n\npython3 -m verl.trainer.main_ppo \\\n    data.train_files=$HOME/data/gsm8k"
  },
  {
    "path": "tests/e2e/run_qwen_gsm8k_model_rm_ulysses.sh",
    "chars": 2402,
    "preview": "set -x\n\nexport VLLM_ATTENTION_BACKEND=XFORMERS # vllm + qwen2 with flash_attn has some issues\n\npython3 -m verl.trainer.m"
  },
  {
    "path": "tests/e2e/run_ray_trainer.sh",
    "chars": 559,
    "preview": "#!/usr/bin/env bash\n\nset -e -x\n\nOUTPUT_FILE=\"/tmp/output_ray_trainer.txt\"\n\nexport PATH=$PATH:~/.local/bin\n\nrm -rf $OUTPU"
  },
  {
    "path": "tests/e2e/run_ray_trainer_rmpad.sh",
    "chars": 598,
    "preview": "#!/usr/bin/env bash\n\nset -e -x\n\npython3 tests/e2e/arithmetic_sequence/rl/main_trainer.py \\\n    data.train_files=tests/e2"
  },
  {
    "path": "tests/gpu_utility/test_memory_buffers.py",
    "chars": 2641,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/gpu_utility/test_ops.py",
    "chars": 1765,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/gpu_utility/test_torch_functional.py",
    "chars": 3521,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/model/test_transformer.py",
    "chars": 7485,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/model/test_transformers_ulysses.py",
    "chars": 10699,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/ray/check_worker_alive/main.py",
    "chars": 1852,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/ray/detached_worker/README.md",
    "chars": 242,
    "preview": "# Detached Worker\n## How to run (Only on a single node)\n- Start a local ray cluster: \n```bash\nray start --head --port=63"
  },
  {
    "path": "tests/ray/detached_worker/client.py",
    "chars": 2077,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/ray/detached_worker/run.sh",
    "chars": 93,
    "preview": "#!/bin/bash\nray start --head --port=6379\npython3 server.py\npython3 client.py\nray stop --force"
  },
  {
    "path": "tests/ray/detached_worker/server.py",
    "chars": 6590,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/ray/test_check_worker_alive.py",
    "chars": 1550,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/ray/test_colocated_workers.py",
    "chars": 2750,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/ray/test_data_transfer.py",
    "chars": 3365,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/ray/test_driverfunc_to_worker.py",
    "chars": 2433,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/ray/test_high_level_scheduling_api.py",
    "chars": 3724,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/ray/test_ray_local_envs.py",
    "chars": 1819,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/ray/test_rvdz.py",
    "chars": 1576,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/ray/test_worker_group_basics.py",
    "chars": 4051,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/ray/test_worker_group_torch.py",
    "chars": 3697,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/rollout/run_fsdp_vllm.py",
    "chars": 5983,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/rollout/test_vllm_hf_loader.py",
    "chars": 6078,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/sanity/check_license.py",
    "chars": 1300,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/sanity/test_import.py",
    "chars": 784,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/utility/test_tensor_dict_utilities.py",
    "chars": 11529,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/verl/utils/dataset/test_rl_dataset.py",
    "chars": 1999,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/verl/utils/dataset/test_rm_dataset.py",
    "chars": 1420,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "tests/verl/utils/dataset/test_sft_dataset.py",
    "chars": 2287,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/__init__.py",
    "chars": 927,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/models/README.md",
    "chars": 1744,
    "preview": "# Models\nCommon modelzoo such as huggingface/transformers stuggles when using Pytorch native model parallelism. Followin"
  },
  {
    "path": "verl/models/__init__.py",
    "chars": 600,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/models/llama/__init__.py",
    "chars": 600,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/models/llama/megatron/__init__.py",
    "chars": 944,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/models/llama/megatron/checkpoint_utils/__init__.py",
    "chars": 600,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/models/llama/megatron/checkpoint_utils/llama_loader.py",
    "chars": 19665,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/models/llama/megatron/checkpoint_utils/llama_saver.py",
    "chars": 18189,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/models/llama/megatron/layers/__init__.py",
    "chars": 838,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/models/llama/megatron/layers/parallel_attention.py",
    "chars": 20129,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2022 EleutherAI and the HuggingFace Inc. team. All rig"
  },
  {
    "path": "verl/models/llama/megatron/layers/parallel_decoder.py",
    "chars": 6036,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2022 EleutherAI and the HuggingFace Inc. team. All rig"
  },
  {
    "path": "verl/models/llama/megatron/layers/parallel_linear.py",
    "chars": 2787,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/models/llama/megatron/layers/parallel_mlp.py",
    "chars": 3394,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2022 EleutherAI and the HuggingFace Inc. team. All rig"
  },
  {
    "path": "verl/models/llama/megatron/layers/parallel_rmsnorm.py",
    "chars": 1860,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/models/llama/megatron/modeling_llama_megatron.py",
    "chars": 29656,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2022 EleutherAI and the HuggingFace Inc. team. All rig"
  },
  {
    "path": "verl/models/registry.py",
    "chars": 2535,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/models/transformers/__init__.py",
    "chars": 600,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/models/transformers/llama.py",
    "chars": 6759,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/models/transformers/monkey_patch.py",
    "chars": 2884,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/models/transformers/qwen2.py",
    "chars": 6144,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/models/weight_loader_registry.py",
    "chars": 1137,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/protocol.py",
    "chars": 24417,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/single_controller/__init__.py",
    "chars": 787,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/single_controller/base/__init__.py",
    "chars": 699,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/single_controller/base/decorator.py",
    "chars": 15531,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/single_controller/base/megatron/__init__.py",
    "chars": 600,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/single_controller/base/megatron/worker.py",
    "chars": 1595,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/single_controller/base/megatron/worker_group.py",
    "chars": 2080,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/single_controller/base/register_center/__init__.py",
    "chars": 600,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/single_controller/base/register_center/ray.py",
    "chars": 939,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/single_controller/base/worker.py",
    "chars": 6416,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/single_controller/base/worker_group.py",
    "chars": 7494,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/single_controller/ray/__init__.py",
    "chars": 778,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/single_controller/ray/base.py",
    "chars": 19136,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/single_controller/ray/megatron.py",
    "chars": 3035,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/single_controller/version/version",
    "chars": 5,
    "preview": "0.0.2"
  },
  {
    "path": "verl/third_party/__init__.py",
    "chars": 600,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/third_party/vllm/__init__.py",
    "chars": 1746,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_3_1/__init__.py",
    "chars": 600,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_3_1/arg_utils.py",
    "chars": 11957,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_3_1/config.py",
    "chars": 25666,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_3_1/llm.py",
    "chars": 13604,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_3_1/llm_engine_sp.py",
    "chars": 35525,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_3_1/model_loader.py",
    "chars": 12265,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_3_1/model_runner.py",
    "chars": 13381,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_3_1/parallel_state.py",
    "chars": 6075,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Adapted from\n# https://github.co"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_3_1/tokenizer.py",
    "chars": 3047,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_3_1/weight_loaders.py",
    "chars": 4215,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_3_1/worker.py",
    "chars": 13214,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_4_2/__init__.py",
    "chars": 600,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_4_2/arg_utils.py",
    "chars": 16002,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_4_2/config.py",
    "chars": 9527,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_4_2/dtensor_weight_loaders.py",
    "chars": 12378,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_4_2/hf_weight_loader.py",
    "chars": 4032,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_4_2/llm.py",
    "chars": 15147,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_4_2/llm_engine_sp.py",
    "chars": 12403,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_4_2/megatron_weight_loaders.py",
    "chars": 15726,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_4_2/model_loader.py",
    "chars": 13600,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_4_2/model_runner.py",
    "chars": 12581,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_4_2/parallel_state.py",
    "chars": 12798,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Adapted from\n# https://github.co"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_4_2/spmd_gpu_executor.py",
    "chars": 8647,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_4_2/tokenizer.py",
    "chars": 3310,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_4_2/worker.py",
    "chars": 13310,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_5_4/__init__.py",
    "chars": 600,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_5_4/arg_utils.py",
    "chars": 23746,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_5_4/config.py",
    "chars": 11948,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_5_4/dtensor_weight_loaders.py",
    "chars": 15827,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_5_4/hf_weight_loader.py",
    "chars": 1895,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_5_4/llm.py",
    "chars": 12271,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_5_4/llm_engine_sp.py",
    "chars": 14689,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py",
    "chars": 13939,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_5_4/model_loader.py",
    "chars": 14974,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  },
  {
    "path": "verl/third_party/vllm/vllm_v_0_5_4/model_runner.py",
    "chars": 6857,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023 The vLLM team.\n# Licensed under the Apache Licens"
  }
]

// ... and 108 more files (download for full content)

About this extraction

This page contains the full source code of the Jiayi-Pan/TinyZero GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 308 files (1.9 MB), approximately 475.5k tokens, and a symbol index with 1406 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!