Full Code of verl-project/verl for AI

main 8e24127f4234 cached

1128 files

8.0 MB

2.2M tokens

4887 symbols

1 requests

Download .txt

Showing preview only (8,643K chars total). Download the full file or copy to clipboard to get everything.

Repository: verl-project/verl
Branch: main
Commit: 8e24127f4234
Files: 1128
Total size: 8.0 MB

Directory structure:
gitextract_5e2u4bw9/

├── .gemini/
│   └── config.yaml
├── .git-blame-ignore-revs
├── .github/
│   ├── CODEOWNERS
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug-report.yml
│   │   ├── config.yml
│   │   └── feature-request.yml
│   ├── PULL_REQUEST_TEMPLATE.md
│   ├── dependabot.yml
│   └── workflows/
│       ├── README.md
│       ├── check-pr-title.yml
│       ├── cpu_unit_tests.yml
│       ├── doc.yml
│       ├── docker-build-ascend-a2.yml
│       ├── docker-build-ascend-a3.yml
│       ├── e2e_ascend.yml
│       ├── e2e_fully_async_policy.yml
│       ├── e2e_fully_async_policy_ascend.yml
│       ├── e2e_one_step_off_policy.yml
│       ├── e2e_one_step_off_policy_ascend.yml
│       ├── e2e_ppo_grpo_trainer_trtllm.yml
│       ├── e2e_ppo_trainer.yml
│       ├── e2e_ppo_trainer_megatron_sglang.yml
│       ├── e2e_ppo_trainer_megatron_sglang_2.yml
│       ├── e2e_ppo_trainer_megatron_vllm.yml
│       ├── e2e_ppo_trainer_megatron_vllm_2.yml
│       ├── e2e_ppo_trainer_megatron_vllm_2_ascend.yml
│       ├── e2e_ppo_trainer_veomni_vllm.yml
│       ├── e2e_sft_llm.yml
│       ├── e2e_sft_llm_ascend.yml
│       ├── e2e_sft_vlm.yml
│       ├── gpu_unit_tests.yml
│       ├── model.yml
│       ├── model_ascend.yml
│       ├── nightly_ascend.yml
│       ├── npu_unit_tests.yml
│       ├── pre-commit.yml
│       ├── precommit-autofix.yml
│       ├── reward_model_sglang.yml
│       ├── reward_model_vllm.yml
│       ├── reward_model_vllm_ascend.yml
│       ├── sanity.yml
│       ├── scorecard.yml
│       ├── secrets_scan.yml
│       ├── sgl.yml
│       ├── type-coverage-check.yml
│       └── vllm.yml
├── .gitignore
├── .gitmodules
├── .pre-commit-config.yaml
├── .readthedocs.yaml
├── CONTRIBUTING.md
├── LICENSE
├── Notice.txt
├── README.md
├── docker/
│   ├── Dockerfile.isaaclab230
│   ├── Dockerfile.stable.sglang
│   ├── Dockerfile.stable.trtllm
│   ├── Dockerfile.stable.vllm
│   ├── README.md
│   ├── ascend/
│   │   ├── Dockerfile.ascend.sglang_8.3.rc1_a2
│   │   ├── Dockerfile.ascend.sglang_8.3.rc1_a3
│   │   ├── Dockerfile.ascend_8.2.rc1_a2
│   │   ├── Dockerfile.ascend_8.2.rc1_a3
│   │   ├── Dockerfile.ascend_8.3.rc1_a2
│   │   ├── Dockerfile.ascend_8.3.rc1_a3
│   │   ├── Dockerfile.ascend_8.5.0_a2
│   │   └── Dockerfile.ascend_8.5.0_a3
│   ├── aws/
│   │   ├── Dockerfile.extention.awsefa
│   │   └── Dockerfile.ngc.vllm0.8.sagemaker
│   ├── rocm/
│   │   ├── Apptainerfile.rocm
│   │   ├── Dockerfile.rocm
│   │   ├── Dockerfile.rocm7
│   │   ├── Dockerfile.rocm_verl-0.3.0.post1
│   │   └── Dockerfile.rocm_verl-0.4.1
│   ├── verl0.4-cu124-torch2.6-fa2.7.4/
│   │   ├── Dockerfile.app.sglang.vllm.mcore0.12
│   │   ├── Dockerfile.app.sglang.vllm.mcore0.12.deepep
│   │   ├── Dockerfile.app.sglang.vllm.mcore0.13.preview
│   │   ├── Dockerfile.app.vllm.mcore0.12
│   │   ├── Dockerfile.app.vllm.mcore0.12.deepep
│   │   ├── Dockerfile.app.vllm.mcore0.13.preview
│   │   ├── Dockerfile.base
│   │   └── README.md
│   ├── verl0.5-cu126-torch2.7-fa2.7.4/
│   │   ├── Dockerfile.app.sglang0.4.10.post2.mcore0.13
│   │   ├── Dockerfile.app.sglang0.4.9.post6.mcore0.13
│   │   ├── Dockerfile.app.vllm.mcore0.13
│   │   ├── Dockerfile.app.vllm.mcore0.15
│   │   ├── Dockerfile.base.torch2.7.1
│   │   └── README.md
│   ├── verl0.5-cu126-torch2.7.1-fa2.8.0/
│   │   ├── Dockerfile.app.sglang.mcore0.12
│   │   ├── Dockerfile.app.sglang.mcore0.13.preview
│   │   ├── Dockerfile.base
│   │   └── README.md
│   ├── verl0.5-preview-cu128-torch2.7.1-fa2.8.0/
│   │   ├── Dockerfile.app.sglang.megatron
│   │   ├── Dockerfile.base
│   │   └── README.md
│   ├── verl0.6-cu128-torch2.8.0-fa2.7.4/
│   │   ├── Dockerfile.app.sglang
│   │   ├── Dockerfile.base
│   │   └── Dockerfile.vllm011.mcore_gpt-oss
│   └── verl0.6.1-experimental/
│       ├── Dockerfile.sglang056exp
│       └── Dockerfile.vllm012exp
├── docs/
│   ├── Makefile
│   ├── README.md
│   ├── README_vllm0.7.md
│   ├── README_vllm0.8.md
│   ├── _static/
│   │   ├── custom.css
│   │   └── js/
│   │       ├── resizable-sidebar.js
│   │       └── runllm-widget.js
│   ├── advance/
│   │   ├── agent_loop.rst
│   │   ├── async-on-policy-distill.md
│   │   ├── attention_implementation.rst
│   │   ├── checkpoint.rst
│   │   ├── dpo_extension.rst
│   │   ├── fp8.md
│   │   ├── fsdp_extension.rst
│   │   ├── fully_async.md
│   │   ├── grafana_prometheus.md
│   │   ├── megatron_extension.rst
│   │   ├── mtp.md
│   │   ├── one_step_off.md
│   │   ├── placement.rst
│   │   ├── ppo_lora.rst
│   │   ├── reward_loop.rst
│   │   ├── rollout_skip.rst
│   │   ├── rollout_trace.rst
│   │   └── rope.rst
│   ├── algo/
│   │   ├── baseline.md
│   │   ├── collabllm.md
│   │   ├── dapo.md
│   │   ├── dppo.md
│   │   ├── entropy.md
│   │   ├── gpg.md
│   │   ├── grpo.md
│   │   ├── opo.md
│   │   ├── otb.md
│   │   ├── ppo.md
│   │   ├── rollout_corr.md
│   │   ├── rollout_corr_math.md
│   │   ├── spin.md
│   │   └── sppo.md
│   ├── amd_tutorial/
│   │   ├── amd_build_dockerfile_page.rst
│   │   └── amd_vllm_page.rst
│   ├── api/
│   │   ├── data.rst
│   │   ├── single_controller.rst
│   │   ├── trainer.rst
│   │   └── utils.rst
│   ├── ascend_tutorial/
│   │   ├── contribution_guide/
│   │   │   └── ascend_ci_guide_zh.rst
│   │   ├── examples/
│   │   │   ├── ascend_performance_analysis_guide.md
│   │   │   ├── ascend_retool_best_pratice.rst
│   │   │   ├── ascend_sglang_best_practices.rst
│   │   │   ├── dapo_multi_model_optimization_practice.md
│   │   │   ├── gspo_optimization_practice.md
│   │   │   └── run_qwen3_32B_megatron_1k_256k_npu.md
│   │   ├── faq/
│   │   │   └── faq.rst
│   │   ├── features/
│   │   │   ├── ascend_backend_features.md
│   │   │   └── ascend_consistency.rst
│   │   ├── profiling/
│   │   │   ├── ascend_profiling_en.rst
│   │   │   └── ascend_profiling_zh.rst
│   │   └── quick_start/
│   │       ├── ascend_quick_start.rst
│   │       ├── ascend_sglang_quick_start.rst
│   │       └── dockerfile_build_guidance.rst
│   ├── blog/
│   │   └── v0.7.md
│   ├── conf.py
│   ├── data/
│   │   └── transfer_queue.md
│   ├── examples/
│   │   ├── config.rst
│   │   ├── gsm8k_example.rst
│   │   ├── multi_modal_example.rst
│   │   ├── ppo_code_architecture.rst
│   │   ├── sandbox_fusion_example.rst
│   │   └── skypilot_examples.rst
│   ├── faq/
│   │   └── faq.rst
│   ├── hybrid_flow.rst
│   ├── index.rst
│   ├── perf/
│   │   ├── best_practices.rst
│   │   ├── device_tuning.rst
│   │   ├── dpsk.md
│   │   ├── nsight_profiling.md
│   │   ├── perf_tuning.rst
│   │   ├── perf_tuning_on_ascend.rst
│   │   ├── torch_profiling.md
│   │   └── verl_profiler_system.md
│   ├── preparation/
│   │   ├── prepare_data.rst
│   │   └── reward_function.rst
│   ├── requirements-docs.txt
│   ├── sglang_multiturn/
│   │   ├── interaction_system.rst
│   │   ├── multiturn.rst
│   │   ├── sandbox_fusion.rst
│   │   └── search_tool_example.rst
│   ├── single_controller.rst
│   ├── start/
│   │   ├── agentic_rl.rst
│   │   ├── install.rst
│   │   ├── more_resources.rst
│   │   ├── multinode.rst
│   │   ├── quickstart.rst
│   │   └── ray_debug_tutorial.rst
│   └── workers/
│       ├── automodel_workers.rst
│       ├── fsdp_workers.rst
│       ├── megatron_workers.rst
│       ├── model_engine.rst
│       ├── ray_trainer.rst
│       ├── sglang_worker.rst
│       └── trtllm_worker.rst
├── examples/
│   ├── cispo_trainer/
│   │   └── run_cispo_qwen2_5_0_5b_gsm8k.sh
│   ├── data_preprocess/
│   │   ├── aime2024_multiturn_w_tool.py
│   │   ├── dapo_multiturn_w_tool.py
│   │   ├── full_hh_rlhf.py
│   │   ├── geo3k.py
│   │   ├── geo3k_multiturn_w_tool.py
│   │   ├── gsm8k.py
│   │   ├── gsm8k_multiturn_sft.py
│   │   ├── gsm8k_multiturn_w_interaction.py
│   │   ├── gsm8k_multiturn_w_tool.py
│   │   ├── gsm8k_tool_agent_loop.py
│   │   ├── hellaswag.py
│   │   ├── math_dataset.py
│   │   ├── multiturn.py
│   │   ├── pokemon.py
│   │   └── preprocess_search_r1_dataset.py
│   ├── dppo_trainer/
│   │   ├── dppo.md
│   │   └── run_qwen30b_dppo.sh
│   ├── fapo_trainer/
│   │   ├── README.md
│   │   ├── prepare_data.py
│   │   ├── reward_fn.py
│   │   ├── run_qwen_7b_rm_colocate.sh
│   │   └── run_qwen_7b_rm_standalone.sh
│   ├── gdpo_trainer/
│   │   └── run_qwen1_5b_gdpo.sh
│   ├── generation/
│   │   ├── run_deepseek7b_mutli_node.sh
│   │   └── run_deepseek_v2_lite_math.sh
│   ├── gmpo_trainer/
│   │   ├── README.md
│   │   ├── run_qwen2_5-7b_math.sh
│   │   ├── test_dapo_7b_math.sh
│   │   └── test_dapo_qwen3_30b_math.sh
│   ├── gpg_trainer/
│   │   ├── gpg.md
│   │   ├── run_qwen2-7b_math.sh
│   │   └── run_qwen2-7b_math_megatron.sh
│   ├── grpo_trainer/
│   │   ├── README.md
│   │   ├── run_deepseek671b_math_megatron_80gb.sh
│   │   ├── run_deepseek671b_math_megatron_96gb.sh
│   │   ├── run_deepseek7b_llm.sh
│   │   ├── run_deepseek7b_llm_math.sh
│   │   ├── run_deepseek7b_llm_math_megatron.sh
│   │   ├── run_deepseek7b_llm_seq_balance.sh
│   │   ├── run_glm41v_9b.sh
│   │   ├── run_gptoss_20b.sh
│   │   ├── run_minicpmo2_6.sh
│   │   ├── run_mistral13b_skyworkrm_hhrlhf.sh
│   │   ├── run_moonlight16b_math_megatron.sh
│   │   ├── run_nemotron_nano_v3_megatron.sh
│   │   ├── run_qwen2-32b_sglang_fsdp_npu.sh
│   │   ├── run_qwen2-7b.sh
│   │   ├── run_qwen2-7b_math.sh
│   │   ├── run_qwen2-7b_math_megatron.sh
│   │   ├── run_qwen2-7b_math_megatron_lora.sh
│   │   ├── run_qwen2-7b_math_megatron_trtllm.sh
│   │   ├── run_qwen2-7b_math_trtllm.sh
│   │   ├── run_qwen2-7b_seq_balance.sh
│   │   ├── run_qwen2-7b_seq_balance_math_megatron.sh
│   │   ├── run_qwen2-7b_sgl_megatron.sh
│   │   ├── run_qwen2_5-32b_grpo_megatron_vllm_npu.sh
│   │   ├── run_qwen2_5-3b_gsm8k_grpo_lora.sh
│   │   ├── run_qwen2_5-3b_gsm8k_grpo_lora_from_adapter.sh
│   │   ├── run_qwen2_5-7b_math_megatron_diff_tp.sh
│   │   ├── run_qwen2_5_32b_grpo_npu.sh
│   │   ├── run_qwen2_5_7b_grpo_discrete_prof_npu.sh
│   │   ├── run_qwen2_5_7b_grpo_e2e_prof_npu.sh
│   │   ├── run_qwen2_5_7b_grpo_npu.sh
│   │   ├── run_qwen2_5_vl-7b-megatron.sh
│   │   ├── run_qwen2_5_vl-7b-sglang.sh
│   │   ├── run_qwen2_5_vl-7b-trtllm.sh
│   │   ├── run_qwen2_5_vl-7b.sh
│   │   ├── run_qwen2_5_vl-7b_freeze_vision.sh
│   │   ├── run_qwen2_5_vl-7b_lora.sh
│   │   ├── run_qwen2_5_vl-7b_seq_balance.sh
│   │   ├── run_qwen2_5_vl_32b_npu.sh
│   │   ├── run_qwen2_5_vl_3b_npu.sh
│   │   ├── run_qwen2_5_vl_3b_trtllm.sh
│   │   ├── run_qwen2_5_vl_7b_npu.sh
│   │   ├── run_qwen3-235b_megatron_96gb.sh
│   │   ├── run_qwen3-30b_dapo_megatron_fp8_trtllm.sh
│   │   ├── run_qwen3-32b_npu.sh
│   │   ├── run_qwen3-4b_gsm8k_grpo_lora_merge.sh
│   │   ├── run_qwen3-8b.sh
│   │   ├── run_qwen3-8b_npu.sh
│   │   ├── run_qwen3_235b_megatron_npu.sh
│   │   ├── run_qwen3_4b_grpo_vllm_1k_npu.sh
│   │   ├── run_qwen3_5-35b-megatron.sh
│   │   ├── run_qwen3_8b_grpo_sglang_1k_spmd_npu.sh
│   │   ├── run_qwen3_8b_grpo_sglang_32k_spmd_npu.sh
│   │   ├── run_qwen3_vl-235b-megatron.sh
│   │   ├── run_qwen3_vl-30b-megatron.sh
│   │   ├── run_qwen3_vl-8b-megatron.sh
│   │   ├── run_qwen3_vl-8b_npu.sh
│   │   ├── run_qwen3_vl_30b_vllm_fsdp_npu.sh
│   │   ├── run_qwen3moe-30b_grpo_megatron_vllm_npu.sh
│   │   ├── run_qwen3moe-30b_megatron_96gb.sh
│   │   ├── run_qwen3moe-30b_megatron_lora.sh
│   │   ├── run_qwen3moe-30b_megatron_lora_fp16.sh
│   │   ├── run_qwen3moe-30b_sglang_megatron_npu.sh
│   │   ├── run_qwen3next_80b_fsdp_npu.sh
│   │   └── run_seed_oss_36b.sh
│   ├── gspo_trainer/
│   │   ├── run_qwen30b_gspo.sh
│   │   ├── run_qwen3_32b_gspo_npu.sh
│   │   ├── test_gspo_3b_math.sh
│   │   ├── test_gspo_3b_math_slurm.sh
│   │   └── test_gspo_qwen30b_a3b_ep.sh
│   ├── mtp_trainer/
│   │   ├── runtime_env.yaml
│   │   ├── test_dapo_mimo_7b_with_mtp_math_megatron.sh
│   │   └── test_dapo_mimo_7b_with_mtp_math_megatron_4_4.sh
│   ├── otb_trainer/
│   │   └── run_qwen2_5-7b.sh
│   ├── ppo_trainer/
│   │   ├── README.md
│   │   ├── run_deepseek7b_llm.sh
│   │   ├── run_deepseek7b_llm_modelscope.sh
│   │   ├── run_deepseek7b_llm_pfppo.sh
│   │   ├── run_deepseek7b_llm_sandbox_fusion.sh
│   │   ├── run_deepseek7b_llm_sp2.sh
│   │   ├── run_deepseek_full_hh_rlhf.sh
│   │   ├── run_deepseek_math_gsm8k_megatron.sh
│   │   ├── run_deepseek_math_gsm8k_megatron_nsys.sh
│   │   ├── run_gemma.sh
│   │   ├── run_moonlight16b_a3b_gsm8k_megatron.sh
│   │   ├── run_qwen1.5_moe_a2.7b-gsm8k_megatron.sh
│   │   ├── run_qwen2-7b_math_gsm8k_megatron.sh
│   │   ├── run_qwen2-7b_rm.sh
│   │   ├── run_qwen2-7b_rm_reward_loop_colocate.sh
│   │   ├── run_qwen2-7b_rm_seq_balance.sh
│   │   ├── run_qwen2-7b_rm_seq_balance_fused_kernels.sh
│   │   ├── run_qwen2-7b_rm_seq_balance_nsys.sh
│   │   ├── run_qwen2-7b_seq_balance.sh
│   │   ├── run_qwen2-7b_sglang_seq_balance.sh
│   │   ├── run_qwen2.5-32b.sh
│   │   ├── run_qwen2.5-3b_rm_reward_loop_colocate.sh
│   │   └── run_qwen3-8b_npu.sh
│   ├── prefix_grouper/
│   │   ├── README.md
│   │   └── run_qwen3_prefix_grouper.sh
│   ├── ray/
│   │   └── tutorial.ipynb
│   ├── reinforce_plus_plus_trainer/
│   │   ├── run_qwen2-7b_math_rf.sh
│   │   └── run_qwen2-7b_math_rf_baseline.sh
│   ├── remax_trainer/
│   │   ├── run_qwen2.5-3b_seq_balance.sh
│   │   └── run_qwen2.5-7b_seq_balance.sh
│   ├── rloo_trainer/
│   │   └── run_qwen2-7b.sh
│   ├── rollout_correction/
│   │   ├── run_with_rollout_corr.sh
│   │   └── run_with_rollout_corr_multi_rs.sh
│   ├── router_replay/
│   │   ├── README.md
│   │   ├── run_qwen30_a3b_megatron_sglang.sh
│   │   └── run_qwen30_a3b_megatron_vllm.sh
│   ├── sapo_trainer/
│   │   ├── run_qwen30b_sapo.sh
│   │   └── run_qwen3_8b_sapo_npu.sh
│   ├── sft/
│   │   ├── gsm8k/
│   │   │   ├── run_deepseek_6b7.sh
│   │   │   ├── run_gemma_2b.sh
│   │   │   ├── run_gemma_7b.sh
│   │   │   ├── run_mimo_megatron_mtp.sh
│   │   │   ├── run_nemotron_nano_v3.sh
│   │   │   ├── run_qwen3_30b_automodel.sh
│   │   │   ├── run_qwen3_5_megatron.sh
│   │   │   ├── run_qwen3_8b_sft_peft_sp2_npu.sh
│   │   │   ├── run_qwen_05_automodel.sh
│   │   │   ├── run_qwen_05_peft.sh
│   │   │   ├── run_qwen_05_sp2.sh
│   │   │   ├── run_qwen_05_sp2_liger.sh
│   │   │   └── run_seed_oss_36b_sft.sh
│   │   ├── multiturn/
│   │   │   └── run_qwen_05_sp2.sh
│   │   └── vlm/
│   │       └── run_qwen3_vl_2b.sh
│   ├── sglang_multiturn/
│   │   ├── README.md
│   │   ├── config/
│   │   │   ├── geo3k_multiturn_grpo.yaml
│   │   │   ├── geo3k_multiturn_megatron_grpo.yaml
│   │   │   ├── gsm8k_multiturn_grpo.yaml
│   │   │   ├── gsm8k_multiturn_grpo_server.yaml
│   │   │   ├── gsm8k_multiturn_grpo_w_interaction.yaml
│   │   │   ├── gsm8k_multiturn_megatron_grpo.yaml
│   │   │   ├── interaction_config/
│   │   │   │   └── gsm8k_interaction_config.yaml
│   │   │   ├── retool_multiturn_grpo.yaml
│   │   │   ├── search_multiturn_grpo.yaml
│   │   │   ├── search_multiturn_grpo_one_step_off.yaml
│   │   │   └── tool_config/
│   │   │       ├── geo3k_tool_config.yaml
│   │   │       ├── gsm8k_tool_config.yaml
│   │   │       ├── mcp_server.json
│   │   │       ├── mcp_tool_config.yaml
│   │   │       ├── sandbox_fusion_tool_config.yaml
│   │   │       └── search_tool_config.yaml
│   │   ├── geo3k/
│   │   │   ├── run_qwen2.5-3b_geo3k_multiturn.sh
│   │   │   ├── run_qwen2.5-3b_geo3k_multiturn_4xgpu.sh
│   │   │   └── run_qwen2.5-3b_megatron_geo3k_multiturn.sh
│   │   ├── gsm8k_toolcall_shaping/
│   │   │   ├── gsm8k_toolcall_shaping.py
│   │   │   └── run_gsm8k_grpo_toolcall_shaping.sh
│   │   ├── run_qwen0.5b_gsm8k_multiturn_curriculum.sh
│   │   ├── run_qwen2.5-0.5b_gsm8k_multiturn_w_interaction.sh
│   │   ├── run_qwen2.5-3b_gsm8k_multiturn.sh
│   │   ├── run_qwen2.5-3b_gsm8k_multiturn_4xgpu.sh
│   │   ├── run_qwen2.5-3b_gsm8k_multiturn_4xgpu_server.sh
│   │   ├── run_qwen2.5-3b_gsm8k_multiturn_server.sh
│   │   ├── run_qwen2.5-3b_gsm8k_multiturn_vllm_fsdp.sh
│   │   ├── run_qwen2.5-3b_gsm8k_tool_agent_mlflow.sh
│   │   ├── run_qwen2.5-3b_megatron_gsm8k_multiturn.sh
│   │   ├── run_qwen3-4b_gsm8k_multiturn.sh
│   │   ├── run_qwen3_4b_dapo_multiturn.sh
│   │   └── search_r1_like/
│   │       ├── local_dense_retriever/
│   │       │   ├── download.py
│   │       │   └── retrieval_server.py
│   │       └── run_qwen2.5-3b_instruct_search_multiturn.sh
│   ├── skypilot/
│   │   ├── README.md
│   │   ├── verl-grpo.yaml
│   │   ├── verl-multiturn-tools.yaml
│   │   └── verl-ppo.yaml
│   ├── slurm/
│   │   └── ray_on_slurm.slurm
│   ├── split_placement/
│   │   ├── README.md
│   │   ├── config/
│   │   │   └── ppo_trainer_split.yaml
│   │   ├── main_ppo_split.py
│   │   ├── run_deepseek7b_llm.sh
│   │   └── split_monkey_patch.py
│   ├── tuning/
│   │   ├── 0.5b/
│   │   │   └── qwen2-0.5b_grpo-lora_1_h100_fsdp_vllm.sh
│   │   ├── 1.5b/
│   │   │   └── qwen2-1.5b_grpo-lora_1_h100_fsdp_vllm.sh
│   │   ├── 14b/
│   │   │   ├── qwen2-14b_grpo-lora_2_h100_fsdp_vllm.sh
│   │   │   └── qwen2_14b_grpo_4_h800_fsdp_vllm.sh
│   │   ├── 32b/
│   │   │   ├── qwen2-32b_grpo-lora_4_h100_fsdp_vllm.sh
│   │   │   └── qwen2_32B_grpo_8_h20_megatron_vllm.sh
│   │   ├── 3b/
│   │   │   └── qwen2-3b_grpo-lora_1_h100_fsdp_vllm.sh
│   │   ├── 70b/
│   │   │   ├── qwen2-70b_grpo_32_h20_fsdp_vllm.sh
│   │   │   ├── qwen2-70b_grpo_32_h800_fsdp_vllm.sh
│   │   │   └── qwen2-72b_grpo-lora_8_h100_fsdp_vllm.sh
│   │   └── 7b/
│   │       ├── qwen2-7b_grpo-lora_1_h100_fsdp_vllm.sh
│   │       └── qwen2-7b_grpo_2_h800_fsdp_vllm.sh
│   └── tutorial/
│       └── agent_loop_get_started/
│           ├── agent_loop_tutorial.ipynb
│           └── sandbox.py
├── pyproject.toml
├── requirements-cuda.txt
├── requirements-npu.txt
├── requirements-test.txt
├── requirements.txt
├── requirements_sglang.txt
├── scripts/
│   ├── __init__.py
│   ├── converter_hf_to_mcore.py
│   ├── diagnose.py
│   ├── generate_trainer_config.sh
│   ├── init_random_model.py
│   ├── install_sglang_mcore_npu.sh
│   ├── install_vllm_sglang_mcore.sh
│   ├── legacy_model_merger.py
│   ├── megatron_merge_lora.py
│   ├── print_cfg.py
│   ├── rollout_viewer.py
│   └── veomni/
│       ├── moe_merge.py
│       └── moe_split.py
├── setup.py
├── tests/
│   ├── README.md
│   ├── __init__.py
│   ├── checkpoint_engine/
│   │   ├── __init__.py
│   │   ├── test_correctness_on_gpu.py
│   │   ├── test_correctness_on_npu.py
│   │   ├── test_special_server_adapter.py
│   │   └── test_utils.py
│   ├── experimental/
│   │   ├── agent_loop/
│   │   │   ├── agent_utils.py
│   │   │   ├── qwen_vl_tool_chat_template.jinja2
│   │   │   ├── test_agent_loop_extra_fields_schema_on_cpu.py
│   │   │   ├── test_basic_agent_loop.py
│   │   │   ├── test_gpt_oss_tool_parser.py
│   │   │   ├── test_multi_modal.py
│   │   │   └── test_standalone_rollout.py
│   │   ├── reward_loop/
│   │   │   ├── reward_fn.py
│   │   │   ├── test_agent_reward_loop_colocate.py
│   │   │   ├── test_agent_reward_loop_standalone.py
│   │   │   ├── test_async_token_bucket_on_cpu.py
│   │   │   ├── test_math_verify.py
│   │   │   ├── test_rate_limited_reward_manager_on_cpu.py
│   │   │   ├── test_reward_model_disrm.py
│   │   │   └── test_reward_model_genrm.py
│   │   └── vla/
│   │       └── test_sim_envs.py
│   ├── interactions/
│   │   ├── __init__.py
│   │   ├── test_gsm8k_interaction.py
│   │   └── test_interaction_registry.py
│   ├── kill_github_tests.sh
│   ├── models/
│   │   ├── test_engine.py
│   │   ├── test_tiled_mlp_accuracy.py
│   │   ├── test_transformer.py
│   │   └── test_transformers_ulysses.py
│   ├── single_controller/
│   │   ├── __init__.py
│   │   ├── base/
│   │   │   └── test_decorator.py
│   │   ├── check_worker_alive/
│   │   │   └── main.py
│   │   ├── detached_worker/
│   │   │   ├── README.md
│   │   │   ├── client.py
│   │   │   ├── run.sh
│   │   │   └── server.py
│   │   ├── test_auto_padding_on_cpu.py
│   │   ├── test_colocated_workers.py
│   │   ├── test_colocated_workers_fused.py
│   │   ├── test_data_transfer.py
│   │   ├── test_decorator_on_cpu.py
│   │   ├── test_device_mesh_register.py
│   │   ├── test_driverfunc_to_worker.py
│   │   ├── test_fused_workers_on_cpu.py
│   │   ├── test_get_set_dispatch_collect_cpu.py
│   │   ├── test_high_level_scheduling_api.py
│   │   ├── test_nested_worker.py
│   │   ├── test_ray_collectives.py
│   │   ├── test_ray_local_envs_on_cpu.py
│   │   ├── test_ray_utils_on_cpu.py
│   │   ├── test_rvdz.py
│   │   ├── test_split_resource_pool.py
│   │   ├── test_worker_group_basics.py
│   │   └── test_worker_group_torch.py
│   ├── special_distributed/
│   │   ├── README.md
│   │   ├── run_all.sh
│   │   ├── test_fsdp_ckpt.py
│   │   ├── test_mcore_config_converter.py
│   │   ├── test_tensor_dict.py
│   │   └── test_torch_functional.py
│   ├── special_e2e/
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── check_custom_rwd_fn.py
│   │   ├── check_results.py
│   │   ├── envs/
│   │   │   ├── __init__.py
│   │   │   └── digit_completion/
│   │   │       ├── __init__.py
│   │   │       ├── task.py
│   │   │       └── tokenizer.py
│   │   ├── generation/
│   │   │   ├── run_gen_qwen05.sh
│   │   │   └── run_gen_qwen05_server.sh
│   │   ├── ppo_trainer/
│   │   │   ├── expert_parallel/
│   │   │   │   ├── qwen2moe_minimal.json
│   │   │   │   └── qwen3moe_minimal.json
│   │   │   ├── run_function_reward.sh
│   │   │   ├── run_model_reward.sh
│   │   │   ├── run_single_gpu.sh
│   │   │   └── run_single_gpu_with_engine.sh
│   │   ├── run_dapo.sh
│   │   ├── run_fully_async_policy.sh
│   │   ├── run_geo3k_fsdp_sgl_multiturn_w_tool.sh
│   │   ├── run_grpo_lora_with_merge.sh
│   │   ├── run_gsm8k_fsdp_sgl_multiturn_sf_tool.sh
│   │   ├── run_gsm8k_fsdp_sgl_multiturn_w_tool.sh
│   │   ├── run_one_step_off_policy.sh
│   │   ├── run_ppo_trainer_megatron.sh
│   │   ├── run_ppo_trainer_torchtitan.sh
│   │   ├── run_ppo_trainer_veomni.sh
│   │   ├── run_test.sh
│   │   └── sft/
│   │       ├── compare_sft_engine_results.py
│   │       ├── run_sft.sh
│   │       ├── run_sft_engine.sh
│   │       └── test_sft_engine_all.sh
│   ├── special_npu/
│   │   ├── nightly_ci_ascend/
│   │   │   ├── run_grpo_qwen25-7b-instruct_fsdp_npu.sh
│   │   │   ├── run_grpo_qwen25-vl-3b-instruct_fsdp_npu.sh
│   │   │   └── run_ppo_qwen3-8b_fsdp_npu.sh
│   │   ├── run_qwen2_5_05b_grpo.sh
│   │   ├── run_qwen2_5_05b_grpo_mindspeed.sh
│   │   ├── run_qwen2_5_05b_sft_peft_sp2.sh
│   │   ├── run_qwen2_5_vl_3b_npu.sh
│   │   ├── run_qwen3_06b_ppo.sh
│   │   └── run_qwen3_30b_grpo_mindspeed.sh
│   ├── special_sanity/
│   │   ├── check_api_docs.py
│   │   ├── check_dataproto_usage.py
│   │   ├── check_device_api_usage.py
│   │   ├── check_docs_time_info.py
│   │   ├── check_docstrings.py
│   │   ├── check_license.py
│   │   ├── check_pr_description.py
│   │   ├── check_pr_title.py
│   │   ├── test_config_docs.py
│   │   ├── test_import.py
│   │   ├── type_coverage_check.py
│   │   ├── validate_imported_docs.py
│   │   └── validate_structure.py
│   ├── special_standalone/
│   │   ├── README.md
│   │   └── test_memory_buffers.py
│   ├── test_base_config_on_cpu.py
│   ├── test_protocol_on_cpu.py
│   ├── test_protocol_v2_on_cpu.py
│   ├── trainer/
│   │   ├── __init__.py
│   │   ├── config/
│   │   │   ├── __init__.py
│   │   │   ├── legacy_ppo_megatron_trainer.yaml
│   │   │   ├── legacy_ppo_trainer.yaml
│   │   │   ├── test_algo_config_on_cpu.py
│   │   │   └── test_legacy_config_on_cpu.py
│   │   └── ppo/
│   │       ├── __init__.py
│   │       ├── test_core_algos_on_cpu.py
│   │       ├── test_metric_utils_on_cpu.py
│   │       ├── test_rollout_corr.py
│   │       └── test_rollout_corr_integration.py
│   ├── utils/
│   │   ├── _test_module.py
│   │   ├── ckpt/
│   │   │   ├── test_checkpoint_cleanup_on_cpu.py
│   │   │   └── test_esi_save_ckpt_on_cpu.py
│   │   ├── dataset/
│   │   │   ├── test_create_rl_sampler_on_cpu.py
│   │   │   ├── test_multiturn_sft_dataset_on_cpu.py
│   │   │   ├── test_rl_collate_fn_on_cpu.py
│   │   │   └── test_rl_dataset_on_cpu.py
│   │   ├── debug/
│   │   │   └── test_metrics.py
│   │   ├── megatron/
│   │   │   └── test_pipeline_parallel.py
│   │   ├── reward_score/
│   │   │   ├── reward_score/
│   │   │   │   └── test_sandbox_fusion_on_cpu.py
│   │   │   └── test_sandbox_on_cpu.py
│   │   ├── test_activation_offload.py
│   │   ├── test_bucketed_weight_transfer.py
│   │   ├── test_check_ipc_version_support_on_npu.py
│   │   ├── test_check_profiler_output.py
│   │   ├── test_config_on_cpu.py
│   │   ├── test_flops_counter.py
│   │   ├── test_fs_on_cpu.py
│   │   ├── test_fsdp2_peft_wrapping.py
│   │   ├── test_fsdp_lora_merge.py
│   │   ├── test_groupwise.py
│   │   ├── test_import_utils_on_cpu.py
│   │   ├── test_linear_cross_entropy.py
│   │   ├── test_mlflow_key_sanitization.py
│   │   ├── test_model_on_cpu.py
│   │   ├── test_normalize_peft_param_name.py
│   │   ├── test_normalize_peft_param_name_on_cpu.py
│   │   ├── test_nvtx_profile.py
│   │   ├── test_padding_on_cpu.py
│   │   ├── test_prepare_micro_batches_with_group_size.py
│   │   ├── test_rollout_skip_on_cpu.py
│   │   ├── test_rollout_trace_on_cpu.py
│   │   ├── test_seqlen_balancing.py
│   │   ├── test_server_profiler.py
│   │   ├── test_shared_memory.py
│   │   ├── test_special_linear_cross_entropy_tp.py
│   │   ├── test_special_mstx_profile.py
│   │   ├── test_temp_env_on_cpu.py
│   │   ├── test_timeout_decorator_cpu.py
│   │   ├── test_tokenizer_normalize_on_cpu.py
│   │   ├── test_torch_functional.py
│   │   └── test_torch_profile.py
│   └── workers/
│       ├── actor/
│       │   └── test_special_dp_actor.py
│       ├── config/
│       │   ├── test_actor_config_on_cpu.py
│       │   ├── test_critic_config_on_cpu.py
│       │   ├── test_engine_config_on_cpu.py
│       │   ├── test_model_config_on_cpu.py
│       │   └── test_optim_config_on_cpu.py
│       ├── critic/
│       │   └── test_special_dp_critic.py
│       ├── reward_manager/
│       │   └── test_registry_on_cpu.py
│       ├── rollout/
│       │   ├── perf/
│       │   │   └── vllm_async_rollout.py
│       │   ├── resource/
│       │   │   └── tool_configs/
│       │   │       ├── mcp_server.json
│       │   │       ├── mcp_tool_config
│       │   │       ├── sandbox_fusion_tool_config
│       │   │       └── search_tool_config
│       │   ├── rollout_sglang/
│       │   │   └── test_http_server_engine.py
│       │   ├── rollout_trtllm/
│       │   │   ├── __init__.py
│       │   │   ├── test_adapter.py
│       │   │   ├── test_async_server.py
│       │   │   └── test_trtllm_rollout_utils.py
│       │   ├── rollout_vllm/
│       │   │   ├── run_fsdp_vllm.py
│       │   │   └── test_vllm_abort.py
│       │   ├── test_hf_rollout.py
│       │   ├── test_sglang_async_rollout_multimodal_delta.py
│       │   ├── test_sglang_rollout_sharding_manager.py
│       │   └── test_vllm_cli_args_on_cpu.py
│       ├── test_fsdp_attn_implementation.py
│       └── test_fsdp_workers.py
└── verl/
    ├── __init__.py
    ├── base_config.py
    ├── checkpoint_engine/
    │   ├── README.md
    │   ├── __init__.py
    │   ├── base.py
    │   ├── hccl_checkpoint_engine.py
    │   ├── kimi_checkpoint_engine.py
    │   ├── mooncake_checkpoint_engine.py
    │   ├── nccl_checkpoint_engine.py
    │   └── nixl_checkpoint_engine.py
    ├── experimental/
    │   ├── __init__.py
    │   ├── agent_loop/
    │   │   ├── __init__.py
    │   │   ├── agent_loop.py
    │   │   ├── prometheus_utils.py
    │   │   ├── single_turn_agent_loop.py
    │   │   ├── tool_agent_loop.py
    │   │   ├── tool_parser.py
    │   │   └── utils.py
    │   ├── dataset/
    │   │   ├── __init__.py
    │   │   └── sampler.py
    │   ├── dynamic_dataset/
    │   │   ├── __init__.py
    │   │   └── dynamicgen_dataset.py
    │   ├── fully_async_policy/
    │   │   ├── README.md
    │   │   ├── README_zh.md
    │   │   ├── agent_loop/
    │   │   │   ├── __init__.py
    │   │   │   └── agent_loop.py
    │   │   ├── config/
    │   │   │   ├── fully_async_ppo_megatron_trainer.yaml
    │   │   │   └── fully_async_ppo_trainer.yaml
    │   │   ├── detach_utils.py
    │   │   ├── fully_async_main.py
    │   │   ├── fully_async_rollouter.py
    │   │   ├── fully_async_trainer.py
    │   │   ├── message_queue.py
    │   │   ├── shell/
    │   │   │   ├── dapo_30b_a3b_base_math_fsdp.sh
    │   │   │   ├── dapo_7b_async_retool.sh
    │   │   │   ├── dapo_7b_math_fsdp2_16_16.sh
    │   │   │   ├── dapo_7b_math_fsdp2_32_32.sh
    │   │   │   ├── dapo_7b_math_fsdp2_4_12.sh
    │   │   │   ├── dapo_7b_math_fsdp2_4_4.sh
    │   │   │   ├── dapo_7b_math_fsdp2_64_64.sh
    │   │   │   ├── dapo_7b_math_fsdp2_64_64_mis.sh
    │   │   │   ├── dapo_7b_math_fsdp2_8_8.sh
    │   │   │   ├── geo3k_qwen25vl_7b_megatron_4_4.sh
    │   │   │   ├── grpo_30b_a3b_base_math_megatron_96_32.sh
    │   │   │   ├── grpo_30b_a3b_base_math_megatron_96_32_mis.sh
    │   │   │   └── runtime_env.yaml
    │   │   └── unittest/
    │   │       └── simple_streaming_demo.py
    │   ├── one_step_off_policy/
    │   │   ├── README.md
    │   │   ├── config/
    │   │   │   ├── one_step_off_ppo_megatron_trainer.yaml
    │   │   │   └── one_step_off_ppo_trainer.yaml
    │   │   ├── main_ppo.py
    │   │   ├── ray_trainer.py
    │   │   └── shell/
    │   │       ├── dapo_7b_math_fsdp2_4_12.sh
    │   │       ├── dapo_7b_math_fsdp2_64_64.sh
    │   │       ├── dapo_7b_math_fsdp2_64_64_ris.sh
    │   │       ├── dapo_7b_math_fsdp2_colocate.sh
    │   │       ├── dapo_7b_math_fsdp2_sglang_4_12.sh
    │   │       ├── dapo_7b_math_fsdp2_sglang_colocate.sh
    │   │       ├── dapo_7b_math_megatron_4_12.sh
    │   │       ├── dapo_7b_math_megatron_colocate.sh
    │   │       ├── grpo_0.6b_gsm8k_fsdp2_2_6.sh
    │   │       ├── grpo_0.6b_gsm8k_fsdp2_sglang_2_6.sh
    │   │       ├── grpo_3b_gsm8k_fsdp2_2_6.sh
    │   │       └── grpo_qwen3_8b_gsm8k_fsdp2_8_8_npu.sh
    │   ├── reward_loop/
    │   │   ├── __init__.py
    │   │   ├── reward_loop.py
    │   │   ├── reward_manager/
    │   │   │   ├── __init__.py
    │   │   │   ├── base.py
    │   │   │   ├── dapo.py
    │   │   │   ├── gdpo.py
    │   │   │   ├── limited.py
    │   │   │   ├── naive.py
    │   │   │   ├── registry.py
    │   │   │   └── remote.py
    │   │   ├── reward_model.py
    │   │   └── router/
    │   │       ├── inner_sglang_router.py
    │   │       └── naive_router.py
    │   ├── separation/
    │   │   ├── __init__.py
    │   │   ├── engine_workers.py
    │   │   ├── ray_trainer.py
    │   │   └── utils.py
    │   └── vla/
    │       ├── README.md
    │       ├── config/
    │       │   ├── rob_ppo_trainer.yaml
    │       │   └── rob_sac_trainer.yaml
    │       ├── dp_rob.py
    │       ├── env_loop.py
    │       ├── envs/
    │       │   ├── __init__.py
    │       │   ├── action_utils.py
    │       │   ├── isaac_env/
    │       │   │   ├── __init__.py
    │       │   │   └── isaac_env.py
    │       │   └── libero_env/
    │       │       ├── __init__.py
    │       │       ├── libero_env.py
    │       │       ├── utils.py
    │       │       └── venv.py
    │       ├── fsdp_workers.py
    │       ├── main_ppo.py
    │       ├── main_sac.py
    │       ├── models/
    │       │   ├── __init__.py
    │       │   ├── modules/
    │       │   │   └── mlp.py
    │       │   ├── openvla_oft/
    │       │   │   ├── __init__.py
    │       │   │   ├── configuration_prismatic.py
    │       │   │   ├── constants.py
    │       │   │   ├── modeling_prismatic.py
    │       │   │   ├── processing_prismatic.py
    │       │   │   └── train_utils.py
    │       │   ├── pi0_torch/
    │       │   │   ├── __init__.py
    │       │   │   ├── configuration_pi0_torch.py
    │       │   │   ├── model/
    │       │   │   │   ├── modeling_pi0.py
    │       │   │   │   └── paligemma_with_expert.py
    │       │   │   ├── modeling_pi0_torch.py
    │       │   │   ├── pi0_utils.py
    │       │   │   └── policy/
    │       │   │       ├── __init__.py
    │       │   │       ├── base.py
    │       │   │       └── libero_policy.py
    │       │   └── register_vla_models.py
    │       ├── naive_rollout_rob.py
    │       ├── prepare_libero_dataset.py
    │       ├── requirements_vla.txt
    │       ├── rob_ray_trainer.py
    │       ├── run_pi05_libero_sac.sh
    │       ├── run_pi05_libero_sac_disagg.sh
    │       ├── run_simpleVLA_isaac_disagg.sh
    │       ├── run_simpleVLA_libero_grpo.sh
    │       ├── sac/
    │       │   ├── base.py
    │       │   ├── naive_rollout_pi05.py
    │       │   ├── replay_pool.py
    │       │   ├── sac_actor.py
    │       │   └── sac_ray_trainer.py
    │       └── workers/
    │           └── env/
    │               ├── env_loop_wg_test.py
    │               ├── env_manager.py
    │               └── env_worker.py
    ├── interactions/
    │   ├── __init__.py
    │   ├── base.py
    │   ├── gsm8k_interaction.py
    │   ├── utils/
    │   │   ├── __init__.py
    │   │   └── interaction_registry.py
    │   └── weather_interaction.py
    ├── model_merger/
    │   ├── __init__.py
    │   ├── __main__.py
    │   ├── base_model_merger.py
    │   ├── fsdp_model_merger.py
    │   └── megatron_model_merger.py
    ├── models/
    │   ├── README.md
    │   ├── __init__.py
    │   ├── llama/
    │   │   ├── __init__.py
    │   │   └── megatron/
    │   │       ├── __init__.py
    │   │       ├── checkpoint_utils/
    │   │       │   ├── __init__.py
    │   │       │   ├── llama_loader.py
    │   │       │   ├── llama_loader_depracated.py
    │   │       │   └── llama_saver.py
    │   │       ├── layers/
    │   │       │   ├── __init__.py
    │   │       │   ├── parallel_attention.py
    │   │       │   ├── parallel_decoder.py
    │   │       │   ├── parallel_linear.py
    │   │       │   ├── parallel_mlp.py
    │   │       │   └── parallel_rmsnorm.py
    │   │       └── modeling_llama_megatron.py
    │   ├── mcore/
    │   │   ├── __init__.py
    │   │   ├── bridge.py
    │   │   ├── config_converter.py
    │   │   ├── loader.py
    │   │   ├── mbridge.py
    │   │   ├── model_forward.py
    │   │   ├── model_forward_1f1b_overlap.py
    │   │   ├── model_forward_fused.py
    │   │   ├── model_initializer.py
    │   │   ├── mtp_patch.py
    │   │   ├── patch.py
    │   │   ├── qwen2_5_vl/
    │   │   │   ├── __init__.py
    │   │   │   ├── attention.py
    │   │   │   ├── model.py
    │   │   │   ├── rope_utils.py
    │   │   │   ├── vision_config.py
    │   │   │   ├── vision_model.py
    │   │   │   └── vision_transformer_block.py
    │   │   ├── readme.md
    │   │   ├── registry.py
    │   │   ├── saver.py
    │   │   ├── util.py
    │   │   └── weight_converter.py
    │   ├── qwen2/
    │   │   ├── __init__.py
    │   │   └── megatron/
    │   │       ├── __init__.py
    │   │       ├── checkpoint_utils/
    │   │       │   ├── __init__.py
    │   │       │   ├── qwen2_loader.py
    │   │       │   ├── qwen2_loader_depracated.py
    │   │       │   └── qwen2_saver.py
    │   │       ├── layers/
    │   │       │   ├── __init__.py
    │   │       │   ├── parallel_attention.py
    │   │       │   ├── parallel_decoder.py
    │   │       │   ├── parallel_linear.py
    │   │       │   ├── parallel_mlp.py
    │   │       │   └── parallel_rmsnorm.py
    │   │       └── modeling_qwen2_megatron.py
    │   ├── registry.py
    │   ├── transformers/
    │   │   ├── __init__.py
    │   │   ├── apertus.py
    │   │   ├── dense_common.py
    │   │   ├── glm4v.py
    │   │   ├── kimi_vl.py
    │   │   ├── llama.py
    │   │   ├── monkey_patch.py
    │   │   ├── npu_patch.py
    │   │   ├── qwen2.py
    │   │   ├── qwen2_vl.py
    │   │   ├── qwen3_vl.py
    │   │   └── tiled_mlp.py
    │   └── weight_loader_registry.py
    ├── protocol.py
    ├── py.typed
    ├── single_controller/
    │   ├── __init__.py
    │   ├── base/
    │   │   ├── __init__.py
    │   │   ├── decorator.py
    │   │   ├── worker.py
    │   │   └── worker_group.py
    │   └── ray/
    │       ├── __init__.py
    │       └── base.py
    ├── third_party/
    │   ├── __init__.py
    │   ├── torch/
    │   │   ├── __init__.py
    │   │   └── distributed/
    │   │       ├── __init__.py
    │   │       ├── _state_dict_utils.py
    │   │       └── checkpoint/
    │   │           ├── __init__.py
    │   │           └── state_dict.py
    │   └── vllm/
    │       └── __init__.py
    ├── tools/
    │   ├── __init__.py
    │   ├── base_tool.py
    │   ├── geo3k_tool.py
    │   ├── gsm8k_tool.py
    │   ├── image_zoom_in_tool.py
    │   ├── mcp_base_tool.py
    │   ├── mcp_search_tool.py
    │   ├── sandbox_fusion_tools.py
    │   ├── schemas.py
    │   ├── search_tool.py
    │   └── utils/
    │       ├── __init__.py
    │       ├── mcp_clients/
    │       │   ├── McpClientManager.py
    │       │   └── utils.py
    │       ├── search_r1_like_utils.py
    │       └── tool_registry.py
    ├── trainer/
    │   ├── README.md
    │   ├── __init__.py
    │   ├── config/
    │   │   ├── __init__.py
    │   │   ├── _generated_ppo_megatron_trainer.yaml
    │   │   ├── _generated_ppo_torchtitan_trainer.yaml
    │   │   ├── _generated_ppo_trainer.yaml
    │   │   ├── _generated_ppo_veomni_trainer.yaml
    │   │   ├── actor/
    │   │   │   ├── actor.yaml
    │   │   │   ├── dp_actor.yaml
    │   │   │   ├── megatron_actor.yaml
    │   │   │   ├── torchtitan_actor.yaml
    │   │   │   └── veomni_actor.yaml
    │   │   ├── algorithm/
    │   │   │   └── rollout_correction.yaml
    │   │   ├── algorithm.py
    │   │   ├── config.py
    │   │   ├── critic/
    │   │   │   ├── critic.yaml
    │   │   │   ├── dp_critic.yaml
    │   │   │   ├── megatron_critic.yaml
    │   │   │   ├── torchtitan_critic.yaml
    │   │   │   └── veomni_critic.yaml
    │   │   ├── data/
    │   │   │   └── legacy_data.yaml
    │   │   ├── engine/
    │   │   │   ├── automodel.yaml
    │   │   │   ├── fsdp.yaml
    │   │   │   ├── megatron.yaml
    │   │   │   ├── torchtitan.yaml
    │   │   │   └── veomni.yaml
    │   │   ├── evaluation.yaml
    │   │   ├── legacy_reward_impl.yaml
    │   │   ├── model/
    │   │   │   └── hf_model.yaml
    │   │   ├── model_engine/
    │   │   │   ├── dp.yaml
    │   │   │   ├── torchtitan.yaml
    │   │   │   └── veomni.yaml
    │   │   ├── npu_profile/
    │   │   │   └── npu_profile.yaml
    │   │   ├── optim/
    │   │   │   ├── automodel.yaml
    │   │   │   ├── fsdp.yaml
    │   │   │   ├── megatron.yaml
    │   │   │   ├── torchtitan.yaml
    │   │   │   └── veomni.yaml
    │   │   ├── ppo_megatron_trainer.yaml
    │   │   ├── ppo_trainer.yaml
    │   │   ├── profiler/
    │   │   │   └── profiler.yaml
    │   │   ├── ref/
    │   │   │   ├── dp_ref.yaml
    │   │   │   ├── megatron_ref.yaml
    │   │   │   ├── ref.yaml
    │   │   │   ├── torchtitan_ref.yaml
    │   │   │   └── veomni_ref.yaml
    │   │   ├── reward/
    │   │   │   └── reward.yaml
    │   │   ├── rollout/
    │   │   │   └── rollout.yaml
    │   │   └── sft_trainer_engine.yaml
    │   ├── constants_ppo.py
    │   ├── main_eval.py
    │   ├── main_generation_server.py
    │   ├── main_ppo.py
    │   ├── ppo/
    │   │   ├── __init__.py
    │   │   ├── core_algos.py
    │   │   ├── metric_utils.py
    │   │   ├── prefix_grouper_utils.py
    │   │   ├── ray_trainer.py
    │   │   ├── reward.py
    │   │   ├── rollout_corr_helper.py
    │   │   └── utils.py
    │   ├── runtime_env.yaml
    │   ├── sft_trainer.py
    │   └── sft_trainer_ray.py
    ├── utils/
    │   ├── __init__.py
    │   ├── activation_offload.py
    │   ├── attention_utils.py
    │   ├── chat_template.py
    │   ├── checkpoint/
    │   │   ├── __init__.py
    │   │   ├── checkpoint_handler.py
    │   │   ├── checkpoint_manager.py
    │   │   ├── fsdp_checkpoint_manager.py
    │   │   └── megatron_checkpoint_manager.py
    │   ├── config.py
    │   ├── dataset/
    │   │   ├── README.md
    │   │   ├── __init__.py
    │   │   ├── dataset_utils.py
    │   │   ├── multiturn_sft_dataset.py
    │   │   ├── rl_dataset.py
    │   │   ├── rm_dataset.py
    │   │   └── vision_utils.py
    │   ├── debug/
    │   │   ├── __init__.py
    │   │   ├── metrics.py
    │   │   ├── performance.py
    │   │   └── trajectory_tracker.py
    │   ├── device.py
    │   ├── distributed.py
    │   ├── experimental/
    │   │   ├── __init__.py
    │   │   └── torch_functional.py
    │   ├── flops_counter.py
    │   ├── fp8_utils.py
    │   ├── fs.py
    │   ├── fsdp_utils.py
    │   ├── groupwise.py
    │   ├── hdfs_io.py
    │   ├── import_utils.py
    │   ├── kernel/
    │   │   ├── __init__.py
    │   │   ├── fp8_kernel.py
    │   │   ├── kernels.py
    │   │   └── linear_cross_entropy.py
    │   ├── logger/
    │   │   ├── __init__.py
    │   │   └── aggregate_logger.py
    │   ├── logging_utils.py
    │   ├── megatron/
    │   │   ├── __init__.py
    │   │   ├── dist_checkpointing.py
    │   │   ├── memory.py
    │   │   ├── optimizer.py
    │   │   ├── pipeline_parallel.py
    │   │   ├── router_replay_patch.py
    │   │   ├── router_replay_utils.py
    │   │   ├── sequence_parallel.py
    │   │   └── tensor_parallel.py
    │   ├── megatron_peft_utils.py
    │   ├── megatron_utils.py
    │   ├── memory_utils.py
    │   ├── metric/
    │   │   ├── __init__.py
    │   │   └── utils.py
    │   ├── model.py
    │   ├── net_utils.py
    │   ├── npu_flash_attn_utils.py
    │   ├── profiler/
    │   │   ├── __init__.py
    │   │   ├── config.py
    │   │   ├── empty_annotations.py
    │   │   ├── mstx_profile.py
    │   │   ├── nvtx_profile.py
    │   │   ├── performance.py
    │   │   ├── profile.py
    │   │   └── torch_profile.py
    │   ├── py_functional.py
    │   ├── qat/
    │   │   ├── __init__.py
    │   │   ├── core.py
    │   │   ├── linear.py
    │   │   ├── quantizer.py
    │   │   └── vllm_patch.py
    │   ├── ray_utils.py
    │   ├── rendezvous/
    │   │   ├── __init__.py
    │   │   └── ray_backend.py
    │   ├── reward_score/
    │   │   ├── __init__.py
    │   │   ├── geo3k.py
    │   │   ├── gsm8k.py
    │   │   ├── math_batch.py
    │   │   ├── math_dapo.py
    │   │   ├── math_reward.py
    │   │   ├── math_verify.py
    │   │   ├── prime_code/
    │   │   │   ├── README.md
    │   │   │   ├── __init__.py
    │   │   │   ├── testing_util.py
    │   │   │   └── utils.py
    │   │   ├── prime_math/
    │   │   │   ├── __init__.py
    │   │   │   ├── grader.py
    │   │   │   └── math_normalize.py
    │   │   ├── rlla.py
    │   │   ├── sandbox_fusion/
    │   │   │   ├── __init__.py
    │   │   │   └── utils.py
    │   │   └── search_r1_like_qa_em.py
    │   ├── rollout_skip.py
    │   ├── rollout_trace.py
    │   ├── seqlen_balancing.py
    │   ├── sglang/
    │   │   └── sglang_fp8_utils.py
    │   ├── tensordict_utils.py
    │   ├── tokenizer.py
    │   ├── torch_dtypes.py
    │   ├── torch_functional.py
    │   ├── tracking.py
    │   ├── transformers_compat.py
    │   ├── trtllm/
    │   │   └── trtllm_fp8_utils.py
    │   ├── ulysses.py
    │   └── vllm/
    │       ├── __init__.py
    │       ├── npu_vllm_patch.py
    │       ├── patch.py
    │       ├── utils.py
    │       └── vllm_fp8_utils.py
    ├── version/
    │   └── version
    └── workers/
        ├── __init__.py
        ├── actor/
        │   ├── __init__.py
        │   ├── base.py
        │   ├── dp_actor.py
        │   └── megatron_actor.py
        ├── config/
        │   ├── __init__.py
        │   ├── actor.py
        │   ├── critic.py
        │   ├── engine.py
        │   ├── megatron_peft.py
        │   ├── model.py
        │   ├── optimizer.py
        │   ├── reward.py
        │   └── rollout.py
        ├── critic/
        │   ├── __init__.py
        │   ├── base.py
        │   ├── dp_critic.py
        │   └── megatron_critic.py
        ├── engine/
        │   ├── __init__.py
        │   ├── automodel/
        │   │   ├── __init__.py
        │   │   ├── transformer_impl.py
        │   │   └── utils.py
        │   ├── base.py
        │   ├── fsdp/
        │   │   ├── __init__.py
        │   │   ├── transformer_impl.py
        │   │   └── utils.py
        │   ├── megatron/
        │   │   ├── __init__.py
        │   │   ├── transformer_impl.py
        │   │   └── utils.py
        │   ├── mindspeed/
        │   │   ├── __init__.py
        │   │   └── transformer_impl.py
        │   ├── torchtitan/
        │   │   ├── __init__.py
        │   │   ├── transformer_impl.py
        │   │   └── utils.py
        │   ├── utils.py
        │   └── veomni/
        │       ├── __init__.py
        │       ├── transformer_impl.py
        │       └── utils.py
        ├── engine_workers.py
        ├── fsdp_workers.py
        ├── megatron_workers.py
        ├── reward_manager/
        │   ├── __init__.py
        │   ├── abstract.py
        │   ├── batch.py
        │   ├── dapo.py
        │   ├── naive.py
        │   ├── prime.py
        │   └── registry.py
        ├── rollout/
        │   ├── __init__.py
        │   ├── base.py
        │   ├── hf_rollout.py
        │   ├── naive/
        │   │   ├── __init__.py
        │   │   └── naive_rollout.py
        │   ├── replica.py
        │   ├── schemas.py
        │   ├── sglang_rollout/
        │   │   ├── __init__.py
        │   │   ├── async_sglang_server.py
        │   │   ├── http_server_engine.py
        │   │   ├── sglang_rollout.py
        │   │   └── utils.py
        │   ├── tokenizer.py
        │   ├── trtllm_rollout/
        │   │   ├── trtllm_async_rollout.md
        │   │   ├── trtllm_async_server.py
        │   │   ├── trtllm_rollout.py
        │   │   └── trtllm_worker_extension.py
        │   ├── utils.py
        │   └── vllm_rollout/
        │       ├── __init__.py
        │       ├── bucketed_weight_transfer.py
        │       ├── utils.py
        │       ├── vllm_async_server.py
        │       └── vllm_rollout.py
        ├── sharding_manager/
        │   ├── __init__.py
        │   ├── base.py
        │   └── fsdp_ulysses.py
        └── utils/
            ├── __init__.py
            ├── losses.py
            └── padding.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gemini/config.yaml
================================================
have_fun: false
code_review:
  disable: false
  comment_severity_threshold: HIGH
  max_review_comments: -1
  pull_request_opened:
    help: false
    summary: false
    code_review: true
ignore_patterns: []


================================================
FILE: .git-blame-ignore-revs
================================================
# Local uasge: git config blame.ignoreRevsFile .git-blame-ignore-revs

# [dev] feat: immigrate from yapf & pylint to ruff based on pre-commit
# Changed 268 files, +10k/-9k lines. This is the biggest formatter change.
b00f77d8559b48d57a33c0132a5ba1c81891a536

# [ci] refactor: reduce ruff line-length from 300 to 120
# Changed 238 files, +6k/-1k lines. Global formatting change.
00a10a8ef389556f957a2f36132b2358fd6a109f

# [Lint] fix: linting errors in all files
# Changed 179 files, +1k/-3k lines. Global lint fix.
8e5ad4688a13de81727c014a3c2e2fb26324bc20


================================================
FILE: .github/CODEOWNERS
================================================
/docs @eric-haibin-lin @zhaochenyang20 @hongpeng-guo
/docs/amd_tutorial @yushengsu-thu
/docs/slang_multiturn @zhaochenyang20 @SwordFaith
/docs/ascend_tutorial @FightingZhen

/third_party/sglang @zhaochenyang20 @SwordFaith
/third_party/vllm @PeterSH6 @wuxibin89

/examples/grpo_trainer @vermouth1992 @PeterSH6 @tardis-key @FightingZhen @ji-huazhong

/verl/single_controller @zw0610 @wuxibin89 @hongpeng-guo
/verl/trainer @eric-haibin-lin @vermouth1992 @tongyx361 @PeterSH6
/verl/models/mcore @ISEEKYAN @vermouth1992
/verl/models/transformers @vermouth1992 @PeterSH6 @tardis-key @FightingZhen @ji-huazhong
/verl/workers/engine @eric-haibin-lin @vermouth1992 @ZihengJiang
/verl/workers/roles @eric-haibin-lin @vermouth1992 @ZihengJiang
/verl/workers/engine/fsdp @eric-haibin-lin @vermouth1992 @ZihengJiang
/verl/workers/rollout/vllm_rollout @wuxibin89 @PeterSH6 @chenhaiq
/verl/workers/rollout/sglang_rollout @zhaochenyang20 @SwordFaith @chenhaiq
/verl/workers/actor/megatron_actor.py @ISEEKYAN @vermouth1992
/verl/workers/critic/megatron_critic.py @ISEEKYAN @vermouth1992
/verl/workers/megatron_workers.py @ISEEKYAN @vermouth1992
/verl/experimental @wuxibin89 @ArronHZG

/tests/single_controller @zw0610 @wuxibin89
/tests/trainer @eric-haibin-lin @vermouth1992 @tongyx361 @PeterSH6
/tests/workers/rollout/vllm_rollout @wuxibin89 @PeterSH6 @chenhaiq


================================================
FILE: .github/ISSUE_TEMPLATE/bug-report.yml
================================================
# modified from https://github.com/huggingface/transformers/blob/main/.github/ISSUE_TEMPLATE/bug-report.yml?plain=1
name: "\U0001F41B Bug Report"
description: Submit a bug report to help us improve verl
labels: [ "bug" ]
body:
  - type: markdown
    attributes:
      value: |
        Thanks for taking the time to fill out this bug report! 🤗

  - type: textarea
    id: system-info
    attributes:
      label: System Info
      description: Please share your system info with us. You can run the command `python scripts/diagnose.py` and copy-paste its output below.
      placeholder: verl version, platform, python version, ...
    validations:
      required: true

  - type: checkboxes
    id: information-scripts-examples
    attributes:
      label: Information
      description: 'The problem arises when using:'
      options:
        - label: "The official example scripts"
        - label: "My own modified scripts"

  - type: checkboxes
    id: information-tasks
    attributes:
      label: Tasks
      description: "The tasks I am working on are:"
      options:
        - label: "An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)"
        - label: "My own task or dataset (give details below)"

  - type: textarea
    id: reproduction
    validations:
      required: true
    attributes:
      label: Reproduction
      description: |
        Please provide a code sample that reproduces the problem you ran into. It can be a Colab link or just a code snippet.
        Please include relevant config information with your code.
        If you have code snippets, error messages, stack traces please provide them here as well.
        Important! Use code tags to correctly format your code. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
        Do not use screenshots, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.

      placeholder: |
        Steps to reproduce the behavior:

          1.
          2.
          3.


  - type: textarea
    id: expected-behavior
    validations:
      required: true
    attributes:
      label: Expected behavior
      description: "A clear and concise description of what you would expect to happen."

================================================
FILE: .github/ISSUE_TEMPLATE/config.yml
================================================
blank_issues_enabled: true
version: 0.1


================================================
FILE: .github/ISSUE_TEMPLATE/feature-request.yml
================================================
# modified from https://github.com/huggingface/transformers/blob/main/.github/ISSUE_TEMPLATE/feature-request.yml?plain=1
name: "\U0001F680 Feature request"
description: Submit a proposal/request for a new verl feature
labels: [ "Feature request" ]
body:
  - type: textarea
    id: feature-request
    validations:
      required: true
    attributes:
      label: Feature request
      description: |
        A clear and concise description of the feature proposal. Please provide a link to the paper and code in case they exist.

  - type: textarea
    id: motivation
    validations:
      required: true
    attributes:
      label: Motivation
      description: |
        Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too.


  - type: textarea
    id: contribution
    validations:
      required: true
    attributes:
      label: Your contribution
      description: |
        Is there any way that you could help, e.g. by submitting a PR? Make sure to read the CONTRIBUTING.MD [readme](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)

================================================
FILE: .github/PULL_REQUEST_TEMPLATE.md
================================================
### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.

### Checklist Before Starting

- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI)
  - `{modules}` include `fsdp`, `megatron`, `veomni`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward`, `fully_async`, `one_step_off`
  - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
  - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s) if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

- [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always`
- [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
- [ ] If your PR is related to the `recipe` submodule, please also update the reference to the submodule commit via `git submodule update --remote` or `cd recipe && git pull origin main`.


================================================
FILE: .github/dependabot.yml
================================================
## Enabled the dependabot to check the dependencies of the project
## Dependabot will open pull requests to update dependencies automatically

version: 2
updates:
  - package-ecosystem: pip
    directory: "/"
    schedule:
      interval: weekly

================================================
FILE: .github/workflows/README.md
================================================
### Adding a New Workflow

When adding a new workflow for continuous integration (CI), you have two runner options: a fixed runner or a machine from the vemlp.

- **Fixed Runner**: To use a fixed runner, specify it in your workflow using the `runs-on` keyword, like `runs-on: [L20x8]`. 
- **Vemlp Runner**: Opting for a Vemlp machine allows you to launch tasks elastically. 

Here is a template to assist you. This template is designed for using Vemlp machines. Currently, for each workflow, you need to create a `setup` and a `cleanup` job. When using this template, the main parts you need to modify are the `IMAGE` environment variable and the specific `job steps`.

```yaml
name: Your Default Workflow

on:
  push:
    branches:
      - main
      - v0.*
  pull_request:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      - ".github/workflows/template.yml"

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

permissions:
  contents: read

env:
  IMAGE: "your vemlp image" # e.g. "verl-ci-cn-beijing.cr.volces.com/verlai/verl:sgl059.dev2"
  DYNAMIC_RUNNER_URL: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner" # public veFaas api

jobs:
  setup:
    if: github.repository_owner == 'verl-project'
    runs-on: ubuntu-latest
    outputs:
      runner-label: ${{ steps.create-runner.outputs.runner-label }}
      task-id: ${{ steps.create-runner.outputs.task-id }}
    steps:
      - uses: actions/checkout@v4
      - id: create-runner
        uses: volcengine/vemlp-github-runner@v1 
        with:
          mode: "create"
          faas-url: "${{ env.DYNAMIC_RUNNER_URL }}"
          image: "${{ env.DEFAULT_IMAGE }}"

  your_job:
    needs: setup
    runs-on: ["${{ needs.setup.outputs.runner-label || 'default-runner' }}"]
    steps:
      xxxx # your jobs

  cleanup:
    runs-on: ubuntu-latest
    needs: [setup, your_job]
    if: always()
    steps:
      - id: destroy-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "destroy"
          faas-url: "${{ env.DYNAMIC_RUNNER_URL }}"
          task-id: "${{ needs.setup.outputs.task-id }}"
```

### Model and Dataset
To avoid CI relies on network, we pre-download dataset on a NFS on the CI machine. The path for models are \${HOME}/models and the path for dataset is \${HOME}/models/hf_data.

================================================
FILE: .github/workflows/check-pr-title.yml
================================================
# # Tests layout

# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...

# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments

# Accelerators for tests 
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

# # Workflow layout

# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
#     - new workflow yaml is added to `.github/workflows`
#     - new tests are added to workflow mentioned in 2.


on:
  pull_request:
    types: [opened, edited, synchronize]

jobs:
  check-title:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Run PR title checker
        run: python3 tests/special_sanity/check_pr_title.py
        env:
          PR_TITLE: ${{ github.event.pull_request.title }}

      - name: Run PR description checker
        run: python3 tests/special_sanity/check_pr_description.py
        env:
          PR_TITLE: ${{ github.event.pull_request.title }}
          GITHUB_EVENT_PATH: ${{ github.event_path }}


================================================
FILE: .github/workflows/cpu_unit_tests.yml
================================================
# # Tests layout

# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...

# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments

# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

# # Workflow layout

# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
#     - new workflow yaml is added to `.github/workflows`
#     - new tests are added to workflow mentioned in 2.

name: cpu_unit_tests

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch
  push:
    branches:
      - main
      - v0.*
  pull_request:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      - .github/workflows/cpu_unit_tests.yml

# Cancel jobs on the same ref if a new one is triggered
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

# Declare permissions just read content.
permissions:
  contents: read

env:
  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:vllm017.dev2"
  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

jobs:
  setup:
    if: github.repository_owner == 'verl-project'
    runs-on: ubuntu-latest
    outputs:
      runner-label: ${{ steps.create-runner.outputs.runner-label }}
      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
    steps:
      - uses: actions/checkout@v4
      - id: create-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "create"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-image: "${{ env.IMAGE }}"

  cpu_unit_tests:
    needs: setup
    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
    timeout-minutes: 20 # Increase this timeout value as needed
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
      TORCH_COMPILE_DISABLE: 1
      TORCHINDUCTOR_DISABLE: 1
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          fetch-depth: 0
      - name: Install the current repository
        run: |
          pip3 install -r requirements-test.txt
          pip3 install --no-deps -e .
          pip3 install --upgrade "transformers>=5.0.0"
      - name: Download datasets
        run: |
          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
          python3 examples/data_preprocess/geo3k.py --local_dataset_path ${HOME}/models/hf_data/hiyouga/geometry3k
      - name: Running CPU unit tests
        run: |
          echo '[pytest]' > pytest.ini
          echo 'python_files = *_on_cpu.py' >> pytest.ini
          pytest -s -x --asyncio-mode=auto tests/
  cleanup:
    runs-on: ubuntu-latest
    needs: [setup, cpu_unit_tests]
    if: always()
    steps:
      - id: destroy-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "destroy"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"


================================================
FILE: .github/workflows/doc.yml
================================================
# # Tests layout

# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...

# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments

# Accelerators for tests 
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

# # Workflow layout

# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
#     - new workflow yaml is added to `.github/workflows`
#     - new tests are added to workflow mentioned in 2.


name: doc_test

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch
  push:
    branches:
      - main
      - v0.*
  pull_request:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      - "docs/**"
      - .github/workflows/doc.yml

# Cancel jobs on the same ref if a new one is triggered
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

# Declare permissions just read content.
permissions:
  contents: read      # for checkout
  pages: write        # for deploy-pages
  id-token: write     # for deploy-pages

jobs:
  doc_test:
    runs-on: ubuntu-latest
    timeout-minutes: 5 # Increase this timeout value as needed
    strategy:
      matrix:
        python-version: ["3.10"]
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
      - name: Set up Python ${{ matrix.python-version }}
        uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
        with:
          python-version: ${{ matrix.python-version }}
      - name: Install the current repository
        run: |
          pip3 install -r requirements-test.txt
          pip3 install --no-deps -e .
          pip install -r docs/requirements-docs.txt

      - name: Run doc make html
        run: |
          cd docs 
          make clean
          make html SPHINXOPTS="--keep-going -w _build/sphinx.log"
          if grep -q ": ERROR:" _build/sphinx.log; then
            echo "🚨 Sphinx doc build contained ERRORs - see _build/sphinx.log"
            exit 1
          fi
          if grep -q "WARNING: document isn't included in any toctree" _build/sphinx.log; then
            echo "🚨 Sphinx doc build contained WARNING. Please include newly added docs in index.rst. See _build/sphinx.log for details"
            exit 1
          fi
          if grep -q "WARNING: Inline emphasis" _build/sphinx.log; then
            echo "🚨 Sphinx doc build contained WARNING. Please check inline emphasis is correct. See _build/sphinx.log for details"
            exit 1
          fi
          if grep -q "WARNING: Definition list ends without a blank line" _build/sphinx.log; then
            echo "🚨 Sphinx doc build contained WARNING. Please check if the indentation is correct. See _build/sphinx.log for details"
            exit 1
          fi


================================================
FILE: .github/workflows/docker-build-ascend-a2.yml
================================================
name: docker-build-ascend-a2

on:
  workflow_dispatch:
  push:
    branches: ["main"]
    paths:
      - "docker/ascend/Dockerfile.ascend_8.5.0_a2"
      - ".github/workflows/docker-build-ascend-a2.yml"
  release:
    types: [published]
  schedule:
    - cron: "0 16 * * *"

jobs:
  build-ascend-image-a2:
    if: ${{ github.event_name != 'pull_request' && github.repository_owner == 'verl-project' }}
    runs-on: ubuntu-latest
    concurrency:
      group: ${{ github.workflow }}-${{ github.ref }}-build-ascend-image-a2
      cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
    steps:
      - name: Remove unnecessary parts in github actions runners to free up disk space
        uses: jlumbroso/free-disk-space@v1.3.1
        with:
          tool-cache: true

      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Get base image name and tag
        id: base_image
        run: |
          BASE_IMAGE_FULL=$(grep '^FROM' ./docker/ascend/Dockerfile.ascend_8.5.0_a2 | head -1 | cut -d' ' -f2)
          echo "Base image full: $BASE_IMAGE_FULL" 
          BASE_IMAGE_TAG=$(echo "$BASE_IMAGE_FULL" | cut -d':' -f2)
          echo "Base image tag: $BASE_IMAGE_TAG"
          NEW_IMAGE_NAME="verl-$BASE_IMAGE_TAG"
          echo "New image name: $NEW_IMAGE_NAME"  
          echo "base_image_tag=$BASE_IMAGE_TAG" >> "$GITHUB_OUTPUT"
          echo "new_image_name=$NEW_IMAGE_NAME" >> "$GITHUB_OUTPUT"

      - name: Get image tag
        id: version
        run: |
          BRANCH_NAME=$(echo "${{ github.ref }}" | sed 's/refs\/heads\///g' | sed 's/[^a-zA-Z0-9._-]/_/g')
          if [ "${{ github.event_name }}" = "release" ]; then
            echo "tag=${{ steps.base_image.outputs.new_image_name }}-${{ github.event.release.tag_name }}" >> "$GITHUB_OUTPUT"
          elif [ "$BRANCH_NAME" = "main" ]; then
            echo "tag=${{ steps.base_image.outputs.new_image_name }}-latest" >> "$GITHUB_OUTPUT"
          fi

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Login to Quay.io
        uses: docker/login-action@v3
        with:
          registry: quay.io
          username: ${{ secrets.QUAY_USERNAME }}
          password: ${{ secrets.QUAY_PASSWORD }}

      - name: Clean Docker cache before build
        run: |
          docker system prune -a -f --volumes || true

      - name: Build and push images Quay
        uses: docker/build-push-action@v6
        with:
          context: .
          platforms: linux/amd64,linux/arm64
          file: ./docker/ascend/Dockerfile.ascend_8.5.0_a2
          push: true
          tags: |
            quay.io/ascend/verl:${{ steps.version.outputs.tag }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
          build-args: |
            BUILDKIT_INLINE_CACHE=1


================================================
FILE: .github/workflows/docker-build-ascend-a3.yml
================================================
name: docker-build-ascend-a3

on:
  workflow_dispatch:
  push:
    branches: ["main"]
    paths:
      - "docker/ascend/Dockerfile.ascend_8.5.0_a3"
      - ".github/workflows/docker-build-ascend-a3.yml"
  release:
    types: [published]
  schedule:
    - cron: "0 19 * * *"

jobs:
  build-ascend-image-a3:
    if: ${{ github.event_name != 'pull_request' && github.repository_owner == 'verl-project' }}
    runs-on: ubuntu-latest
    concurrency:
      group: ${{ github.workflow }}-${{ github.ref }}-build-ascend-image-a3
      cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
    steps:
      - name: Remove unnecessary parts in github actions runners to free up disk space
        uses: jlumbroso/free-disk-space@v1.3.1
        with:
          tool-cache: true

      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Get base image name and tag
        id: base_image
        run: |
          BASE_IMAGE_FULL=$(grep '^FROM' ./docker/ascend/Dockerfile.ascend_8.5.0_a3 | head -1 | cut -d' ' -f2)
          echo "Base image full: $BASE_IMAGE_FULL" 
          BASE_IMAGE_TAG=$(echo "$BASE_IMAGE_FULL" | cut -d':' -f2)
          echo "Base image tag: $BASE_IMAGE_TAG"
          NEW_IMAGE_NAME="verl-$BASE_IMAGE_TAG"
          echo "New image name: $NEW_IMAGE_NAME"  
          echo "base_image_tag=$BASE_IMAGE_TAG" >> "$GITHUB_OUTPUT"
          echo "new_image_name=$NEW_IMAGE_NAME" >> "$GITHUB_OUTPUT"

      - name: Get image tag
        id: version
        run: |
          BRANCH_NAME=$(echo "${{ github.ref }}" | sed 's/refs\/heads\///g' | sed 's/[^a-zA-Z0-9._-]/_/g')
          if [ "${{ github.event_name }}" = "release" ]; then
            echo "tag=${{ steps.base_image.outputs.new_image_name }}-${{ github.event.release.tag_name }}" >> "$GITHUB_OUTPUT"
          elif [ "$BRANCH_NAME" = "main" ]; then
            echo "tag=${{ steps.base_image.outputs.new_image_name }}-latest" >> "$GITHUB_OUTPUT"
          fi

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Login to Quay.io
        uses: docker/login-action@v3
        with:
          registry: quay.io
          username: ${{ secrets.QUAY_USERNAME }}
          password: ${{ secrets.QUAY_PASSWORD }}

      - name: Clean Docker cache before build
        run: |
          docker system prune -a -f --volumes || true

      - name: Build and push images Quay
        uses: docker/build-push-action@v6
        with:
          context: .
          platforms: linux/amd64,linux/arm64
          file: ./docker/ascend/Dockerfile.ascend_8.5.0_a3
          push: true
          tags: |
            quay.io/ascend/verl:${{ steps.version.outputs.tag }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
          build-args: |
            BUILDKIT_INLINE_CACHE=1


================================================
FILE: .github/workflows/e2e_ascend.yml
================================================
# # Tests layout

# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...

# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments

# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

# # Workflow layout

# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
#     - new workflow yaml is added to `.github/workflows`
#     - new tests are added to workflow mentioned in 2.

name: e2e_ascend

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch
  push:
    branches:
      - main
      - v0.*
  pull_request:
    branches:
      - main
    paths:
      - ".github/workflows/e2e_ascend.yml"
      - "examples/data_preprocess/**"
      - "examples/grpo_trainer/**"
      - "examples/ppo_trainer/**"
      - "examples/sft/**"
      - "verl/experimental/one_step_off_policy/**"
      - "tests/special_npu/**"
      - "tests/special_sanity/check_device_api_usage.py"
      - "verl/**"
      - "pyproject.toml"
      - "requirements-npu.txt"
      - "setup.py"

# Cancel jobs on the same ref if a new one is triggered
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

permissions:
  contents: read

jobs:
  llm_rl_job:
    if: github.repository_owner == 'verl-project'
    name: E2E Ascend testing for RL training scenarios of LLM models
    runs-on: linux-aarch64-a2b3-8
    timeout-minutes: 120
    container:
      image: swr.cn-southwest-2.myhuaweicloud.com/modelfoundry/ascend-ci/verl/verl:verl-8.5.0-910b-ubuntu22.04-py3.11-latest
      options: >-
        --shm-size 16g
    env:
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - name: Check npu and CANN info
        run: |
          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
          npu-smi info
      - name: Check initial pip list from image
        run: |
          pip list
      - name: Checkout volcengine/verl repo
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
          clean: true
      - name: Install the current repository
        run: |
          pip install -r requirements-npu.txt
          pip install -e .
      - name: Check final pip list
        run: |
          pip list
      - name: Preprocess gsm8k dataset
        run: |
          python examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/.cache/datasets/openai/gsm8k
      - name: Running gsm8k e2e training tests with PPO on ASCEND NPU (FSDP backend)
        run: |
          ray stop --force
          bash tests/special_npu/run_qwen3_06b_ppo.sh
          rm -rf $HOME/ckpts
      - name: Running gsm8k e2e training tests with GRPO on ASCEND NPU (FSDP backend)
        run: |
          ray stop --force
          bash tests/special_npu/run_qwen2_5_05b_grpo.sh
          rm -rf $HOME/ckpts
      - name: Running gsm8k e2e training tests with GRPO on ASCEND NPU (MindSpeed backend)
        run: |
          ray stop --force
          USE_DIST_CKPT=True bash tests/special_npu/run_qwen2_5_05b_grpo_mindspeed.sh
          rm -rf $HOME/dist_ckpt/qwen2_5_05b_grpo_mindspeed
          rm -rf $HOME/ckpts
      - name: Running gsm8k e2e training tests with GRPO on ASCEND NPU (MindSpeed backend, MoE Model)
        run: |
          ray stop --force
          USE_DIST_CKPT=True USE_DUMMY_MODEL=True DUMMY_MODEL_CONFIG_PATH=tests/special_e2e/ppo_trainer/expert_parallel/qwen3moe_minimal.json DUMMY_MODEL_PATH=$HOME/dist_ckpt/qwen3_30b_grpo_mindspeed bash tests/special_npu/run_qwen3_30b_grpo_mindspeed.sh

  vlm_rl_job:
    if: github.repository_owner == 'verl-project'
    name: E2E Ascend testing for RL training scenarios of VLM models
    runs-on: linux-aarch64-a2b3-8
    timeout-minutes: 120
    container:
      image: swr.cn-southwest-2.myhuaweicloud.com/modelfoundry/ascend-ci/verl/verl:verl-8.5.0-910b-ubuntu22.04-py3.11-latest
      options: >-
        --shm-size 16g
    env:
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - name: Check npu and CANN info
        run: |
          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
          npu-smi info
      - name: Check initial pip list from image
        run: |
          pip list
      - name: Checkout volcengine/verl repo
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
          clean: true
      - name: Install the current repository
        run: |
          pip install -r requirements-npu.txt
          pip install -e .
      - name: Check final pip list
        run: |
          pip list
      - name: Preprocess geo3k dataset
        run: |
          python examples/data_preprocess/geo3k.py --local_dataset_path ${HOME}/.cache/datasets/hiyouga/geometry3k
      - name: Running geo3k e2e training tests with GRPO on ASCEND NPU
        run: |
          ray stop --force
          bash tests/special_npu/run_qwen2_5_vl_3b_npu.sh
          rm -rf $HOME/ckpts


================================================
FILE: .github/workflows/e2e_fully_async_policy.yml
================================================
# # Tests layout

# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...

# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments

# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

# # Workflow layout

# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
#     - new workflow yaml is added to `.github/workflows`
#     - new tests are added to workflow mentioned in 2.

name: e2e_fully_async_policy

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch
  # For push, for now only anti-patterns are specified so it is more conservative
  # and achieves higher coverage.
  push:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      - "!**/*.md"
      - "!**/*.sh"
      # Other entrypoints
      - "!examples/*trainer*"
      - "!tests/**"
      - "!verl/trainer/main_*.py"
      - "!verl/trainer/fsdp_sft_trainer.py"
      - "verl/experimental/fully_async_policy"
  pull_request:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      - "!**/*.md"
      - "!**/*.sh"
      # Other entrypoints
      - "!examples/**"
      - "!tests/**"
      - "!verl/trainer/main_*.py"
      - "!verl/trainer/fsdp_sft_trainer.py"
      # Home
      - "verl/experimental/fully_async_policy"
      # Entrypoints
      - ".github/workflows/e2e_fully_async_policy.yml"
      - "examples/data_preprocess/gsm8k.py"
      - "tests/special_e2e/run_fully_async_policy.sh"

# Cancel jobs on the same ref if a new one is triggered
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

# Declare permissions just read content.
permissions:
  contents: read

env:
  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:vllm017.dev2"
  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

jobs:
  setup:
    if: github.repository_owner == 'verl-project'
    runs-on: ubuntu-latest
    outputs:
      runner-label: ${{ steps.create-runner.outputs.runner-label }}
      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
    steps:
      - uses: actions/checkout@v4
      - id: create-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "create"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-image: "${{ env.IMAGE }}"

  # Test FSDP2 strategy
  e2e_fully_async_policy_fsdp2:
    needs: setup
    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
    timeout-minutes: 10 # Increase timeout for async training
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
      ACTOR_STRATEGY: "fsdp2"
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          fetch-depth: 0
      - name: Install the current repository
        run: |
          pip3 install -r requirements-test.txt
          pip3 install --no-deps -e .
          pip3 install cupy-cuda12x==13.6.0
      - name: Prepare GSM8K dataset
        run: |
          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
      - name: Running the E2E test with fully_async_policy algorithm (FSDP2)
        run: |
          ray stop --force
          bash tests/special_e2e/run_fully_async_policy.sh

  # Test Megatron strategy
  e2e_fully_async_policy_megatron:
    needs: setup
    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
    timeout-minutes: 10 # Increase timeout for async training
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
      ACTOR_STRATEGY: "megatron"
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          fetch-depth: 0
      - name: Install the current repository
        run: |
          pip3 install -r requirements-test.txt
          pip3 install --no-deps -e .
          pip3 install cupy-cuda12x==13.6.0
      - name: Prepare GSM8K dataset
        run: |
          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
      - name: Running the E2E test with fully_async_policy algorithm (Megatron)
        run: |
          ray stop --force
          bash tests/special_e2e/run_fully_async_policy.sh

  cleanup:
    runs-on: ubuntu-latest
    needs: [setup, e2e_fully_async_policy_fsdp2]
    if: always()
    steps:
      - id: destroy-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "destroy"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"


================================================
FILE: .github/workflows/e2e_fully_async_policy_ascend.yml
================================================
# # Tests layout

# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...

# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments

# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

# # Workflow layout

# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
#     - new workflow yaml is added to `.github/workflows`
#     - new tests are added to workflow mentioned in 2.

name: e2e_fully_async_policy_ascend

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch
  # For push, for now only anti-patterns are specified so it is more conservative
  # and achieves higher coverage.
  push:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      - "!**/*.md"
      - "!**/*.sh"
      # Other entrypoints
      - "!examples/*trainer*"
      - "!tests/**"
      - "!verl/trainer/main_*.py"
      - "!verl/trainer/fsdp_sft_trainer.py"
      - "verl/experimental/fully_async_policy"
  pull_request:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      - "!**/*.md"
      - "!**/*.sh"
      # Other entrypoints
      - "!examples/**"
      - "!tests/**"
      - "!verl/trainer/main_*.py"
      - "!verl/trainer/fsdp_sft_trainer.py"
      # Home
      - "verl/experimental/fully_async_policy"
      # Entrypoints
      - ".github/workflows/e2e_fully_async_policy_ascend.yml"
      - "examples/data_preprocess/gsm8k.py"
      - "tests/special_e2e/run_fully_async_policy.sh"

# Cancel jobs on the same ref if a new one is triggered
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

# Declare permissions just read content.
permissions:
  contents: read

jobs:
  # Test FSDP2 strategy
  e2e_fully_async_policy_fsdp2_ascend:
    if: github.repository_owner == 'verl-project'
    runs-on: linux-aarch64-a2b3-8
    timeout-minutes: 60 # Increase this timeout value as needed
    container:
      image: swr.cn-southwest-2.myhuaweicloud.com/modelfoundry/ascend-ci/verl/verl:verl-8.5.0-910b-ubuntu22.04-py3.11-latest
      options: >-
        --shm-size 16g
    env:
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
      ACTOR_STRATEGY: "fsdp2"
      device_name: "npu"
    steps:
      - name: Check npu and CANN info
        run: |
          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
          npu-smi info
      - name: Check initial pip list from image
        run: |
          pip list
      - name: Checkout verl-project/verl repo
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
          clean: true
      - name: Install the current repository
        run: |
          pip install -r requirements-npu.txt
          pip install --no-deps -e .
      - name: Check final pip list
        run: |
          pip list
      - name: Prepare weights
        run: |
          ln -s /root/.cache/models ~/models
      - name: Prepare GSM8K dataset
        run: |
          python examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/.cache/datasets/openai/gsm8k
      - name: Running the E2E test with fully_async_policy algorithm (FSDP2)
        run: |
          ray stop --force
          bash tests/special_e2e/run_fully_async_policy.sh

  # Test Megatron strategy
  e2e_fully_async_policy_megatron_ascend:
    if: github.repository_owner == 'verl-project'
    runs-on: linux-aarch64-a2b3-8
    timeout-minutes: 60 # Increase this timeout value as needed
    container:
      image: swr.cn-southwest-2.myhuaweicloud.com/modelfoundry/ascend-ci/verl/verl:verl-8.5.0-910b-ubuntu22.04-py3.11-latest
      options: >-
        --shm-size 16g
    env:
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
      ACTOR_STRATEGY: "megatron"
      device_name: "npu"
    steps:
      - name: Check npu and CANN info
        run: |
          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
          npu-smi info
      - name: Check initial pip list from image
        run: |
          pip list
      - name: Checkout verl-project/verl repo
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
          clean: true
      - name: Install the current repository
        run: |
          pip install -r requirements-npu.txt
          pip install --no-deps -e .
      - name: Check final pip list
        run: |
          pip list
      - name: Prepare weights
        run: |
          ln -s /root/.cache/models ~/models
      - name: Prepare GSM8K dataset
        run: |
          python examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/.cache/datasets/openai/gsm8k
      - name: Running the E2E test with fully_async_policy algorithm (Megatron)
        run: |
          ray stop --force
          bash tests/special_e2e/run_fully_async_policy.sh


================================================
FILE: .github/workflows/e2e_one_step_off_policy.yml
================================================
# # Tests layout

# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...

# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments

# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

# # Workflow layout

# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
#     - new workflow yaml is added to `.github/workflows`
#     - new tests are added to workflow mentioned in 2.

name: e2e_one_step_off_policy

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch
  # For push, for now only anti-patterns are specified so it is more conservative
  # and achieves higher coverage.
  push:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      - "!**/*.md"
      - "!**/*.sh"
      # Other entrypoints
      - "!examples/*trainer*"
      - "!tests/**"
      - "!verl/trainer/main_*.py"
      - "!verl/trainer/fsdp_sft_trainer.py"
      - "verl/experimental/one_step_off_policy"
  pull_request:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      - "!**/*.md"
      - "!**/*.sh"
      # Other entrypoints
      - "!examples/**"
      - "!tests/**"
      - "!verl/trainer/main_*.py"
      - "!verl/trainer/fsdp_sft_trainer.py"
      # Home
      - "verl/experimental/one_step_off_policy"
      # Entrypoints
      - ".github/workflows/e2e_one_step_off_policy.yml"
      - "examples/data_preprocess/gsm8k.py"
      - "tests/special_e2e/run_one_step_off_policy.sh"

# Cancel jobs on the same ref if a new one is triggered
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

# Declare permissions just read content.
permissions:
  contents: read

env:
  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:vllm017.dev2"
  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

jobs:
  setup:
    if: github.repository_owner == 'verl-project'
    runs-on: ubuntu-latest
    outputs:
      runner-label: ${{ steps.create-runner.outputs.runner-label }}
      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
    steps:
      - uses: actions/checkout@v4
      - id: create-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "create"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-image: "${{ env.IMAGE }}"

  # Test FSDP2 strategy
  e2e_one_step_off_policy_fsdp2:
    needs: setup
    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
    timeout-minutes: 10 # Increase timeout for async training
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
      ACTOR_STRATEGY: "fsdp2"
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          fetch-depth: 0
      - name: Install the current repository
        run: |
          pip3 install -r requirements-test.txt
          pip3 install --no-deps -e .
          pip3 install cupy-cuda12x==13.6.0
      - name: Prepare GSM8K dataset
        run: |
          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
      - name: Running the E2E test with one_step_off_policy algorithm (FSDP2)
        run: |
          ray stop --force
          bash tests/special_e2e/run_one_step_off_policy.sh

  # Test Megatron strategy
  e2e_one_step_off_policy_megatron:
    needs: setup
    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
    timeout-minutes: 10 # Increase timeout for async training
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
      ACTOR_STRATEGY: "megatron"
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          fetch-depth: 0
      - name: Install the current repository
        run: |
          pip3 install -r requirements-test.txt
          pip3 install --no-deps -e .
          pip3 install cupy-cuda12x==13.6.0
      - name: Prepare GSM8K dataset
        run: |
          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
      - name: Running the E2E test with one_step_off_policy algorithm (Megatron)
        run: |
          ray stop --force
          bash tests/special_e2e/run_one_step_off_policy.sh

  cleanup:
    runs-on: ubuntu-latest
    needs:
      [setup, e2e_one_step_off_policy_fsdp2, e2e_one_step_off_policy_megatron]
    if: always()
    steps:
      - id: destroy-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "destroy"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"


================================================
FILE: .github/workflows/e2e_one_step_off_policy_ascend.yml
================================================
# # Tests layout

# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...

# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments

# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

# # Workflow layout

# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
#     - new workflow yaml is added to `.github/workflows`
#     - new tests are added to workflow mentioned in 2.

name: e2e_one_step_off_policy_ascend

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch
  # For push, for now only anti-patterns are specified so it is more conservative
  # and achieves higher coverage.
  push:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      - "!**/*.md"
      - "!**/*.sh"
      # Other entrypoints
      - "!examples/*trainer*"
      - "!tests/**"
      - "!verl/trainer/main_*.py"
      - "!verl/trainer/fsdp_sft_trainer.py"
      - "verl/experimental/one_step_off_policy"
  pull_request:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      - "!**/*.md"
      - "!**/*.sh"
      # Other entrypoints
      - "!examples/**"
      - "!tests/**"
      - "!verl/trainer/main_*.py"
      - "!verl/trainer/fsdp_sft_trainer.py"
      # Home
      - "verl/experimental/one_step_off_policy"
      # Entrypoints
      - ".github/workflows/e2e_one_step_off_policy_ascend.yml"
      - "examples/data_preprocess/gsm8k.py"
      - "tests/special_e2e/run_one_step_off_policy.sh"

# Cancel jobs on the same ref if a new one is triggered
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

# Declare permissions just read content.
permissions:
  contents: read

jobs:
  # Test FSDP2 strategy
  e2e_one_step_off_policy_fsdp2_ascend:
    if: github.repository_owner == 'verl-project'
    runs-on: linux-aarch64-a2b3-8
    timeout-minutes: 60 # Increase this timeout value as needed
    container:
      image: swr.cn-southwest-2.myhuaweicloud.com/modelfoundry/ascend-ci/verl/verl:verl-8.5.0-910b-ubuntu22.04-py3.11-latest
      options: >-
        --shm-size 16g
    env:
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
      ACTOR_STRATEGY: "fsdp2"
      device_name: "npu"
    steps:
      - name: Check npu and CANN info
        run: |
          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
          npu-smi info
      - name: Check initial pip list from image
        run: |
          pip list
      - name: Checkout verl-project/verl repo
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
          clean: true
      - name: Install the current repository
        run: |
          pip install -r requirements-npu.txt
          pip install --no-deps -e .
      - name: Check final pip list
        run: |
          pip list
      - name: Prepare weights
        run: |
          ln -s /root/.cache/models ~/models
      - name: Prepare GSM8K dataset
        run: |
          python examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/.cache/datasets/openai/gsm8k
      - name: Running the E2E test with one_step_off_policy algorithm (FSDP2)
        run: |
          ray stop --force
          bash tests/special_e2e/run_one_step_off_policy.sh

  # Test Megatron strategy
  e2e_one_step_off_policy_megatron_ascend:
    if: github.repository_owner == 'verl-project'
    runs-on: linux-aarch64-a2b3-8
    timeout-minutes: 60 # Increase this timeout value as needed
    container:
      image: swr.cn-southwest-2.myhuaweicloud.com/modelfoundry/ascend-ci/verl/verl:verl-8.5.0-910b-ubuntu22.04-py3.11-latest
      options: >-
        --shm-size 16g
    env:
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
      ACTOR_STRATEGY: "megatron"
      device_name: "npu"
    steps:
      - name: Check npu and CANN info
        run: |
          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
          npu-smi info
      - name: Check initial pip list from image
        run: |
          pip list
      - name: Checkout verl-project/verl repo
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
          clean: true
      - name: Install the current repository
        run: |
          pip install -r requirements-npu.txt
          pip install --no-deps -e .
      - name: Check final pip list
        run: |
          pip list
      - name: Prepare weights
        run: |
          ln -s /root/.cache/models ~/models
      - name: Prepare GSM8K dataset
        run: |
          python examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/.cache/datasets/openai/gsm8k
      - name: Running the E2E test with one_step_off_policy algorithm (Megatron)
        run: |
          ray stop --force
          export PYTHONPATH=$PYTHONPATH:/Megatron-LM
          bash tests/special_e2e/run_one_step_off_policy.sh


================================================
FILE: .github/workflows/e2e_ppo_grpo_trainer_trtllm.yml
================================================
# # Tests layout

# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...

# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments

# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

# # Workflow layout

# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
#     - new workflow yaml is added to `.github/workflows`
#     - new tests are added to workflow mentioned in 2.

name: e2e_ppo_trainer_megatron_trtllm

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch.
  # For push, for now only anti-patterns are specified so it is more conservative
  # and achieves higher coverage.
  push:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      # Other entrypoints
      - "!verl/trainer/fsdp_sft_trainer.py"
      # Recipes
      - "!recipe/**"
      # FSDP
      - "!verl/workers/**/*dp_*.py"
  pull_request:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      # Other entrypoints
      - "!docker/**"
      # Docs
      - "!**/*.md"
      - "!docs/**"
      - "!examples/**"
      - "!tests/**"
      - "!verl/trainer/main_*.py"
      - "!verl/trainer/fsdp_sft_trainer.py"
      # Recipes
      - "!recipe/**"
      # FSDP
      - "!verl/workers/**/*dp_*.py"
      # Entrypoints
      - "verl/workers/rollout/trtllm_rollout/**"
      - "tests/workers/rollout/rollout_trtllm/**"
      - ".github/workflows/e2e_ppo_grpo_trainer_trtllm.yml"
      - "examples/data_preprocess/gsm8k.py"
      - "examples/data_preprocess/geo3k.py"
      - "examples/data_preprocess/dapo_multiturn_w_tool.py"
      - "examples/data_preprocess/aime2024_multiturn_w_tool.py"
      - "examples/grpo_trainer/run_qwen2-7b_math_trtllm.sh"
      - "examples/grpo_trainer/run_qwen2-7b_math_megatron_trtllm.sh"
      - "examples/grpo_trainer/run_qwen3-30b_dapo_megatron_fp8_trtllm.sh"
      # add back when ppo flow is ready
      # - "tests/special_e2e/run_ppo_trainer_megatron.sh"
      # - "verl/trainer/main_ppo.py"
      # - "verl/trainer/config/ppo_megatron_trainer.yaml"

# Cancel jobs on the same ref if a new one is triggered
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

# Declare permissions just read content.
permissions:
  contents: read

env:
  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:trtllm1.3.0rc4"
  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

jobs:
  setup:
    if: github.repository_owner == 'verl-project'
    runs-on: ubuntu-latest
    outputs:
      runner-label: ${{ steps.create-runner.outputs.runner-label }}
      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
    steps:
      - uses: actions/checkout@v4
      - id: create-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "create"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-image: "${{ env.IMAGE }}"

  trtllm_unit_tests:
    needs: setup
    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
    timeout-minutes: 30 # Increase this timeout value as needed
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          fetch-depth: 0
      - name: Install the current repository
        run: |
          pip3 install pytest-asyncio
          pip3 install -r requirements-test.txt
          pip3 install --no-deps -e .
      - name: Run TRTLLM unit tests
        run: |
          export TRTLLM_TEST_MODEL_PATH_ROOT="${HOME}/models"
          ray stop --force
          pytest -v -s \
            tests/workers/rollout/rollout_trtllm/test_adapter.py \
            tests/workers/rollout/rollout_trtllm/test_async_server.py \
            tests/workers/rollout/rollout_trtllm/test_trtllm_rollout_utils.py

  e2e_grpo_trainer_fsdp-qwen2:
    needs: setup
    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
    timeout-minutes: 30 # Increase this timeout value as needed
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          fetch-depth: 0
      - name: Install the current repository
        run: |
          pip3 install -r requirements-test.txt
          pip3 install --no-deps -e .
      - name: Prepare GSM8K dataset
        run: |
          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k --local_save_dir ${PWD}/data/gsm8k
      - name: Running GSM8K E2E training tests with FSDP on 8 L20 GPUs (Qwen)
        run: |
          ray stop --force
          DATADIR=${HOME}/data \
            bash examples/grpo_trainer/run_qwen2-7b_math_trtllm.sh 2 \
            trainer.total_training_steps=1 \
            data.train_files="['${PWD}/data/gsm8k/train.parquet']" \
            data.val_files="['${PWD}/data/gsm8k/test.parquet']" \
            trainer.logger='["console"]' \
            actor_rollout_ref.model.path="${HOME}/models/Qwen/Qwen2.5-0.5B-Instruct"
      - name: clean up
        run: |
          rm -rf checkpoints

  e2e_grpo_trainer_megatron-qwen2:
    needs: setup
    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
    timeout-minutes: 30 # Increase this timeout value as needed
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          fetch-depth: 0
      - name: Install the current repository
        run: |
          pip3 install -r requirements-test.txt
          pip3 install --no-deps -e .
      - name: Prepare GSM8K dataset
        run: |
          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k --local_save_dir ${PWD}/data/gsm8k
      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen)
        run: |
          ray stop --force
          DATADIR=${HOME}/data \
          ACTOR_TP=2 \
            bash examples/grpo_trainer/run_qwen2-7b_math_megatron_trtllm.sh 2 \
            trainer.total_training_steps=1 \
            data.train_files="['${PWD}/data/gsm8k/train.parquet']" \
            data.val_files="['${PWD}/data/gsm8k/test.parquet']" \
            trainer.logger='["console"]' \
            actor_rollout_ref.model.path="${HOME}/models/Qwen/Qwen2.5-0.5B-Instruct"
      - name: clean up
        run: |
          rm -rf checkpoints
  e2e_grpo_trainer_fsdp-vlm:
    needs: setup
    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
    timeout-minutes: 30 # Increase this timeout value as needed
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          fetch-depth: 0
      - name: Install the current repository
        run: |
          pip3 install -r requirements-test.txt
          pip3 install --no-deps -e .
          pip3 install qwen_vl_utils 
          pip3 install mathruler
      - name: Prepare GEO3K dataset
        run: |
          python3 examples/data_preprocess/geo3k.py --local_dataset_path ${HOME}/models/hf_data/hiyouga/geometry3k --local_save_dir ${PWD}/data/geo3k
      - name: Running GEO3K E2E training tests with FSDP on 8 L20 GPUs (VLM)
        run: |
          ray stop --force
          DATADIR=${HOME}/data \
            bash examples/grpo_trainer/run_qwen2_5_vl_3b_trtllm.sh 2 \
            trainer.total_training_steps=1 \
            data.train_files="['${PWD}/data/geo3k/train.parquet']" \
            data.val_files="['${PWD}/data/geo3k/test.parquet']" \
            trainer.logger='["console"]' \
            actor_rollout_ref.model.path="${HOME}/models/Qwen/Qwen3-VL-2B-Instruct"
      - name: clean up
        run: |
          rm -rf checkpoints
      - name: Prepare DAPO-Math-17k and AIME-2024 datasets (data_preprocess)
        run: |
          python3 examples/data_preprocess/dapo_multiturn_w_tool.py --local_save_dir ${PWD}/data/dapo-math-17k
          python3 examples/data_preprocess/aime2024_multiturn_w_tool.py --local_save_dir ${PWD}/data/aime-2024
      - name: Running DAPO E2E with FP8 TRT-LLM rollout (Qwen3-0.6B)
        run: |
          ray stop --force
          export INFER_TP=2 ACTOR_TP=2 ACTOR_PP=2 ACTOR_VPP=2 ACTOR_EP=1 ACTOR_CP=2 REF_TP=2 REF_PP=2 REF_VPP=2 REF_EP=1 REF_CP=2 GEN_MOE_TP=null GEN_MOE_EP=null
          export NNODES=1 GPUS_PER_NODE=8 TRTLLM_MOE_BACKEND=CUTLASS
          export DATA_DIR=${PWD} DAPO_MATH_TRAIN=${PWD}/data/dapo-math-17k/train.parquet AIME_VAL=${PWD}/data/aime-2024/train.parquet MODEL_PATH=${HOME}/models/Qwen/Qwen3-0.6B
          bash examples/grpo_trainer/run_qwen3-30b_dapo_megatron_fp8_trtllm.sh \
            reward_model.reward_kwargs.overlong_buffer_cfg.len=258 \
            reward_model.reward_kwargs.max_resp_len=512 \
            data.max_prompt_length=512 \
            data.max_response_length=512 \
            data.train_batch_size=32 \
            actor_rollout_ref.rollout.n=4 \
            actor_rollout_ref.rollout.max_num_seqs=16 \
            actor_rollout_ref.rollout.max_num_batched_tokens=1024 \
            actor_rollout_ref.rollout.max_model_len=1024 \
            actor_rollout_ref.actor.megatron.override_transformer_config.moe_grouped_gemm=False \
            actor_rollout_ref.actor.megatron.override_transformer_config.moe_permute_fusion=False \
            trainer.total_training_steps=1 \
            trainer.logger='["console"]'
      - name: clean up
        run: |
          rm -rf checkpoints

  cleanup:
    runs-on: ubuntu-latest
    needs: [setup, trtllm_unit_tests, e2e_grpo_trainer_fsdp-qwen2, e2e_grpo_trainer_megatron-qwen2, e2e_grpo_trainer_fsdp-vlm]
    if: always()
    steps:
      - id: destroy-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "destroy"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"


================================================
FILE: .github/workflows/e2e_ppo_trainer.yml
================================================
name: e2e_ppo_trainer

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch
  # For push, for now only anti-patterns are specified so it is more conservative
  # and achieves higher coverage.
  push:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      # Other entrypoints
      - "!verl/trainer/fsdp_sft_trainer.py"

      # Megatron
      - "!verl/workers/**/megatron_*.py"

  pull_request:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      # Other entrypoints
      - "!**/*.md"
      - "!docker/**"
      - "!examples/**"
      - "!tests/**"
      - "!verl/trainer/main_*.py"
      - "!verl/trainer/fsdp_sft_trainer.py"
      # Docs
      - "!docs/**"

      # Megatron
      - "!verl/workers/**/megatron_*.py"
      # Entrypoints
      - ".github/workflows/e2e_ppo_trainer.yml"
      - "examples/data_preprocess/gsm8k.py"
      - "examples/data_preprocess/geo3k.py"
      - "tests/special_e2e/ppo_trainer"
      - "verl/trainer/main_ppo.py"
      - "verl/trainer/config/ppo_trainer.yaml"

# Cancel jobs on the same ref if a new one is triggered
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

# Declare permissions just read content.
permissions:
  contents: read

jobs:
  pre_commit_for_ppo:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ["3.12"]
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
      - name: Set up Python ${{ matrix.python-version }}
        uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
        with:
          python-version: ${{ matrix.python-version }}
      - name: Install the current repository
        run: |
          pip install pre-commit hydra-core
          pip3 install --no-deps -e .
      - name: Set ruff --output-format=github
        run: |
          sed -i 's/--output-format=full/--output-format=github/' .pre-commit-config.yaml
          git add .pre-commit-config.yaml
      - uses: pre-commit/action@v3.0.1
        with:
          extra_args: "" # Overriding default "--all-files"



================================================
FILE: .github/workflows/e2e_ppo_trainer_megatron_sglang.yml
================================================
# # Tests layout

# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...

# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments

# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

# # Workflow layout

# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
#     - new workflow yaml is added to `.github/workflows`
#     - new tests are added to workflow mentioned in 2.

name: e2e_ppo_trainer_megatron_sglang

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch.
  # For push, for now only anti-patterns are specified so it is more conservative
  # and achieves higher coverage.
  push:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      # Other entrypoints
      - "!verl/trainer/fsdp_sft_trainer.py" # FSDP
      - "!verl/workers/**/*dp_*.py"
      - "!verl/utils/fsdp_utils.py"
      - "!verl/utils/checkpoint/fsdp_checkpoint_manager.py"
      - "!verl/model_merger/fsdp_model_merger.py"
  pull_request:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      # Other entrypoints
      - "!docker/**"
      # Docs
      - "!**/*.md"
      - "!docs/**"
      - "!examples/**"
      - "!tests/**"
      - "!verl/trainer/main_*.py"
      - "!verl/trainer/fsdp_sft_trainer.py" # FSDP
      - "!verl/workers/**/*dp_*.py"
      - "!verl/utils/fsdp_utils.py"
      - "!verl/utils/checkpoint/fsdp_checkpoint_manager.py"
      - "!verl/model_merger/fsdp_model_merger.py"
      # Entrypoints
      - "verl/worksers/rollout/sglang_rollout/*"
      - ".github/workflows/e2e_ppo_trainer_megatron_sglang.yml"
      - "examples/data_preprocess/gsm8k.py"
      - "examples/data_preprocess/geo3k.py"
      - "tests/special_e2e/run_ppo_trainer_megatron.sh"
      - "verl/trainer/main_ppo.py"
      - "verl/trainer/config/ppo_megatron_trainer.yaml"

# Cancel jobs on the same ref if a new one is triggered
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

# Declare permissions just read content.
permissions:
  contents: read

env:
  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:sgl059.dev2"
  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

jobs:
  setup:
    if: github.repository_owner == 'verl-project'
    runs-on: ubuntu-latest
    outputs:
      runner-label: ${{ steps.create-runner.outputs.runner-label }}
      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
    steps:
      - uses: actions/checkout@v4
      - id: create-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "create"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-image: "${{ env.IMAGE }}"

  e2e_ppo_trainer_megatron-deepseek:
    needs: setup
    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
    timeout-minutes: 60 # Increase this timeout value as needed
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
      ENGINE: sglang
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          fetch-depth: 0
      - name: Install the current repository
        run: |
          pip3 install -r requirements-test.txt
          pip3 install git+https://github.com/ISEEKYAN/mbridge.git@main --no-deps --no-build-isolation
          pip3 install --no-deps -e .
      - name: Prepare GSM8K dataset
        run: |
          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (DeepSeek)
        run: |
          ray stop --force
          OPTIM_MEMORY_EFFICIENT=True ENGINE=sglang SAVE_FREQ=1 MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct bash tests/special_e2e/run_ppo_trainer_megatron.sh
      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (DeepSeek)
        run: |
          ray stop --force
          export VLLM_USE_V1=1
          ray start --head
          ENGINE=sglang MODE=async RESUME_MODE=auto MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct TOTAL_TRAIN_STEPS=2 bash tests/special_e2e/run_ppo_trainer_megatron.sh
      - name: Profiling GRPO GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Deepseek)
        run: |
          ray stop --force
          PROFILE_ENABLE=True ENGINE=sglang ADV_ESTIMATOR=grpo USE_DYNAMIC_BSZ=False MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct bash tests/special_e2e/run_ppo_trainer_megatron.sh
          if [ -z "$( ls -A '/tmp/ray/session_latest/logs/nsight/' )" ]; then
            echo "[ERROR] not found any profiling files"
            exit 1
          else
            echo "[SUCCESS] profile success"
          fi
      - name: clean up
        run: |
          rm -rf checkpoints

  # Qwen3-0.6B: dense, tie_word_embeddings=True
  e2e_ppo_trainer_megatron-qwen3:
    needs: setup
    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
    timeout-minutes: 60 # Increase this timeout value as needed
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
      ENGINE: sglang
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          fetch-depth: 0
      - name: Install the current repository
        run: |
          pip3 install -r requirements-test.txt
          pip3 install --no-deps -e .
      - name: Prepare GSM8K dataset
        run: |
          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen3) testing learning rate scheduler
        run: |
          ray stop --force
          ALL_OFFLOAD=True VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 LR_WARMUP_STEPS=1 TOTAL_TRAIN_STEPS=2 MODEL_ID=Qwen/Qwen3-0.6B bash tests/special_e2e/run_ppo_trainer_megatron.sh
      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with FP8 rollout
        run: |
          ray stop --force
          export VLLM_USE_V1=1
          ROLLOUT_QUANTIZATION=fp8 TOTAL_TRAIN_STEPS=2 MODEL_ID=Qwen/Qwen3-0.6B bash tests/special_e2e/run_ppo_trainer_megatron.sh
      - name: clean up
        run: |
          rm -rf checkpoints

  cleanup:
    runs-on: ubuntu-latest
    needs:
      [setup, e2e_ppo_trainer_megatron-deepseek, e2e_ppo_trainer_megatron-qwen3]
    if: always()
    steps:
      - id: destroy-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "destroy"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"


================================================
FILE: .github/workflows/e2e_ppo_trainer_megatron_sglang_2.yml
================================================
# # Tests layout

# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...

# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments

# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

# # Workflow layout

# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
#     - new workflow yaml is added to `.github/workflows`
#     - new tests are added to workflow mentioned in 2.

name: e2e_ppo_trainer_megatron_sglang_2

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch.
  # For push, for now only anti-patterns are specified so it is more conservative
  # and achieves higher coverage.
  push:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      # Other entrypoints
      - "!verl/trainer/fsdp_sft_trainer.py" # FSDP
      - "!verl/workers/**/*dp_*.py"
      - "!verl/utils/fsdp_utils.py"
      - "!verl/utils/checkpoint/fsdp_checkpoint_manager.py"
      - "!verl/model_merger/fsdp_model_merger.py"
  pull_request:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      # Other entrypoints
      - "!docker/**"
      # Docs
      - "!**/*.md"
      - "!docs/**"
      - "!examples/**"
      - "!tests/**"
      - "!verl/trainer/main_*.py"
      - "!verl/trainer/fsdp_sft_trainer.py" # FSDP
      - "!verl/workers/**/*dp_*.py"
      - "!verl/utils/fsdp_utils.py"
      - "!verl/utils/checkpoint/fsdp_checkpoint_manager.py"
      - "!verl/model_merger/fsdp_model_merger.py"
      # Entrypoints
      - "verl/worksers/rollout/sglang_rollout/*"
      - ".github/workflows/e2e_ppo_trainer_megatron_sglang.yml"
      - "examples/data_preprocess/gsm8k.py"
      - "examples/data_preprocess/geo3k.py"
      - "tests/special_e2e/run_ppo_trainer_megatron.sh"
      - "verl/trainer/main_ppo.py"
      - "verl/trainer/config/ppo_megatron_trainer.yaml"

# Cancel jobs on the same ref if a new one is triggered
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

# Declare permissions just read content.
permissions:
  contents: read

env:
  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:sgl059.dev2"
  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

jobs:
  setup:
    if: github.repository_owner == 'verl-project'
    runs-on: ubuntu-latest
    outputs:
      runner-label: ${{ steps.create-runner.outputs.runner-label }}
      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
    steps:
      - uses: actions/checkout@v4
      - id: create-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "create"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-image: "${{ env.IMAGE }}"

  e2e_ppo_trainer_fsdp_sglang:
    needs: setup
    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
    timeout-minutes: 40 # Increase this timeout value as needed
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          fetch-depth: 0
      - name: Install the current repository
        run: |
          pip3 install -r requirements-test.txt
          pip3 install --no-deps -e .
      - name: Prepare gsm8k dataset
        run: |
          ray stop --force
          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
      - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm and save ckpt
        run: |
          ray stop --force
          ENGINE=sglang bash tests/special_e2e/ppo_trainer/run_function_reward.sh

  e2e_ppo_trainer_fsdp-qwen2_5vl-3b:
    needs: setup
    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
    timeout-minutes: 60 # Increase this timeout value as needed
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          fetch-depth: 0
      - name: Install the current repository
        run: |
          pip3 install -r requirements-test.txt
          pip3 install --no-deps -e .
      # Geo3k
      - name: Prepare GEO3K dataset
        run: |
          ray stop --force
          python3 examples/data_preprocess/geo3k.py --local_dataset_path ${HOME}/models/hf_data/hiyouga/geometry3k/
      - name: Running GEO3K VLM E2E training tests on 8 L20 GPUs with rmpad using function rm
        run: |
          ray stop --force
          TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \
            MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \
            MODEL_ID=Qwen/Qwen2.5-VL-3B-Instruct \
            ADV_ESTIMATOR=grpo RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \
            ENGINE=sglang ROLLOUT_MODE=async GPU_MEMORY_UTILIZATION=0.6 ACTOR_FSDP_PARAM_OFFLOAD=True \
            ACTOR_FSDP_OPTIMIZER_OFFLOAD=True REF_FSDP_PARAM_OFFLOAD=True \
            bash tests/special_e2e/ppo_trainer/run_function_reward.sh
      - name: Running GEO3K VLM E2E with rmpad using torch fused kernel (Qwen2.5-VL)
        run: |
          ray stop --force
          FUSED_KERNELS=True TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \
            MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \
            MODEL_ID=Qwen/Qwen2.5-VL-3B-Instruct \
            ADV_ESTIMATOR=grpo RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \
            ENGINE=sglang ROLLOUT_MODE=async GPU_MEMORY_UTILIZATION=0.6 ACTOR_FSDP_PARAM_OFFLOAD=True \
            ACTOR_FSDP_OPTIMIZER_OFFLOAD=True REF_FSDP_PARAM_OFFLOAD=True \
            bash tests/special_e2e/ppo_trainer/run_function_reward.sh
      - name: Running GEO3K VLM E2E with rmpad using triton fused kernel (Qwen2.5-VL)
        run: |
          ray stop --force
          FUSED_KERNELS=True FUSED_KERNEL_BACKEND=triton \
            TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \
            MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \
            MODEL_ID=Qwen/Qwen2.5-VL-3B-Instruct \
            ADV_ESTIMATOR=grpo RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \
            ENGINE=sglang ROLLOUT_MODE=async GPU_MEMORY_UTILIZATION=0.6 ACTOR_FSDP_PARAM_OFFLOAD=True \
            ACTOR_FSDP_OPTIMIZER_OFFLOAD=True REF_FSDP_PARAM_OFFLOAD=True \
            bash tests/special_e2e/ppo_trainer/run_function_reward.sh

  cleanup:
    runs-on: ubuntu-latest
    needs:
      [setup, e2e_ppo_trainer_fsdp-qwen2_5vl-3b, e2e_ppo_trainer_fsdp_sglang]
    if: always()
    steps:
      - id: destroy-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "destroy"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"


================================================
FILE: .github/workflows/e2e_ppo_trainer_megatron_vllm.yml
================================================
# # Tests layout

# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...

# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments

# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

# # Workflow layout

# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
#     - new workflow yaml is added to `.github/workflows`
#     - new tests are added to workflow mentioned in 2.

name: e2e_ppo_trainer_megatron_vllm

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch.
  # For push, for now only anti-patterns are specified so it is more conservative
  # and achieves higher coverage.
  push:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      # Other entrypoints
      - "!verl/trainer/fsdp_sft_trainer.py"
      # FSDP
      - "!verl/workers/**/*dp_*.py"
      - "!verl/utils/fsdp_utils.py"
      - "!verl/utils/checkpoint/fsdp_checkpoint_manager.py"
      - "!verl/model_merger/fsdp_model_merger.py"
  pull_request:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      # Other entrypoints
      - "!docker/**"
      # Docs
      - "!**/*.md"
      - "!docs/**"
      - "!examples/**"
      - "!tests/**"
      - "!verl/trainer/main_*.py"
      - "!verl/trainer/fsdp_sft_trainer.py"
      # FSDP
      - "!verl/workers/**/*dp_*.py"
      - "!verl/utils/fsdp_utils.py"
      - "!verl/utils/checkpoint/fsdp_checkpoint_manager.py"
      - "!verl/model_merger/fsdp_model_merger.py"
      # Entrypoints
      - ".github/workflows/e2e_ppo_trainer_megatron_vllm.yml"
      - "examples/data_preprocess/gsm8k.py"
      - "examples/data_preprocess/geo3k.py"
      - "tests/special_e2e/run_ppo_trainer_megatron.sh"
      - "verl/trainer/main_ppo.py"
      - "verl/trainer/config/ppo_megatron_trainer.yaml"

# Cancel jobs on the same ref if a new one is triggered
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

# Declare permissions just read content.
permissions:
  contents: read

env:
  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:vllm017.dev2"
  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

jobs:
  setup:
    if: github.repository_owner == 'verl-project'
    runs-on: ubuntu-latest
    outputs:
      runner-label: ${{ steps.create-runner.outputs.runner-label }}
      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
    steps:
      - uses: actions/checkout@v4
      - id: create-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "create"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-image: "${{ env.IMAGE }}"

  # deepseek-ai/deepseek-coder-1.3b-instruct: dense, tie_word_embeddings=False
  e2e_ppo_trainer_megatron-deepseek:
    needs: setup
    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
    timeout-minutes: 60 # Increase this timeout value as needed
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          fetch-depth: 0
      - name: Install the current repository
        run: |
          pip3 install -r requirements-test.txt
          pip3 install --no-deps --force-reinstall .
          pip3 install git+https://github.com/ISEEKYAN/mbridge.git@main --no-deps --no-build-isolation
          pip3 install math-verify
      - name: Prepare GSM8K dataset
        run: |
          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
      # Full training save&load
      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron, use mbridge e2e to pre-load and save (Deepseek)
        run: |
          ray stop --force
          ALL_OFFLOAD=True SAVE_FREQ=1 MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct COMMON_PP=4 COMMON_VPP=null COMMON_CP=1 USE_MBRIDGE=True USE_DIST_CKPT=False \
          bash tests/special_e2e/run_ppo_trainer_megatron.sh
      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron, use mbridge e2e to pre-load and save (Deepseek)
        run: |
          ray stop --force
          RESUME_MODE=auto MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct TOTAL_TRAIN_STEPS=2 SAVE_FREQ=1 COMMON_PP=4 COMMON_VPP=null COMMON_CP=1 USE_MBRIDGE=True USE_DIST_CKPT=False \
          bash tests/special_e2e/run_ppo_trainer_megatron.sh
      # LoRA training save&load
      - name: clean up and install Megatron-Bridge
        run: |
          rm -rf checkpoints
          pip3 install git+https://github.com/NVIDIA-NeMo/Megatron-Bridge.git@83a7c11 --no-deps --no-build-isolation
          pip3 install git+https://github.com/NVIDIA/Megatron-LM.git@5455f0a --no-deps --no-build-isolation
          pip3 install "nvidia-modelopt[torch]>=0.37.0" transformers==4.57.1
      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron, use Megatron-Bridge LoRA e2e to pre-load and save (Deepseek)
        run: |
          ray stop --force
          ALL_OFFLOAD=True SAVE_FREQ=1 MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct COMMON_PP=4 LORA_RANK=8 COMMON_VPP=null COMMON_CP=1 USE_MBRIDGE=True VANILLA_MBRIDGE=False VALUE_VANILLA_MBRIDGE=False USE_DIST_CKPT=False \
          bash tests/special_e2e/run_ppo_trainer_megatron.sh
      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron, use Megatron-Bridge LoRA e2e to pre-load and save (Deepseek)
        run: |
          ray stop --force
          RESUME_MODE=auto MODEL_ID=deepseek-ai/deepseek-coder-1.3b-instruct TOTAL_TRAIN_STEPS=2 SAVE_FREQ=1 COMMON_PP=4 LORA_RANK=8 COMMON_VPP=null COMMON_CP=1 USE_MBRIDGE=True VANILLA_MBRIDGE=False VALUE_VANILLA_MBRIDGE=False USE_DIST_CKPT=False \
          bash tests/special_e2e/run_ppo_trainer_megatron.sh
      - name: clean up
        run: |
          rm -rf checkpoints

  # Qwen3-0.6B: dense, tie_word_embeddings=True
  e2e_ppo_trainer_megatron-qwen3:
    needs: setup
    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
    timeout-minutes: 60 # Increase this timeout value as needed
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          fetch-depth: 0
      - name: Install the current repository
        run: |
          pip3 install -r requirements-test.txt
          pip3 install --no-deps -e .
          pip3 install math-verify
      - name: Prepare GSM8K dataset
        run: |
          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron (Qwen3) testing learning rate scheduler
        run: |
          ray stop --force
          ALL_OFFLOAD=True VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 LR_WARMUP_STEPS=1 TOTAL_TRAIN_STEPS=2 MODEL_ID=Qwen/Qwen3-0.6B bash tests/special_e2e/run_ppo_trainer_megatron.sh
      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with FP8 rollout
        run: |
          ray stop --force
          export VLLM_USE_V1=1
          ROLLOUT_QUANTIZATION=fp8 TOTAL_TRAIN_STEPS=2 MODEL_ID=Qwen/Qwen3-0.6B bash tests/special_e2e/run_ppo_trainer_megatron.sh
      - name: clean up
        run: |
          rm -rf checkpoints

  cleanup:
    runs-on: ubuntu-latest
    needs:
      [setup, e2e_ppo_trainer_megatron-deepseek, e2e_ppo_trainer_megatron-qwen3]
    if: always()
    steps:
      - id: destroy-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "destroy"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"


================================================
FILE: .github/workflows/e2e_ppo_trainer_megatron_vllm_2.yml
================================================
# # Tests layout

# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...

# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments

# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

# # Workflow layout

# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
#     - new workflow yaml is added to `.github/workflows`
#     - new tests are added to workflow mentioned in 2.

name: e2e_ppo_trainer_megatron_vllm_2

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch.
  # For push, for now only anti-patterns are specified so it is more conservative
  # and achieves higher coverage.
  push:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      # Other entrypoints
      - "!verl/trainer/fsdp_sft_trainer.py"
      # FSDP
      - "!verl/workers/**/*dp_*.py"
      - "!verl/utils/fsdp_utils.py"
      - "!verl/utils/checkpoint/fsdp_checkpoint_manager.py"
      - "!verl/model_merger/fsdp_model_merger.py"
  pull_request:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      # Other entrypoints
      - "!docker/**"
      # Docs
      - "!**/*.md"
      - "!docs/**"
      - "!examples/**"
      - "!tests/**"
      - "!verl/trainer/main_*.py"
      - "!verl/trainer/fsdp_sft_trainer.py"
      # FSDP
      - "!verl/workers/**/*dp_*.py"
      - "!verl/utils/fsdp_utils.py"
      - "!verl/utils/checkpoint/fsdp_checkpoint_manager.py"
      - "!verl/model_merger/fsdp_model_merger.py"
      # Entrypoints
      - ".github/workflows/e2e_ppo_trainer_megatron_vllm_2.yml"
      - "examples/data_preprocess/gsm8k.py"
      - "examples/data_preprocess/geo3k.py"
      - "tests/special_e2e/run_ppo_trainer_megatron.sh"
      - "verl/trainer/main_ppo.py"
      - "verl/trainer/config/ppo_megatron_trainer.yaml"

# Cancel jobs on the same ref if a new one is triggered
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

# Declare permissions just read content.
permissions:
  contents: read

env:
  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:vllm017.dev2"
  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

jobs:
  setup:
    if: github.repository_owner == 'verl-project'
    runs-on: ubuntu-latest
    outputs:
      runner-label: ${{ steps.create-runner.outputs.runner-label }}
      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
    steps:
      - uses: actions/checkout@v4
      - id: create-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "create"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-image: "${{ env.IMAGE }}"

  e2e_ppo_trainer_megatron-moe-expert-parallel:
    needs: setup
    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
    timeout-minutes: 60 # Increase this timeout value as needed
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          fetch-depth: 0
      - name: Install the current repository
        run: |
          pip3 install -r requirements-test.txt
          pip3 install --no-deps --force-reinstall .
          pip3 install git+https://github.com/NVIDIA-NeMo/Megatron-Bridge.git@83a7c11 --no-deps --no-build-isolation
          pip3 install git+https://github.com/NVIDIA/Megatron-LM.git@5455f0a --no-deps --no-build-isolation
          pip3 install "nvidia-modelopt[torch]>=0.37.0" transformers==4.57.1
      - name: Prepare GSM8K dataset
        run: |
          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron-Bridge (Qwen3-30B-A3B-Instruct-2507)
        run: |
          ray stop --force
          ADV_ESTIMATOR=grpo USE_DUMMY_MODEL=True DUMMY_MODEL_CONFIG_PATH=tests/special_e2e/ppo_trainer/expert_parallel/qwen2moe_minimal.json \
          PPO_MAX_TOKEN_LEN=1024 FWD_MAX_TOKEN_LEN=1024 \
          MAX_PROMPT_LENGTH=512 MAX_RESPONSE_LENGTH=512 \
          MODEL_ID=Qwen/Qwen3-30B-A3B-Instruct-2507 USE_MBRIDGE=True VANILLA_MBRIDGE=False VALUE_VANILLA_MBRIDGE=False \
          COMMON_PP=2 COMMON_VPP=null COMMON_CP=1 COMMON_TP=4 COMMON_EP=4 COMMON_ETP=1 INFER_TP=8 \
          USE_DIST_CKPT=True ALL_OFFLOAD=True SKIP_SAVE_HF_MODEL=1 bash tests/special_e2e/run_ppo_trainer_megatron.sh
      - name: Running GSM8K E2E training tests with 3D parallelism with FP8 rollout on 8 L20 GPUs with Megatron-Bridge (Qwen3-30B-A3B-Instruct-2507)
        run: |
          ray stop --force
          ADV_ESTIMATOR=grpo USE_DUMMY_MODEL=True DUMMY_MODEL_CONFIG_PATH=tests/special_e2e/ppo_trainer/expert_parallel/qwen2moe_minimal.json \
          PPO_MAX_TOKEN_LEN=1024 FWD_MAX_TOKEN_LEN=1024 \
          MAX_PROMPT_LENGTH=512 MAX_RESPONSE_LENGTH=512 \
          MODEL_ID=Qwen/Qwen3-30B-A3B-Instruct-2507 USE_MBRIDGE=True VANILLA_MBRIDGE=False VALUE_VANILLA_MBRIDGE=False \
          COMMON_PP=2 COMMON_VPP=null COMMON_CP=1 COMMON_TP=4 COMMON_EP=4 COMMON_ETP=1 INFER_TP=2 \
          USE_DIST_CKPT=True ALL_OFFLOAD=True SKIP_SAVE_HF_MODEL=1 ROLLOUT_QUANTIZATION=fp8 bash tests/special_e2e/run_ppo_trainer_megatron.sh
      - name: clean up
        run: |
          rm -rf checkpoints
      - name: Running GSM8K E2E training tests with 3D parallelism on 8 L20 GPUs with Megatron-Bridge LoRA (Qwen3-30B-A3B-Instruct-2507)
        run: |
          ray stop --force
          ADV_ESTIMATOR=grpo USE_DUMMY_MODEL=True DUMMY_MODEL_CONFIG_PATH=tests/special_e2e/ppo_trainer/expert_parallel/qwen2moe_minimal.json \
          PPO_MAX_TOKEN_LEN=1024 FWD_MAX_TOKEN_LEN=1024 \
          MAX_PROMPT_LENGTH=512 MAX_RESPONSE_LENGTH=512 LORA_RANK=8 CRITIC_LORA_RANK=8 \
          MODEL_ID=Qwen/Qwen3-30B-A3B-Instruct-2507 USE_MBRIDGE=True VANILLA_MBRIDGE=False VALUE_VANILLA_MBRIDGE=False \
          COMMON_PP=2 COMMON_VPP=null COMMON_CP=1 COMMON_TP=4 COMMON_EP=2 COMMON_ETP=1 INFER_TP=8 \
          USE_DIST_CKPT=False LORA_MERGE=True ALL_OFFLOAD=True SKIP_SAVE_HF_MODEL=1 bash tests/special_e2e/run_ppo_trainer_megatron.sh
      - name: clean up
        run: |
          rm -rf checkpoints

  e2e_ppo_trainer_fsdp_vllm:
    needs: setup
    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
    timeout-minutes: 60 # Increase this timeout value as needed
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          fetch-depth: 0
      - name: Install the current repository
        run: |
          pip3 install -r requirements-test.txt
          pip3 install --no-deps -e .
      - name: Prepare GSM8K dataset
        run: |
          ray stop --force
          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
      # Function RM
      - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm with validation and saving (FSDP_SIZE=8)
        run: |
          ray stop --force
          VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 SAVE_HF_MODEL=True VERL_EXP_NAME="qwen2.5-0.5b-function-reward-minimal-fsdp-size8" bash tests/special_e2e/ppo_trainer/run_function_reward.sh
      - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm after resuming
        run: |
          ray stop --force
          RESUME_MODE=auto VERL_EXP_NAME="qwen2.5-0.5b-function-reward-minimal-fsdp-size8" bash tests/special_e2e/ppo_trainer/run_function_reward.sh
      - name: Test merging FSDP checkpoints (Qwen Actor)
        run: |
          exp_name="qwen2.5-0.5b-function-reward-minimal-fsdp-size8"
          python -m verl.model_merger test --backend fsdp --local_dir checkpoints/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface
      - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm with validation and saving (DDP_SIZE=2, FSDP_SIZE=4)
        run: |
          ray stop --force
          VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 SAVE_HF_MODEL=True FSDP_SIZE=4 USE_KL=True VERL_EXP_NAME="qwen2.5-0.5b-function-reward-minimal-ddp-size2-fsdp-size4" bash tests/special_e2e/ppo_trainer/run_function_reward.sh
      - name: Test merging DDP+FSDP checkpoints (Qwen Actor)
        run: |
          exp_name="qwen2.5-0.5b-function-reward-minimal-ddp-size2-fsdp-size4"
          python -m verl.model_merger test --backend fsdp --local_dir checkpoints/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface
      - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm with validation and saving (FSDP2)
        run: |
          ray stop --force
          VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 SAVE_HF_MODEL=True VERL_EXP_NAME="qwen2.5-0.5b-function-reward-minimal-fsdp2-size8" STRATEGY=fsdp2 bash tests/special_e2e/ppo_trainer/run_function_reward.sh
      - name: Test merging FSDP2 checkpoints (Qwen Actor)
        run: |
          exp_name="qwen2.5-0.5b-function-reward-minimal-fsdp2-size8"
          python -m verl.model_merger test --backend fsdp --local_dir checkpoints/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface
      - name: Running GSM8K E2E without rmpad using function rm
        run: |
          ray stop --force
          RM_PAD=False bash tests/special_e2e/ppo_trainer/run_function_reward.sh
      - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm (GRPO)
        run: |
          ray stop --force
          CUSTOM_REWARD_FN=True ADV_ESTIMATOR=grpo USE_KL=True bash tests/special_e2e/ppo_trainer/run_function_reward.sh
      # - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm (ReMax)
      #   run: |
      #     ray stop --force
      #     ADV_ESTIMATOR=remax USE_KL=True bash tests/special_e2e/ppo_trainer/run_function_reward.sh
      # LoRA tests
      - name: Running GSM8K E2E training tests on 8 L20 GPUs with grpo lora using function rm with use_shm
        run: |
          ray stop --force
          ADV_ESTIMATOR=grpo USE_SHM=True LORA_RANK=32 LOAD_FORMAT=safetensors bash tests/special_e2e/ppo_trainer/run_function_reward.sh
      - name: Running GSM8K E2E training tests on 8 L20 GPUs with grpo lora using function rm with use_shm and layered_summon
        run: |
          ray stop --force
          ADV_ESTIMATOR=grpo USE_SHM=True LORA_RANK=32 LOAD_FORMAT=safetensors LAYERED_SUMMON=True TOTAL_TRAIN_STEPS=1 SAVE_FREQ=1 FSDP_SIZE=4 VERL_EXP_NAME="qwen2.5-0.5b-function-reward-minimal" bash tests/special_e2e/ppo_trainer/run_function_reward.sh
      - name: Test GRPO LoRA checkpoints merging function
        run: |
          export EXP_NAME="qwen2.5-0.5b-function-reward-minimal"
          ls checkpoints/verl-test/${EXP_NAME}/global_step_1/actor
          cat checkpoints/verl-test/${EXP_NAME}/global_step_1/actor/huggingface/config.json
          python3 -m verl.model_merger merge --backend fsdp --local_dir checkpoints/verl-test/${EXP_NAME}/global_step_1/actor/ --target_dir checkpoints/verl-test/${EXP_NAME}/global_step_1/actor/huggingface
      - name: Running GSM8K E2E training tests on 8 L20 GPUs with grpo lora using function rm with use_shm and layered_summon with fsdp2
        run: |
          ray stop --force
          ADV_ESTIMATOR=grpo USE_SHM=True LORA_RANK=32 LOAD_FORMAT=safetensors LAYERED_SUMMON=True STRATEGY=fsdp2 bash tests/special_e2e/ppo_trainer/run_function_reward.sh

  e2e_ppo_trainer_fsdp-qwen2_5vl-3b:
    needs: setup
    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
    timeout-minutes: 40 # Increase this timeout value as needed
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          fetch-depth: 0
      - name: Install the current repository
        run: |
          pip3 install -r requirements-test.txt
          pip3 install --no-deps -e .
      # Geo3k
      - name: Prepare GEO3K dataset
        run: |
          python3 examples/data_preprocess/geo3k.py --local_dataset_path ${HOME}/models/hf_data/hiyouga/geometry3k/
      - name: Running GEO3K VLM GRPO E2E training tests on 8 L20 GPUs with rmpad using function rm
        run: |
          ray stop --force
          TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \
            MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \
            MODEL_ID=Qwen/Qwen2.5-VL-3B-Instruct \
            ADV_ESTIMATOR=grpo RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \
            SP_SIZE=2 \
            bash tests/special_e2e/ppo_trainer/run_function_reward.sh

      - name: Running GEO3K VLM PPO E2E training tests on 8 L20 GPUs with rmpad using function rm
        run: |
          ray stop --force
          TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \
            MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \
            MODEL_ID=Qwen/Qwen2.5-VL-3B-Instruct \
            ADV_ESTIMATOR=gae RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \
            SP_SIZE=2 \
            bash tests/special_e2e/ppo_trainer/run_function_reward.sh
      - name: Running GEO3K VLM GRPO E2E lora training tests on 8 L20 GPUs with rmpad using function rm
        run: |
          ray stop --force
          TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \
            MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \
            MODEL_ID=Qwen/Qwen2.5-VL-3B-Instruct \
            ADV_ESTIMATOR=grpo RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \
            SP_SIZE=2 \
            LORA_RANK=32 LORA_EXCLUDE=".*visual.*" \
            bash tests/special_e2e/ppo_trainer/run_function_reward.sh

  cleanup:
    runs-on: ubuntu-latest
    needs:
      [
        setup,
        e2e_ppo_trainer_megatron-moe-expert-parallel,
        e2e_ppo_trainer_fsdp-qwen2_5vl-3b,
        e2e_ppo_trainer_fsdp_vllm,
      ]
    if: always()
    steps:
      - id: destroy-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "destroy"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"


================================================
FILE: .github/workflows/e2e_ppo_trainer_megatron_vllm_2_ascend.yml
================================================
# # Tests layout

# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...

# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments

# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

# # Workflow layout

# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
#     - new workflow yaml is added to `.github/workflows`
#     - new tests are added to workflow mentioned in 2.

name: e2e_ppo_trainer_megatron_vllm_2_ascend

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch.
  # For push, for now only anti-patterns are specified so it is more conservative
  # and achieves higher coverage.
  push:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      # Other entrypoints
      - "!verl/trainer/fsdp_sft_trainer.py"
      # FSDP
      - "!verl/workers/**/*dp_*.py"
      - "!verl/utils/fsdp_utils.py"
      - "!verl/utils/checkpoint/fsdp_checkpoint_manager.py"
      - "!verl/model_merger/fsdp_model_merger.py"
  pull_request:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      # Other entrypoints
      - "!docker/**"
      # Docs
      - "!**/*.md"
      - "!docs/**"
      - "!examples/**"
      - "!tests/**"
      - "!verl/trainer/main_*.py"
      - "!verl/trainer/fsdp_sft_trainer.py"
      # FSDP
      - "!verl/workers/**/*dp_*.py"
      - "!verl/utils/fsdp_utils.py"
      - "!verl/utils/checkpoint/fsdp_checkpoint_manager.py"
      - "!verl/model_merger/fsdp_model_merger.py"
      # Entrypoints
      - ".github/workflows/e2e_ppo_trainer_megatron_vllm_2_ascend.yml"
      - "examples/data_preprocess/gsm8k.py"
      - "examples/data_preprocess/geo3k.py"
      - "tests/special_e2e/run_ppo_trainer_megatron.sh"
      - "verl/trainer/main_ppo.py"
      - "verl/trainer/config/ppo_megatron_trainer.yaml"

# Cancel jobs on the same ref if a new one is triggered
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

# Declare permissions just read content.
permissions:
  contents: read

jobs:
  e2e_ppo_trainer_fsdp_vllm_ascend:
    if: github.repository_owner == 'verl-project'
    runs-on: linux-aarch64-a2b3-8
    timeout-minutes: 90 # Increase this timeout value as needed
    container:
      image: swr.cn-southwest-2.myhuaweicloud.com/modelfoundry/ascend-ci/verl/verl:verl-8.5.0-910b-ubuntu22.04-py3.11-latest
      options: >-
        --shm-size 16g
    env:
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - name: Check npu and CANN info
        run: |
          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
          npu-smi info
      - name: Check initial pip list from image
        run: |
          pip list
      - name: Checkout verl-project/verl repo
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
          clean: true
      - name: Install the current repository
        run: |
          pip install -r requirements-npu.txt
          pip install --no-deps -e .
      - name: Check final pip list
        run: |
          pip list
      - name: Prepare weights
        run: |
          ln -s /root/.cache/models ~/models
      - name: Prepare GSM8K dataset
        run: |
          python examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/.cache/datasets/openai/gsm8k
      # Function RM
      - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm with validation and saving (DDP_SIZE=2, FSDP_SIZE=4)
        run: |
          ray stop --force
          VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 SAVE_HF_MODEL=True FSDP_SIZE=4 USE_KL=True VERL_EXP_NAME="qwen2.5-0.5b-function-reward-minimal-ddp-size2-fsdp-size4" bash tests/special_e2e/ppo_trainer/run_function_reward.sh
      - name: Test merging DDP+FSDP checkpoints (Qwen Actor)
        run: |
          exp_name="qwen2.5-0.5b-function-reward-minimal-ddp-size2-fsdp-size4"
          python -m verl.model_merger test --backend fsdp --local_dir checkpoints/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface
      - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm with validation and saving (FSDP2)
        run: |
          ray stop --force
          VAL_BEFORE_TRAIN=True TEST_FREQ=1 SAVE_FREQ=1 SAVE_HF_MODEL=True VERL_EXP_NAME="qwen2.5-0.5b-function-reward-minimal-fsdp2-size8" STRATEGY=fsdp2 bash tests/special_e2e/ppo_trainer/run_function_reward.sh
      - name: Test merging FSDP2 checkpoints (Qwen Actor)
        run: |
          exp_name="qwen2.5-0.5b-function-reward-minimal-fsdp2-size8"
          python -m verl.model_merger test --backend fsdp --local_dir checkpoints/verl-test/${exp_name}/global_step_1/actor --test_hf_dir checkpoints/verl-test/${exp_name}/global_step_1/actor/huggingface
      - name: Running GSM8K E2E without rmpad using function rm
        run: |
          ray stop --force
          RM_PAD=False bash tests/special_e2e/ppo_trainer/run_function_reward.sh
      - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm (GRPO)
        run: |
          ray stop --force
          CUSTOM_REWARD_FN=True ADV_ESTIMATOR=grpo USE_KL=True bash tests/special_e2e/ppo_trainer/run_function_reward.sh
      - name: Running GSM8K E2E training tests on 8 L20 GPUs with grpo lora using function rm with use_shm and layered_summon
        run: |
          ray stop --force
          ADV_ESTIMATOR=grpo USE_SHM=True LORA_RANK=32 LOAD_FORMAT=safetensors LAYERED_SUMMON=True TOTAL_TRAIN_STEPS=1 SAVE_FREQ=1 FSDP_SIZE=4 VERL_EXP_NAME="qwen2.5-0.5b-function-reward-minimal" bash tests/special_e2e/ppo_trainer/run_function_reward.sh
      - name: Test GRPO LoRA checkpoints merging function
        run: |
          export EXP_NAME="qwen2.5-0.5b-function-reward-minimal"
          ls checkpoints/verl-test/${EXP_NAME}/global_step_1/actor
          cat checkpoints/verl-test/${EXP_NAME}/global_step_1/actor/huggingface/config.json
          python3 -m verl.model_merger merge --backend fsdp --local_dir checkpoints/verl-test/${EXP_NAME}/global_step_1/actor/ --target_dir checkpoints/verl-test/${EXP_NAME}/global_step_1/actor/huggingface
      - name: Running GSM8K E2E training tests on 8 L20 GPUs with grpo lora using function rm with use_shm and layered_summon with fsdp2
        run: |
          ray stop --force
          ADV_ESTIMATOR=grpo USE_SHM=True LORA_RANK=32 LOAD_FORMAT=safetensors LAYERED_SUMMON=True STRATEGY=fsdp2 bash tests/special_e2e/ppo_trainer/run_function_reward.sh

  e2e_ppo_trainer_fsdp-qwen2_5vl-3b_ascend:
    if: github.repository_owner == 'verl-project'
    runs-on: linux-aarch64-a2b3-8
    timeout-minutes: 60 # Increase this timeout value as needed
    container:
      image: swr.cn-southwest-2.myhuaweicloud.com/modelfoundry/ascend-ci/verl/verl:verl-8.5.0-910b-ubuntu22.04-py3.11-latest
      options: >-
        --shm-size 16g
    env:
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - name: Check npu and CANN info
        run: |
          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
          npu-smi info
      - name: Check initial pip list from image
        run: |
          pip list
      - name: Checkout verl-project/verl repo
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
          clean: true
      - name: Install the current repository
        run: |
          pip install -r requirements-npu.txt
          pip install --no-deps -e .
          pip install trl==0.26.0
      - name: Check final pip list
        run: |
          pip list
      - name: Prepare weights
        run: |
          ln -s /root/.cache/models ~/models
      # Geo3k
      - name: Prepare GEO3K dataset
        run: |
          python examples/data_preprocess/geo3k.py --local_dataset_path ${HOME}/.cache/datasets/hiyouga/geometry3k
      - name: Running GEO3K VLM GRPO E2E training tests on 8 L20 GPUs with rmpad using function rm
        run: |
          ray stop --force
          TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \
            MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \
            MODEL_ID=Qwen/Qwen2.5-VL-3B-Instruct \
            ADV_ESTIMATOR=grpo RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \
            SP_SIZE=2 \
            bash tests/special_e2e/ppo_trainer/run_function_reward.sh
      - name: Running GEO3K VLM PPO E2E training tests on 8 L20 GPUs with rmpad using function rm
        run: |
          ray stop --force
          TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \
            MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \
            MODEL_ID=Qwen/Qwen2.5-VL-3B-Instruct \
            ADV_ESTIMATOR=gae RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \
            SP_SIZE=2 \
            bash tests/special_e2e/ppo_trainer/run_function_reward.sh
      - name: Running GEO3K VLM GRPO E2E lora training tests on 8 L20 GPUs with rmpad using function rm
        run: |
          ray stop --force
          TRAIN_FILES=$HOME/data/geo3k/train.parquet VAL_FILES=$HOME/data/geo3k/test.parquet \
            MAX_PROMPT_LEN=1536 MAX_RESPONSE_LEN=1536 \
            MODEL_ID=Qwen/Qwen2.5-VL-3B-Instruct \
            ADV_ESTIMATOR=grpo RM_PAD=True USE_KL=True ENABLE_CHUNKED_PREFILL=False \
            SP_SIZE=2 \
            LORA_RANK=32 LORA_EXCLUDE=".*visual.*" \
            bash tests/special_e2e/ppo_trainer/run_function_reward.sh


================================================
FILE: .github/workflows/e2e_ppo_trainer_veomni_vllm.yml
================================================
# # Tests layout

# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...

# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments

# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

# # Workflow layout

# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
#     - new workflow yaml is added to `.github/workflows`
#     - new tests are added to workflow mentioned in 2.

name: e2e_ppo_trainer_veomni_vllm

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch.
  # For push, for now only anti-patterns are specified so it is more conservative
  # and achieves higher coverage.
  push:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      # Other entrypoints
      - "!verl/trainer/fsdp_sft_trainer.py"
      # Megatron
      - "!verl/workers/**/megatron_*.py"
  pull_request:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      # Other entrypoints
      - "!docker/**"
      # Docs
      - "!**/*.md"
      - "!docs/**"
      - "!examples/**"
      - "!tests/**"
      - "!verl/trainer/main_*.py"
      - "!verl/trainer/fsdp_sft_trainer.py"
      # Megatron
      - "!verl/workers/**/megatron_*.py"
      # Entrypoints
      - ".github/workflows/e2e_ppo_trainer_veomni_vllm.yml"
      - "examples/data_preprocess/gsm8k.py"
      - "examples/data_preprocess/geo3k.py"
      - "tests/special_e2e/run_ppo_trainer_veomni.sh"
      - "verl/trainer/main_ppo.py"
      - "verl/trainer/config/ppo_trainer.yaml"

# Cancel jobs on the same ref if a new one is triggered
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

# Declare permissions just read content.
permissions:
  contents: read

env:
  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:vllm017.dev2"
  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

jobs:
  setup:
    if: github.repository_owner == 'verl-project'
    runs-on: ubuntu-latest
    outputs:
      runner-label: ${{ steps.create-runner.outputs.runner-label }}
      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
    steps:
      - uses: actions/checkout@v4
      - id: create-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "create"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-image: "${{ env.IMAGE }}"

  e2e_ppo_trainer_veomni_vllm:
    needs: setup
    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
    timeout-minutes: 60 # Increase this timeout value as needed
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          fetch-depth: 0
      - name: Install the current repository
        run: |
          pip3 install -r requirements-test.txt
          pip3 install --no-deps -e .
          pip3 install git+https://github.com/ByteDance-Seed/VeOmni.git@v0.1.4
      - name: Prepare GSM8K dataset
        run: |
          ray stop --force
          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
      - name: Prepare GEO3K dataset
        run: |
          ray stop --force
          python3 examples/data_preprocess/geo3k.py --local_dataset_path ${HOME}/models/hf_data/hiyouga/geometry3k/
      - name: Running GSM8K E2E training tests on 8 L20 GPUs with veomni engine (FSDP_SIZE=4, USP=2)
        run: |
          ray stop --force
          FSDP_SIZE=4 SP_SIZE=2 bash tests/special_e2e/run_ppo_trainer_veomni.sh
      - name: Running GEO3K E2E training tests on 8 L20 GPUs with veomni engine (FSDP_SIZE=8, USP=1)
        run: |
          ray stop --force
          MODEL_ID=Qwen/Qwen3-VL-2B-Instruct TRAIN_FILES=${HOME}/data/geo3k/train.parquet VAL_FILES=${HOME}/data/gsm8k/test.parquet FSDP_SIZE=8 SP_SIZE=1 bash tests/special_e2e/run_ppo_trainer_veomni.sh

  cleanup:
    runs-on: ubuntu-latest
    needs:
      [
        setup,
        e2e_ppo_trainer_veomni_vllm,
      ]
    if: always()
    steps:
      - id: destroy-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "destroy"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"


================================================
FILE: .github/workflows/e2e_sft_llm.yml
================================================
# # Tests layout

# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...

# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments

# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

# # Workflow layout

# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
#     - new workflow yaml is added to `.github/workflows`
#     - new tests are added to workflow mentioned in 2.

name: e2e_sft_llm

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch
  push:
    branches:
      - main
      - v0.*
  pull_request:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      # Other entrypoints
      - "!examples/**"
      - "!tests/**"
      - "!verl/trainer/main_*.py"
      - "!verl/trainer/fsdp_sft_trainer.py"

      # Megatron
      - "!verl/workers/**/megatron_*.py"
      # Entrypoints
      - ".github/workflows/e2e_sft_llm.yml"
      - "examples/data_preprocess/gsm8k.py"
      - "tests/special_e2e/sft"
      - "verl/trainer/fsdp_sft_trainer.py"
      - "verl/trainer/config/sft_trainer.yaml"

# Cancel jobs on the same ref if a new one is triggered
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

# Declare permissions just read content.
permissions:
  contents: read

env:
  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:sgl059.dev2"
  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

jobs:
  setup:
    if: github.repository_owner == 'verl-project'
    runs-on: ubuntu-latest
    outputs:
      runner-label: ${{ steps.create-runner.outputs.runner-label }}
      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
    steps:
      - uses: actions/checkout@v4
      - id: create-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "create"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-image: "${{ env.IMAGE }}"
  e2e_sft_llm:
    needs: setup
    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
    timeout-minutes: 30 # Increase this timeout value as needed
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          fetch-depth: 0
      - name: Install the current repository
        run: |
          pip3 install peft
          pip3 install -r requirements-test.txt
          pip3 install --no-deps -e .
          pip3 install git+https://github.com/ByteDance-Seed/VeOmni.git@v0.1.4
      - name: Prepare gsm8k dataset
        run: |
          ray stop --force
          python3 examples/data_preprocess/gsm8k_multiturn_sft.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
      - name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm
        run: |
          ray stop --force
          bash tests/special_e2e/sft/run_sft.sh
      - name: Running GSM8K E2E training tests on 8 L20 GPUs w/o rmpad using function rm
        run: |
          ray stop --force
          RM_PAD=False bash tests/special_e2e/sft/run_sft.sh
      - name: Running GSM8K E2E training tests on 8 L20 GPUs with sequence parallism
        run: |
          ray stop --force
          SP_SIZE=2 bash tests/special_e2e/sft/run_sft.sh
      - name: Running GSM8K E2E training tests on 8 L20 GPUs with sequence parallism and liger
        run: |
          ray stop --force
          SP_SIZE=2 LIGER=True bash tests/special_e2e/sft/run_sft.sh
      - name: Running GSM8K E2E training tests with LoRA
        run: |
          ray stop --force
          LORA_RANK=32 bash tests/special_e2e/sft/run_sft.sh
      - name: Run GSM8K E2E training and resume tests resuming from the checkpoint manager
        run: |
          ray stop --force
          LORA_RANK=32 RESUME_MODE=auto TOTAL_TRAIN_STEP=2 bash tests/special_e2e/sft/run_sft.sh
      # TODO: multiturn
      - name: Running GSM8K E2E training tests with multiturn and various configs and compare results
        run: |
          bash tests/special_e2e/sft/test_sft_engine_all.sh

  cleanup:
    runs-on: ubuntu-latest
    needs: [setup, e2e_sft_llm]
    if: always()
    steps:
      - id: destroy-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "destroy"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"


================================================
FILE: .github/workflows/e2e_sft_llm_ascend.yml
================================================
# # Tests layout

# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...

# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments

# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

# # Workflow layout

# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
#     - new workflow yaml is added to `.github/workflows`
#     - new tests are added to workflow mentioned in 2.

name: e2e_sft_llm_ascend

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch
  push:
    branches:
      - main
      - v0.*
  pull_request:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      # Other entrypoints
      - "!examples/**"
      - "!tests/**"
      - "!verl/trainer/main_*.py"
      - "!verl/trainer/fsdp_sft_trainer.py"

      # Megatron
      - "!verl/workers/**/megatron_*.py"
      # Entrypoints
      - ".github/workflows/e2e_sft_llm_ascend.yml"
      - "examples/data_preprocess/gsm8k.py"
      - "tests/special_e2e/sft"
      - "verl/trainer/fsdp_sft_trainer.py"
      - "verl/trainer/config/sft_trainer.yaml"

# Cancel jobs on the same ref if a new one is triggered
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

# Declare permissions just read content.
permissions: 
  contents: read

jobs:
  e2e_sft_llm_ascend:
    if: github.repository_owner == 'verl-project'
    runs-on: linux-aarch64-a2b3-8
    timeout-minutes: 90 # Increase this timeout value as needed
    container:
      image: swr.cn-southwest-2.myhuaweicloud.com/modelfoundry/ascend-ci/verl/verl:verl-8.5.0-910b-ubuntu22.04-py3.11-latest
      options: >-
        --shm-size 16g
    env:
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - name: Check npu and CANN info
        run: |
          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
          npu-smi info
      - name: Check initial pip list from image
        run: |
          pip list
      - name: Checkout verl-project/verl repo
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
          clean: true
      - name: Install the current repository
        run: |
          pip install -r requirements-npu.txt
          pip install -e .
          pip install git+https://github.com/ByteDance-Seed/VeOmni.git@v0.1.4
          pip install pandas==2.3.3
          pip uninstall -y mbridge
          pip install git+https://github.com/ISEEKYAN/mbridge.git@89eb10
      - name: Check final pip list
        run: |
          pip list
      - name: Prepare weights
        run: |
          ln -s /root/.cache/models ~/models
      - name: Prepare gsm8k dataset
        run: |
          python3 examples/data_preprocess/gsm8k_multiturn_sft.py --local_dataset_path ${HOME}/.cache/datasets/openai/gsm8k
      - name: Running GSM8K E2E training tests on 8 NPUs with rmpad using function rm
        run: |
          ray stop --force
          bash tests/special_e2e/sft/run_sft.sh
      - name: Running GSM8K E2E training tests on 8 NPUs w/o rmpad using function rm
        run: |
          ray stop --force
          RM_PAD=False bash tests/special_e2e/sft/run_sft.sh
      - name: Running GSM8K E2E training tests on 8 NPUs with sequence parallism
        run: |
          ray stop --force
          SP_SIZE=2 bash tests/special_e2e/sft/run_sft.sh
      - name: Running GSM8K E2E training tests with LoRA
        run: |
          ray stop --force
          LORA_RANK=32 bash tests/special_e2e/sft/run_sft.sh
      - name: Run GSM8K E2E training and resume tests resuming from the checkpoint manager
        run: |
          ray stop --force
          LORA_RANK=32 RESUME_MODE=auto TOTAL_TRAIN_STEP=2 bash tests/special_e2e/sft/run_sft.sh
      - name: Running GSM8K E2E training tests with multiturn and various configs and compare results
        run: |
          ray stop --force
          rm -rf ~/verl/test/log
          mkdir -p ~/verl/test/log
          export VERL_FILE_LOGGER_ROOT=~/verl/test/log
          # test with single gpu as golden
          echo "run with single gpu as golden"
          BACKEND=fsdp SP_SIZE=1 FSDP_SIZE=1 NUM_GPUS=1 FSDP_STRATEGY=fsdp VERL_FILE_LOGGER_PATH=~/verl/test/log/golden.jsonl bash tests/special_e2e/sft/run_sft_engine.sh
          # test with fsdp 1
          echo "run with sp2 fsdp_size2 num_gpus8 fsdp_strategy fsdp pad_mode no_padding"
          BACKEND=fsdp SP_SIZE=2 FSDP_SIZE=2 NUM_GPUS=8 FSDP_STRATEGY=fsdp PAD_MODE=no_padding bash tests/special_e2e/sft/run_sft_engine.sh
          # test with fsdp 1 use_remove_padding and pad_mode no_padding
          echo "run with sp4 fsdp_size4 num_gpus8 fsdp_strategy fsdp pad_mode no_padding use_remove_padding False"
          BACKEND=fsdp SP_SIZE=1 FSDP_SIZE=-1 NUM_GPUS=8 FSDP_STRATEGY=fsdp PAD_MODE=no_padding USE_REMOVE_PADDING=False bash tests/special_e2e/sft/run_sft_engine.sh
          # test with fsdp 2
          echo "run with sp2 fsdp_size2 num_gpus8 fsdp_strategy fsdp2"
          BACKEND=fsdp SP_SIZE=2 FSDP_SIZE=2 NUM_GPUS=8 FSDP_STRATEGY=fsdp2 bash tests/special_e2e/sft/run_sft_engine.sh
          # test with veomni
          echo "run with sp2 fsdp_size4 num_gpus8 fsdp_strategy fsdp2"
          BACKEND=veomni SP_SIZE=2 FSDP_SIZE=4 NUM_GPUS=8 FSDP_STRATEGY=fsdp2 bash tests/special_e2e/sft/run_sft_engine.sh
          # test with megatron
          echo "run with tp2 pp2 vpp2 cp2 num_gpus8"
          BACKEND=megatron TP_SIZE=2 PP_SIZE=2 VPP_SIZE=NULL CP_SIZE=2 NUM_GPUS=8 bash tests/special_e2e/sft/run_sft_engine.sh
          # test with cp in ray
          echo "run with tp2 pp2 vpp2 cp2 num_gpus8 mode=ray"
          BACKEND=megatron TP_SIZE=2 PP_SIZE=2 VPP_SIZE=NULL CP_SIZE=2 NUM_GPUS=8 mode=ray bash tests/special_e2e/sft/run_sft_engine.sh
          rm -rf ~/verl/test/log


================================================
FILE: .github/workflows/e2e_sft_vlm.yml
================================================
# # Tests layout

# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...

# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments

# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

# # Workflow layout

# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
#     - new workflow yaml is added to `.github/workflows`
#     - new tests are added to workflow mentioned in 2.

name: e2e_sft_vlm

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch
  push:
    branches:
      - main
      - v0.*
  pull_request:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      # Other entrypoints
      - "!examples/**"
      - "!tests/**"
      - "!verl/trainer/main_*.py"
      - "!verl/trainer/fsdp_sft_trainer.py"

      # Megatron
      - "!verl/workers/**/megatron_*.py"
      # Entrypoints
      - ".github/workflows/e2e_sft_vlm.yml"
      - "examples/data_preprocess/gsm8k.py"
      - "tests/special_e2e/sft"
      - "verl/trainer/fsdp_sft_trainer.py"
      - "verl/trainer/config/sft_trainer.yaml"

# Cancel jobs on the same ref if a new one is triggered
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

# Declare permissions just read content.
permissions:
  contents: read

env:
  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:sgl059.dev2"
  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

jobs:
  setup:
    if: github.repository_owner == 'verl-project'
    runs-on: ubuntu-latest
    outputs:
      runner-label: ${{ steps.create-runner.outputs.runner-label }}
      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
    steps:
      - uses: actions/checkout@v4
      - id: create-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "create"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-image: "${{ env.IMAGE }}"
  e2e_sft_vlm:
    needs: setup
    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
    timeout-minutes: 30 # Increase this timeout value as needed
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          fetch-depth: 0
      - name: Install the current repository
        run: |
          pip3 install peft
          pip3 install -r requirements-test.txt
          pip3 install --no-deps -e .
          pip3 install git+https://github.com/ByteDance-Seed/VeOmni.git@v0.1.4
      - name: Prepare pokemon-gpt4o-captions dataset
        run: |
          ray stop --force
          python3 examples/data_preprocess/pokemon.py --local_dataset_path ${HOME}/models/hf_data/pokemon-gpt4o-captions
      - name: Running Pokemon E2E training tests with multiturn and various configs and compare results
        run: |
          MODEL_ID=Qwen/Qwen3-VL-2B-Instruct DATASET_DIR=~/data/pokemon-gpt4o-captions VPP_SIZE=null bash tests/special_e2e/sft/test_sft_engine_all.sh

  cleanup:
    runs-on: ubuntu-latest
    needs: [setup, e2e_sft_vlm]
    if: always()
    steps:
      - id: destroy-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "destroy"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"


================================================
FILE: .github/workflows/gpu_unit_tests.yml
================================================
# # Tests layout

# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...

# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments

# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

# # Workflow layout

# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
#     - new workflow yaml is added to `.github/workflows`
#     - new tests are added to workflow mentioned in 2.

name: GPU unit tests

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch
  push:
    branches:
      - main
      - v0.4.x
    paths:
      - "**/*.py"
      - .github/workflows/gpu_unit_tests.yml
  pull_request:
    branches:
      - main
      - v0.4.x
    paths:
      # The order that you define paths patterns matters:
      # A matching negative pattern (prefixed with !) after a positive match will exclude the path.
      # A matching positive pattern after a negative match will include the path again.
      - "**/*.py"
      # Other entrypoints
      - "!examples/**"
      - "!verl/trainer/main_*.py"
      - "!verl/trainer/fsdp_sft_trainer.py"
      # Entrypoints
      - .github/workflows/gpu_unit_tests.yml
      - "tests/**test_*.py"
      # Ignore CPU tests
      - "!tests/*_on_cpu.py"

# Cancel jobs on the same ref if a new one is triggered
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

# Declare permissions just read content.
permissions:
  contents: read

env:
  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:sgl059.dev2"
  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

jobs:
  setup:
    if: github.repository_owner == 'verl-project'
    runs-on: ubuntu-latest
    outputs:
      runner-label: ${{ steps.create-runner.outputs.runner-label }}
      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
    steps:
      - uses: actions/checkout@v4
      - id: create-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "create"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-image: "${{ env.IMAGE }}"

  gpu_unit_tests:
    if: github.repository_owner == 'verl-project'
    needs: setup
    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
    timeout-minutes: 60 # Increase this timeout value as needed
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1"
      HF_HUB_ENABLE_HF_TRANSFER: 1
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          fetch-depth: 0
      - name: Install the current repository
        run: |
          pip3 install hf_transfer
          pip3 install -r requirements-test.txt
          pip3 install --no-deps -e .
          pip3 install cupy-cuda12x==13.6.0 pytest-asyncio
          pip3 install --ignore-installed blinker
          pip3 install --ignore-installed mlflow "numpy<2.0"
      - name: Run all GPU unit tests
        run: |
          pytest -s -x --ignore-glob="*on_npu.py" --ignore-glob="*test_special_*.py" --ignore-glob='*on_cpu.py' --ignore-glob="*test_vllm*" --ignore-glob="*_sglang*" --ignore-glob="*_hf_rollout*" --ignore-glob="tests/models/" --ignore-glob='tests/special*' --ignore-glob="tests/experimental" --ignore-glob="tests/workers/reward_model" --ignore-glob="*test_shared_memory*" --ignore-glob="tests/workers/rollout/rollout_trtllm" --ignore-glob="*test_bucketed_weight_transfer*" tests/
      - name: Testing LinearCrossEntropyTP Correctness, Computation Time and Memory Consumption
        run: |
          LOW_MEMORY=True torchrun --standalone --nnodes=1 --nproc-per-node=8 tests/utils/test_special_linear_cross_entropy_tp.py
      - name: Testing FSDP2 actor functionality
        run: |
          torchrun --standalone --nnodes=1 --nproc-per-node=2 tests/workers/actor/test_special_dp_actor.py
      - name: Testing FSDP2 critic functionality
        run: |
          torchrun --standalone --nnodes=1 --nproc-per-node=2 tests/workers/critic/test_special_dp_critic.py

  cleanup:
    runs-on: ubuntu-latest
    needs: [setup, gpu_unit_tests]
    if: always()
    steps:
      - id: destroy-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "destroy"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"


================================================
FILE: .github/workflows/model.yml
================================================
# # Tests layout

# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...

# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments

# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

# # Workflow layout

# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
#     - new workflow yaml is added to `.github/workflows`
#     - new tests are added to workflow mentioned in 2.
# name: Check PR Title

name: model

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch
  push:
    branches:
      - main
      - v0.*
  pull_request:
    branches:
      - main
      - v0.*
    paths:
      - "verl/**/*.py"
      # Entrypoints
      - ".github/workflows/model.yml"
      - "tests/special_distributed/test_fsdp_ckpt.py"
      - "tests/special_distributed/test_tensor_dict.py"
      - "tests/models/**"
      - "tests/special_distributed/run_all.sh"

# Declare permissions just read content.
permissions:
  contents: read

# Cancel jobs on the same ref if a new one is triggered
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

env:
  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:vllm017.dev2"
  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

jobs:
  setup:
    if: github.repository_owner == 'verl-project'
    runs-on: ubuntu-latest
    outputs:
      runner-label: ${{ steps.create-runner.outputs.runner-label }}
      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
    steps:
      - uses: actions/checkout@v4
      - id: create-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "create"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-image: "${{ env.IMAGE }}"

  model_rmpad:
    needs: setup
    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
    timeout-minutes: 20 # Increase this timeout value as needed
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          fetch-depth: 0
      - name: Install the current repository and upgrade to latest transformers(4.54.0)/flash_attn, transformers 4.55.0 has strange behavior with model backward
        run: |
          pip3 install -r requirements-test.txt
          pip3 install --no-deps -e .
          pip3 install --upgrade "transformers<5.0.0"
      - name: Running rmpad model tests on 8 L20 GPUs + flash_attn 2.5.8
        run: |
          pytest -s tests/models/test_transformer.py
      - name: Running rmpad model tests on 8 L20 GPUs + latest flash_attn
        run: |
          pytest -s tests/models/test_transformer.py
      - name: Running FSDP rmpad model tests on 8 L20 GPUs + latest flash_attn
        run: |
          STRATEGY=fsdp torchrun --nproc_per_node=8 tests/special_distributed/test_fsdp_ckpt.py
      - name: Running transformers ulysses tests on 8 L20 GPUs + latest transformers
        run: |
          torchrun --nproc_per_node=8 -m pytest tests/models/test_transformers_ulysses.py
      - name: Running transformers ulysses tests on 8 L20 GPUs + transformers 4.54.1
        run: |
          pip3 install transformers==4.54.1
          torchrun --nproc_per_node=8 -m pytest tests/models/test_transformers_ulysses.py
      - name: Run distributed test
        run: |
          bash tests/special_distributed/run_all.sh

  # TODO: Move this back to model_rmpad once FSDP2 is stable.
  # NOTE: List as an independent job to make rerun easier.
  model_rmpad_fsdp2_unstable:
    needs: setup
    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
    timeout-minutes: 20 # Increase this timeout value as needed
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          fetch-depth: 0
      - name: Install the current repository and upgrade to latest transformers/flash_attn
        run: |
          pip3 install -r requirements-test.txt
          pip3 install --no-deps -e .
      - name: Running FSDP2 rmpad model tests on 8 L20 GPUs + latest flash_attn
        run: |
          STRATEGY=fsdp2 torchrun --nproc_per_node=8 tests/special_distributed/test_fsdp_ckpt.py

  model_engine:
    needs: setup
    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
    timeout-minutes: 20 # Increase this timeout value as needed
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          fetch-depth: 0
      - name: Install the current repository
        run: |
          pip3 install -r requirements-test.txt
          pip3 install --no-deps -e .
      - name: Download model config files
        run: |
          hf download Qwen/Qwen2.5-0.5B-Instruct --local-dir $HOME/models/Qwen/Qwen2.5-0.5B-Instruct

      - name: Running mcore engine tests on 8 L20 GPUs
        run: |
          ray stop --force
          pytest -s -x tests/models/test_engine.py

  cleanup:
    runs-on: ubuntu-latest
    needs: [setup, model_rmpad, model_rmpad_fsdp2_unstable, model_engine]
    if: always()
    steps:
      - id: destroy-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "destroy"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"


================================================
FILE: .github/workflows/model_ascend.yml
================================================
# # Tests layout

# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...

# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments

# Accelerators for tests 
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

# # Workflow layout

# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
#     - new workflow yaml is added to `.github/workflows`
#     - new tests are added to workflow mentioned in 2.
# name: Check PR Title

name: model_ascend

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch
  push:
    branches:
      - main
      - v0.*
  pull_request:
    branches:
      - main
      - v0.*
    paths:
      - "verl/**/*.py"
      # Entrypoints
      - ".github/workflows/model_ascend.yml"
      - "tests/special_distributed/test_fsdp_ckpt.py"
      - "tests/special_distributed/test_tensor_dict.py"
      - "tests/models/**"
      - "tests/special_distributed/run_all.sh"

# Cancel jobs on the same ref if a new one is triggered
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

permissions:
  contents: read

jobs:
  model_rmpad_ascend:
    if: github.repository_owner == 'verl-project'
    runs-on: linux-aarch64-a2b3-8
    timeout-minutes: 60 # Increase this timeout value as needed
    container:
      image: swr.cn-southwest-2.myhuaweicloud.com/modelfoundry/ascend-ci/verl/verl:verl-8.5.0-910b-ubuntu22.04-py3.11-latest
      options: >-
        --shm-size 16g
    env:
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - name: Check npu and CANN info
        run: |
          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
          npu-smi info
      - name: Check initial pip list from image
        run: |
          pip list
      - name: Checkout verl-project/verl repo
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
          clean: true
      - name: Install the current repository
        run: |
          pip install -r requirements-npu.txt
          pip install --no-deps -e .[test]
      - name: Check final pip list
        run: |
          pip list
      - name: Prepare weights
        run: |
          ln -s /root/.cache/models ~/models
      - name: Running rmpad model tests on 8 NPUs
        run: |
          pytest -s tests/models/test_transformer.py
      - name: Running FSDP rmpad model tests on 8 NPUs
        run: |
          STRATEGY=fsdp torchrun --nproc_per_node=8 tests/special_distributed/test_fsdp_ckpt.py
      - name: Running transformers ulysses tests on 8 NPUs
        run: |
          torchrun --nproc_per_node=8 -m pytest tests/models/test_transformers_ulysses.py
      - name: Run distributed test
        run: |
          bash tests/special_distributed/run_all.sh

  # TODO: Move this back to model_rmpad once FSDP2 is stable.
  # NOTE: List as an independent job to make rerun easier.
  model_rmpad_fsdp2_unstable_ascend:
    if: github.repository_owner == 'verl-project'
    runs-on: linux-aarch64-a2b3-8
    timeout-minutes: 60
    container:
      image: swr.cn-southwest-2.myhuaweicloud.com/modelfoundry/ascend-ci/verl/verl:verl-8.5.0-910b-ubuntu22.04-py3.11-latest
      options: >-
        --shm-size 16g
    env:
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          fetch-depth: 0
      - name: Install the current repository
        run: |
          pip install -r requirements-npu.txt
          pip install --no-deps -e .[test]
      - name: Prepare weights
        run: |
          ln -s /root/.cache/models ~/models
      - name: Running FSDP2 rmpad model tests on 8 NPUs
        run: |
          STRATEGY=fsdp2 torchrun --nproc_per_node=8 tests/special_distributed/test_fsdp_ckpt.py


================================================
FILE: .github/workflows/nightly_ascend.yml
================================================
# # Tests layout

# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...

# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments

# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

# # Workflow layout

# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
#     - new workflow yaml is added to `.github/workflows`
#     - new tests are added to workflow mentioned in 2.

name: nightly_ci_ascend

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch
  # For push, for now only anti-patterns are specified so it is more conservative
  # and achieves higher coverage.
  schedule:
    - cron: "0 17 * * *"

# Declare permissions just read content.
permissions:
  contents: read

jobs:
  # Test ppo qwen3-8b fsdp+vllm
  nightlyCI_ppo-qwen3-8b-fsdp-vllm_ascend:
    if: github.repository_owner == 'verl-project'
    runs-on: linux-aarch64-a2b3-8
    timeout-minutes: 180 # Increase this timeout value as needed
    container:
      image: swr.cn-southwest-2.myhuaweicloud.com/modelfoundry/ascend-ci/verl/verl:verl-8.5.0-910b-ubuntu22.04-py3.11-latest
      options: >-
        --shm-size 16g
    env:
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - name: Check npu and CANN info
        run: |
          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
          npu-smi info
      - name: Check initial pip list from image
        run: |
          pip list
      - name: Checkout verl-project/verl repo
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
          clean: true
      - name: Install the current repository
        run: |
          pip install -r requirements-npu.txt
          pip install --no-deps -e .
      - name: Check final pip list
        run: |
          pip list
      - name: Prepare weights
        run: |
          ln -s /root/.cache/models ~/models
      - name: Prepare GSM8K dataset
        run: |
          python examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/.cache/datasets/openai/gsm8k
      - name: Running nightlyCI_ppo-qwen3-8b-fsdp-vllm_ascend
        run: |
          ray stop --force
          bash tests/special_npu/nightly_ci_ascend/run_ppo_qwen3-8b_fsdp_npu.sh

  # Test grpo qwen25-7b-Instruct fsdp+vllm
  nightlyCI_grpo-qwen25-7b-Instruct-fsdp-vllm_ascend:
    if: github.repository_owner == 'verl-project'
    runs-on: linux-aarch64-a2b3-8
    timeout-minutes: 180 # Increase this timeout value as needed
    container:
      image: swr.cn-southwest-2.myhuaweicloud.com/modelfoundry/ascend-ci/verl/verl:verl-8.5.0-910b-ubuntu22.04-py3.11-latest
      options: >-
        --shm-size 16g
    env:
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - name: Check npu and CANN info
        run: |
          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
          npu-smi info
      - name: Check initial pip list from image
        run: |
          pip list
      - name: Checkout verl-project/verl repo
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
          clean: true
      - name: Install the current repository
        run: |
          pip install -r requirements-npu.txt
          pip install --no-deps -e .
      - name: Check final pip list
        run: |
          pip list
      - name: Prepare weights
        run: |
          ln -s /root/.cache/models ~/models
      - name: Prepare GSM8K dataset
        run: |
          python examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/.cache/datasets/openai/gsm8k
      - name: Running nightlyCI_grpo-qwen25-7b-Instruct-fsdp-vllm_ascend
        run: |
          ray stop --force
          bash tests/special_npu/nightly_ci_ascend/run_grpo_qwen25-7b-instruct_fsdp_npu.sh

  # Test grpo qwen25-vl-3b-Instruct fsdp+vllm
  nightlyCI_grpo-qwen25-vl-3b-Instruct-fsdp-vllm_ascend:
    if: github.repository_owner == 'verl-project'
    runs-on: linux-aarch64-a2b3-8
    timeout-minutes: 180 # Increase this timeout value as needed
    container:
      image: swr.cn-southwest-2.myhuaweicloud.com/modelfoundry/ascend-ci/verl/verl:verl-8.5.0-910b-ubuntu22.04-py3.11-latest
      options: >-
        --shm-size 16g
    env:
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - name: Check npu and CANN info
        run: |
          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
          npu-smi info
      - name: Check initial pip list from image
        run: |
          pip list
      - name: Checkout verl-project/verl repo
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
          clean: true
      - name: Install the current repository
        run: |
          pip install -r requirements-npu.txt
          pip install --no-deps -e .
      - name: Check final pip list
        run: |
          pip list
      - name: Prepare weights
        run: |
          ln -s /root/.cache/models ~/models
      - name: Preprocess geo3k dataset
        run: |
          python examples/data_preprocess/geo3k.py --local_dataset_path ${HOME}/.cache/datasets/hiyouga/geometry3k
      - name: Running nightlyCI_grpo-qwen25-vl-3b-Instruct-fsdp-vllm_ascend
        run: |
          ray stop --force
          bash tests/special_npu/nightly_ci_ascend/run_grpo_qwen25-vl-3b-instruct_fsdp_npu.sh


================================================
FILE: .github/workflows/npu_unit_tests.yml
================================================
# # Tests layout

# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...

# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments

# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

# # Workflow layout

# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
#   - `npu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix on ascend device.
#   - Since cpu/gpu/npu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
#     - new workflow yaml is added to `.github/workflows`
#     - new tests are added to workflow mentioned in 2.

name: NPU unit tests

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch
  push:
    branches:
      - main
      - v0.*
    paths:
      - "**/*.py"
      - .github/workflows/npu_unit_tests.yml
  pull_request:
    branches:
      - main
    paths:
      # The order that you define paths patterns matters:
      # A matching negative pattern (prefixed with !) after a positive match will exclude the path.
      # A matching positive pattern after a negative match will include the path again.
      - "**/*.py"
      # Other entrypoints
      - "!examples/**"
      - "!verl/trainer/main_*.py"
      - "!verl/trainer/fsdp_sft_trainer.py"
      - "!recipe/**"
      # Entrypoints
      - .github/workflows/npu_unit_tests.yml
      - "tests/**test_*.py"
      # Ignore CPU tests
      - "!tests/*_on_cpu.py"

# Cancel jobs on the same ref if a new one is triggered
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

# Declare permissions just read content.
permissions:
  contents: read

jobs:
  npu_unit_tests:
    if: github.repository_owner == 'verl-project'
    runs-on: linux-aarch64-a2b3-8
    timeout-minutes: 60 # Increase this timeout value as needed
    container:
      image: swr.cn-southwest-2.myhuaweicloud.com/modelfoundry/ascend-ci/verl/verl:verl-8.5.0-910b-ubuntu22.04-py3.11-latest
      options: >-
        --shm-size 16g
    env:
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
    steps:
      - name: Check npu and CANN info
        run: |
          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
          npu-smi info
      - name: Check initial pip list from image
        run: |
          pip list
      - name: Checkout volcengine/verl repo
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
          clean: true
      - name: Install the current repository
        run: |
          pip install -r requirements-npu.txt
          pip install --no-deps -e .[test]
          pip install mlflow pytest-asyncio
      - name: Check final pip list
        run: |
          pip list
      - name: Prepare weights
        run: |
          ln -s /root/.cache/models ~/models
      - name: Run all NPU unit tests
        run: |
          pytest -s -x --ignore-glob="*test_special_*.py" --ignore-glob="*on_cpu.py" --ignore-glob="*test_vllm*" --ignore-glob="*_sglang*" --ignore-glob="*_hf_rollout*" --ignore-glob="tests/models/" --ignore-glob="tests/special*" --ignore-glob="tests/experimental" --ignore-glob="tests/workers/reward_model" --ignore-glob="*test_rvdz*" --ignore-glob="*test_ray_collectives*" --ignore-glob="*test_nvtx_profile*" --ignore-glob="tests/checkpoint_engine" --ignore-glob="*test_shared_memory*" --ignore-glob="tests/workers/rollout/rollout_trtllm" --ignore-glob="*test_fsdp_lora_merge*" --ignore-glob="*test_activation_offload*" --ignore-glob="*test_normalize_peft_param_name.py*" tests/
      - name: Testing activation offload
        run: |
          pytest -s -x tests/utils/test_activation_offload.py
      - name: Testing normalize peft param name
        run: |
          pytest -s -x tests/utils/test_normalize_peft_param_name.py
      - name: Testing FSDP2 actor functionality
        run: |
          torchrun --standalone --nnodes=1 --nproc-per-node=2 tests/workers/actor/test_special_dp_actor.py
      - name: Testing FSDP2 critic functionality
        run: |
          torchrun --standalone --nnodes=1 --nproc-per-node=2 tests/workers/critic/test_special_dp_critic.py
      - name: Running NPU profiling unit tests
        run: |
          pytest -s -x tests/utils/test_special_mstx_profile.py


================================================
FILE: .github/workflows/pre-commit.yml
================================================
# c.f. https://github.com/pre-commit/action?tab=readme-ov-file#using-this-action
name: pre-commit

# No need to avoid / cancel lightweight pre-commit jobs
on:
  schedule:
    - cron: "0 0 * * 0"
  pull_request:
  push:
    branches:
      - main
      - v0.*
  # Allow manual triggering
  workflow_dispatch:

# Declare permissions just read content.
permissions:
  contents: read

jobs:
  pre-commit:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ["3.12"]
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
      - name: Set up Python ${{ matrix.python-version }}
        uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
        with:
          python-version: ${{ matrix.python-version }}
      - name: Install the current repository
        run: |
          pip install pre-commit hydra-core
          pip install --no-deps -e .
      - name: Set ruff --output-format=github
        run: |
          sed -i 's/--output-format=full/--output-format=github/' .pre-commit-config.yaml
          git add .pre-commit-config.yaml
      # Check "--all-files" by default
      - uses: pre-commit/action@v3.0.1


================================================
FILE: .github/workflows/precommit-autofix.yml
================================================
name: scheduled pre-commit autofix

on:
  schedule:
    # Every hour
    - cron: "0 * * * *"
  workflow_dispatch:

permissions:
  contents: write
  pull-requests: write

jobs:
  precommit:
    if: github.repository_owner == 'verl-project'
    runs-on: ubuntu-latest

    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.10"

      - name: Install pre-commit
        run: |
          python -m pip install --upgrade pip
          pip install pre-commit hydra-core

      - name: Run pre-commit
        run: |
          pre-commit run --all-files || true

      - name: Create or update PR
        uses: peter-evans/create-pull-request@v6
        with:
          branch: bot/precommit-autofix
          delete-branch: true
          title: "[ci] chore: scheduled pre-commit autofix"
          commit-message: "chore: auto-fix pre-commit issues"
          body: |
            This PR was created automatically by a scheduled GitHub Action.

            - Runs `pre-commit run --all-files`
            - Triggered hourly
          labels: |
            automated
            pre-commit


================================================
FILE: .github/workflows/reward_model_sglang.yml
================================================
# # Tests layout

# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...

# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments

# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

# # Workflow layout

# All CI tests are configured by yaml files in `.github/workflows/`. Here's an overview of all test configs:
# 1. A list of always triggered CPU sanity tests: `check-pr-title.yml`, `secrets_scan.yml`, `check-pr-title,yml`, `pre-commit.yml`, `doc.yml`
# 2. Some heavy multi-GPU unit tests, such as `model.yml`, `vllm.yml`, `sgl.yml`
# 3. End-to-end tests: `e2e_*.yml`
# 4. Unit tests
#   - `cpu_unit_tests.yml`, run pytest on all scripts with file name pattern `tests/**/test_*_on_cpu.py`
#   - `gpu_unit_tests.yml`, run pytest on all scripts with file without the `on_cpu.py` suffix.
#   - Since cpu/gpu unit tests by default runs all tests under `tests`, please make sure tests are manually excluded in them when
#     - new workflow yaml is added to `.github/workflows`
#     - new tests are added to workflow mentioned in 2.
# name: Check PR Title

name: reward_model_sglang

on:
  # Trigger the workflow on push or pull request,
  # but only for the main branch
  push:
    branches:
      - main
      - v0.*
  pull_request:
    branches:
      - main
      - v0.*
    paths:
      - "verl/**/*.py"
      # Entrypoints
      - ".github/workflows/reward_model_sglang.yml"
      - "tests/experimental/reward_loop/**"

# Cancel jobs on the same ref if a new one is triggered
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

# Declare permissions just read content.
permissions:
  contents: read

env:
  IMAGE: "verl-ci-cn-beijing.cr.volces.com/verlai/verl:sgl059.dev2"
  DYNAMIC_RUNNER_ENDPOINT: "https://sd10g3clalm04ug7alq90.apigateway-cn-beijing.volceapi.com/runner"

jobs:
  setup:
    if: github.repository_owner == 'verl-project'
    runs-on: ubuntu-latest
    outputs:
      runner-label: ${{ steps.create-runner.outputs.runner-label }}
      mlp-task-id: ${{ steps.create-runner.outputs.mlp-task-id }}
    steps:
      - uses: actions/checkout@v4
      - id: create-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "create"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-image: "${{ env.IMAGE }}"

  reward_model_sglang:
    needs: setup
    runs-on: ["${{ needs.setup.outputs.runner-label || 'L20x8' }}"]
    timeout-minutes: 30 # Increase this timeout value as needed
    env:
      HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
      HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
      NO_PROXY: "localhost,127.0.0.1,hf-mirror.com"
      HF_ENDPOINT: "https://hf-mirror.com"
      HF_HUB_ENABLE_HF_TRANSFER: "0" # This is more stable
      SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK: "True"
      NCCL_SHM_DISABLE: "1"
      NCCL_P2P_DISABLE: "1"
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
        with:
          fetch-depth: 0
      - name: Install the current repository
        run: |
          pip3 install -r requirements-test.txt
          pip3 install --no-deps -e .
          pip3 install sglang-router==0.2.2
      - name: Prepare gsm8k dataset
        run: |
          ray stop --force
          python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k --local_dir ${HOME}/data/gsm8k
      - name: Running sglang generative reward model tests on 8 L20 GPUs
        run: |
          unset http_proxy https_proxy HTTP_PROXY HTTPS_PROXY
          ROLLOUT_NAME=sglang pytest -s -x tests/experimental/reward_loop/test_reward_model_genrm.py
      - name: Running sglang discriminative reward model tests on 8 L20 GPUs
        run: |
          unset http_proxy https_proxy HTTP_PROXY HTTPS_PROXY
          ROLLOUT_NAME=sglang pytest -s -x tests/experimental/reward_loop/test_reward_model_disrm.py
      - name: Running sglang agent loop with reward manager tests on 8 L20 GPUs
        run: |
          unset http_proxy https_proxy HTTP_PROXY HTTPS_PROXY
          ROLLOUT_NAME=sglang pytest -s -x tests/experimental/reward_loop/test_agent_reward_loop_standalone.py
      - name: Running sglang agent loop with reward model colocate tests on 8 L20 GPUs
        run: |
          unset http_proxy https_proxy HTTP_PROXY HTTPS_PROXY
          ROLLOUT_NAME=sglang pytest -s -x tests/experimental/reward_loop/test_agent_reward_loop_colocate.py

  cleanup:
    runs-on: ubuntu-latest
    needs: [setup, reward_model_sglang]
    if: always()
    steps:
      - id: destroy-runner
        uses: volcengine/vemlp-github-runner@v1
        with:
          mode: "destroy"
          faas-url: "${{ env.DYNAMIC_RUNNER_ENDPOINT }}"
          mlp-task-id: "${{ needs.setup.outputs.mlp-task-id }}"


================================================
FILE: .github/workflows/reward_model_vllm.yml
================================================
# # Tests layout

# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:
# - `tests/trainer` for testing functionality related to `verl/trainer`
# - `tests/models` for testing functionality related to `verl/models`
# - ...

# There are a few folders with `special_` prefix, created for special purposes:
# - `special_distributed`: unit tests that must run with multiple GPUs
# - `special_e2e`: end-to-end tests with training/generation scripts
# - `special_npu`: tests for NPUs
# - `special_sanity`: a suite of quick sanity tests
# - `special_standalone`: a set of test that are designed to run in dedicated environments

# Accelerators for tests
# - By default tests are run with GPU available, except for the ones under `special_npu`, and any test script whose name ends with `on_cpu.py`.
# - For test scripts with `on_cpu.py` name suffix would be tested on CPU resources in linux environment.

# # Workflow layout

# All CI tests are configured by yaml

Download .txt

gitextract_5e2u4bw9/

├── .gemini/
│   └── config.yaml
├── .git-blame-ignore-revs
├── .github/
│   ├── CODEOWNERS
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug-report.yml
│   │   ├── config.yml
│   │   └── feature-request.yml
│   ├── PULL_REQUEST_TEMPLATE.md
│   ├── dependabot.yml
│   └── workflows/
│       ├── README.md
│       ├── check-pr-title.yml
│       ├── cpu_unit_tests.yml
│       ├── doc.yml
│       ├── docker-build-ascend-a2.yml
│       ├── docker-build-ascend-a3.yml
│       ├── e2e_ascend.yml
│       ├── e2e_fully_async_policy.yml
│       ├── e2e_fully_async_policy_ascend.yml
│       ├── e2e_one_step_off_policy.yml
│       ├── e2e_one_step_off_policy_ascend.yml
│       ├── e2e_ppo_grpo_trainer_trtllm.yml
│       ├── e2e_ppo_trainer.yml
│       ├── e2e_ppo_trainer_megatron_sglang.yml
│       ├── e2e_ppo_trainer_megatron_sglang_2.yml
│       ├── e2e_ppo_trainer_megatron_vllm.yml
│       ├── e2e_ppo_trainer_megatron_vllm_2.yml
│       ├── e2e_ppo_trainer_megatron_vllm_2_ascend.yml
│       ├── e2e_ppo_trainer_veomni_vllm.yml
│       ├── e2e_sft_llm.yml
│       ├── e2e_sft_llm_ascend.yml
│       ├── e2e_sft_vlm.yml
│       ├── gpu_unit_tests.yml
│       ├── model.yml
│       ├── model_ascend.yml
│       ├── nightly_ascend.yml
│       ├── npu_unit_tests.yml
│       ├── pre-commit.yml
│       ├── precommit-autofix.yml
│       ├── reward_model_sglang.yml
│       ├── reward_model_vllm.yml
│       ├── reward_model_vllm_ascend.yml
│       ├── sanity.yml
│       ├── scorecard.yml
│       ├── secrets_scan.yml
│       ├── sgl.yml
│       ├── type-coverage-check.yml
│       └── vllm.yml
├── .gitignore
├── .gitmodules
├── .pre-commit-config.yaml
├── .readthedocs.yaml
├── CONTRIBUTING.md
├── LICENSE
├── Notice.txt
├── README.md
├── docker/
│   ├── Dockerfile.isaaclab230
│   ├── Dockerfile.stable.sglang
│   ├── Dockerfile.stable.trtllm
│   ├── Dockerfile.stable.vllm
│   ├── README.md
│   ├── ascend/
│   │   ├── Dockerfile.ascend.sglang_8.3.rc1_a2
│   │   ├── Dockerfile.ascend.sglang_8.3.rc1_a3
│   │   ├── Dockerfile.ascend_8.2.rc1_a2
│   │   ├── Dockerfile.ascend_8.2.rc1_a3
│   │   ├── Dockerfile.ascend_8.3.rc1_a2
│   │   ├── Dockerfile.ascend_8.3.rc1_a3
│   │   ├── Dockerfile.ascend_8.5.0_a2
│   │   └── Dockerfile.ascend_8.5.0_a3
│   ├── aws/
│   │   ├── Dockerfile.extention.awsefa
│   │   └── Dockerfile.ngc.vllm0.8.sagemaker
│   ├── rocm/
│   │   ├── Apptainerfile.rocm
│   │   ├── Dockerfile.rocm
│   │   ├── Dockerfile.rocm7
│   │   ├── Dockerfile.rocm_verl-0.3.0.post1
│   │   └── Dockerfile.rocm_verl-0.4.1
│   ├── verl0.4-cu124-torch2.6-fa2.7.4/
│   │   ├── Dockerfile.app.sglang.vllm.mcore0.12
│   │   ├── Dockerfile.app.sglang.vllm.mcore0.12.deepep
│   │   ├── Dockerfile.app.sglang.vllm.mcore0.13.preview
│   │   ├── Dockerfile.app.vllm.mcore0.12
│   │   ├── Dockerfile.app.vllm.mcore0.12.deepep
│   │   ├── Dockerfile.app.vllm.mcore0.13.preview
│   │   ├── Dockerfile.base
│   │   └── README.md
│   ├── verl0.5-cu126-torch2.7-fa2.7.4/
│   │   ├── Dockerfile.app.sglang0.4.10.post2.mcore0.13
│   │   ├── Dockerfile.app.sglang0.4.9.post6.mcore0.13
│   │   ├── Dockerfile.app.vllm.mcore0.13
│   │   ├── Dockerfile.app.vllm.mcore0.15
│   │   ├── Dockerfile.base.torch2.7.1
│   │   └── README.md
│   ├── verl0.5-cu126-torch2.7.1-fa2.8.0/
│   │   ├── Dockerfile.app.sglang.mcore0.12
│   │   ├── Dockerfile.app.sglang.mcore0.13.preview
│   │   ├── Dockerfile.base
│   │   └── README.md
│   ├── verl0.5-preview-cu128-torch2.7.1-fa2.8.0/
│   │   ├── Dockerfile.app.sglang.megatron
│   │   ├── Dockerfile.base
│   │   └── README.md
│   ├── verl0.6-cu128-torch2.8.0-fa2.7.4/
│   │   ├── Dockerfile.app.sglang
│   │   ├── Dockerfile.base
│   │   └── Dockerfile.vllm011.mcore_gpt-oss
│   └── verl0.6.1-experimental/
│       ├── Dockerfile.sglang056exp
│       └── Dockerfile.vllm012exp
├── docs/
│   ├── Makefile
│   ├── README.md
│   ├── README_vllm0.7.md
│   ├── README_vllm0.8.md
│   ├── _static/
│   │   ├── custom.css
│   │   └── js/
│   │       ├── resizable-sidebar.js
│   │       └── runllm-widget.js
│   ├── advance/
│   │   ├── agent_loop.rst
│   │   ├── async-on-policy-distill.md
│   │   ├── attention_implementation.rst
│   │   ├── checkpoint.rst
│   │   ├── dpo_extension.rst
│   │   ├── fp8.md
│   │   ├── fsdp_extension.rst
│   │   ├── fully_async.md
│   │   ├── grafana_prometheus.md
│   │   ├── megatron_extension.rst
│   │   ├── mtp.md
│   │   ├── one_step_off.md
│   │   ├── placement.rst
│   │   ├── ppo_lora.rst
│   │   ├── reward_loop.rst
│   │   ├── rollout_skip.rst
│   │   ├── rollout_trace.rst
│   │   └── rope.rst
│   ├── algo/
│   │   ├── baseline.md
│   │   ├── collabllm.md
│   │   ├── dapo.md
│   │   ├── dppo.md
│   │   ├── entropy.md
│   │   ├── gpg.md
│   │   ├── grpo.md
│   │   ├── opo.md
│   │   ├── otb.md
│   │   ├── ppo.md
│   │   ├── rollout_corr.md
│   │   ├── rollout_corr_math.md
│   │   ├── spin.md
│   │   └── sppo.md
│   ├── amd_tutorial/
│   │   ├── amd_build_dockerfile_page.rst
│   │   └── amd_vllm_page.rst
│   ├── api/
│   │   ├── data.rst
│   │   ├── single_controller.rst
│   │   ├── trainer.rst
│   │   └── utils.rst
│   ├── ascend_tutorial/
│   │   ├── contribution_guide/
│   │   │   └── ascend_ci_guide_zh.rst
│   │   ├── examples/
│   │   │   ├── ascend_performance_analysis_guide.md
│   │   │   ├── ascend_retool_best_pratice.rst
│   │   │   ├── ascend_sglang_best_practices.rst
│   │   │   ├── dapo_multi_model_optimization_practice.md
│   │   │   ├── gspo_optimization_practice.md
│   │   │   └── run_qwen3_32B_megatron_1k_256k_npu.md
│   │   ├── faq/
│   │   │   └── faq.rst
│   │   ├── features/
│   │   │   ├── ascend_backend_features.md
│   │   │   └── ascend_consistency.rst
│   │   ├── profiling/
│   │   │   ├── ascend_profiling_en.rst
│   │   │   └── ascend_profiling_zh.rst
│   │   └── quick_start/
│   │       ├── ascend_quick_start.rst
│   │       ├── ascend_sglang_quick_start.rst
│   │       └── dockerfile_build_guidance.rst
│   ├── blog/
│   │   └── v0.7.md
│   ├── conf.py
│   ├── data/
│   │   └── transfer_queue.md
│   ├── examples/
│   │   ├── config.rst
│   │   ├── gsm8k_example.rst
│   │   ├── multi_modal_example.rst
│   │   ├── ppo_code_architecture.rst
│   │   ├── sandbox_fusion_example.rst
│   │   └── skypilot_examples.rst
│   ├── faq/
│   │   └── faq.rst
│   ├── hybrid_flow.rst
│   ├── index.rst
│   ├── perf/
│   │   ├── best_practices.rst
│   │   ├── device_tuning.rst
│   │   ├── dpsk.md
│   │   ├── nsight_profiling.md
│   │   ├── perf_tuning.rst
│   │   ├── perf_tuning_on_ascend.rst
│   │   ├── torch_profiling.md
│   │   └── verl_profiler_system.md
│   ├── preparation/
│   │   ├── prepare_data.rst
│   │   └── reward_function.rst
│   ├── requirements-docs.txt
│   ├── sglang_multiturn/
│   │   ├── interaction_system.rst
│   │   ├── multiturn.rst
│   │   ├── sandbox_fusion.rst
│   │   └── search_tool_example.rst
│   ├── single_controller.rst
│   ├── start/
│   │   ├── agentic_rl.rst
│   │   ├── install.rst
│   │   ├── more_resources.rst
│   │   ├── multinode.rst
│   │   ├── quickstart.rst
│   │   └── ray_debug_tutorial.rst
│   └── workers/
│       ├── automodel_workers.rst
│       ├── fsdp_workers.rst
│       ├── megatron_workers.rst
│       ├── model_engine.rst
│       ├── ray_trainer.rst
│       ├── sglang_worker.rst
│       └── trtllm_worker.rst
├── examples/
│   ├── cispo_trainer/
│   │   └── run_cispo_qwen2_5_0_5b_gsm8k.sh
│   ├── data_preprocess/
│   │   ├── aime2024_multiturn_w_tool.py
│   │   ├── dapo_multiturn_w_tool.py
│   │   ├── full_hh_rlhf.py
│   │   ├── geo3k.py
│   │   ├── geo3k_multiturn_w_tool.py
│   │   ├── gsm8k.py
│   │   ├── gsm8k_multiturn_sft.py
│   │   ├── gsm8k_multiturn_w_interaction.py
│   │   ├── gsm8k_multiturn_w_tool.py
│   │   ├── gsm8k_tool_agent_loop.py
│   │   ├── hellaswag.py
│   │   ├── math_dataset.py
│   │   ├── multiturn.py
│   │   ├── pokemon.py
│   │   └── preprocess_search_r1_dataset.py
│   ├── dppo_trainer/
│   │   ├── dppo.md
│   │   └── run_qwen30b_dppo.sh
│   ├── fapo_trainer/
│   │   ├── README.md
│   │   ├── prepare_data.py
│   │   ├── reward_fn.py
│   │   ├── run_qwen_7b_rm_colocate.sh
│   │   └── run_qwen_7b_rm_standalone.sh
│   ├── gdpo_trainer/
│   │   └── run_qwen1_5b_gdpo.sh
│   ├── generation/
│   │   ├── run_deepseek7b_mutli_node.sh
│   │   └── run_deepseek_v2_lite_math.sh
│   ├── gmpo_trainer/
│   │   ├── README.md
│   │   ├── run_qwen2_5-7b_math.sh
│   │   ├── test_dapo_7b_math.sh
│   │   └── test_dapo_qwen3_30b_math.sh
│   ├── gpg_trainer/
│   │   ├── gpg.md
│   │   ├── run_qwen2-7b_math.sh
│   │   └── run_qwen2-7b_math_megatron.sh
│   ├── grpo_trainer/
│   │   ├── README.md
│   │   ├── run_deepseek671b_math_megatron_80gb.sh
│   │   ├── run_deepseek671b_math_megatron_96gb.sh
│   │   ├── run_deepseek7b_llm.sh
│   │   ├── run_deepseek7b_llm_math.sh
│   │   ├── run_deepseek7b_llm_math_megatron.sh
│   │   ├── run_deepseek7b_llm_seq_balance.sh
│   │   ├── run_glm41v_9b.sh
│   │   ├── run_gptoss_20b.sh
│   │   ├── run_minicpmo2_6.sh
│   │   ├── run_mistral13b_skyworkrm_hhrlhf.sh
│   │   ├── run_moonlight16b_math_megatron.sh
│   │   ├── run_nemotron_nano_v3_megatron.sh
│   │   ├── run_qwen2-32b_sglang_fsdp_npu.sh
│   │   ├── run_qwen2-7b.sh
│   │   ├── run_qwen2-7b_math.sh
│   │   ├── run_qwen2-7b_math_megatron.sh
│   │   ├── run_qwen2-7b_math_megatron_lora.sh
│   │   ├── run_qwen2-7b_math_megatron_trtllm.sh
│   │   ├── run_qwen2-7b_math_trtllm.sh
│   │   ├── run_qwen2-7b_seq_balance.sh
│   │   ├── run_qwen2-7b_seq_balance_math_megatron.sh
│   │   ├── run_qwen2-7b_sgl_megatron.sh
│   │   ├── run_qwen2_5-32b_grpo_megatron_vllm_npu.sh
│   │   ├── run_qwen2_5-3b_gsm8k_grpo_lora.sh
│   │   ├── run_qwen2_5-3b_gsm8k_grpo_lora_from_adapter.sh
│   │   ├── run_qwen2_5-7b_math_megatron_diff_tp.sh
│   │   ├── run_qwen2_5_32b_grpo_npu.sh
│   │   ├── run_qwen2_5_7b_grpo_discrete_prof_npu.sh
│   │   ├── run_qwen2_5_7b_grpo_e2e_prof_npu.sh
│   │   ├── run_qwen2_5_7b_grpo_npu.sh
│   │   ├── run_qwen2_5_vl-7b-megatron.sh
│   │   ├── run_qwen2_5_vl-7b-sglang.sh
│   │   ├── run_qwen2_5_vl-7b-trtllm.sh
│   │   ├── run_qwen2_5_vl-7b.sh
│   │   ├── run_qwen2_5_vl-7b_freeze_vision.sh
│   │   ├── run_qwen2_5_vl-7b_lora.sh
│   │   ├── run_qwen2_5_vl-7b_seq_balance.sh
│   │   ├── run_qwen2_5_vl_32b_npu.sh
│   │   ├── run_qwen2_5_vl_3b_npu.sh
│   │   ├── run_qwen2_5_vl_3b_trtllm.sh
│   │   ├── run_qwen2_5_vl_7b_npu.sh
│   │   ├── run_qwen3-235b_megatron_96gb.sh
│   │   ├── run_qwen3-30b_dapo_megatron_fp8_trtllm.sh
│   │   ├── run_qwen3-32b_npu.sh
│   │   ├── run_qwen3-4b_gsm8k_grpo_lora_merge.sh
│   │   ├── run_qwen3-8b.sh
│   │   ├── run_qwen3-8b_npu.sh
│   │   ├── run_qwen3_235b_megatron_npu.sh
│   │   ├── run_qwen3_4b_grpo_vllm_1k_npu.sh
│   │   ├── run_qwen3_5-35b-megatron.sh
│   │   ├── run_qwen3_8b_grpo_sglang_1k_spmd_npu.sh
│   │   ├── run_qwen3_8b_grpo_sglang_32k_spmd_npu.sh
│   │   ├── run_qwen3_vl-235b-megatron.sh
│   │   ├── run_qwen3_vl-30b-megatron.sh
│   │   ├── run_qwen3_vl-8b-megatron.sh
│   │   ├── run_qwen3_vl-8b_npu.sh
│   │   ├── run_qwen3_vl_30b_vllm_fsdp_npu.sh
│   │   ├── run_qwen3moe-30b_grpo_megatron_vllm_npu.sh
│   │   ├── run_qwen3moe-30b_megatron_96gb.sh
│   │   ├── run_qwen3moe-30b_megatron_lora.sh
│   │   ├── run_qwen3moe-30b_megatron_lora_fp16.sh
│   │   ├── run_qwen3moe-30b_sglang_megatron_npu.sh
│   │   ├── run_qwen3next_80b_fsdp_npu.sh
│   │   └── run_seed_oss_36b.sh
│   ├── gspo_trainer/
│   │   ├── run_qwen30b_gspo.sh
│   │   ├── run_qwen3_32b_gspo_npu.sh
│   │   ├── test_gspo_3b_math.sh
│   │   ├── test_gspo_3b_math_slurm.sh
│   │   └── test_gspo_qwen30b_a3b_ep.sh
│   ├── mtp_trainer/
│   │   ├── runtime_env.yaml
│   │   ├── test_dapo_mimo_7b_with_mtp_math_megatron.sh
│   │   └── test_dapo_mimo_7b_with_mtp_math_megatron_4_4.sh
│   ├── otb_trainer/
│   │   └── run_qwen2_5-7b.sh
│   ├── ppo_trainer/
│   │   ├── README.md
│   │   ├── run_deepseek7b_llm.sh
│   │   ├── run_deepseek7b_llm_modelscope.sh
│   │   ├── run_deepseek7b_llm_pfppo.sh
│   │   ├── run_deepseek7b_llm_sandbox_fusion.sh
│   │   ├── run_deepseek7b_llm_sp2.sh
│   │   ├── run_deepseek_full_hh_rlhf.sh
│   │   ├── run_deepseek_math_gsm8k_megatron.sh
│   │   ├── run_deepseek_math_gsm8k_megatron_nsys.sh
│   │   ├── run_gemma.sh
│   │   ├── run_moonlight16b_a3b_gsm8k_megatron.sh
│   │   ├── run_qwen1.5_moe_a2.7b-gsm8k_megatron.sh
│   │   ├── run_qwen2-7b_math_gsm8k_megatron.sh
│   │   ├── run_qwen2-7b_rm.sh
│   │   ├── run_qwen2-7b_rm_reward_loop_colocate.sh
│   │   ├── run_qwen2-7b_rm_seq_balance.sh
│   │   ├── run_qwen2-7b_rm_seq_balance_fused_kernels.sh
│   │   ├── run_qwen2-7b_rm_seq_balance_nsys.sh
│   │   ├── run_qwen2-7b_seq_balance.sh
│   │   ├── run_qwen2-7b_sglang_seq_balance.sh
│   │   ├── run_qwen2.5-32b.sh
│   │   ├── run_qwen2.5-3b_rm_reward_loop_colocate.sh
│   │   └── run_qwen3-8b_npu.sh
│   ├── prefix_grouper/
│   │   ├── README.md
│   │   └── run_qwen3_prefix_grouper.sh
│   ├── ray/
│   │   └── tutorial.ipynb
│   ├── reinforce_plus_plus_trainer/
│   │   ├── run_qwen2-7b_math_rf.sh
│   │   └── run_qwen2-7b_math_rf_baseline.sh
│   ├── remax_trainer/
│   │   ├── run_qwen2.5-3b_seq_balance.sh
│   │   └── run_qwen2.5-7b_seq_balance.sh
│   ├── rloo_trainer/
│   │   └── run_qwen2-7b.sh
│   ├── rollout_correction/
│   │   ├── run_with_rollout_corr.sh
│   │   └── run_with_rollout_corr_multi_rs.sh
│   ├── router_replay/
│   │   ├── README.md
│   │   ├── run_qwen30_a3b_megatron_sglang.sh
│   │   └── run_qwen30_a3b_megatron_vllm.sh
│   ├── sapo_trainer/
│   │   ├── run_qwen30b_sapo.sh
│   │   └── run_qwen3_8b_sapo_npu.sh
│   ├── sft/
│   │   ├── gsm8k/
│   │   │   ├── run_deepseek_6b7.sh
│   │   │   ├── run_gemma_2b.sh
│   │   │   ├── run_gemma_7b.sh
│   │   │   ├── run_mimo_megatron_mtp.sh
│   │   │   ├── run_nemotron_nano_v3.sh
│   │   │   ├── run_qwen3_30b_automodel.sh
│   │   │   ├── run_qwen3_5_megatron.sh
│   │   │   ├── run_qwen3_8b_sft_peft_sp2_npu.sh
│   │   │   ├── run_qwen_05_automodel.sh
│   │   │   ├── run_qwen_05_peft.sh
│   │   │   ├── run_qwen_05_sp2.sh
│   │   │   ├── run_qwen_05_sp2_liger.sh
│   │   │   └── run_seed_oss_36b_sft.sh
│   │   ├── multiturn/
│   │   │   └── run_qwen_05_sp2.sh
│   │   └── vlm/
│   │       └── run_qwen3_vl_2b.sh
│   ├── sglang_multiturn/
│   │   ├── README.md
│   │   ├── config/
│   │   │   ├── geo3k_multiturn_grpo.yaml
│   │   │   ├── geo3k_multiturn_megatron_grpo.yaml
│   │   │   ├── gsm8k_multiturn_grpo.yaml
│   │   │   ├── gsm8k_multiturn_grpo_server.yaml
│   │   │   ├── gsm8k_multiturn_grpo_w_interaction.yaml
│   │   │   ├── gsm8k_multiturn_megatron_grpo.yaml
│   │   │   ├── interaction_config/
│   │   │   │   └── gsm8k_interaction_config.yaml
│   │   │   ├── retool_multiturn_grpo.yaml
│   │   │   ├── search_multiturn_grpo.yaml
│   │   │   ├── search_multiturn_grpo_one_step_off.yaml
│   │   │   └── tool_config/
│   │   │       ├── geo3k_tool_config.yaml
│   │   │       ├── gsm8k_tool_config.yaml
│   │   │       ├── mcp_server.json
│   │   │       ├── mcp_tool_config.yaml
│   │   │       ├── sandbox_fusion_tool_config.yaml
│   │   │       └── search_tool_config.yaml
│   │   ├── geo3k/
│   │   │   ├── run_qwen2.5-3b_geo3k_multiturn.sh
│   │   │   ├── run_qwen2.5-3b_geo3k_multiturn_4xgpu.sh
│   │   │   └── run_qwen2.5-3b_megatron_geo3k_multiturn.sh
│   │   ├── gsm8k_toolcall_shaping/
│   │   │   ├── gsm8k_toolcall_shaping.py
│   │   │   └── run_gsm8k_grpo_toolcall_shaping.sh
│   │   ├── run_qwen0.5b_gsm8k_multiturn_curriculum.sh
│   │   ├── run_qwen2.5-0.5b_gsm8k_multiturn_w_interaction.sh
│   │   ├── run_qwen2.5-3b_gsm8k_multiturn.sh
│   │   ├── run_qwen2.5-3b_gsm8k_multiturn_4xgpu.sh
│   │   ├── run_qwen2.5-3b_gsm8k_multiturn_4xgpu_server.sh
│   │   ├── run_qwen2.5-3b_gsm8k_multiturn_server.sh
│   │   ├── run_qwen2.5-3b_gsm8k_multiturn_vllm_fsdp.sh
│   │   ├── run_qwen2.5-3b_gsm8k_tool_agent_mlflow.sh
│   │   ├── run_qwen2.5-3b_megatron_gsm8k_multiturn.sh
│   │   ├── run_qwen3-4b_gsm8k_multiturn.sh
│   │   ├── run_qwen3_4b_dapo_multiturn.sh
│   │   └── search_r1_like/
│   │       ├── local_dense_retriever/
│   │       │   ├── download.py
│   │       │   └── retrieval_server.py
│   │       └── run_qwen2.5-3b_instruct_search_multiturn.sh
│   ├── skypilot/
│   │   ├── README.md
│   │   ├── verl-grpo.yaml
│   │   ├── verl-multiturn-tools.yaml
│   │   └── verl-ppo.yaml
│   ├── slurm/
│   │   └── ray_on_slurm.slurm
│   ├── split_placement/
│   │   ├── README.md
│   │   ├── config/
│   │   │   └── ppo_trainer_split.yaml
│   │   ├── main_ppo_split.py
│   │   ├── run_deepseek7b_llm.sh
│   │   └── split_monkey_patch.py
│   ├── tuning/
│   │   ├── 0.5b/
│   │   │   └── qwen2-0.5b_grpo-lora_1_h100_fsdp_vllm.sh
│   │   ├── 1.5b/
│   │   │   └── qwen2-1.5b_grpo-lora_1_h100_fsdp_vllm.sh
│   │   ├── 14b/
│   │   │   ├── qwen2-14b_grpo-lora_2_h100_fsdp_vllm.sh
│   │   │   └── qwen2_14b_grpo_4_h800_fsdp_vllm.sh
│   │   ├── 32b/
│   │   │   ├── qwen2-32b_grpo-lora_4_h100_fsdp_vllm.sh
│   │   │   └── qwen2_32B_grpo_8_h20_megatron_vllm.sh
│   │   ├── 3b/
│   │   │   └── qwen2-3b_grpo-lora_1_h100_fsdp_vllm.sh
│   │   ├── 70b/
│   │   │   ├── qwen2-70b_grpo_32_h20_fsdp_vllm.sh
│   │   │   ├── qwen2-70b_grpo_32_h800_fsdp_vllm.sh
│   │   │   └── qwen2-72b_grpo-lora_8_h100_fsdp_vllm.sh
│   │   └── 7b/
│   │       ├── qwen2-7b_grpo-lora_1_h100_fsdp_vllm.sh
│   │       └── qwen2-7b_grpo_2_h800_fsdp_vllm.sh
│   └── tutorial/
│       └── agent_loop_get_started/
│           ├── agent_loop_tutorial.ipynb
│           └── sandbox.py
├── pyproject.toml
├── requirements-cuda.txt
├── requirements-npu.txt
├── requirements-test.txt
├── requirements.txt
├── requirements_sglang.txt
├── scripts/
│   ├── __init__.py
│   ├── converter_hf_to_mcore.py
│   ├── diagnose.py
│   ├── generate_trainer_config.sh
│   ├── init_random_model.py
│   ├── install_sglang_mcore_npu.sh
│   ├── install_vllm_sglang_mcore.sh
│   ├── legacy_model_merger.py
│   ├── megatron_merge_lora.py
│   ├── print_cfg.py
│   ├── rollout_viewer.py
│   └── veomni/
│       ├── moe_merge.py
│       └── moe_split.py
├── setup.py
├── tests/
│   ├── README.md
│   ├── __init__.py
│   ├── checkpoint_engine/
│   │   ├── __init__.py
│   │   ├── test_correctness_on_gpu.py
│   │   ├── test_correctness_on_npu.py
│   │   ├── test_special_server_adapter.py
│   │   └── test_utils.py
│   ├── experimental/
│   │   ├── agent_loop/
│   │   │   ├── agent_utils.py
│   │   │   ├── qwen_vl_tool_chat_template.jinja2
│   │   │   ├── test_agent_loop_extra_fields_schema_on_cpu.py
│   │   │   ├── test_basic_agent_loop.py
│   │   │   ├── test_gpt_oss_tool_parser.py
│   │   │   ├── test_multi_modal.py
│   │   │   └── test_standalone_rollout.py
│   │   ├── reward_loop/
│   │   │   ├── reward_fn.py
│   │   │   ├── test_agent_reward_loop_colocate.py
│   │   │   ├── test_agent_reward_loop_standalone.py
│   │   │   ├── test_async_token_bucket_on_cpu.py
│   │   │   ├── test_math_verify.py
│   │   │   ├── test_rate_limited_reward_manager_on_cpu.py
│   │   │   ├── test_reward_model_disrm.py
│   │   │   └── test_reward_model_genrm.py
│   │   └── vla/
│   │       └── test_sim_envs.py
│   ├── interactions/
│   │   ├── __init__.py
│   │   ├── test_gsm8k_interaction.py
│   │   └── test_interaction_registry.py
│   ├── kill_github_tests.sh
│   ├── models/
│   │   ├── test_engine.py
│   │   ├── test_tiled_mlp_accuracy.py
│   │   ├── test_transformer.py
│   │   └── test_transformers_ulysses.py
│   ├── single_controller/
│   │   ├── __init__.py
│   │   ├── base/
│   │   │   └── test_decorator.py
│   │   ├── check_worker_alive/
│   │   │   └── main.py
│   │   ├── detached_worker/
│   │   │   ├── README.md
│   │   │   ├── client.py
│   │   │   ├── run.sh
│   │   │   └── server.py
│   │   ├── test_auto_padding_on_cpu.py
│   │   ├── test_colocated_workers.py
│   │   ├── test_colocated_workers_fused.py
│   │   ├── test_data_transfer.py
│   │   ├── test_decorator_on_cpu.py
│   │   ├── test_device_mesh_register.py
│   │   ├── test_driverfunc_to_worker.py
│   │   ├── test_fused_workers_on_cpu.py
│   │   ├── test_get_set_dispatch_collect_cpu.py
│   │   ├── test_high_level_scheduling_api.py
│   │   ├── test_nested_worker.py
│   │   ├── test_ray_collectives.py
│   │   ├── test_ray_local_envs_on_cpu.py
│   │   ├── test_ray_utils_on_cpu.py
│   │   ├── test_rvdz.py
│   │   ├── test_split_resource_pool.py
│   │   ├── test_worker_group_basics.py
│   │   └── test_worker_group_torch.py
│   ├── special_distributed/
│   │   ├── README.md
│   │   ├── run_all.sh
│   │   ├── test_fsdp_ckpt.py
│   │   ├── test_mcore_config_converter.py
│   │   ├── test_tensor_dict.py
│   │   └── test_torch_functional.py
│   ├── special_e2e/
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── check_custom_rwd_fn.py
│   │   ├── check_results.py
│   │   ├── envs/
│   │   │   ├── __init__.py
│   │   │   └── digit_completion/
│   │   │       ├── __init__.py
│   │   │       ├── task.py
│   │   │       └── tokenizer.py
│   │   ├── generation/
│   │   │   ├── run_gen_qwen05.sh
│   │   │   └── run_gen_qwen05_server.sh
│   │   ├── ppo_trainer/
│   │   │   ├── expert_parallel/
│   │   │   │   ├── qwen2moe_minimal.json
│   │   │   │   └── qwen3moe_minimal.json
│   │   │   ├── run_function_reward.sh
│   │   │   ├── run_model_reward.sh
│   │   │   ├── run_single_gpu.sh
│   │   │   └── run_single_gpu_with_engine.sh
│   │   ├── run_dapo.sh
│   │   ├── run_fully_async_policy.sh
│   │   ├── run_geo3k_fsdp_sgl_multiturn_w_tool.sh
│   │   ├── run_grpo_lora_with_merge.sh
│   │   ├── run_gsm8k_fsdp_sgl_multiturn_sf_tool.sh
│   │   ├── run_gsm8k_fsdp_sgl_multiturn_w_tool.sh
│   │   ├── run_one_step_off_policy.sh
│   │   ├── run_ppo_trainer_megatron.sh
│   │   ├── run_ppo_trainer_torchtitan.sh
│   │   ├── run_ppo_trainer_veomni.sh
│   │   ├── run_test.sh
│   │   └── sft/
│   │       ├── compare_sft_engine_results.py
│   │       ├── run_sft.sh
│   │       ├── run_sft_engine.sh
│   │       └── test_sft_engine_all.sh
│   ├── special_npu/
│   │   ├── nightly_ci_ascend/
│   │   │   ├── run_grpo_qwen25-7b-instruct_fsdp_npu.sh
│   │   │   ├── run_grpo_qwen25-vl-3b-instruct_fsdp_npu.sh
│   │   │   └── run_ppo_qwen3-8b_fsdp_npu.sh
│   │   ├── run_qwen2_5_05b_grpo.sh
│   │   ├── run_qwen2_5_05b_grpo_mindspeed.sh
│   │   ├── run_qwen2_5_05b_sft_peft_sp2.sh
│   │   ├── run_qwen2_5_vl_3b_npu.sh
│   │   ├── run_qwen3_06b_ppo.sh
│   │   └── run_qwen3_30b_grpo_mindspeed.sh
│   ├── special_sanity/
│   │   ├── check_api_docs.py
│   │   ├── check_dataproto_usage.py
│   │   ├── check_device_api_usage.py
│   │   ├── check_docs_time_info.py
│   │   ├── check_docstrings.py
│   │   ├── check_license.py
│   │   ├── check_pr_description.py
│   │   ├── check_pr_title.py
│   │   ├── test_config_docs.py
│   │   ├── test_import.py
│   │   ├── type_coverage_check.py
│   │   ├── validate_imported_docs.py
│   │   └── validate_structure.py
│   ├── special_standalone/
│   │   ├── README.md
│   │   └── test_memory_buffers.py
│   ├── test_base_config_on_cpu.py
│   ├── test_protocol_on_cpu.py
│   ├── test_protocol_v2_on_cpu.py
│   ├── trainer/
│   │   ├── __init__.py
│   │   ├── config/
│   │   │   ├── __init__.py
│   │   │   ├── legacy_ppo_megatron_trainer.yaml
│   │   │   ├── legacy_ppo_trainer.yaml
│   │   │   ├── test_algo_config_on_cpu.py
│   │   │   └── test_legacy_config_on_cpu.py
│   │   └── ppo/
│   │       ├── __init__.py
│   │       ├── test_core_algos_on_cpu.py
│   │       ├── test_metric_utils_on_cpu.py
│   │       ├── test_rollout_corr.py
│   │       └── test_rollout_corr_integration.py
│   ├── utils/
│   │   ├── _test_module.py
│   │   ├── ckpt/
│   │   │   ├── test_checkpoint_cleanup_on_cpu.py
│   │   │   └── test_esi_save_ckpt_on_cpu.py
│   │   ├── dataset/
│   │   │   ├── test_create_rl_sampler_on_cpu.py
│   │   │   ├── test_multiturn_sft_dataset_on_cpu.py
│   │   │   ├── test_rl_collate_fn_on_cpu.py
│   │   │   └── test_rl_dataset_on_cpu.py
│   │   ├── debug/
│   │   │   └── test_metrics.py
│   │   ├── megatron/
│   │   │   └── test_pipeline_parallel.py
│   │   ├── reward_score/
│   │   │   ├── reward_score/
│   │   │   │   └── test_sandbox_fusion_on_cpu.py
│   │   │   └── test_sandbox_on_cpu.py
│   │   ├── test_activation_offload.py
│   │   ├── test_bucketed_weight_transfer.py
│   │   ├── test_check_ipc_version_support_on_npu.py
│   │   ├── test_check_profiler_output.py
│   │   ├── test_config_on_cpu.py
│   │   ├── test_flops_counter.py
│   │   ├── test_fs_on_cpu.py
│   │   ├── test_fsdp2_peft_wrapping.py
│   │   ├── test_fsdp_lora_merge.py
│   │   ├── test_groupwise.py
│   │   ├── test_import_utils_on_cpu.py
│   │   ├── test_linear_cross_entropy.py
│   │   ├── test_mlflow_key_sanitization.py
│   │   ├── test_model_on_cpu.py
│   │   ├── test_normalize_peft_param_name.py
│   │   ├── test_normalize_peft_param_name_on_cpu.py
│   │   ├── test_nvtx_profile.py
│   │   ├── test_padding_on_cpu.py
│   │   ├── test_prepare_micro_batches_with_group_size.py
│   │   ├── test_rollout_skip_on_cpu.py
│   │   ├── test_rollout_trace_on_cpu.py
│   │   ├── test_seqlen_balancing.py
│   │   ├── test_server_profiler.py
│   │   ├── test_shared_memory.py
│   │   ├── test_special_linear_cross_entropy_tp.py
│   │   ├── test_special_mstx_profile.py
│   │   ├── test_temp_env_on_cpu.py
│   │   ├── test_timeout_decorator_cpu.py
│   │   ├── test_tokenizer_normalize_on_cpu.py
│   │   ├── test_torch_functional.py
│   │   └── test_torch_profile.py
│   └── workers/
│       ├── actor/
│       │   └── test_special_dp_actor.py
│       ├── config/
│       │   ├── test_actor_config_on_cpu.py
│       │   ├── test_critic_config_on_cpu.py
│       │   ├── test_engine_config_on_cpu.py
│       │   ├── test_model_config_on_cpu.py
│       │   └── test_optim_config_on_cpu.py
│       ├── critic/
│       │   └── test_special_dp_critic.py
│       ├── reward_manager/
│       │   └── test_registry_on_cpu.py
│       ├── rollout/
│       │   ├── perf/
│       │   │   └── vllm_async_rollout.py
│       │   ├── resource/
│       │   │   └── tool_configs/
│       │   │       ├── mcp_server.json
│       │   │       ├── mcp_tool_config
│       │   │       ├── sandbox_fusion_tool_config
│       │   │       └── search_tool_config
│       │   ├── rollout_sglang/
│       │   │   └── test_http_server_engine.py
│       │   ├── rollout_trtllm/
│       │   │   ├── __init__.py
│       │   │   ├── test_adapter.py
│       │   │   ├── test_async_server.py
│       │   │   └── test_trtllm_rollout_utils.py
│       │   ├── rollout_vllm/
│       │   │   ├── run_fsdp_vllm.py
│       │   │   └── test_vllm_abort.py
│       │   ├── test_hf_rollout.py
│       │   ├── test_sglang_async_rollout_multimodal_delta.py
│       │   ├── test_sglang_rollout_sharding_manager.py
│       │   └── test_vllm_cli_args_on_cpu.py
│       ├── test_fsdp_attn_implementation.py
│       └── test_fsdp_workers.py
└── verl/
    ├── __init__.py
    ├── base_config.py
    ├── checkpoint_engine/
    │   ├── README.md
    │   ├── __init__.py
    │   ├── base.py
    │   ├── hccl_checkpoint_engine.py
    │   ├── kimi_checkpoint_engine.py
    │   ├── mooncake_checkpoint_engine.py
    │   ├── nccl_checkpoint_engine.py
    │   └── nixl_checkpoint_engine.py
    ├── experimental/
    │   ├── __init__.py
    │   ├── agent_loop/
    │   │   ├── __init__.py
    │   │   ├── agent_loop.py
    │   │   ├── prometheus_utils.py
    │   │   ├── single_turn_agent_loop.py
    │   │   ├── tool_agent_loop.py
    │   │   ├── tool_parser.py
    │   │   └── utils.py
    │   ├── dataset/
    │   │   ├── __init__.py
    │   │   └── sampler.py
    │   ├── dynamic_dataset/
    │   │   ├── __init__.py
    │   │   └── dynamicgen_dataset.py
    │   ├── fully_async_policy/
    │   │   ├── README.md
    │   │   ├── README_zh.md
    │   │   ├── agent_loop/
    │   │   │   ├── __init__.py
    │   │   │   └── agent_loop.py
    │   │   ├── config/
    │   │   │   ├── fully_async_ppo_megatron_trainer.yaml
    │   │   │   └── fully_async_ppo_trainer.yaml
    │   │   ├── detach_utils.py
    │   │   ├── fully_async_main.py
    │   │   ├── fully_async_rollouter.py
    │   │   ├── fully_async_trainer.py
    │   │   ├── message_queue.py
    │   │   ├── shell/
    │   │   │   ├── dapo_30b_a3b_base_math_fsdp.sh
    │   │   │   ├── dapo_7b_async_retool.sh
    │   │   │   ├── dapo_7b_math_fsdp2_16_16.sh
    │   │   │   ├── dapo_7b_math_fsdp2_32_32.sh
    │   │   │   ├── dapo_7b_math_fsdp2_4_12.sh
    │   │   │   ├── dapo_7b_math_fsdp2_4_4.sh
    │   │   │   ├── dapo_7b_math_fsdp2_64_64.sh
    │   │   │   ├── dapo_7b_math_fsdp2_64_64_mis.sh
    │   │   │   ├── dapo_7b_math_fsdp2_8_8.sh
    │   │   │   ├── geo3k_qwen25vl_7b_megatron_4_4.sh
    │   │   │   ├── grpo_30b_a3b_base_math_megatron_96_32.sh
    │   │   │   ├── grpo_30b_a3b_base_math_megatron_96_32_mis.sh
    │   │   │   └── runtime_env.yaml
    │   │   └── unittest/
    │   │       └── simple_streaming_demo.py
    │   ├── one_step_off_policy/
    │   │   ├── README.md
    │   │   ├── config/
    │   │   │   ├── one_step_off_ppo_megatron_trainer.yaml
    │   │   │   └── one_step_off_ppo_trainer.yaml
    │   │   ├── main_ppo.py
    │   │   ├── ray_trainer.py
    │   │   └── shell/
    │   │       ├── dapo_7b_math_fsdp2_4_12.sh
    │   │       ├── dapo_7b_math_fsdp2_64_64.sh
    │   │       ├── dapo_7b_math_fsdp2_64_64_ris.sh
    │   │       ├── dapo_7b_math_fsdp2_colocate.sh
    │   │       ├── dapo_7b_math_fsdp2_sglang_4_12.sh
    │   │       ├── dapo_7b_math_fsdp2_sglang_colocate.sh
    │   │       ├── dapo_7b_math_megatron_4_12.sh
    │   │       ├── dapo_7b_math_megatron_colocate.sh
    │   │       ├── grpo_0.6b_gsm8k_fsdp2_2_6.sh
    │   │       ├── grpo_0.6b_gsm8k_fsdp2_sglang_2_6.sh
    │   │       ├── grpo_3b_gsm8k_fsdp2_2_6.sh
    │   │       └── grpo_qwen3_8b_gsm8k_fsdp2_8_8_npu.sh
    │   ├── reward_loop/
    │   │   ├── __init__.py
    │   │   ├── reward_loop.py
    │   │   ├── reward_manager/
    │   │   │   ├── __init__.py
    │   │   │   ├── base.py
    │   │   │   ├── dapo.py
    │   │   │   ├── gdpo.py
    │   │   │   ├── limited.py
    │   │   │   ├── naive.py
    │   │   │   ├── registry.py
    │   │   │   └── remote.py
    │   │   ├── reward_model.py
    │   │   └── router/
    │   │       ├── inner_sglang_router.py
    │   │       └── naive_router.py
    │   ├── separation/
    │   │   ├── __init__.py
    │   │   ├── engine_workers.py
    │   │   ├── ray_trainer.py
    │   │   └── utils.py
    │   └── vla/
    │       ├── README.md
    │       ├── config/
    │       │   ├── rob_ppo_trainer.yaml
    │       │   └── rob_sac_trainer.yaml
    │       ├── dp_rob.py
    │       ├── env_loop.py
    │       ├── envs/
    │       │   ├── __init__.py
    │       │   ├── action_utils.py
    │       │   ├── isaac_env/
    │       │   │   ├── __init__.py
    │       │   │   └── isaac_env.py
    │       │   └── libero_env/
    │       │       ├── __init__.py
    │       │       ├── libero_env.py
    │       │       ├── utils.py
    │       │       └── venv.py
    │       ├── fsdp_workers.py
    │       ├── main_ppo.py
    │       ├── main_sac.py
    │       ├── models/
    │       │   ├── __init__.py
    │       │   ├── modules/
    │       │   │   └── mlp.py
    │       │   ├── openvla_oft/
    │       │   │   ├── __init__.py
    │       │   │   ├── configuration_prismatic.py
    │       │   │   ├── constants.py
    │       │   │   ├── modeling_prismatic.py
    │       │   │   ├── processing_prismatic.py
    │       │   │   └── train_utils.py
    │       │   ├── pi0_torch/
    │       │   │   ├── __init__.py
    │       │   │   ├── configuration_pi0_torch.py
    │       │   │   ├── model/
    │       │   │   │   ├── modeling_pi0.py
    │       │   │   │   └── paligemma_with_expert.py
    │       │   │   ├── modeling_pi0_torch.py
    │       │   │   ├── pi0_utils.py
    │       │   │   └── policy/
    │       │   │       ├── __init__.py
    │       │   │       ├── base.py
    │       │   │       └── libero_policy.py
    │       │   └── register_vla_models.py
    │       ├── naive_rollout_rob.py
    │       ├── prepare_libero_dataset.py
    │       ├── requirements_vla.txt
    │       ├── rob_ray_trainer.py
    │       ├── run_pi05_libero_sac.sh
    │       ├── run_pi05_libero_sac_disagg.sh
    │       ├── run_simpleVLA_isaac_disagg.sh
    │       ├── run_simpleVLA_libero_grpo.sh
    │       ├── sac/
    │       │   ├── base.py
    │       │   ├── naive_rollout_pi05.py
    │       │   ├── replay_pool.py
    │       │   ├── sac_actor.py
    │       │   └── sac_ray_trainer.py
    │       └── workers/
    │           └── env/
    │               ├── env_loop_wg_test.py
    │               ├── env_manager.py
    │               └── env_worker.py
    ├── interactions/
    │   ├── __init__.py
    │   ├── base.py
    │   ├── gsm8k_interaction.py
    │   ├── utils/
    │   │   ├── __init__.py
    │   │   └── interaction_registry.py
    │   └── weather_interaction.py
    ├── model_merger/
    │   ├── __init__.py
    │   ├── __main__.py
    │   ├── base_model_merger.py
    │   ├── fsdp_model_merger.py
    │   └── megatron_model_merger.py
    ├── models/
    │   ├── README.md
    │   ├── __init__.py
    │   ├── llama/
    │   │   ├── __init__.py
    │   │   └── megatron/
    │   │       ├── __init__.py
    │   │       ├── checkpoint_utils/
    │   │       │   ├── __init__.py
    │   │       │   ├── llama_loader.py
    │   │       │   ├── llama_loader_depracated.py
    │   │       │   └── llama_saver.py
    │   │       ├── layers/
    │   │       │   ├── __init__.py
    │   │       │   ├── parallel_attention.py
    │   │       │   ├── parallel_decoder.py
    │   │       │   ├── parallel_linear.py
    │   │       │   ├── parallel_mlp.py
    │   │       │   └── parallel_rmsnorm.py
    │   │       └── modeling_llama_megatron.py
    │   ├── mcore/
    │   │   ├── __init__.py
    │   │   ├── bridge.py
    │   │   ├── config_converter.py
    │   │   ├── loader.py
    │   │   ├── mbridge.py
    │   │   ├── model_forward.py
    │   │   ├── model_forward_1f1b_overlap.py
    │   │   ├── model_forward_fused.py
    │   │   ├── model_initializer.py
    │   │   ├── mtp_patch.py
    │   │   ├── patch.py
    │   │   ├── qwen2_5_vl/
    │   │   │   ├── __init__.py
    │   │   │   ├── attention.py
    │   │   │   ├── model.py
    │   │   │   ├── rope_utils.py
    │   │   │   ├── vision_config.py
    │   │   │   ├── vision_model.py
    │   │   │   └── vision_transformer_block.py
    │   │   ├── readme.md
    │   │   ├── registry.py
    │   │   ├── saver.py
    │   │   ├── util.py
    │   │   └── weight_converter.py
    │   ├── qwen2/
    │   │   ├── __init__.py
    │   │   └── megatron/
    │   │       ├── __init__.py
    │   │       ├── checkpoint_utils/
    │   │       │   ├── __init__.py
    │   │       │   ├── qwen2_loader.py
    │   │       │   ├── qwen2_loader_depracated.py
    │   │       │   └── qwen2_saver.py
    │   │       ├── layers/
    │   │       │   ├── __init__.py
    │   │       │   ├── parallel_attention.py
    │   │       │   ├── parallel_decoder.py
    │   │       │   ├── parallel_linear.py
    │   │       │   ├── parallel_mlp.py
    │   │       │   └── parallel_rmsnorm.py
    │   │       └── modeling_qwen2_megatron.py
    │   ├── registry.py
    │   ├── transformers/
    │   │   ├── __init__.py
    │   │   ├── apertus.py
    │   │   ├── dense_common.py
    │   │   ├── glm4v.py
    │   │   ├── kimi_vl.py
    │   │   ├── llama.py
    │   │   ├── monkey_patch.py
    │   │   ├── npu_patch.py
    │   │   ├── qwen2.py
    │   │   ├── qwen2_vl.py
    │   │   ├── qwen3_vl.py
    │   │   └── tiled_mlp.py
    │   └── weight_loader_registry.py
    ├── protocol.py
    ├── py.typed
    ├── single_controller/
    │   ├── __init__.py
    │   ├── base/
    │   │   ├── __init__.py
    │   │   ├── decorator.py
    │   │   ├── worker.py
    │   │   └── worker_group.py
    │   └── ray/
    │       ├── __init__.py
    │       └── base.py
    ├── third_party/
    │   ├── __init__.py
    │   ├── torch/
    │   │   ├── __init__.py
    │   │   └── distributed/
    │   │       ├── __init__.py
    │   │       ├── _state_dict_utils.py
    │   │       └── checkpoint/
    │   │           ├── __init__.py
    │   │           └── state_dict.py
    │   └── vllm/
    │       └── __init__.py
    ├── tools/
    │   ├── __init__.py
    │   ├── base_tool.py
    │   ├── geo3k_tool.py
    │   ├── gsm8k_tool.py
    │   ├── image_zoom_in_tool.py
    │   ├── mcp_base_tool.py
    │   ├── mcp_search_tool.py
    │   ├── sandbox_fusion_tools.py
    │   ├── schemas.py
    │   ├── search_tool.py
    │   └── utils/
    │       ├── __init__.py
    │       ├── mcp_clients/
    │       │   ├── McpClientManager.py
    │       │   └── utils.py
    │       ├── search_r1_like_utils.py
    │       └── tool_registry.py
    ├── trainer/
    │   ├── README.md
    │   ├── __init__.py
    │   ├── config/
    │   │   ├── __init__.py
    │   │   ├── _generated_ppo_megatron_trainer.yaml
    │   │   ├── _generated_ppo_torchtitan_trainer.yaml
    │   │   ├── _generated_ppo_trainer.yaml
    │   │   ├── _generated_ppo_veomni_trainer.yaml
    │   │   ├── actor/
    │   │   │   ├── actor.yaml
    │   │   │   ├── dp_actor.yaml
    │   │   │   ├── megatron_actor.yaml
    │   │   │   ├── torchtitan_actor.yaml
    │   │   │   └── veomni_actor.yaml
    │   │   ├── algorithm/
    │   │   │   └── rollout_correction.yaml
    │   │   ├── algorithm.py
    │   │   ├── config.py
    │   │   ├── critic/
    │   │   │   ├── critic.yaml
    │   │   │   ├── dp_critic.yaml
    │   │   │   ├── megatron_critic.yaml
    │   │   │   ├── torchtitan_critic.yaml
    │   │   │   └── veomni_critic.yaml
    │   │   ├── data/
    │   │   │   └── legacy_data.yaml
    │   │   ├── engine/
    │   │   │   ├── automodel.yaml
    │   │   │   ├── fsdp.yaml
    │   │   │   ├── megatron.yaml
    │   │   │   ├── torchtitan.yaml
    │   │   │   └── veomni.yaml
    │   │   ├── evaluation.yaml
    │   │   ├── legacy_reward_impl.yaml
    │   │   ├── model/
    │   │   │   └── hf_model.yaml
    │   │   ├── model_engine/
    │   │   │   ├── dp.yaml
    │   │   │   ├── torchtitan.yaml
    │   │   │   └── veomni.yaml
    │   │   ├── npu_profile/
    │   │   │   └── npu_profile.yaml
    │   │   ├── optim/
    │   │   │   ├── automodel.yaml
    │   │   │   ├── fsdp.yaml
    │   │   │   ├── megatron.yaml
    │   │   │   ├── torchtitan.yaml
    │   │   │   └── veomni.yaml
    │   │   ├── ppo_megatron_trainer.yaml
    │   │   ├── ppo_trainer.yaml
    │   │   ├── profiler/
    │   │   │   └── profiler.yaml
    │   │   ├── ref/
    │   │   │   ├── dp_ref.yaml
    │   │   │   ├── megatron_ref.yaml
    │   │   │   ├── ref.yaml
    │   │   │   ├── torchtitan_ref.yaml
    │   │   │   └── veomni_ref.yaml
    │   │   ├── reward/
    │   │   │   └── reward.yaml
    │   │   ├── rollout/
    │   │   │   └── rollout.yaml
    │   │   └── sft_trainer_engine.yaml
    │   ├── constants_ppo.py
    │   ├── main_eval.py
    │   ├── main_generation_server.py
    │   ├── main_ppo.py
    │   ├── ppo/
    │   │   ├── __init__.py
    │   │   ├── core_algos.py
    │   │   ├── metric_utils.py
    │   │   ├── prefix_grouper_utils.py
    │   │   ├── ray_trainer.py
    │   │   ├── reward.py
    │   │   ├── rollout_corr_helper.py
    │   │   └── utils.py
    │   ├── runtime_env.yaml
    │   ├── sft_trainer.py
    │   └── sft_trainer_ray.py
    ├── utils/
    │   ├── __init__.py
    │   ├── activation_offload.py
    │   ├── attention_utils.py
    │   ├── chat_template.py
    │   ├── checkpoint/
    │   │   ├── __init__.py
    │   │   ├── checkpoint_handler.py
    │   │   ├── checkpoint_manager.py
    │   │   ├── fsdp_checkpoint_manager.py
    │   │   └── megatron_checkpoint_manager.py
    │   ├── config.py
    │   ├── dataset/
    │   │   ├── README.md
    │   │   ├── __init__.py
    │   │   ├── dataset_utils.py
    │   │   ├── multiturn_sft_dataset.py
    │   │   ├── rl_dataset.py
    │   │   ├── rm_dataset.py
    │   │   └── vision_utils.py
    │   ├── debug/
    │   │   ├── __init__.py
    │   │   ├── metrics.py
    │   │   ├── performance.py
    │   │   └── trajectory_tracker.py
    │   ├── device.py
    │   ├── distributed.py
    │   ├── experimental/
    │   │   ├── __init__.py
    │   │   └── torch_functional.py
    │   ├── flops_counter.py
    │   ├── fp8_utils.py
    │   ├── fs.py
    │   ├── fsdp_utils.py
    │   ├── groupwise.py
    │   ├── hdfs_io.py
    │   ├── import_utils.py
    │   ├── kernel/
    │   │   ├── __init__.py
    │   │   ├── fp8_kernel.py
    │   │   ├── kernels.py
    │   │   └── linear_cross_entropy.py
    │   ├── logger/
    │   │   ├── __init__.py
    │   │   └── aggregate_logger.py
    │   ├── logging_utils.py
    │   ├── megatron/
    │   │   ├── __init__.py
    │   │   ├── dist_checkpointing.py
    │   │   ├── memory.py
    │   │   ├── optimizer.py
    │   │   ├── pipeline_parallel.py
    │   │   ├── router_replay_patch.py
    │   │   ├── router_replay_utils.py
    │   │   ├── sequence_parallel.py
    │   │   └── tensor_parallel.py
    │   ├── megatron_peft_utils.py
    │   ├── megatron_utils.py
    │   ├── memory_utils.py
    │   ├── metric/
    │   │   ├── __init__.py
    │   │   └── utils.py
    │   ├── model.py
    │   ├── net_utils.py
    │   ├── npu_flash_attn_utils.py
    │   ├── profiler/
    │   │   ├── __init__.py
    │   │   ├── config.py
    │   │   ├── empty_annotations.py
    │   │   ├── mstx_profile.py
    │   │   ├── nvtx_profile.py
    │   │   ├── performance.py
    │   │   ├── profile.py
    │   │   └── torch_profile.py
    │   ├── py_functional.py
    │   ├── qat/
    │   │   ├── __init__.py
    │   │   ├── core.py
    │   │   ├── linear.py
    │   │   ├── quantizer.py
    │   │   └── vllm_patch.py
    │   ├── ray_utils.py
    │   ├── rendezvous/
    │   │   ├── __init__.py
    │   │   └── ray_backend.py
    │   ├── reward_score/
    │   │   ├── __init__.py
    │   │   ├── geo3k.py
    │   │   ├── gsm8k.py
    │   │   ├── math_batch.py
    │   │   ├── math_dapo.py
    │   │   ├── math_reward.py
    │   │   ├── math_verify.py
    │   │   ├── prime_code/
    │   │   │   ├── README.md
    │   │   │   ├── __init__.py
    │   │   │   ├── testing_util.py
    │   │   │   └── utils.py
    │   │   ├── prime_math/
    │   │   │   ├── __init__.py
    │   │   │   ├── grader.py
    │   │   │   └── math_normalize.py
    │   │   ├── rlla.py
    │   │   ├── sandbox_fusion/
    │   │   │   ├── __init__.py
    │   │   │   └── utils.py
    │   │   └── search_r1_like_qa_em.py
    │   ├── rollout_skip.py
    │   ├── rollout_trace.py
    │   ├── seqlen_balancing.py
    │   ├── sglang/
    │   │   └── sglang_fp8_utils.py
    │   ├── tensordict_utils.py
    │   ├── tokenizer.py
    │   ├── torch_dtypes.py
    │   ├── torch_functional.py
    │   ├── tracking.py
    │   ├── transformers_compat.py
    │   ├── trtllm/
    │   │   └── trtllm_fp8_utils.py
    │   ├── ulysses.py
    │   └── vllm/
    │       ├── __init__.py
    │       ├── npu_vllm_patch.py
    │       ├── patch.py
    │       ├── utils.py
    │       └── vllm_fp8_utils.py
    ├── version/
    │   └── version
    └── workers/
        ├── __init__.py
        ├── actor/
        │   ├── __init__.py
        │   ├── base.py
        │   ├── dp_actor.py
        │   └── megatron_actor.py
        ├── config/
        │   ├── __init__.py
        │   ├── actor.py
        │   ├── critic.py
        │   ├── engine.py
        │   ├── megatron_peft.py
        │   ├── model.py
        │   ├── optimizer.py
        │   ├── reward.py
        │   └── rollout.py
        ├── critic/
        │   ├── __init__.py
        │   ├── base.py
        │   ├── dp_critic.py
        │   └── megatron_critic.py
        ├── engine/
        │   ├── __init__.py
        │   ├── automodel/
        │   │   ├── __init__.py
        │   │   ├── transformer_impl.py
        │   │   └── utils.py
        │   ├── base.py
        │   ├── fsdp/
        │   │   ├── __init__.py
        │   │   ├── transformer_impl.py
        │   │   └── utils.py
        │   ├── megatron/
        │   │   ├── __init__.py
        │   │   ├── transformer_impl.py
        │   │   └── utils.py
        │   ├── mindspeed/
        │   │   ├── __init__.py
        │   │   └── transformer_impl.py
        │   ├── torchtitan/
        │   │   ├── __init__.py
        │   │   ├── transformer_impl.py
        │   │   └── utils.py
        │   ├── utils.py
        │   └── veomni/
        │       ├── __init__.py
        │       ├── transformer_impl.py
        │       └── utils.py
        ├── engine_workers.py
        ├── fsdp_workers.py
        ├── megatron_workers.py
        ├── reward_manager/
        │   ├── __init__.py
        │   ├── abstract.py
        │   ├── batch.py
        │   ├── dapo.py
        │   ├── naive.py
        │   ├── prime.py
        │   └── registry.py
        ├── rollout/
        │   ├── __init__.py
        │   ├── base.py
        │   ├── hf_rollout.py
        │   ├── naive/
        │   │   ├── __init__.py
        │   │   └── naive_rollout.py
        │   ├── replica.py
        │   ├── schemas.py
        │   ├── sglang_rollout/
        │   │   ├── __init__.py
        │   │   ├── async_sglang_server.py
        │   │   ├── http_server_engine.py
        │   │   ├── sglang_rollout.py
        │   │   └── utils.py
        │   ├── tokenizer.py
        │   ├── trtllm_rollout/
        │   │   ├── trtllm_async_rollout.md
        │   │   ├── trtllm_async_server.py
        │   │   ├── trtllm_rollout.py
        │   │   └── trtllm_worker_extension.py
        │   ├── utils.py
        │   └── vllm_rollout/
        │       ├── __init__.py
        │       ├── bucketed_weight_transfer.py
        │       ├── utils.py
        │       ├── vllm_async_server.py
        │       └── vllm_rollout.py
        ├── sharding_manager/
        │   ├── __init__.py
        │   ├── base.py
        │   └── fsdp_ulysses.py
        └── utils/
            ├── __init__.py
            ├── losses.py
            └── padding.py

Download .txt

Showing preview only (445K chars total). Download the full file or copy to clipboard to get everything.

SYMBOL INDEX (4887 symbols across 489 files)

FILE: docs/_static/js/resizable-sidebar.js
  function setupNavigationFix (line 136) | function setupNavigationFix() {

FILE: examples/data_preprocess/aime2024_multiturn_w_tool.py
  function make_map_fn (line 49) | def make_map_fn(split):

FILE: examples/data_preprocess/dapo_multiturn_w_tool.py
  function make_map_fn (line 49) | def make_map_fn(split):

FILE: examples/data_preprocess/full_hh_rlhf.py
  function generate_sft_dataset (line 30) | def generate_sft_dataset(target_hdfs_path_dir, local_dir="~/data/full_hh...
  function generate_rm_dataset (line 61) | def generate_rm_dataset(target_hdfs_path_dir, local_dir="~/data/full_hh_...
  function generate_rl_dataset (line 93) | def generate_rl_dataset(target_hdfs_path_dir, local_dir="~/data/full_hh_...

FILE: examples/data_preprocess/geo3k.py
  function make_map_fn (line 58) | def make_map_fn(split):

FILE: examples/data_preprocess/geo3k_multiturn_w_tool.py
  function make_map_fn (line 60) | def make_map_fn(split):

FILE: examples/data_preprocess/gsm8k.py
  function extract_solution (line 27) | def extract_solution(solution_str):
  function make_map_fn (line 60) | def make_map_fn(split):

FILE: examples/data_preprocess/gsm8k_multiturn_sft.py
  function extract_solution (line 27) | def extract_solution(solution_str):
  function make_map_fn (line 60) | def make_map_fn(split):

FILE: examples/data_preprocess/gsm8k_multiturn_w_interaction.py
  function extract_solution (line 29) | def extract_solution(solution_str):
  function make_map_fn (line 62) | def make_map_fn(split):

FILE: examples/data_preprocess/gsm8k_multiturn_w_tool.py
  function extract_solution (line 29) | def extract_solution(solution_str):
  function make_map_fn (line 62) | def make_map_fn(split):

FILE: examples/data_preprocess/gsm8k_tool_agent_loop.py
  function extract_solution (line 29) | def extract_solution(solution_str):
  function make_map_fn (line 62) | def make_map_fn(split):

FILE: examples/data_preprocess/hellaswag.py
  function preprocess (line 28) | def preprocess(text):
  function make_map_fn (line 62) | def make_map_fn(split):

FILE: examples/data_preprocess/math_dataset.py
  function extract_solution (line 28) | def extract_solution(solution_str):
  function make_map_fn (line 63) | def make_map_fn(split):

FILE: examples/data_preprocess/multiturn.py
  function main (line 24) | def main():

FILE: examples/data_preprocess/pokemon.py
  function map_fn (line 38) | def map_fn(row: dict):

FILE: examples/data_preprocess/preprocess_search_r1_dataset.py
  function process_single_row (line 45) | def process_single_row(row, current_split_name, row_index):
  function main (line 101) | def main():

FILE: examples/fapo_trainer/prepare_data.py
  function example_map_fn (line 27) | def example_map_fn(example, idx, process_fn, data_source, ability, split):
  function build_aime2024_dataset (line 39) | def build_aime2024_dataset():
  function build_aime2025_dataset (line 53) | def build_aime2025_dataset():
  function build_gpqa_diamond_dataset (line 67) | def build_gpqa_diamond_dataset():
  function build_dapo_train_dataset (line 107) | def build_dapo_train_dataset():

FILE: examples/fapo_trainer/reward_fn.py
  function verify (line 29) | def verify(
  function compute_score_baseline (line 45) | async def compute_score_baseline(
  function post_request (line 95) | async def post_request(router_address: str, payload: dict, endpoint: str...
  function compute_score_fapo (line 134) | async def compute_score_fapo(

FILE: examples/sglang_multiturn/gsm8k_toolcall_shaping/gsm8k_toolcall_shaping.py
  function toolcall_shaping_reward (line 23) | def toolcall_shaping_reward(
  function compute_score (line 46) | def compute_score(

FILE: examples/sglang_multiturn/search_r1_like/local_dense_retriever/retrieval_server.py
  function load_corpus (line 34) | def load_corpus(corpus_path: str):
  function load_docs (line 39) | def load_docs(corpus, doc_idxs):
  function load_model (line 44) | def load_model(model_path: str, use_fp16: bool = False):
  function pooling (line 54) | def pooling(pooler_output, last_hidden_state, attention_mask=None, pooli...
  class Encoder (line 66) | class Encoder:
    method __init__ (line 67) | def __init__(self, model_name, model_path, pooling_method, max_length,...
    method encode (line 78) | def encode(self, query_list: list[str], is_query=True) -> np.ndarray:
  class BaseRetriever (line 124) | class BaseRetriever:
    method __init__ (line 125) | def __init__(self, config):
    method _search (line 133) | def _search(self, query: str, num: int, return_score: bool):
    method _batch_search (line 136) | def _batch_search(self, query_list: list[str], num: int, return_score:...
    method search (line 139) | def search(self, query: str, num: int = None, return_score: bool = Fal...
    method batch_search (line 142) | def batch_search(self, query_list: list[str], num: int = None, return_...
  class BM25Retriever (line 146) | class BM25Retriever(BaseRetriever):
    method __init__ (line 147) | def __init__(self, config):
    method _check_contain_doc (line 157) | def _check_contain_doc(self):
    method _search (line 160) | def _search(self, query: str, num: int = None, return_score: bool = Fa...
    method _batch_search (line 193) | def _batch_search(self, query_list: list[str], num: int = None, return...
  class DenseRetriever (line 206) | class DenseRetriever(BaseRetriever):
    method __init__ (line 207) | def __init__(self, config):
    method _search (line 227) | def _search(self, query: str, num: int = None, return_score: bool = Fa...
    method _batch_search (line 240) | def _batch_search(self, query_list: list[str], num: int = None, return...
  function get_retriever (line 273) | def get_retriever(config):
  class Config (line 285) | class Config:
    method __init__ (line 291) | def __init__(
  class QueryRequest (line 320) | class QueryRequest(BaseModel):
  function retrieve_endpoint (line 330) | def retrieve_endpoint(request: QueryRequest):

FILE: examples/split_placement/main_ppo_split.py
  function _select_rm_score_fn (line 30) | def _select_rm_score_fn(data_source):
  class RewardManager (line 39) | class RewardManager:
    method __init__ (line 40) | def __init__(self, tokenizer, num_examine) -> None:
    method __call__ (line 44) | def __call__(self, data: DataProto, return_dict: bool = False):
  function main (line 96) | def main(config):
  function main_task (line 111) | def main_task(config):

FILE: examples/split_placement/split_monkey_patch.py
  function fit (line 38) | def fit(self):

FILE: examples/tutorial/agent_loop_get_started/sandbox.py
  class SandboxTool (line 22) | class SandboxTool(BaseTool):
    method __init__ (line 23) | def __init__(self, config: dict, tool_schema: OpenAIFunctionToolSchema):
    method code_interpreter (line 28) | async def code_interpreter(self, code: str) -> str:
    method get_openai_tool_schema (line 47) | def get_openai_tool_schema(self) -> OpenAIFunctionToolSchema:
    method execute (line 51) | async def execute(self, instance_id: str, parameters: dict, **kwargs) ...

FILE: scripts/converter_hf_to_mcore.py
  function _init_args (line 51) | def _init_args():
  function test_conversion (line 73) | def test_conversion(megatron_model_provider, tfconfig, output_path, model):
  function convert_checkpoint_from_transformers_to_megatron (line 122) | def convert_checkpoint_from_transformers_to_megatron(
  function safe_copy (line 209) | def safe_copy(
  function convert_checkpoint_from_transformers_to_megatron_qwen2_5_vl (line 223) | def convert_checkpoint_from_transformers_to_megatron_qwen2_5_vl(hfmodel,...
  function convert_checkpoint_from_transformers_to_megatron_dpskv3 (line 330) | def convert_checkpoint_from_transformers_to_megatron_dpskv3(
  function noop_context (line 434) | def noop_context() -> Any:
  function support_distributed_convert (line 438) | def support_distributed_convert(hf_config: AutoConfig) -> bool:
  function convert_hf_to_mcore (line 445) | def convert_hf_to_mcore(

FILE: scripts/diagnose.py
  function test_connection (line 50) | def test_connection(name, url, timeout=10):
  function check_python (line 70) | def check_python():
  function check_pip (line 78) | def check_pip():
  function _get_current_git_commit (line 89) | def _get_current_git_commit():
  function check_verl (line 101) | def check_verl():
  function check_os (line 126) | def check_os():
  function check_hardware (line 135) | def check_hardware():
  function check_network (line 151) | def check_network(args):
  function check_environment (line 170) | def check_environment():
  function check_pip_package_versions (line 177) | def check_pip_package_versions():
  function check_cuda_versions (line 187) | def check_cuda_versions():
  function _get_cpu_memory (line 208) | def _get_cpu_memory():
  function _get_gpu_info (line 216) | def _get_gpu_info():
  function _get_system_info (line 244) | def _get_system_info():
  function check_system_info (line 253) | def check_system_info():
  function parse_args (line 263) | def parse_args():

FILE: scripts/init_random_model.py
  function _init_args (line 37) | def _init_args():
  function check_output_path (line 51) | def check_output_path(output_path: str):
  function check_configs (line 60) | def check_configs(original_config: dict[str, Any], new_config: dict[str,...
  function init_random_model (line 77) | def init_random_model(hf_model_path, new_config_path, output_path, trust...

FILE: scripts/legacy_model_merger.py
  class ModelMergerConfig (line 77) | class ModelMergerConfig:
    method __post_init__ (line 91) | def __post_init__(self):
  class BaseModelMerger (line 99) | class BaseModelMerger(ABC):
    method __init__ (line 100) | def __init__(self, config: ModelMergerConfig):
    method get_transformers_auto_model_class (line 117) | def get_transformers_auto_model_class(self):
    method patch_model_generation_config (line 141) | def patch_model_generation_config(self, model):
    method save_lora_adapter (line 157) | def save_lora_adapter(self, state_dict: dict[str, torch.Tensor]):
    method save_hf_model_and_tokenizer (line 214) | def save_hf_model_and_tokenizer(self, state_dict: dict[str, torch.Tens...
    method upload_to_huggingface (line 243) | def upload_to_huggingface(self):
    method merge_and_save (line 251) | def merge_and_save(self):
  class FSDPModelMerger (line 255) | class FSDPModelMerger(BaseModelMerger):
    method _get_world_size (line 256) | def _get_world_size(self) -> int:
    method _load_rank_zero_state_dict (line 266) | def _load_rank_zero_state_dict(self, world_size: int) -> dict:
    method _extract_device_mesh_info (line 273) | def _extract_device_mesh_info(self, state_dict: dict, world_size: int)...
    method _calculate_shard_configuration (line 293) | def _calculate_shard_configuration(
    method _merge_by_placement (line 309) | def _merge_by_placement(self, tensors: list[torch.Tensor], placement: ...
    method _load_and_merge_state_dicts (line 320) | def _load_and_merge_state_dicts(
    method merge_and_save (line 383) | def merge_and_save(self):
    method _test_state_dict (line 406) | def _test_state_dict(self, state_dict: dict[str, torch.Tensor]):
  class MegatronModelMerger (line 440) | class MegatronModelMerger(BaseModelMerger):
    method __init__ (line 441) | def __init__(self, config: ModelMergerConfig):
    method _get_tp_pp_rank_from_sharded_dir (line 484) | def _get_tp_pp_rank_from_sharded_dir(self, sharded_dir: str) -> tuple[...
    method _check_megatron_checkpoint_path (line 498) | def _check_megatron_checkpoint_path(self, model_path: str) -> tuple[li...
    method _merge_across_tp (line 513) | def _merge_across_tp(
    method _load_state_dicts (line 569) | def _load_state_dicts(
    method _check_megatron_state_key (line 587) | def _check_megatron_state_key(self, key: str) -> bool:
    method _merge_state_dicts (line 611) | def _merge_state_dicts(
    method merge_and_save (line 663) | def merge_and_save(self):
    method _test_state_dict (line 685) | def _test_state_dict(self, state_dict: dict[str, torch.Tensor]):
    method _replace_name (line 706) | def _replace_name(self, megatron_name: str, name_mapping: dict[str, st...
  function main (line 718) | def main():

FILE: scripts/megatron_merge_lora.py
  class CustomSaveWorker (line 33) | class CustomSaveWorker(ActorRolloutRefWorker):
    method save_merged_weights (line 35) | def save_merged_weights(self, hf_ckpt_path):
  function main (line 56) | def main(config):
  function run_merge (line 69) | def run_merge(config) -> None:
  function main_task (line 84) | def main_task(config):

FILE: scripts/print_cfg.py
  function main (line 21) | def main(config):

FILE: scripts/rollout_viewer.py
  function check_textual_version (line 42) | def check_textual_version():
  function load_path (line 54) | async def load_path(p: Path, data: dict, mask_strs: str, idx: int, pbar):
  function load_dir (line 74) | async def load_dir(path: Path, data: dict[int, dict], pbar, mask_strs: s...
  class Highlighter (line 83) | class Highlighter(ReprHighlighter):
  function center_word_with_equals_exactly (line 90) | def center_word_with_equals_exactly(word: str, total_length: int, char: ...
  function highlight_keyword (line 100) | def highlight_keyword(content: str, keyword: Optional[str]):
  class JsonLineViewer (line 129) | class JsonLineViewer(App):
    method __init__ (line 175) | def __init__(self, step_num: int, data: dict[int, dict], pbar):
    method compose (line 200) | def compose(self) -> ComposeResult:
    method on_mount (line 250) | async def on_mount(self) -> None:
    method update_result_options (line 268) | def update_result_options(self, offset: int = 0, sort_desc: Optional[b...
    method update_content (line 292) | async def update_content(self, search_keyword: Optional[str] = None):
    method on_reqid_submitted (line 332) | async def on_reqid_submitted(self, event: Input.Submitted) -> None:
    method _update_fields_select (line 373) | def _update_fields_select(self, keys):
    method step_changed (line 395) | async def step_changed(self, event):
    method sample_changed (line 401) | async def sample_changed(self, event):
    method sort_changed (line 407) | async def sort_changed(self, event):
    method fields_changed (line 413) | async def fields_changed(self, event):
    method fields_all_changed (line 417) | async def fields_all_changed(self, event):
    method action_focus_previous (line 424) | def action_focus_previous(self):
    method action_focus_next (line 427) | def action_focus_next(self):
    method action_next_step (line 430) | async def action_next_step(self) -> None:
    method action_next_sample (line 438) | async def action_next_sample(self) -> None:
    method action_previous_step (line 446) | async def action_previous_step(self) -> None:
    method action_previous_sample (line 454) | async def action_previous_sample(self) -> None:
    method action_swith_render (line 462) | async def action_swith_render(self):
    method action_toggle_search (line 466) | def action_toggle_search(self) -> None:
    method action_cancel_search (line 469) | async def action_cancel_search(self) -> None:
    method _clear_search (line 474) | async def _clear_search(self):
    method on_search_submitted (line 480) | async def on_search_submitted(self, event: Input.Submitted) -> None:
    method action_next_search (line 507) | async def action_next_search(self) -> None:
    method action_page_up (line 521) | def action_page_up(self):
    method action_page_down (line 524) | def action_page_down(self):
    method action_page_home (line 527) | def action_page_home(self):
    method action_page_end (line 530) | def action_page_end(self):
  function _run (line 534) | async def _run(path: Path, mask_str: str):
  function run (line 556) | def run(

FILE: scripts/veomni/moe_merge.py
  class StateDictIterator (line 46) | class StateDictIterator:
    method __iter__ (line 49) | def __iter__(self) -> Generator[tuple[str, "torch.Tensor"], None, None]:
  function main (line 61) | def main(raw_hf_path, merge_hf_path):

FILE: scripts/veomni/moe_split.py
  class StateDictIterator (line 43) | class StateDictIterator:
    method __iter__ (line 46) | def __iter__(self) -> Generator[tuple[str, "torch.Tensor"], None, None]:
  function main (line 58) | def main(merge_hf_path, split_hf_path):

FILE: tests/checkpoint_engine/test_correctness_on_gpu.py
  function test_nccl_checkpoint_engine (line 34) | async def test_nccl_checkpoint_engine(
  function test_nixl_checkpoint_engine (line 83) | async def test_nixl_checkpoint_engine(
  function test_kimi_checkpoint_engine (line 139) | async def test_kimi_checkpoint_engine(

FILE: tests/checkpoint_engine/test_correctness_on_npu.py
  function test_hccl_checkpoint_engine (line 34) | async def test_hccl_checkpoint_engine(
  function test_kimi_checkpoint_engine (line 83) | async def test_kimi_checkpoint_engine(
  function test_mooncake_checkpoint_engine (line 130) | async def test_mooncake_checkpoint_engine(

FILE: tests/checkpoint_engine/test_special_server_adapter.py
  function init_config (line 34) | def init_config() -> DictConfig:
  function _run_update_weights_with_global_steps_none (line 57) | async def _run_update_weights_with_global_steps_none(
  function _run_server_manager_without_resume (line 83) | async def _run_server_manager_without_resume(
  function _run_server_manager_with_resume (line 124) | async def _run_server_manager_with_resume(
  function test_server_adapter (line 175) | async def test_server_adapter(init_config):

FILE: tests/checkpoint_engine/test_utils.py
  class TrainingWorkerTest (line 31) | class TrainingWorkerTest(TrainingWorker):
    method __init__ (line 32) | def __init__(self, config: TrainingWorkerConfig, checkpoint_engine_con...
    method update_weights (line 43) | async def update_weights(self, global_steps: int = None):
    method execute_checkpoint_engine (line 48) | def execute_checkpoint_engine(self, method: str, *args, **kwargs):
  class MockServerAdapter (line 52) | class MockServerAdapter(BaseRollout):
    method __init__ (line 53) | def __init__(self, config: RolloutConfig, model_config: HFModelConfig,...
    method resume (line 59) | async def resume(self, tags: list[str]):
    method release (line 62) | async def release(self):
    method update_weights (line 65) | async def update_weights(
    method check_weights (line 75) | def check_weights(self):
  class MockReplica (line 90) | class MockReplica(RolloutReplica):
    method init_hybrid (line 91) | async def init_hybrid(self, worker_group: RayWorkerGroup):
    method get_ray_class_with_init_args (line 101) | def get_ray_class_with_init_args(self) -> RayClassWithInitArgs:
    method launch_servers (line 105) | async def launch_servers(self):
  class CheckpointEngineWorkerTest (line 110) | class CheckpointEngineWorkerTest(CheckpointEngineWorker):
    method __init__ (line 111) | def __init__(
    method check_weights (line 118) | def check_weights(self):
  function create_trainer_worker_group (line 122) | def create_trainer_worker_group(
  function create_rollout_worker_group (line 150) | async def create_rollout_worker_group(

FILE: tests/experimental/agent_loop/agent_utils.py
  function init_agent_loop_manager (line 28) | def init_agent_loop_manager(config: DictConfig) -> AgentLoopManager | Ra...

FILE: tests/experimental/agent_loop/test_agent_loop_extra_fields_schema_on_cpu.py
  class _FakeServerManager (line 37) | class _FakeServerManager:
    method generate (line 38) | async def generate(
    method generate_for_partial (line 51) | async def generate_for_partial(
  class _FakeTokenizer (line 67) | class _FakeTokenizer:
    method apply_chat_template (line 70) | def apply_chat_template(
    method pad (line 83) | def pad(
    method decode (line 113) | def decode(self, ids: list[int] | torch.Tensor, skip_special_tokens: b...
  function _pad_1d (line 118) | def _pad_1d(ids: list[int], *, length: int, pad_id: int = 0) -> list[int]:
  function _to_internal (line 124) | def _to_internal(
  function test_agent_loop_extra_fields_schema_stable_for_training_concat_on_cpu (line 170) | async def test_agent_loop_extra_fields_schema_stable_for_training_concat...
  function test_agent_loop_postprocess_accepts_read_only_routed_experts_on_cpu (line 252) | async def test_agent_loop_postprocess_accepts_read_only_routed_experts_o...

FILE: tests/experimental/agent_loop/test_basic_agent_loop.py
  function init_config (line 36) | def init_config() -> DictConfig:
  function test_single_turn (line 68) | def test_single_turn(init_config):
  class WeatherTool (line 130) | class WeatherTool(BaseTool):
    method get_current_temperature (line 131) | def get_current_temperature(self, location: str, unit: str = "celsius"):
    method get_openai_tool_schema (line 148) | def get_openai_tool_schema(self) -> OpenAIFunctionToolSchema:
    method execute (line 152) | async def execute(self, instance_id: str, parameters: dict[str, Any], ...
  class WeatherToolWithData (line 160) | class WeatherToolWithData(BaseTool):
    method get_openai_tool_schema (line 161) | def get_openai_tool_schema(self) -> OpenAIFunctionToolSchema:
    method get_temperature_date (line 165) | def get_temperature_date(self, location: str, date: str, unit: str = "...
    method execute (line 184) | async def execute(self, instance_id: str, parameters: dict[str, Any], ...
  function test_tool_agent (line 192) | def test_tool_agent(init_config):
  function test_tool_agent_with_interaction (line 306) | def test_tool_agent_with_interaction(init_config):
  function test_get_trajectory_info (line 441) | async def test_get_trajectory_info():
  function ray_for_lb (line 464) | def ray_for_lb():
  class TestLoadBalancerRouting (line 470) | class TestLoadBalancerRouting:
    method test_distributes_across_servers (line 473) | def test_distributes_across_servers(self, ray_for_lb):
    method test_new_requests_route_to_least_loaded (line 478) | def test_new_requests_route_to_least_loaded(self, ray_for_lb):
    method test_release_rebalances (line 490) | def test_release_rebalances(self, ray_for_lb):
    method test_release_invalid_server_raises (line 501) | def test_release_invalid_server_raises(self, ray_for_lb):
    method test_release_without_inflight_raises (line 507) | def test_release_without_inflight_raises(self, ray_for_lb):
  class TestLoadBalancerStickySession (line 514) | class TestLoadBalancerStickySession:
    method test_same_request_id_same_server (line 517) | def test_same_request_id_same_server(self, ray_for_lb):

FILE: tests/experimental/agent_loop/test_gpt_oss_tool_parser.py
  function test_gpt_oss_tool_parser (line 22) | async def test_gpt_oss_tool_parser():

FILE: tests/experimental/agent_loop/test_multi_modal.py
  function parse_multi_modal_type (line 32) | def parse_multi_modal_type(messages: list[dict]) -> str:
  function init_config (line 47) | def init_config() -> DictConfig:
  class ImageGeneratorTool (line 75) | class ImageGeneratorTool(BaseTool):
    method generate_image (line 76) | def generate_image(self, description: str, size: str = "256x256"):
    method get_openai_tool_schema (line 113) | def get_openai_tool_schema(self) -> OpenAIFunctionToolSchema:
    method execute (line 117) | async def execute(self, instance_id: str, parameters: dict[str, Any], ...
  function test_multimodal_tool_agent (line 127) | def test_multimodal_tool_agent(init_config):
  function test_multimodal_single_turn_agent (line 297) | def test_multimodal_single_turn_agent(init_config):

FILE: tests/experimental/agent_loop/test_standalone_rollout.py
  function init_config (line 29) | def init_config() -> DictConfig:
  function test_standalone_rollout (line 48) | async def test_standalone_rollout(init_config, tp_size):
  function test_hybrid_rollout_with_ep (line 104) | def test_hybrid_rollout_with_ep(init_config):

FILE: tests/experimental/reward_loop/reward_fn.py
  function chat_complete (line 41) | async def chat_complete(router_address: str, chat_complete_request: dict):
  function compute_score_gsm8k (line 56) | async def compute_score_gsm8k(
  function compute_score_math_verify (line 87) | def compute_score_math_verify(

FILE: tests/experimental/reward_loop/test_agent_reward_loop_colocate.py
  function test_agent_reward_loop_standalone (line 34) | def test_agent_reward_loop_standalone():

FILE: tests/experimental/reward_loop/test_agent_reward_loop_standalone.py
  function test_agent_reward_loop_standalone (line 28) | def test_agent_reward_loop_standalone():

FILE: tests/experimental/reward_loop/test_async_token_bucket_on_cpu.py
  class TestAsyncTokenBucket (line 23) | class TestAsyncTokenBucket:
    method test_basic_acquire (line 27) | async def test_basic_acquire(self):
    method test_refill_mechanism (line 40) | async def test_refill_mechanism(self):
    method test_waiting_for_tokens (line 59) | async def test_waiting_for_tokens(self):
    method test_max_tokens_cap (line 75) | async def test_max_tokens_cap(self):
    method test_fractional_tokens (line 90) | async def test_fractional_tokens(self):
    method test_concurrent_acquires (line 102) | async def test_concurrent_acquires(self):
    method test_high_rate_limit (line 123) | async def test_high_rate_limit(self):
    method test_zero_initial_state (line 137) | async def test_zero_initial_state(self):
    method test_rate_limit_accuracy (line 149) | async def test_rate_limit_accuracy(self):
    method test_sequential_acquires (line 166) | async def test_sequential_acquires(self):
    method test_default_max_tokens (line 187) | async def test_default_max_tokens(self):
    method test_single_token_acquire (line 195) | async def test_single_token_acquire(self):
    method test_large_token_acquire (line 204) | async def test_large_token_acquire(self):
    method test_thread_safety_with_lock (line 217) | async def test_thread_safety_with_lock(self):
    method test_multiple_wait_cycles (line 237) | async def test_multiple_wait_cycles(self):
    method test_rapid_small_acquires (line 253) | async def test_rapid_small_acquires(self):

FILE: tests/experimental/reward_loop/test_math_verify.py
  function test_agent_reward_loop_standalone (line 27) | def test_agent_reward_loop_standalone():

FILE: tests/experimental/reward_loop/test_rate_limited_reward_manager_on_cpu.py
  class MockAPICounter (line 29) | class MockAPICounter:
    method __init__ (line 32) | def __init__(self):
    method record_call (line 37) | async def record_call(self):
    method reset (line 42) | def reset(self):
    method get_rate_per_second (line 46) | def get_rate_per_second(self, window_start: float = None):
  function mock_sync_reward_function (line 70) | def mock_sync_reward_function(
  function mock_async_reward_function (line 82) | async def mock_async_reward_function(
  function mock_slow_api_function (line 97) | async def mock_slow_api_function(
  function mock_failing_api_function (line 105) | async def mock_failing_api_function(
  function mock_dict_result_function (line 113) | async def mock_dict_result_function(
  function create_test_data_proto (line 124) | def create_test_data_proto(tokenizer, response_text: str, ground_truth: ...
  class TestRateLimitedRewardManager (line 143) | class TestRateLimitedRewardManager:
    method setup_and_teardown (line 147) | def setup_and_teardown(self):
    method tokenizer (line 160) | def tokenizer(self):
    method test_basic_reward_computation (line 165) | async def test_basic_reward_computation(self, tokenizer):
    method test_rpm_rate_limiting (line 183) | async def test_rpm_rate_limiting(self, tokenizer):
    method test_tpm_rate_limiting (line 218) | async def test_tpm_rate_limiting(self, tokenizer):
    method test_concurrency_limiting (line 254) | async def test_concurrency_limiting(self, tokenizer):
    method test_timeout_handling (line 287) | async def test_timeout_handling(self, tokenizer):
    method test_error_handling (line 311) | async def test_error_handling(self, tokenizer):
    method test_dict_result_format (line 330) | async def test_dict_result_format(self, tokenizer):
    method test_sync_reward_function (line 347) | async def test_sync_reward_function(self, tokenizer):
    method test_combined_rate_limits (line 362) | async def test_combined_rate_limits(self, tokenizer):
    method test_correct_vs_incorrect_answers (line 398) | async def test_correct_vs_incorrect_answers(self, tokenizer):
    method test_high_throughput (line 417) | async def test_high_throughput(self, tokenizer):
    method test_class_initialization_once (line 459) | async def test_class_initialization_once(self, tokenizer):
    method test_extra_info_handling (line 474) | async def test_extra_info_handling(self, tokenizer):

FILE: tests/experimental/reward_loop/test_reward_model_disrm.py
  function create_data_samples (line 27) | def create_data_samples(tokenizer) -> DataProto:
  function test_reward_model_manager (line 107) | def test_reward_model_manager():

FILE: tests/experimental/reward_loop/test_reward_model_genrm.py
  function create_data_samples (line 28) | def create_data_samples(tokenizer) -> DataProto:
  function test_reward_model_manager (line 108) | def test_reward_model_manager():

FILE: tests/experimental/vla/test_sim_envs.py
  function test_sim_env_creation_and_step (line 25) | def test_sim_env_creation_and_step(simulator_type):

FILE: tests/interactions/test_gsm8k_interaction.py
  class TestGsm8kInteraction (line 24) | class TestGsm8kInteraction:
    method setup_method (line 27) | def setup_method(self):
    method test_init (line 32) | def test_init(self):
    method test_start_interaction_with_instance_id (line 39) | async def test_start_interaction_with_instance_id(self):
    method test_start_interaction_without_instance_id (line 53) | async def test_start_interaction_without_instance_id(self):
    method test_start_interaction_without_ground_truth (line 65) | async def test_start_interaction_without_ground_truth(self):
    method test_generate_response_correct_answer_with_prefix (line 75) | async def test_generate_response_correct_answer_with_prefix(self):
    method test_generate_response_correct_answer_without_prefix (line 97) | async def test_generate_response_correct_answer_without_prefix(self):
    method test_generate_response_incorrect_answer (line 118) | async def test_generate_response_incorrect_answer(self):
    method test_generate_response_multiple_messages (line 139) | async def test_generate_response_multiple_messages(self):
    method test_generate_response_no_assistant_message (line 164) | async def test_generate_response_no_assistant_message(self):
    method test_calculate_score_direct_call (line 183) | async def test_calculate_score_direct_call(self):
    method test_calculate_score_with_kwargs (line 201) | async def test_calculate_score_with_kwargs(self):
    method test_finalize_interaction (line 219) | async def test_finalize_interaction(self):
    method test_finalize_interaction_with_kwargs (line 234) | async def test_finalize_interaction_with_kwargs(self):
    method test_finalize_nonexistent_interaction (line 249) | async def test_finalize_nonexistent_interaction(self):
    method test_full_interaction_workflow_correct (line 258) | async def test_full_interaction_workflow_correct(self):
    method test_full_interaction_workflow_incorrect (line 281) | async def test_full_interaction_workflow_incorrect(self):
    method test_multiple_concurrent_interactions (line 316) | async def test_multiple_concurrent_interactions(self):
    method test_edge_case_empty_messages (line 349) | async def test_edge_case_empty_messages(self):
    method test_edge_case_message_without_content (line 369) | async def test_edge_case_message_without_content(self):
    method test_inheritance_from_base_interaction (line 390) | def test_inheritance_from_base_interaction(self):
    method test_name_attribute_initialization (line 408) | def test_name_attribute_initialization(self):

FILE: tests/interactions/test_interaction_registry.py
  class TestInteractionRegistry (line 30) | class TestInteractionRegistry:
    method test_get_interaction_class (line 31) | def test_get_interaction_class(self):
    method test_initialize_single_interaction_from_config (line 41) | def test_initialize_single_interaction_from_config(self):
    method test_initialize_multiple_interactions_from_config (line 69) | def test_initialize_multiple_interactions_from_config(self):
    method test_initialize_interaction_without_explicit_name (line 111) | def test_initialize_interaction_without_explicit_name(self):
    method test_initialize_empty_config (line 132) | def test_initialize_empty_config(self):
    method test_invalid_class_name (line 146) | def test_invalid_class_name(self):
    method test_duplicate_interaction_names (line 162) | def test_duplicate_interaction_names(self):
    method test_auto_name_generation_edge_cases (line 185) | def test_auto_name_generation_edge_cases(self):

FILE: tests/models/test_engine.py
  function get_test_language_model (line 56) | def get_test_language_model(device_count):
  function create_training_config (line 65) | def create_training_config(model_type, strategy, device_count, model):
  function test_actor_engine (line 114) | def test_actor_engine(strategy):
  function create_value_model (line 234) | def create_value_model(language_model_path, output_path):
  function test_critic_engine (line 250) | def test_critic_engine(strategy):
  function create_actor_model (line 353) | def create_actor_model(tmp_path, config):
  function _worker (line 361) | def _worker(rank: int, world_size: int, rendezvous_file: str, strategy: ...
  function test_per_tensor_generator (line 431) | def test_per_tensor_generator(world_size, tmp_path, config, strategy):

FILE: tests/models/test_tiled_mlp_accuracy.py
  function setup_distributed (line 26) | def setup_distributed():
  function create_model (line 34) | def create_model(model_name="Qwen/Qwen3-1.7B", num_layers=2):
  function apply_fsdp2 (line 51) | def apply_fsdp2(model, device_mesh):
  function run_forward_backward (line 59) | def run_forward_backward(model, input_ids, labels):
  function compare_results (line 78) | def compare_results(logits1, grads1, logits2, grads2, rank):
  function main (line 119) | def main():

FILE: tests/models/test_transformer.py
  function test_hf_casual_models (line 47) | def test_hf_casual_models():
  function test_hf_value_models (line 117) | def test_hf_value_models():
  function test_attn_implementation_override (line 172) | def test_attn_implementation_override():
  function test_fsdp_worker_attn_implementation_integration (line 207) | def test_fsdp_worker_attn_implementation_integration():

FILE: tests/models/test_transformers_ulysses.py
  class SequenceParallelConfig (line 49) | class SequenceParallelConfig:
  function test_configs (line 55) | def test_configs():
  function sync_model_parameters_global (line 92) | def sync_model_parameters_global(layer):
  function test_hf_casual_fwd_bwd (line 99) | def test_hf_casual_fwd_bwd(test_config):
  function _hf_casual_fwd (line 112) | def _hf_casual_fwd(config, sp_size, dp_size):
  function _hf_casual_fwd_bwd (line 191) | def _hf_casual_fwd_bwd(config, sp_size, dp_size):

FILE: tests/single_controller/base/test_decorator.py
  function reset_dispatch_registry (line 29) | def reset_dispatch_registry():
  function test_register_new_dispatch_mode (line 38) | def test_register_new_dispatch_mode(reset_dispatch_registry):
  function test_update_existing_dispatch_mode (line 60) | def test_update_existing_dispatch_mode(reset_dispatch_registry):

FILE: tests/single_controller/check_worker_alive/main.py
  class TestActor (line 27) | class TestActor(Worker):
    method __init__ (line 28) | def __init__(self) -> None:
    method foo (line 32) | def foo(self, wait_time):

FILE: tests/single_controller/detached_worker/client.py
  function compute_position_id_with_mask (line 27) | def compute_position_id_with_mask(mask):

FILE: tests/single_controller/detached_worker/server.py
  class Trainer (line 44) | class Trainer(Worker):
    method __init__ (line 45) | def __init__(self):
    method init_model (line 74) | def init_model(self):
    method train_model (line 117) | def train_model(self, data: DataProto) -> DataProto:

FILE: tests/single_controller/test_auto_padding_on_cpu.py
  class Actor (line 30) | class Actor(Worker):
    method __init__ (line 31) | def __init__(self) -> None:
    method add (line 35) | def add(self, data: DataProto):
  function test_auto_padding (line 40) | def test_auto_padding():

FILE: tests/single_controller/test_colocated_workers.py
  class Actor (line 30) | class Actor(Worker):
    method __init__ (line 31) | def __init__(self) -> None:
    method add (line 35) | def add(self, data: DataProto):
  class Critic (line 41) | class Critic(Worker):
    method __init__ (line 42) | def __init__(self, config) -> None:
    method sub (line 47) | async def sub(self, data: DataProto):
  function test_colocated_workers (line 52) | def test_colocated_workers():

FILE: tests/single_controller/test_colocated_workers_fused.py
  class Actor (line 30) | class Actor(Worker):
    method __init__ (line 31) | def __init__(self) -> None:
    method add (line 35) | def add(self, data: DataProto):
  class Critic (line 41) | class Critic(Worker):
    method __init__ (line 42) | def __init__(self, config) -> None:
    method sub (line 47) | def sub(self, data: DataProto):
  function test_colocated_workers_fused (line 52) | def test_colocated_workers_fused():

FILE: tests/single_controller/test_data_transfer.py
  class DummyWorker (line 34) | class DummyWorker(Worker):
    method __init__ (line 35) | def __init__(self):
    method do_nothing (line 40) | def do_nothing(self, data):
  function test_data_transfer (line 48) | def test_data_transfer():

FILE: tests/single_controller/test_decorator_on_cpu.py
  function ray_init_shutdown (line 32) | def ray_init_shutdown():
  class DecoratorTestWorker (line 40) | class DecoratorTestWorker(Worker):
    method __init__ (line 41) | def __init__(self, initial_value=0):
    method dp_compute (line 51) | def dp_compute(self, data: DataProto) -> DataProto:
    method async_dp_compute (line 59) | async def async_dp_compute(self, data: DataProto) -> DataProto:
    method dp_compute_td (line 67) | def dp_compute_td(self, data: TensorDict) -> TensorDict:
  function test_decorator_dp_compute (line 82) | def test_decorator_dp_compute(ray_init_shutdown):
  function test_decorator_async_function (line 118) | def test_decorator_async_function(ray_init_shutdown):
  function test_decorator_dp_compute_td (line 161) | def test_decorator_dp_compute_td(ray_init_shutdown):

FILE: tests/single_controller/test_device_mesh_register.py
  class TestActor (line 29) | class TestActor(Worker):
    method __init__ (line 30) | def __init__(self):
    method generate_data_proto (line 56) | def generate_data_proto(self, data: DataProto):
    method generate_tensordict (line 63) | def generate_tensordict(self, data: TensorDict):
    method train_data_proto (line 70) | def train_data_proto(self, data: DataProto):
    method train_tensordict (line 80) | def train_tensordict(self, data: TensorDict):
    method generate_nested_tensor (line 90) | def generate_nested_tensor(self, data: TensorDict):
  function test_dist_global_info_wg (line 100) | def test_dist_global_info_wg():

FILE: tests/single_controller/test_driverfunc_to_worker.py
  class ModelActor (line 32) | class ModelActor(Worker):
    method __init__ (line 33) | def __init__(self):
  class HackSelf (line 37) | class HackSelf:
    method __init__ (line 38) | def __init__(self):
  function get_aux_metrics (line 42) | def get_aux_metrics(self, test_proto):
  function test (line 55) | def test():

FILE: tests/single_controller/test_fused_workers_on_cpu.py
  class Actor (line 28) | class Actor(Worker):
    method __init__ (line 29) | def __init__(self) -> None:
    method add (line 33) | def add(self, x):
  class Critic (line 39) | class Critic(Worker):
    method __init__ (line 40) | def __init__(self, val) -> None:
    method sub (line 45) | def sub(self, x):
  class HybridWorker (line 57) | class HybridWorker(FusedBaseClass):
    method foo (line 59) | def foo(self, x):
  function test_fused_workers (line 63) | def test_fused_workers():

FILE: tests/single_controller/test_get_set_dispatch_collect_cpu.py
  function test_get_set_dispatch_collect_cpu (line 21) | def test_get_set_dispatch_collect_cpu():

FILE: tests/single_controller/test_high_level_scheduling_api.py
  class TestActor (line 25) | class TestActor(Worker):
    method __init__ (line 27) | def __init__(self, cuda_visible_devices=None) -> None:
    method get_node_id (line 30) | def get_node_id(self):
  function test (line 34) | def test():

FILE: tests/single_controller/test_nested_worker.py
  class TestActor (line 24) | class TestActor(Worker):
    method __init__ (line 26) | def __init__(self, x) -> None:
    method get (line 31) | def get(self):
  class TestHighLevelActor (line 35) | class TestHighLevelActor(Worker):
    method __init__ (line 36) | def __init__(self, x=None) -> None:
    method get (line 41) | def get(self):
  function test_nested_worker (line 45) | def test_nested_worker():

FILE: tests/single_controller/test_ray_collectives.py
  class Actor (line 33) | class Actor(Worker):
    method init (line 35) | def init(self):
    method send_tensors (line 41) | def send_tensors(self):
  class Rollout (line 47) | class Rollout(Worker):
    method init (line 49) | def init(self):
    method receive_tensors (line 59) | def receive_tensors(self):
    method get_tensors (line 67) | def get_tensors(self):
  function test_ray_collective_group (line 71) | def test_ray_collective_group():

FILE: tests/single_controller/test_ray_local_envs_on_cpu.py
  class TestActor (line 27) | class TestActor(Worker):
    method __init__ (line 28) | def __init__(self) -> None:
    method getenv (line 31) | def getenv(self, key):
  function test_basics (line 36) | def test_basics():
  function test_customized_worker_env (line 53) | def test_customized_worker_env():

FILE: tests/single_controller/test_ray_utils_on_cpu.py
  function init_ray (line 23) | def init_ray():
  function test_parallel_put_basic (line 29) | def test_parallel_put_basic(init_ray):
  function test_parallel_put_empty (line 37) | def test_parallel_put_empty(init_ray):
  function test_parallel_put_workers (line 43) | def test_parallel_put_workers(init_ray):

FILE: tests/single_controller/test_rvdz.py
  class TestWorker (line 19) | class TestWorker:
    method __init__ (line 20) | def __init__(self, rank, world_size, group_name):
    method init (line 26) | def init(self):
    method test (line 31) | def test(self):
  function test_rvdz (line 37) | def test_rvdz():

FILE: tests/single_controller/test_split_resource_pool.py
  class Actor (line 33) | class Actor(Worker):
    method __init__ (line 34) | def __init__(self, worker_id) -> None:
    method add (line 45) | def add(self, data: DataProto):
  function test_split_resource_pool_with_split_size (line 50) | def test_split_resource_pool_with_split_size():
  function test_split_resource_pool_with_split_size_list (line 78) | def test_split_resource_pool_with_split_size_list():
  function test_split_resource_pool_with_split_size_list_cross_nodes (line 112) | def test_split_resource_pool_with_split_size_list_cross_nodes():
  function test_split_resource_pool_with_split_twice (line 147) | def test_split_resource_pool_with_split_twice():

FILE: tests/single_controller/test_worker_group_basics.py
  function two_to_all_dispatch_fn (line 27) | def two_to_all_dispatch_fn(worker_group, *args, **kwargs):
  function get_ray_remote_options (line 42) | def get_ray_remote_options() -> str:
  class TestActor (line 56) | class TestActor(Worker):
    method __init__ (line 58) | def __init__(self, x) -> None:
    method foo (line 62) | def foo(self, y):
    method foo_rank_zero (line 66) | def foo_rank_zero(self, x, y):
    method foo_one_to_all (line 70) | def foo_one_to_all(self, x, y):
    method foo_all_to_all (line 74) | def foo_all_to_all(self, x, y):
    method foo_custom (line 78) | def foo_custom(self, x, y):
  function remote_call_wg (line 83) | def remote_call_wg(worker_names):
  function add_one (line 99) | def add_one(data):
  function test_basics (line 106) | def test_basics():

FILE: tests/single_controller/test_worker_group_torch.py
  class TestAllGatherActor (line 30) | class TestAllGatherActor(Worker):
    method __init__ (line 31) | def __init__(self, size) -> None:
    method init (line 35) | def init(self):
    method all_gather (line 40) | def all_gather(self):
  class TestAllGatherActorV2 (line 50) | class TestAllGatherActorV2(Worker):
    method __init__ (line 51) | def __init__(self, size) -> None:
    method all_gather (line 59) | def all_gather(self):
  function test_all_gather_torch (line 68) | def test_all_gather_torch():
  function test_all_gather_torch_v2 (line 94) | def test_all_gather_torch_v2():

FILE: tests/special_distributed/test_fsdp_ckpt.py
  function create_random_input_ids (line 31) | def create_random_input_ids(batch_size, seq_len, vocab_size):
  function test_fsdp_ckpt (line 50) | def test_fsdp_ckpt(strategy="fsdp"):

FILE: tests/special_distributed/test_mcore_config_converter.py
  function check_config_converter_results (line 36) | def check_config_converter_results(tf_config: TransformerConfig | MLATra...
  function modify_hf_config (line 67) | def modify_hf_config(name: str, hf_config: PretrainedConfig):
  function test_mcore_config_converter (line 74) | def test_mcore_config_converter():

FILE: tests/special_distributed/test_tensor_dict.py
  function test_all_gather_data_proto (line 28) | def test_all_gather_data_proto():
  function test_vocab_parallel_entropy (line 61) | def test_vocab_parallel_entropy():

FILE: tests/special_e2e/check_custom_rwd_fn.py
  function check_congratulations_in_file (line 18) | def check_congratulations_in_file(output_file):

FILE: tests/special_e2e/check_results.py
  function extract_reward_from_line (line 20) | def extract_reward_from_line(line):

FILE: tests/special_e2e/envs/digit_completion/task.py
  class DigitCompletion (line 19) | class DigitCompletion:
    method __init__ (line 35) | def __init__(self, max_number: int, max_diff: int, max_num_in_response...
    method __str__ (line 56) | def __str__(self):
    method get_state (line 63) | def get_state(self):
    method set_state (line 66) | def set_state(self, state):
    method prompt_length (line 71) | def prompt_length(self):
    method response_length (line 75) | def response_length(self):
    method add (line 80) | def add(self, a, b):
    method get_all_prompts (line 83) | def get_all_prompts(self):
    method sample_str_prompts (line 93) | def sample_str_prompts(self):
    method sample_batch_str_prompts (line 102) | def sample_batch_str_prompts(self, batch_size):
  function compute_attention_mask (line 109) | def compute_attention_mask(prompts, pad_token_id):
  function compute_position_id_with_mask (line 115) | def compute_position_id_with_mask(mask):
  function generate_ground_truth_response (line 119) | def generate_ground_truth_response(prompt: str):
  function compute_reward (line 139) | def compute_reward(prompt: str, response: str, sequence_reward=1.0):

FILE: tests/special_e2e/envs/digit_completion/tokenizer.py
  class CharTokenizer (line 29) | class CharTokenizer(PreTrainedTokenizer):
    method __init__ (line 30) | def __init__(self, characters: Sequence[str], model_max_length: int, c...
    method vocab_size (line 83) | def vocab_size(self) -> int:
    method get_vocab (line 86) | def get_vocab(self):
    method _tokenize (line 89) | def _tokenize(self, text: str) -> list[str]:
    method _convert_token_to_id (line 92) | def _convert_token_to_id(self, token: str) -> int:
    method _convert_id_to_token (line 95) | def _convert_id_to_token(self, index: int) -> str:
    method convert_tokens_to_string (line 98) | def convert_tokens_to_string(self, tokens):
    method build_inputs_with_special_tokens (line 101) | def build_inputs_with_special_tokens(
    method get_special_tokens_mask (line 111) | def get_special_tokens_mask(
    method get_config (line 129) | def get_config(self) -> dict:
    method from_config (line 137) | def from_config(cls, config: dict):
    method save_pretrained (line 144) | def save_pretrained(self, save_directory: str | os.PathLike, **kwargs):
    method from_pretrained (line 151) | def from_pretrained(cls, save_directory: str | os.PathLike, **kwargs):

FILE: tests/special_e2e/sft/compare_sft_engine_results.py
  function get_result (line 21) | def get_result(file):
  function compare_results (line 31) | def compare_results(golden_results, other_result):

FILE: tests/special_sanity/check_api_docs.py
  function iter_submodules (line 57) | def iter_submodules(root: ModuleType) -> Iterable[ModuleType]:
  function names_missing_doc (line 72) | def names_missing_doc(mod: ModuleType) -> list[str]:
  function check_module (line 92) | def check_module(qualname: str) -> list[str]:
  function autodiscover_packages (line 106) | def autodiscover_packages() -> list[str]:
  function main (line 115) | def main() -> None:

FILE: tests/special_sanity/check_docs_time_info.py
  function is_allowed (line 41) | def is_allowed(path: Path) -> bool:
  function main (line 52) | def main():

FILE: tests/special_sanity/check_docstrings.py
  class DocstringChecker (line 25) | class DocstringChecker(ast.NodeVisitor):
    method __init__ (line 28) | def __init__(self, filename: str):
    method visit_FunctionDef (line 34) | def visit_FunctionDef(self, node: ast.FunctionDef):
    method visit_AsyncFunctionDef (line 45) | def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef):
    method visit_ClassDef (line 56) | def visit_ClassDef(self, node: ast.ClassDef):
    method _has_docstring (line 67) | def _has_docstring(self, node) -> bool:
  function check_file_docstrings (line 72) | def check_file_docstrings(filepath: str) -> list[tuple[str, str, int]]:
  function main (line 88) | def main():

FILE: tests/special_sanity/check_license.py
  function get_py_files (line 49) | def get_py_files(path_arg: Path) -> Iterable[Path]:

FILE: tests/special_sanity/check_pr_description.py
  class TemplateFileError (line 24) | class TemplateFileError(Exception):
  class PRBodyLoadError (line 28) | class PRBodyLoadError(Exception):
  class PRDescriptionError (line 32) | class PRDescriptionError(Exception):
  function load_template (line 40) | def load_template(path):
  function load_pr_body (line 58) | def load_pr_body(event_path):
  function check_pr_description (line 67) | def check_pr_description(body, template_lines):
  function main (line 84) | def main():

FILE: tests/special_sanity/test_config_docs.py
  function validate_yaml_format (line 19) | def validate_yaml_format(yaml_lines):
  function test_trainer_config_doc (line 60) | def test_trainer_config_doc():

FILE: tests/special_sanity/test_import.py
  function test_import (line 16) | def test_import():
  function test_single_controller_import (line 22) | def test_single_controller_import():

FILE: tests/special_sanity/type_coverage_check.py
  function get_changed_files (line 27) | def get_changed_files() -> list[Path]:
  function get_changed_lines (line 34) | def get_changed_lines(file_path: Path) -> set[int]:
  function should_check_type (line 61) | def should_check_type(arg_name: str) -> bool:
  function has_type_annotations (line 69) | def has_type_annotations(node: ast.AST, debug: bool = False) -> int:
  function check_file (line 87) | def check_file(
  function main (line 116) | def main() -> None:

FILE: tests/special_sanity/validate_imported_docs.py
  function _parse_args (line 32) | def _parse_args() -> argparse.Namespace:
  function _import_attr (line 57) | def _import_attr(module_name: str, attr_name: str):
  function _check_file (line 63) | def _check_file(py_file: pathlib.Path, project_root: pathlib.Path, allow...
  function main (line 110) | def main() -> None:

FILE: tests/special_sanity/validate_structure.py
  function discover_allowed_modules (line 39) | def discover_allowed_modules(impl_root: Path, extra: list[str]) -> set[s...
  function find_violations (line 46) | def find_violations(tests_root: Path, allowed: set[str], allowed_files: ...
  function main (line 66) | def main() -> None:

FILE: tests/special_standalone/test_memory_buffers.py
  function test_memory_buffers (line 26) | def test_memory_buffers():

FILE: tests/test_base_config_on_cpu.py
  function base_config_mock (line 21) | def base_config_mock():
  function test_getitem_success (line 28) | def test_getitem_success(base_config_mock):
  function test_getitem_nonexistent_attribute (line 33) | def test_getitem_nonexistent_attribute(base_config_mock):
  function test_getitem_invalid_key_type (line 39) | def test_getitem_invalid_key_type(base_config_mock):

FILE: tests/test_protocol_on_cpu.py
  function test_union_tensor_dict (line 36) | def test_union_tensor_dict():
  function test_union_numpy_dict (line 51) | def test_union_numpy_dict():
  function test_tensor_dict_constructor (line 141) | def test_tensor_dict_constructor():
  function test_tensor_dict_make_iterator (line 155) | def test_tensor_dict_make_iterator():
  function test_reorder (line 184) | def test_reorder():
  function test_chunk_concat (line 195) | def test_chunk_concat():
  function test_concat_metrics_from_multiple_workers (line 219) | def test_concat_metrics_from_multiple_workers():
  function test_concat_with_empty_and_non_list_meta_info (line 249) | def test_concat_with_empty_and_non_list_meta_info():
  function test_concat_first_worker_missing_metrics (line 272) | def test_concat_first_worker_missing_metrics():
  function test_concat_non_list_metrics (line 295) | def test_concat_non_list_metrics():
  function test_concat_merge_different_non_metric_keys (line 315) | def test_concat_merge_different_non_metric_keys():
  function test_concat_conflicting_non_metric_keys (line 339) | def test_concat_conflicting_non_metric_keys():
  function test_pop (line 357) | def test_pop():
  function test_repeat (line 370) | def test_repeat():
  function test_dataproto_pad_unpad (line 395) | def test_dataproto_pad_unpad():
  function test_dataproto_fold_unfold (line 447) | def test_dataproto_fold_unfold():
  function test_torch_save_data_proto (line 470) | def test_torch_save_data_proto():
  function test_len (line 486) | def test_len():
  function test_dataproto_index (line 506) | def test_dataproto_index():
  function test_old_vs_new_from_single_dict (line 570) | def test_old_vs_new_from_single_dict():
  function test_dataproto_no_batch (line 607) | def test_dataproto_no_batch():
  function test_sample_level_repeat (line 617) | def test_sample_level_repeat():
  function test_dataproto_unfold_column_chunks (line 642) | def test_dataproto_unfold_column_chunks():
  function test_dataproto_chunk_after_index (line 708) | def test_dataproto_chunk_after_index():
  function test_to_tensordict (line 754) | def test_to_tensordict():
  function test_from_tensordict (line 768) | def test_from_tensordict():
  function test_to_tensordict_with_nested_lists (line 785) | def test_to_tensordict_with_nested_lists():
  function test_to_tensordict_with_nested_dicts (line 810) | def test_to_tensordict_with_nested_dicts():
  function test_to_tensordict_with_complex_nested_structures (line 834) | def test_to_tensordict_with_complex_nested_structures():
  function test_to_tensordict_and_back_with_nested_data (line 862) | def test_to_tensordict_and_back_with_nested_data():
  function test_to_tensordict_agent_loop_scenario (line 926) | def test_to_tensordict_agent_loop_scenario():
  function test_serialize_deserialize_single_tensor (line 993) | def test_serialize_deserialize_single_tensor():
  function test_serialize_deserialize_tensordict_regular_tensors (line 1010) | def test_serialize_deserialize_tensordict_regular_tensors():
  function test_serialize_deserialize_tensordict_nested_tensors (line 1039) | def test_serialize_deserialize_tensordict_nested_tensors():
  function test_serialize_deserialize_tensordict_mixed_types (line 1092) | def test_serialize_deserialize_tensordict_mixed_types():
  function test_serialize_deserialize_tensordict_with_device (line 1177) | def test_serialize_deserialize_tensordict_with_device():
  function test_serialize_dataproto_with_empty_tensordict (line 1208) | def test_serialize_dataproto_with_empty_tensordict():

FILE: tests/test_protocol_v2_on_cpu.py
  function test_union_tensor_dict (line 30) | def test_union_tensor_dict():
  function test_tensor_dict_constructor (line 67) | def test_tensor_dict_constructor():
  function test_index_select_tensor_dict (line 92) | def test_index_select_tensor_dict():
  function test_tensordict_with_images (line 131) | def test_tensordict_with_images():
  function test_tensordict_with_packing (line 159) | def test_tensordict_with_packing():
  function test_tensordict_eq (line 185) | def test_tensordict_eq():
  function test_tensor_dict_make_iterator (line 248) | def test_tensor_dict_make_iterator():
  function test_reorder (line 300) | def test_reorder():
  function test_chunk_concat (line 313) | def test_chunk_concat():
  function test_pop (line 350) | def test_pop():
  function test_get (line 382) | def test_get():
  function test_repeat (line 412) | def test_repeat():
  function test_dataproto_pad_unpad (line 437) | def test_dataproto_pad_unpad():
  function test_torch_save_data_proto (line 488) | def test_torch_save_data_proto():
  function test_len (line 506) | def test_len():
  function test_dataproto_index (line 523) | def test_dataproto_index():
  function test_select (line 583) | def test_select():
  function test_dataproto_no_batch (line 596) | def test_dataproto_no_batch():
  function test_sample_level_repeat (line 607) | def test_sample_level_repeat():
  function test_dataproto_chunk_after_index (line 633) | def test_dataproto_chunk_after_index():
  function test_concat_nested_tensor (line 676) | def test_concat_nested_tensor():
  function test_concat_tensordict (line 731) | def test_concat_tensordict():
  function test_chunk_tensordict (line 781) | def test_chunk_tensordict():
  function test_assign_non_tensor_stack_with_nested_lists (line 840) | def test_assign_non_tensor_stack_with_nested_lists():
  function test_assign_non_tensor_stack_with_nested_dicts (line 855) | def test_assign_non_tensor_stack_with_nested_dicts():
  function test_assign_non_tensor_stack_with_complex_nested (line 870) | def test_assign_non_tensor_stack_with_complex_nested():
  function test_assign_non_tensor_handles_wrappers (line 889) | def test_assign_non_tensor_handles_wrappers():
  function test_assign_non_tensor_stack_batch_size_check (line 904) | def test_assign_non_tensor_stack_batch_size_check():
  function test_assign_non_tensor_with_auto_detection (line 912) | def test_assign_non_tensor_with_auto_detection():
  function test_get_tensordict_with_nested_lists (line 935) | def test_get_tensordict_with_nested_lists():
  function test_get_tensordict_with_nested_dicts (line 950) | def test_get_tensordict_with_nested_dicts():
  function test_get_tensordict_with_complex_nested_structures (line 962) | def test_get_tensordict_with_complex_nested_structures():
  function test_get_tensordict_agent_loop_scenario (line 977) | def test_get_tensordict_agent_loop_scenario():
  function test_contiguous (line 1040) | def test_contiguous():

FILE: tests/trainer/config/test_algo_config_on_cpu.py
  class TestAlgoConfig (line 30) | class TestAlgoConfig(unittest.TestCase):
    method setUp (line 33) | def setUp(self):
    method test_dataclass_creation_from_dict (line 56) | def test_dataclass_creation_from_dict(self):
    method test_dataclass_creation_from_omega_config (line 69) | def test_dataclass_creation_from_omega_config(self):
    method test_nested_configs (line 77) | def test_nested_configs(self):
    method test_default_values (line 92) | def test_default_values(self):
    method test_get_method_backward_compatibility (line 105) | def test_get_method_backward_compatibility(self):
    method test_post_init_nested_configs (line 117) | def test_post_init_nested_configs(self):
    method test_config_init_from_yaml (line 127) | def test_config_init_from_yaml(self):
  class TestAlgoCompute (line 140) | class TestAlgoCompute(unittest.TestCase):
    method setUp (line 143) | def setUp(self):
    method test_advantage_estimator_with_cfg (line 157) | def test_advantage_estimator_with_cfg(self):
    method test_grpo_advantage_estimator_with_cfg (line 182) | def test_grpo_advantage_estimator_with_cfg(self):

FILE: tests/trainer/config/test_legacy_config_on_cpu.py
  class TestConfigComparison (line 35) | class TestConfigComparison(unittest.TestCase):
    method _compare_configs_recursively (line 55) | def _compare_configs_recursively(
    method test_ppo_trainer_config_matches_legacy (line 114) | def test_ppo_trainer_config_matches_legacy(self):
    method test_ppo_megatron_trainer_config_matches_legacy (line 138) | def test_ppo_megatron_trainer_config_matches_legacy(self):
    method test_load_component (line 160) | def test_load_component(self):

FILE: tests/trainer/ppo/test_core_algos_on_cpu.py
  function mock_test_fn (line 34) | def mock_test_fn():
  class TestRegisterAdvEst (line 38) | class TestRegisterAdvEst(unittest.TestCase):
    method setUp (line 39) | def setUp(self):
    method tearDown (line 48) | def tearDown(self) -> None:
    method test_register_new_function (line 52) | def test_register_new_function(self):
    method test_register_with_enum (line 62) | def test_register_with_enum(self):
    method test_duplicate_registration_same_function (line 76) | def test_duplicate_registration_same_function(self):
    method test_duplicate_registration_different_function (line 83) | def test_duplicate_registration_different_function(self):
    method test_decorator_preserves_function (line 96) | def test_decorator_preserves_function(self):
    method test_multiple_registrations (line 105) | def test_multiple_registrations(self):
    method test_get_adv_estimator_fn_valid_names (line 121) | def test_get_adv_estimator_fn_valid_names(self):
    method test_get_adv_estimator_fn_invalid_name (line 131) | def test_get_adv_estimator_fn_invalid_name(self):
    method test_get_adv_estimator_fn_case_sensitive (line 137) | def test_get_adv_estimator_fn_case_sensitive(self):
  function test_multi_turn_compute_gae_advantage_return (line 143) | def test_multi_turn_compute_gae_advantage_return():
  function _make_group_index (line 200) | def _make_group_index(batch_size: int, num_groups: int) -> np.ndarray:
  function _rand_mask (line 214) | def _rand_mask(batch_size: int, seq_len: int) -> torch.Tensor:
  function test_rloo_and_vectorized_equivalence (line 230) | def test_rloo_and_vectorized_equivalence(batch_size: int, seq_len: int, ...
  function test_grpo_and_vectorized_equivalence (line 270) | def test_grpo_and_vectorized_equivalence(batch_size: int, seq_len: int, ...

FILE: tests/trainer/ppo/test_metric_utils_on_cpu.py
  class TestReduceMetrics (line 41) | class TestReduceMetrics(unittest.TestCase):
    method test_reduce_metrics_basic (line 44) | def test_reduce_metrics_basic(self):
    method test_reduce_metrics_empty (line 55) | def test_reduce_metrics_empty(self):
    method test_reduce_metrics_single_value (line 64) | def test_reduce_metrics_single_value(self):
  class TestMetric (line 74) | class TestMetric(unittest.TestCase):
    method test_init_with_string_aggregation (line 77) | def test_init_with_string_aggregation(self):
    method test_init_with_enum_aggregation (line 83) | def test_init_with_enum_aggregation(self):
    method test_init_with_value (line 89) | def test_init_with_value(self):
    method test_init_with_invalid_aggregation (line 94) | def test_init_with_invalid_aggregation(self):
    method test_append_float (line 99) | def test_append_float(self):
    method test_append_int (line 106) | def test_append_int(self):
    method test_append_tensor (line 113) | def test_append_tensor(self):
    method test_append_non_scalar_tensor_raises (line 120) | def test_append_non_scalar_tensor_raises(self):
    method test_append_metric (line 126) | def test_append_metric(self):
    method test_extend_with_list (line 136) | def test_extend_with_list(self):
    method test_extend_with_metric (line 142) | def test_extend_with_metric(self):
    method test_extend_aggregation_mismatch_raises (line 153) | def test_extend_aggregation_mismatch_raises(self):
    method test_aggregate_mean (line 161) | def test_aggregate_mean(self):
    method test_aggregate_sum (line 167) | def test_aggregate_sum(self):
    method test_aggregate_min (line 173) | def test_aggregate_min(self):
    method test_aggregate_max (line 179) | def test_aggregate_max(self):
    method test_aggregate_dp_sum_mean (line 185) | def test_aggregate_dp_sum_mean(self):
    method test_aggregate_dp_min_max (line 215) | def test_aggregate_dp_min_max(self):
    method test_aggregate_dp_mismatched_lengths (line 245) | def test_aggregate_dp_mismatched_lengths(self):
    method test_from_dict (line 256) | def test_from_dict(self):
    method test_init_list (line 267) | def test_init_list(self):
    method test_reduce_metrics_with_metric (line 277) | def test_reduce_metrics_with_metric(self):
  class TestComputeDataMetrics (line 292) | class TestComputeDataMetrics(unittest.TestCase):
    method setUp (line 295) | def setUp(self):
    method test_compute_data_metrics_with_critic (line 320) | def test_compute_data_metrics_with_critic(self):
    method test_compute_data_metrics_without_critic (line 338) | def test_compute_data_metrics_without_critic(self):
  class TestComputeTimingMetrics (line 352) | class TestComputeTimingMetrics(unittest.TestCase):
    method setUp (line 355) | def setUp(self):
    method test_compute_timing_metrics (line 377) | def test_compute_timing_metrics(self, mock_compute_response_info):
  class TestComputeThroughputMetrics (line 403) | class TestComputeThroughputMetrics(unittest.TestCase):
    method setUp (line 406) | def setUp(self):
    method test_compute_throughout_metrics (line 414) | def test_compute_throughout_metrics(self):
  class TestBootstrapMetric (line 435) | class TestBootstrapMetric(unittest.TestCase):
    method test_bootstrap_metric_basic (line 438) | def test_bootstrap_metric_basic(self):
    method test_bootstrap_metric_empty (line 462) | def test_bootstrap_metric_empty(self):
  class TestCalcMajVal (line 468) | class TestCalcMajVal(unittest.TestCase):
    method test_calc_maj_val_basic (line 471) | def test_calc_maj_val_basic(self):
    method test_calc_maj_val_tie (line 484) | def test_calc_maj_val_tie(self):
  class TestProcessValidationMetrics (line 501) | class TestProcessValidationMetrics(unittest.TestCase):
    method test_process_validation_metrics_basic (line 504) | def test_process_validation_metrics_basic(self):
    method test_process_validation_metrics_with_pred (line 527) | def test_process_validation_metrics_with_pred(self):

FILE: tests/trainer/ppo/test_rollout_corr.py
  function test_basic_rollout_correction (line 41) | def test_basic_rollout_correction():
  function test_each_supported_rollout_rs_option (line 143) | def test_each_supported_rollout_rs_option(option: str, threshold):
  function test_rollout_rs_multiple_options (line 168) | def test_rollout_rs_multiple_options():
  function test_metrics_completeness (line 194) | def test_metrics_completeness():
  function test_offpolicy_metrics (line 254) | def test_offpolicy_metrics():
  function test_mask_mode (line 311) | def test_mask_mode():

FILE: tests/trainer/ppo/test_rollout_corr_integration.py
  class TestRolloutISIntegration (line 28) | class TestRolloutISIntegration:
    method sample_data (line 32) | def sample_data(self):
    method config_with_rollout_is (line 46) | def config_with_rollout_is(self):
    method test_policy_loss_with_rollout_is (line 60) | def test_policy_loss_with_rollout_is(self, sample_data, config_with_ro...
    method test_rollout_is_weights_computation (line 96) | def test_rollout_is_weights_computation(self, sample_data):
    method test_all_aggregation_levels (line 120) | def test_all_aggregation_levels(self, sample_data):
    method test_both_bounding_modes (line 146) | def test_both_bounding_modes(self, sample_data):
    method test_offpolicy_metrics (line 172) | def test_offpolicy_metrics(self, sample_data):
    method test_metrics_only_mode (line 186) | def test_metrics_only_mode(self, sample_data, config_with_rollout_is):
  class TestRolloutCorrectionConfigNormalization (line 234) | class TestRolloutCorrectionConfigNormalization:
    method test_alias_normalization_and_threshold_parsing (line 237) | def test_alias_normalization_and_threshold_parsing(self):
    method test_missing_threshold_raises (line 250) | def test_missing_threshold_raises(self):
    method test_float_threshold_conversion_in_factory (line 255) | def test_float_threshold_conversion_in_factory(self):

FILE: tests/utils/_test_module.py
  class TestClass (line 17) | class TestClass:
    method __init__ (line 20) | def __init__(self, value=None):
    method get_value (line 23) | def get_value(self):
  function test_function (line 30) | def test_function():

FILE: tests/utils/ckpt/test_checkpoint_cleanup_on_cpu.py
  class TestCheckpointCleanupLogic (line 22) | class TestCheckpointCleanupLogic:
    method setup (line 26) | def setup(self):
    method manager (line 33) | def manager(self, monkeypatch):
    method _create_checkpoint_dir (line 56) | def _create_checkpoint_dir(self, step: int) -> str:
    method test_max_ckpt_1_preserves_existing_before_save (line 64) | def test_max_ckpt_1_preserves_existing_before_save(self, manager):
    method test_max_ckpt_1_deletes_old_after_save (line 76) | def test_max_ckpt_1_deletes_old_after_save(self, manager):
    method test_max_ckpt_2_keeps_one_before_save (line 88) | def test_max_ckpt_2_keeps_one_before_save(self, manager):
    method test_max_ckpt_0_keeps_all (line 100) | def test_max_ckpt_0_keeps_all(self, manager):
    method test_full_save_cycle_max_ckpt_1 (line 115) | def test_full_save_cycle_max_ckpt_1(self, manager):

FILE: tests/utils/ckpt/test_esi_save_ckpt_on_cpu.py
  class TestShouldSaveCkptEsi (line 22) | class TestShouldSaveCkptEsi(TestCase):
    method test_no_expiration_timestamp (line 23) | def test_no_expiration_timestamp(self):
    method test_mlp_expiration_valid (line 29) | def test_mlp_expiration_valid(self):
    method test_mlp_expiration_passed (line 35) | def test_mlp_expiration_passed(self):
    method test_mlp_invalid_timestamp (line 41) | def test_mlp_invalid_timestamp(self):
    method test_mlp_expiration_not_reached (line 46) | def test_mlp_expiration_not_reached(self):
    method test_aws_expiration_not_reached (line 52) | def test_aws_expiration_not_reached(self):
    method test_redundant_time (line 59) | def test_redundant_time(self):
    method test_zero_max_steps_duration (line 66) | def test_zero_max_steps_duration(self):

FILE: tests/utils/dataset/test_create_rl_sampler_on_cpu.py
  class RandomCurriculumSampler (line 29) | class RandomCurriculumSampler(AbstractCurriculumSampler):
    method __init__ (line 30) | def __init__(
    method __iter__ (line 40) | def __iter__(self):
    method __len__ (line 43) | def __len__(self) -> int:
    method update (line 46) | def update(self, batch) -> None:
  class MockIncorrectSampler (line 50) | class MockIncorrectSampler:
    method __init__ (line 53) | def __init__(self, data_source, data_config):
  class MockChatDataset (line 57) | class MockChatDataset(Dataset):
    method __init__ (line 58) | def __init__(self):
    method __getitem__ (line 70) | def __getitem__(self, index):
    method __len__ (line 73) | def __len__(self):
  function test_create_custom_curriculum_samper (line 77) | def test_create_custom_curriculum_samper():
  function test_create_custom_curriculum_samper_wrong_class (line 94) | def test_create_custom_curriculum_samper_wrong_class():

FILE: tests/utils/dataset/test_multiturn_sft_dataset_on_cpu.py
  function test_multiturn_sft_dataset (line 47) | def test_multiturn_sft_dataset(model_path: str, ignore_input_ids_mismatc...
  function generate_image (line 239) | def generate_image(description: str, size: str = "256x256"):
  function vlm_data_file (line 253) | def vlm_data_file():
  function test_multiturn_sft_vlm_dataset_on_cpu (line 349) | def test_multiturn_sft_vlm_dataset_on_cpu(model_path, vlm_data_file):
  function test_multiturn_sft_vlm_dataloader_on_cpu (line 411) | def test_multiturn_sft_vlm_dataloader_on_cpu(model_path, vlm_data_file):

FILE: tests/utils/dataset/test_rl_collate_fn_on_cpu.py
  function test_rl_collate_fn (line 17) | def test_rl_collate_fn():

FILE: tests/utils/dataset/test_rl_dataset_on_cpu.py
  function get_gsm8k_data (line 28) | def get_gsm8k_data():
  function test_rl_dataset (line 36) | def test_rl_dataset():
  function test_rl_dataset_with_max_samples (line 67) | def test_rl_dataset_with_max_samples():
  function test_image_rl_data (line 83) | def test_image_rl_data():
  function video_data_file (line 131) | def video_data_file():
  function test_video_rl_data (line 166) | def test_video_rl_data(video_data_file):

FILE: tests/utils/debug/test_metrics.py
  class TestMetrics (line 22) | class TestMetrics(unittest.TestCase):
    method test_calculate_debug_metrics (line 23) | def test_calculate_debug_metrics(self):

FILE: tests/utils/megatron/test_pipeline_parallel.py
  function test_make_batch_generator_no_vpp (line 21) | def test_make_batch_generator_no_vpp():
  function test_make_batch_generator_with_vpp (line 28) | def test_make_batch_generator_with_vpp():
  function test_make_batch_generator_empty (line 40) | def test_make_batch_generator_empty():
  function test_get_dynamic_pipeline_shards (line 63) | def test_get_dynamic_pipeline_shards(layer_num, pp_size, gt):

FILE: tests/utils/reward_score/reward_score/test_sandbox_fusion_on_cpu.py
  function test_integration_success_correct (line 78) | def test_integration_success_correct():
  function test_integration_success_wrong_output (line 89) | def test_integration_success_wrong_output():
  function test_integration_compile_error (line 99) | def test_integration_compile_error():
  function test_integration_runtime_error (line 108) | def test_integration_runtime_error():
  function test_integration_runtime_timeout (line 117) | def test_integration_runtime_timeout():
  function test_integration_concurrency_high_load (line 127) | def test_integration_concurrency_high_load():
  function test_unit_concurrency_order (line 254) | def test_unit_concurrency_order(mock_call_sandbox_api):
  function test_unit_api_timeout_error_concurrent (line 298) | def test_unit_api_timeout_error_concurrent(mock_call_sandbox_api):
  function _mock_api_call_for_concurrency_tracking (line 351) | def _mock_api_call_for_concurrency_tracking(
  function _process_pool_worker_for_concurrency_test (line 391) | def _process_pool_worker_for_concurrency_test(
  function test_multiprocess_global_concurrency_limit_with_semaphore (line 458) | def test_multiprocess_global_concurrency_limit_with_semaphore():
  function test_unit_invalid_input_format (line 556) | def test_unit_invalid_input_format():
  function test_unit_input_output_mismatch (line 572) | def test_unit_input_output_mismatch():
  function test_integration_concurrency_all_timeout (line 581) | def test_integration_concurrency_all_timeout():
  function test_fn_name_success_single_case (line 633) | def test_fn_name_success_single_case():
  function test_none_and_empty_stdin_passed_correctly (line 672) | def test_none_and_empty_stdin_passed_correctly():
  function test_assert_case_success (line 696) | def test_assert_case_success():

FILE: tests/utils/reward_score/test_sandbox_on_cpu.py
  function test_parallelism (line 95) | def test_parallelism():
  function test_prime_code (line 118) | def test_prime_code():
  function test_prime_code_sandbox_fusion (line 130) | def test_prime_code_sandbox_fusion():
  function test_continuous_score_consistency (line 147) | def test_continuous_score_consistency():
  function test_check_correctness (line 176) | def test_check_correctness():
  function test_prime_math (line 186) | def test_prime_math():

FILE: tests/utils/test_activation_offload.py
  function create_random_input_ids (line 33) | def create_random_input_ids(batch_size, seq_len, vocab_size):
  function _fsdp_activation_offloading_test (line 52) | def _fsdp_activation_offloading_test(rank, world_size, rendezvous_file, ...
  function test_activation_offloading (line 166) | def test_activation_offloading(world_size, strategy, tmp_path):

FILE: tests/utils/test_bucketed_weight_transfer.py
  function _unique_zmq_handle (line 37) | def _unique_zmq_handle():
  function _generate_weights (line 41) | def _generate_weights(weight_specs, seed):
  function _sender_fn (line 64) | def _sender_fn(zmq_handle, weight_specs, seed, bucket_size_mb, use_shm):
  function _receiver_fn (line 77) | def _receiver_fn(zmq_handle, use_shm, result_queue):
  function _transfer_and_validate (line 98) | def _transfer_and_validate(weight_specs, bucket_size_mb, use_shm):
  class TestBucketedWeightTransferSHM (line 149) | class TestBucketedWeightTransferSHM:
    method test_single_small_weight (line 152) | def test_single_small_weight(self):
    method test_multiple_weights_single_bucket (line 156) | def test_multiple_weights_single_bucket(self):
    method test_multiple_buckets (line 164) | def test_multiple_buckets(self):
    method test_mixed_dtypes (line 169) | def test_mixed_dtypes(self):
    method test_empty_weights (line 177) | def test_empty_weights(self):
  class TestBucketedWeightTransferIPC (line 185) | class TestBucketedWeightTransferIPC:
    method test_single_small_weight (line 188) | def test_single_small_weight(self):
    method test_multiple_weights_single_bucket (line 192) | def test_multiple_weights_single_bucket(self):
    method test_multiple_buckets (line 200) | def test_multiple_buckets(self):
    method test_mixed_dtypes (line 204) | def test_mixed_dtypes(self):
    method test_empty_weights (line 212) | def test_empty_weights(self):
    method test_exact_bucket_boundary (line 215) | def test_exact_bucket_boundary(self):

FILE: tests/utils/test_check_ipc_version_support_on_npu.py
  class TestCheckIPCVersionSupport (line 22) | class TestCheckIPCVersionSupport(unittest.TestCase):
    method setUp (line 25) | def setUp(self):
    method tearDown (line 30) | def tearDown(self):
    method test_standard_version_with_support (line 34) | def test_standard_version_with_support(self):
    method test_standard_version_newer (line 40) | def test_standard_version_newer(self):
    method test_rc_version_format (line 46) | def test_rc_version_format(self):
    method test_exact_rc_version (line 53) | def test_exact_rc_version(self):
    method test_t_suffix_version (line 60) | def test_t_suffix_version(self):
    method test_t_suffix_version_older (line 67) | def test_t_suffix_version_older(self):
    method test_software_version_below_minimum (line 74) | def test_software_version_below_minimum(self):
    method test_cann_version_below_minimum (line 80) | def test_cann_version_below_minimum(self):
    method test_both_versions_below_minimum (line 87) | def test_both_versions_below_minimum(self):
    method test_invalid_software_version (line 94) | def test_invalid_software_version(self):
    method test_invalid_cann_version (line 100) | def test_invalid_cann_version(self):
    method test_rc_with_more_parts (line 106) | def test_rc_with_more_parts(self):
    method test_standard_with_more_parts (line 112) | def test_standard_with_more_parts(self):
    method test_rc_edge_case_versions (line 118) | def test_rc_edge_case_versions(self):
    method test_major_version_differences (line 128) | def test_major_version_differences(self):
  class TestGetNPUVersions (line 139) | class TestGetNPUVersions(unittest.TestCase):
    method test_get_npu_versions_success (line 146) | def test_get_npu_versions_success(self, mock_file, mock_exists, mock_m...
    method test_get_npu_versions_missing_software_version (line 163) | def test_get_npu_versions_missing_software_version(self, mock_run):
    method test_get_npu_versions_unsupported_architecture (line 176) | def test_get_npu_versions_unsupported_architecture(self, mock_file, mo...
    method test_get_npu_versions_cann_path_not_exists (line 192) | def test_get_npu_versions_cann_path_not_exists(self, mock_file, mock_e...
    method test_get_npu_versions_info_file_not_exists (line 208) | def test_get_npu_versions_info_file_not_exists(self, mock_file, mock_e...
    method test_get_npu_versions_missing_cann_version (line 226) | def test_get_npu_versions_missing_cann_version(self, mock_file, mock_e...

FILE: tests/utils/test_check_profiler_output.py
  class DeviceCheckConfig (line 29) | class DeviceCheckConfig:
  class ProfilerChecker (line 40) | class ProfilerChecker:
    method __init__ (line 45) | def __init__(self, device_type: str, profiler_dir: str):
    method _init_device_config (line 56) | def _init_device_config(self):
    method _validate_stage_dirs (line 80) | def _validate_stage_dirs(self, stage: str) -> bool:
    method check (line 104) | def check(self) -> bool:
  function parse_args (line 122) | def parse_args():
  function main (line 141) | def main():

FILE: tests/utils/test_config_on_cpu.py
  class TestDataclass (line 25) | class TestDataclass(BaseConfig):
  class TestTrainConfig (line 31) | class TestTrainConfig(BaseConfig):
  class TestConfigOnCPU (line 46) | class TestConfigOnCPU(unittest.TestCase):
    method setUp (line 55) | def setUp(self):
    method test_omega_conf_to_dataclass (line 58) | def test_omega_conf_to_dataclass(self):
    method test_nested_omega_conf_to_dataclass (line 65) | def test_nested_omega_conf_to_dataclass(self):
  class TestPrintCfgCommand (line 74) | class TestPrintCfgCommand(unittest.TestCase):
    method test_command_with_override (line 77) | def test_command_with_override(self):

FILE: tests/utils/test_flops_counter.py
  class Config (line 24) | class Config:
    method __init__ (line 25) | def __init__(self, config_dict):
  function test_flops_counter (line 454) | def test_flops_counter(config_type: str):

FILE: tests/utils/test_fs_on_cpu.py
  function test_record_and_check_directory_structure (line 21) | def test_record_and_check_directory_structure(tmp_path):
  function test_copy_from_hdfs_with_mocks (line 43) | def test_copy_from_hdfs_with_mocks(tmp_path, monkeypatch):
  function test_always_recopy_flag (line 66) | def test_always_recopy_flag(tmp_path, monkeypatch):

FILE: tests/utils/test_fsdp2_peft_wrapping.py
  class MockDecoderLayer (line 30) | class MockDecoderLayer(nn.Module):
    method __init__ (line 33) | def __init__(self, hidden_size=64):
  class MockModulesToSaveWrapper (line 39) | class MockModulesToSaveWrapper(nn.Module):
    method __init__ (line 46) | def __init__(self, original_module):
  class MockCausalLM (line 52) | class MockCausalLM(nn.Module):
    method __init__ (line 57) | def __init__(self, vocab_size=1000, hidden_size=64, num_layers=2, tie_...
  class TestFSDP2PeftWrapping (line 69) | class TestFSDP2PeftWrapping(unittest.TestCase):
    method _get_wrapped_names (line 72) | def _get_wrapped_names(self, model, cls_names):
    method test_vanilla_model_wraps_layers_and_embedding (line 79) | def test_vanilla_model_wraps_layers_and_embedding(self):
    method test_peft_wrapped_model_wraps_embed_tokens_by_name (line 89) | def test_peft_wrapped_model_wraps_embed_tokens_by_name(self):
    method test_tied_embeddings_skips_name_based_wrapping (line 101) | def test_tied_embeddings_skips_name_based_wrapping(self):
    method test_peft_wrapped_tied_embeddings_skips_wrapping (line 110) | def test_peft_wrapped_tied_embeddings_skips_wrapping(self):
    method test_no_duplicate_wrapping_for_vanilla_embedding (line 121) | def test_no_duplicate_wrapping_for_vanilla_embedding(self):

FILE: tests/utils/test_fsdp_lora_merge.py
  function _test_merged_lora_context_worker (line 36) | def _test_merged_lora_context_worker(
  function test_merged_lora_context_qwen2 (line 161) | def test_merged_lora_context_qwen2(world_size, strategy, backup_adapters...
  function test_merged_lora_context_gptoss (line 190) | def test_merged_lora_context_gptoss(world_size, strategy, backup_adapter...

FILE: tests/utils/test_groupwise.py
  function test_as_torch_index_basic_integers (line 27) | def test_as_torch_index_basic_integers():
  function test_as_torch_index_near_integer_floats (line 36) | def test_as_torch_index_near_integer_floats():
  function test_as_torch_index_factorization_mixed (line 43) | def test_as_torch_index_factorization_mixed():
  function test_group_mean_std_simple (line 51) | def test_group_mean_std_simple():
  function test_group_mean_std_empty (line 68) | def test_group_mean_std_empty():
  function test_group_mean_std_default_device_no_force_env (line 75) | def test_group_mean_std_default_device_no_force_env(monkeypatch):

FILE: tests/utils/test_import_utils_on_cpu.py
  function test_load_extern_object_class (line 25) | def test_load_extern_object_class():
  function test_load_extern_object_function (line 42) | def test_load_extern_object_function():
  function test_load_extern_object_constant (line 55) | def test_load_extern_object_constant():
  function test_load_extern_object_nonexistent_file (line 64) | def test_load_extern_object_nonexistent_file():
  function test_load_extern_object_nonexistent_type (line 70) | def test_load_extern_object_nonexistent_type():
  function test_load_extern_object_none_path (line 76) | def test_load_extern_object_none_path():
  function test_load_extern_object_invalid_module (line 82) | def test_load_extern_object_invalid_module():

FILE: tests/utils/test_linear_cross_entropy.py
  function run_torch_entropy (line 49) | def run_torch_entropy(
  function run_verl_original_entropy (line 65) | def run_verl_original_entropy(
  function run_verl_torch_fused_entropy (line 83) | def run_verl_torch_fused_entropy(
  class TestLinearCrossEntropy (line 100) | class TestLinearCrossEntropy:
    method __init__ (line 101) | def __init__(self, test_case_idx: int, temperature: float = 1.5) -> None:
    method cleanup (line 105) | def cleanup(self):
    method generate_hyper (line 113) | def generate_hyper(self):
    method generate_forward_inputs (line 146) | def generate_forward_inputs(self):
    method generate_backward_inputs (line 160) | def generate_backward_inputs(self):
    method verify_correctness (line 165) | def verify_correctness(self, iterations=5):
    method check_storage (line 323) | def check_storage(self, method_name, run_forward):
    method check_storage_all (line 345) | def check_storage_all(self):
  function test_lce_non_divisible_vocab_padding (line 352) | def test_lce_non_divisible_vocab_padding():

FILE: tests/utils/test_mlflow_key_sanitization.py
  class TestMlflowLoggingAdapter (line 21) | class TestMlflowLoggingAdapter(unittest.TestCase):
    method test_sanitize_key_and_warning (line 22) | def test_sanitize_key_and_warning(self):

FILE: tests/utils/test_model_on_cpu.py
  function test_update_model_config (line 30) | def test_update_model_config(override_kwargs):

FILE: tests/utils/test_normalize_peft_param_name.py
  function _test_normalize_peft_with_fsdp_worker (line 37) | def _test_normalize_peft_with_fsdp_worker(rank, world_size, rendezvous_f...
  function test_normalize_peft_param_name_with_fsdp (line 200) | def test_normalize_peft_param_name_with_fsdp(world_size, strategy, tmp_p...

FILE: tests/utils/test_normalize_peft_param_name_on_cpu.py
  function create_base_model (line 23) | def create_base_model():
  function create_peft_model (line 36) | def create_peft_model():
  function base_model (line 46) | def base_model():
  function peft_model (line 52) | def peft_model():
  function test_normalize_peft_param_name_keys_match_base_model (line 57) | def test_normalize_peft_param_name_keys_match_base_model():
  function test_normalize_peft_param_name_removes_lora_keys (line 86) | def test_normalize_peft_param_name_removes_lora_keys(peft_model):
  function test_normalize_peft_param_name_removes_base_model_prefix (line 102) | def test_normalize_peft_param_name_removes_base_model_prefix(peft_model):
  function test_normalize_peft_param_name_removes_base_layer_suffix (line 118) | def test_normalize_peft_param_name_removes_base_layer_suffix(peft_model):
  function test_normalize_peft_param_name_tensor_shapes_match (line 134) | def test_normalize_peft_param_name_tensor_shapes_match(base_model, peft_...
  function test_normalize_peft_param_name_empty_dict (line 150) | def test_normalize_peft_param_name_empty_dict():
  function test_normalize_peft_param_name_filters_lora_patterns (line 165) | def test_normalize_peft_param_name_filters_lora_patterns(lora_key_pattern):

FILE: tests/utils/test_nvtx_profile.py
  class TestProfilerConfig (line 24) | class TestProfilerConfig(unittest.TestCase):
    method test_config_init (line 25) | def test_config_init(self):
    method test_frozen_config (line 51) | def test_frozen_config(self):
  class TestNsightSystemsProfiler (line 73) | class TestNsightSystemsProfiler(unittest.TestCase):
    method setUp (line 84) | def setUp(self):
    method test_initialization (line 89) | def test_initialization(self):
    method test_start_stop_profiling (line 93) | def test_start_stop_profiling(self):
    method test_annotate_decorator (line 118) | def test_annotate_decorator(self):

FILE: tests/utils/test_padding_on_cpu.py
  function test_padding_conversion_with_log_probs (line 21) | def test_padding_conversion_with_log_probs():
  function test_padding_conversion_without_log_probs (line 99) | def test_padding_conversion_without_log_probs():
  function test_padding_roundtrip (line 130) | def test_padding_roundtrip():
  function test_no_padding_2_padding_varying_lengths (line 178) | def test_no_padding_2_padding_varying_lengths():

FILE: tests/utils/test_prepare_micro_batches_with_group_size.py
  function _make_batch (line 34) | def _make_batch(seq_lens: list[int], force_group_size: int, max_token_le...
  function _verify_group_integrity (line 70) | def _verify_group_integrity(batch_idx_list: list[list[int]], force_group...
  function test_force_group_size_2_basic (line 99) | def test_force_group_size_2_basic():
  function test_force_group_size_4_basic (line 116) | def test_force_group_size_4_basic():
  function test_force_group_size_reconstruction (line 150) | def test_force_group_size_reconstruction():
  function test_force_group_size_single_micro_batch (line 174) | def test_force_group_size_single_micro_batch():
  function test_force_group_size_large_group (line 191) | def test_force_group_size_large_group():
  function test_force_group_size_1_unchanged (line 227) | def test_force_group_size_1_unchanged():

FILE: tests/utils/test_rollout_skip_on_cpu.py
  function temp_dir (line 28) | def temp_dir():
  function build_generate_fn (line 36) | def build_generate_fn(gen_bs, n):
  function mock_rollout_wg (line 56) | def mock_rollout_wg(request):
  class TestRolloutSkip (line 74) | class TestRolloutSkip:
    method test_initialization (line 75) | def test_initialization(self, capsys):
    method test_generate_without_wrap (line 95) | def test_generate_without_wrap(self, mock_rollout_wg):
    method test_dump (line 110) | def test_dump(self, mock_rollout_wg, capsys):
    method test_generate_with_wrap (line 125) | def test_generate_with_wrap(self, mock_rollout_wg, capsys):

FILE: tests/utils/test_rollout_trace_on_cpu.py
  function reset_rollout_trace_config_singleton (line 25) | def reset_rollout_trace_config_singleton():
  function mock_weave_client (line 31) | def mock_weave_client():
  class TracedClass (line 46) | class TracedClass:
    method my_method (line 50) | async def my_method(self, a, b="default"):
    method middle_method (line 56) | async def middle_method(self, a, b="default"):
    method my_method_with_exception (line 62) | async def my_method_with_exception(self):
    method upper_method (line 65) | async def upper_method(self):
  class UntracedClass (line 71) | class UntracedClass:
    method my_method (line 73) | async def my_method(self, x):
  function test_rollout_trace_on_untraced_class (line 77) | async def test_rollout_trace_on_untraced_class():
  function test_rollout_trace_with_tracer (line 83) | async def test_rollout_trace_with_tracer(mock_weave_client):
  function test_rollout_trace_with_exception (line 102) | async def test_rollout_trace_with_exception(mock_weave_client):
  function test_rollout_trace_with_dummy_backend (line 121) | async def test_rollout_trace_with_dummy_backend(mock_weave_client):
  function test_trace_disabled_with_trace_false (line 131) | async def test_trace_disabled_with_trace_false(mock_weave_client):
  function test_trace_false_disables_nested_trace_ops (line 157) | async def test_trace_false_disables_nested_trace_ops(mock_weave_client):
  function test_trace_enabled_restored_after_exception (line 182) | async def test_trace_enabled_restored_after_exception(mock_weave_client):
  function test_rollout_trace_with_real_weave_backend (line 211) | async def test_rollout_trace_with_real_weave_backend():
  function test_rollout_trace_with_real_mlflow_backend (line 232) | async def test_rollout_trace_with_real_mlflow_backend():

FILE: tests/utils/test_seqlen_balancing.py
  function test_seqlen_balancing (line 31) | def test_seqlen_balancing():
  function test_dynamic_batch (line 50) | def test_dynamic_batch():
  function _worker (line 64) | def _worker(rank, world_size, init_method, max_token_len, use_same_dp, m...
  function test_dataproto_split_uneven (line 128) | def test_dataproto_split_uneven():
  function test_seqlen_balancing_distributed_params (line 182) | def test_seqlen_balancing_distributed_params(tmp_path):
  function test_group_balanced_partitions (line 205) | def test_group_balanced_partitions():
  function test_group_balanced_partitions_single_sample_groups (line 237) | def test_group_balanced_partitions_single_sample_groups():
  function test_group_balanced_partitions_equal_size (line 254) | def test_group_balanced_partitions_equal_size():

FILE: tests/utils/test_server_profiler.py
  class TestServerProfilerArgs (line 28) | class TestServerProfilerArgs(unittest.TestCase):
    method test_build_vllm_profiler_args (line 29) | def test_build_vllm_profiler_args(self):
    method test_build_sglang_profiler_args (line 52) | def test_build_sglang_profiler_args(self):
  class TestServerProfilerFunctionality (line 63) | class TestServerProfilerFunctionality(unittest.IsolatedAsyncioTestCase):
    method test_vllm_start_stop_profile (line 64) | async def test_vllm_start_stop_profile(self):
    method test_sglang_start_stop_profile (line 93) | async def test_sglang_start_stop_profile(self):

FILE: tests/utils/test_shared_memory.py
  class TestSharedMemory (line 24) | class TestSharedMemory(unittest.TestCase):
    method setUp (line 27) | def setUp(self):
    method tearDown (line 35) | def tearDown(self):
    method test_create_shared_memory_new (line 41) | def test_create_shared_memory_new(self):
    method test_create_shared_memory_attach_existing (line 56) | def test_create_shared_memory_attach_existing(self):
    method test_rebuild_shared_memory_default_dtype (line 78) | def test_rebuild_shared_memory_default_dtype(self):
    method test_rebuild_shared_memory_custom_dtype (line 101) | def test_rebuild_shared_memory_custom_dtype(self):
    method test_shared_memory_data_integrity (line 124) | def test_shared_memory_data_integrity(self):
    method test_shared_memory_different_dtypes (line 145) | def test_shared_memory_different_dtypes(self):
    method test_shared_memory_multiple_operations (line 176) | def test_shared_memory_multiple_operations(self):
  function child_process_function (line 200) | def child_process_function(name, size, test_data_bytes):
  class TestSharedMemoryIntegration (line 230) | class TestSharedMemoryIntegration(unittest.TestCase):
    method test_cross_process_shared_memory (line 233) | def test_cross_process_shared_memory(self):

FILE: tests/utils/test_special_linear_cross_entropy_tp.py
  function run_torch_entropy (line 57) | def run_torch_entropy(
  class TorchEntropyTP (line 79) | class TorchEntropyTP(torch.autograd.Function):
    method forward (line 86) | def forward(
    method backward (line 128) | def backward(ctx, g_logprobs: torch.Tensor, g_entropy: torch.Tensor):
  class TestLinearCrossEntropy_TensorParallel (line 181) | class TestLinearCrossEntropy_TensorParallel:
    method __init__ (line 182) | def __init__(self):
    method initialize (line 192) | def initialize(self, test_case_idx: int, temperature: float = 1.5):
    method shutdown (line 196) | def shutdown(self):
    method cleanup (line 199) | def cleanup(self):
    method generate_hyper (line 207) | def generate_hyper(self):
    method generate_forward_inputs (line 242) | def generate_forward_inputs(self):
    method generate_backward_inputs (line 256) | def generate_backward_inputs(self):
    method verify_torch_itself (line 261) | def verify_torch_itself(self, iterations: int = 5):
    method check_torch_storage (line 331) | def check_torch_storage(self):
    method verify_kernel_correctness (line 364) | def verify_kernel_correctness(self, iterations: int = 5):
    method check_kernel_storage (line 455) | def check_kernel_storage(self):

FILE: tests/utils/test_special_mstx_profile.py
  class TestNPUProfilerInitialization (line 23) | class TestNPUProfilerInitialization(unittest.TestCase):
    method setUp (line 24) | def setUp(self):
    method test_init_with_default_config (line 27) | def test_init_with_default_config(self):
    method test_init_with_disabled_config (line 33) | def test_init_with_disabled_config(self):
    method test_init_with_all_ranks_true (line 39) | def test_init_with_all_ranks_true(self):
    method test_init_with_ranks_list (line 45) | def test_init_with_ranks_list(self):
    method test_init_with_rank_not_in_ranks (line 51) | def test_init_with_rank_not_in_ranks(self):
  class TestNPUProfilerStart (line 58) | class TestNPUProfilerStart(unittest.TestCase):
    method setUp (line 59) | def setUp(self):
    method test_start_when_enabled_and_this_rank (line 65) | def test_start_when_enabled_and_this_rank(self, mock_get_profiler):
    method test_start_when_not_this_rank (line 73) | def test_start_when_not_this_rank(self, mock_get_profiler):
    method test_start_discrete_mode_does_not_increase_count (line 81) | def test_start_discrete_mode_does_not_increase_count(self, mock_get_pr...
    method test_multiple_start_calls_do_not_increase_count (line 89) | def test_multiple_start_calls_do_not_increase_count(self, mock_get_pro...
  class TestNPUProfilerStartStopInteraction (line 97) | class TestNPUProfilerStartStopInteraction(unittest.TestCase):
    method setUp (line 98) | def setUp(self):
    method test_start_stop_cycle (line 104) | def test_start_stop_cycle(self, mock_get_profiler):
    method test_multiple_instances_share_define_count (line 118) | def test_multiple_instances_share_define_count(self, mock_get_profiler):
  class TestNPUProfilerAnnotate (line 132) | class TestNPUProfilerAnnotate(unittest.TestCase):
    method setUp (line 133) | def setUp(self):
    method test_annotate_decorator_applied_correctly (line 138) | def test_annotate_decorator_applied_correctly(self):
    method test_annotate_when_profiler_disabled (line 166) | def test_annotate_when_profiler_disabled(self):
    method test_annotate_when_this_step_disabled (line 189) | def test_annotate_when_this_step_disabled(self):
    method test_annotate_discrete_mode_enabled (line 212) | def test_annotate_discrete_mode_enabled(self):
    method test_annotate_with_default_message (line 250) | def test_annotate_with_default_message(self):

FILE: tests/utils/test_temp_env_on_cpu.py
  function clean_env (line 23) | def clean_env():
  function test_set_new_env_var (line 42) | def test_set_new_env_var():
  function test_restore_existing_env_var (line 56) | def test_restore_existing_env_var():
  function test_env_var_restored_on_exception (line 69) | def test_env_var_restored_on_exception():
  function test_nested_context_managers (line 85) | def test_nested_context_managers():
  function test_multiple_different_vars (line 103) | def test_multiple_different_vars():
  function test_empty_string_value (line 118) | def test_empty_string_value():
  function test_overwrite_with_empty_string (line 128) | def test_overwrite_with_empty_string():
  function test_context_manager_returns_none (line 139) | def test_context_manager_returns_none():

FILE: tests/utils/test_timeout_decorator_cpu.py
  function quick_task (line 30) | def quick_task(x):
  function slow_task (line 37) | def slow_task(x):
  function task_raises_value_error (line 44) | def task_raises_value_error():  # Now truly not globally decorated
  function top_level_decorated_quick_task_signal (line 52) | def top_level_decorated_quick_task_signal():
  function top_level_decorated_slow_task_signal (line 62) | def top_level_decorated_slow_task_signal():
  function run_target_and_put_in_queue (line 69) | def run_target_and_put_in_queue(target_func, q):
  function set_macos_start_method (line 83) | def set_macos_start_method():
  function test_quick_task (line 97) | def test_quick_task():  # Renamed from test_multiprocessing_quick_task
  function test_slow_task_timeout (line 104) | def test_slow_task_timeout():  # Renamed from test_multiprocessing_slow_...
  function test_internal_exception (line 113) | def test_internal_exception():  # Renamed from test_multiprocessing_inte...
  function test_signal_quick_task_main_process (line 127) | def test_signal_quick_task_main_process():  # Removed self
  function test_signal_slow_task_main_process_timeout (line 139) | def test_signal_slow_task_main_process_timeout():  # Removed self
  function test_signal_in_thread_does_not_timeout (line 155) | def test_signal_in_thread_does_not_timeout():
  function test_in_thread_timeout (line 200) | def test_in_thread_timeout():

FILE: tests/utils/test_tokenizer_normalize_on_cpu.py
  class DummyBatchEncoding (line 21) | class DummyBatchEncoding:
    method __init__ (line 22) | def __init__(self, input_ids):
  class DummyToList (line 26) | class DummyToList:
    method __init__ (line 27) | def __init__(self, data):
    method tolist (line 30) | def tolist(self):
  function test_normalize_token_ids_valid_outputs (line 53) | def test_normalize_token_ids_valid_outputs(tokenized_output, expected):
  function test_normalize_token_ids_invalid_outputs (line 66) | def test_normalize_token_ids_invalid_outputs(tokenized_output):

FILE: tests/utils/test_torch_functional.py
  function _worker_mean (line 31) | def _worker_mean(rank: int, world_size: int, rendezvous_file: str):
  function test_masked_mean (line 68) | def test_masked_mean(value, mask, gt):
  function test_distributed_mean_max_min_std (line 75) | def test_distributed_mean_max_min_std(world_size, tmp_path):
  function _worker_mask (line 87) | def _worker_mask(rank: int, world_size: int, rendezvous_file: str):
  function test_distributed_masked_mean (line 113) | def test_distributed_masked_mean(world_size, tmp_path):
  function test_expand_as_nested (line 125) | def test_expand_as_nested():

FILE: tests/utils/test_torch_profile.py
  class TestTorchProfile (line 24) | class TestTorchProfile(unittest.TestCase):
    method setUp (line 25) | def setUp(self):
    method test_get_torch_profiler (line 30) | def test_get_torch_profiler(self, mock_profile):
    method test_profiler_lifecycle (line 47) | def test_profiler_lifecycle(self, mock_get_profiler):
    method test_discrete_mode (line 71) | def test_discrete_mode(self, mock_get_profiler):

FILE: tests/workers/actor/test_special_dp_actor.py
  class MockTransformerModel (line 28) | class MockTransformerModel(nn.Module):
    method __init__ (line 31) | def __init__(self, vocab_size=1000, hidden_size=64):
    method forward (line 41) | def forward(self, input_ids, attention_mask=None, position_ids=None, u...
  class TestDataParallelPPOActor (line 55) | class TestDataParallelPPOActor(unittest.TestCase):
    method setUpClass (line 59) | def setUpClass(cls):
    method setUp (line 83) | def setUp(self):
    method tearDownClass (line 108) | def tearDownClass(cls):
    method _create_test_data_for_compute_log_prob (line 113) | def _create_test_data_for_compute_log_prob(self):
    method _create_test_data_for_update_policy (line 140) | def _create_test_data_for_update_policy(self):
    method test_compute_log_prob (line 173) | def test_compute_log_prob(self):
    method test_compute_log_prob_without_entropy (line 193) | def test_compute_log_prob_without_entropy(self):
    method test_update_policy (line 209) | def test_update_policy(self):
    method test_dataparallelppoactor_initialization (line 233) | def test_dataparallelppoactor_initialization(self):
    method test_dataparallelppoactor_with_qwen3_model (line 243) | def test_dataparallelppoactor_with_qwen3_model(self):

FILE: tests/workers/config/test_actor_config_on_cpu.py
  class TestActorConfig (line 27) | class TestActorConfig(unittest.TestCase):
    method test_config_inheritance (line 30) | def test_config_inheritance(self):
    method test_actor_config_from_yaml (line 66) | def test_actor_config_from_yaml(self):
    method test_fsdp_actor_config_from_yaml (line 78) | def test_fsdp_actor_config_from_yaml(self):
    method test_megatron_actor_config_from_yaml (line 90) | def test_megatron_actor_config_from_yaml(self):
    method test_config_get_method (line 102) | def test_config_get_method(self):
    method test_config_dict_like_access (line 123) | def test_config_dict_like_access(self):
    method test_frozen_fields_modification_raises_exception (line 147) | def test_frozen_fields_modification_raises_exception(self):
    method test_actor_config_validation_exceptions (line 171) | def test_actor_config_validation_exceptions(self):
    method test_fsdp_actor_config_validation_exceptions (line 217) | def test_fsdp_actor_config_validation_exceptions(self):
    method test_actor_config_validate_method_exceptions (line 233) | def test_actor_config_validate_method_exceptions(self):

FILE: tests/workers/config/test_critic_config_on_cpu.py
  class TestCriticConfig (line 34) | class TestCriticConfig:
    method config_dir (line 38) | def config_dir(self):
    method test_megatron_critic_config_instantiation_from_yaml (line 42) | def test_megatron_critic_config_instantiation_from_yaml(self, config_d...
    method test_fsdp_critic_config_instantiation_from_yaml (line 74) | def test_fsdp_critic_config_instantiation_from_yaml(self, config_dir):
    method test_config_inheritance_hierarchy (line 107) | def test_config_inheritance_hierarchy(self):
    method test_config_dict_interface (line 122) | def test_config_dict_interface(self):
    method test_frozen_fields_immutability (line 139) | def test_frozen_fields_immutability(self):
    method test_batch_size_fields_modifiable (line 162) | def test_batch_size_fields_modifiable(self):
    method test_profiler_config_type_validation (line 183) | def test_profiler_config_type_validation(self):
    method test_critic_config_validation_logic (line 211) | def test_critic_config_validation_logic(self):
    method test_micro_batch_size_divisibility_validation (line 254) | def test_micro_batch_size_divisibility_validation(self):
    method test_fsdp_sequence_parallelism_validation (line 279) | def test_fsdp_sequence_parallelism_validation(self):

FILE: tests/workers/config/test_engine_config_on_cpu.py
  class TestMcoreEngineConfig (line 20) | class TestMcoreEngineConfig:
    method test_default_values (line 21) | def test_default_values(self):
    method test_post_init_validation (line 27) | def test_post_init_validation(self):
    method test_mutable_fields (line 36) | def test_mutable_fields(self):
    method test_offload_flags (line 43) | def test_offload_flags(self, offload_field):
  class TestFSDPEngineConfigCPU (line 48) | class TestFSDPEngineConfigCPU:
    method test_default_values (line 49) | def test_default_values(self):
    method test_offload_combinations (line 59) | def test_offload_combinations(self, offload_params):
    method test_wrap_policy_configuration (line 64) | def test_wrap_policy_configuration(self):

FILE: tests/workers/config/test_model_config_on_cpu.py
  class TestHFModelConfigCPU (line 23) | class TestHFModelConfigCPU:
    method test_target_modules_accepts_list_via_omegaconf (line 26) | def test_target_modules_accepts_list_via_omegaconf(self):
    method test_target_modules_accepts_none_via_omegaconf (line 55) | def test_target_modules_accepts_none_via_omegaconf(self):
    method test_target_modules_accepts_string_via_omegaconf (line 70) | def test_target_modules_accepts_string_via_omegaconf(self):
    method test_target_modules_raises_on_invalid_type (line 85) | def test_target_modules_raises_on_invalid_type(self):

FILE: tests/workers/config/test_optim_config_on_cpu.py
  class TestFSDPOptimizerConfigCPU (line 20) | class TestFSDPOptimizerConfigCPU:
    method test_default_configuration (line 21) | def test_default_configuration(self):
    method test_valid_lr_scheduler_types (line 28) | def test_valid_lr_scheduler_types(self, lr_scheduler_type):
    method test_valid_warmup_style_types (line 33) | def test_valid_warmup_style_types(self, warmup_style):
    method test_invalid_lr_scheduler_type (line 37) | def test_invalid_lr_scheduler_type(self):
    method test_invalid_warmup_style_type (line 41) | def test_invalid_warmup_style_type(self):
    method test_num_cycles_configuration (line 46) | def test_num_cycles_configuration(self, num_cycles):

FILE: tests/workers/critic/test_special_dp_critic.py
  class TestCriticWorker (line 33) | class TestCriticWorker(unittest.TestCase):
    method setUpClass (line 35) | def setUpClass(cls):
    method tearDownClass (line 52) | def tearDownClass(cls):
    method setUp (line 57) | def setUp(self):
    method tearDown (line 88) | def tearDown(self):
    method _create_test_data_for_compute_values (line 94) | def _create_test_data_for_compute_values(self, batch_size=2, seq_len=1...
    method _create_test_data_for_update_critic (line 119) | def _create_test_data_for_update_critic(self, batch_size=2, seq_len=10...
    method test_init_model (line 149) | def test_init_model(self):
    method test_compute_values (line 159) | def test_compute_values(self):
    method test_update_critic (line 177) | def test_update_critic(self):
    method test_critic_attn_implementation_override_functionality (line 202) | def test_critic_attn_implementation_override_functionality(self, mock_...
    method test_critic_model_config_structure (line 260) | def test_critic_model_config_structure(self):
    method test_critic_hydra_config_compatibility (line 290) | def test_critic_hydra_config_compatibility(self):
    method test_critic_backward_compatibility (line 310) | def test_critic_backward_compatibility(self):
    method test_critic_and_actor_independent_configuration (line 333) | def test_critic_and_actor_independent_configuration(self):

FILE: tests/workers/reward_manager/test_registry_on_cpu.py
  function setup (line 22) | def setup():
  function test_get_existing_manager (line 29) | def test_get_existing_manager(setup):
  function test_get_nonexistent_manager (line 35) | def test_get_nonexistent_manager(setup):
  function test_case_sensitivity (line 42) | def test_case_sensitivity(setup):
  function test_empty_registry (line 50) | def test_empty_registry(setup):
  function test_register_new_class (line 58) | def test_register_new_class(setup):
  function test_register_different_classes_same_name (line 69) | def test_register_different_classes_same_name(setup):
  function test_decorator_returns_original_class (line 85) | def test_decorator_returns_original_class(setup):

FILE: tests/workers/rollout/perf/vllm_async_rollout.py
  function init_config (line 48) | def init_config(n_gpus_per_node) -> DictConfig:
  function initialize (line 77) | def initialize(config, backend) -> tuple[AgentLoopManager | RayWorkerGro...
  function perf_rollout (line 107) | def perf_rollout(mode, backend, n_gpus_per_node, num_steps):

FILE: tests/workers/rollout/rollout_sglang/test_http_server_engine.py
  function event_loop (line 63) | def event_loop():
  function basic_adapter_kwargs (line 71) | def basic_adapter_kwargs():
  function router_adapter_kwargs (line 82) | def router_adapter_kwargs():
  function non_master_adapter_kwargs (line 95) | def non_master_adapter_kwargs():
  function mock_launch_server_process (line 106) | def mock_launch_server_process():
  function mock_multiprocessing_process (line 119) | def mock_multiprocessing_process():
  function mock_requests_session (line 132) | def mock_requests_session():
  function mock_requests_post (line 148) | def mock_requests_post():
  function mock_requests_get (line 161) | def mock_requests_get():
  function mock_aiohttp_session (line 174) | def mock_aiohttp_session():
  function mock_kill_process_tree (line 193) | def mock_kill_process_tree():
  function sglang_test_model_path (line 203) | def sglang_test_model_path():
  function real_adapter_kwargs (line 215) | def real_adapter_kwargs(sglang_test_model_path):
  function mock_server_args_post_init (line 226) | def mock_server_args_post_init():
  class TestLaunchServerProcess (line 236) | class TestLaunchServerProcess:
    method test_launch_server_process_success (line 239) | def test_launch_server_process_success(
    method test_launch_server_process_non_master (line 264) | def test_launch_server_process_non_master(self, mock_multiprocessing_p...
    method test_launch_server_process_timeout (line 279) | def test_launch_server_process_timeout(self, mock_multiprocessing_proc...
    method test_launch_server_process_died (line 305) | def test_launch_server_process_died(self, real_adapter_kwargs):
  class TestHttpServerEngineAdapter (line 322) | class TestHttpServerEngineAdapter:
    method test_init_with_router_registration (line 325) | def test_init_with_router_registration(self, mock_launch_server_proces...
    method test_init_without_router (line 334) | def test_init_without_router(self, mock_launch_server_process, basic_a...
    method test_register_with_router_failure (line 342) | def test_register_with_router_failure(self, mock_launch_server_process...
    method test_make_request_success (line 353) | def test_make_request_success(self, mock_launch_server_process, basic_...
    method test_make_request_get_method (line 372) | def test_make_request_get_method(self, mock_launch_server_process, bas...
    method test_make_request_non_master (line 387) | def test_make_request_non_master(self, mock_launch_server_process):
    method test_make_request_retry_logic (line 395) | def test_make_request_retry_logic(self, mock_launch_server_process, ba...
    method test_make_request_http_error (line 414) | def test_make_request_http_error(self, mock_launch_server_process, bas...
    method test_make_request_max_attempts_exceeded (line 426) | def test_make_request_max_attempts_exceeded(self, mock_launch_server_p...
    method test_update_weights_from_tensor_strict (line 439) | def test_update_weights_from_tensor_strict(self, mock_launch_server_pr...
    method test_update_weights_from_tensor_empty (line 473) | def test_update_weights_from_tensor_empty(self, mock_launch_server_pro...
    method test_update_weights_from_tensor_none (line 502) | def test_update_weights_from_tensor_none(self, mock_launch_server_proc...
    method test_generate (line 531) | def test_generate(self, mock_launch_server_process, basic_adapter_kwar...
    method test_flush_cache (line 555) | def test_flush_cache(self, mock_launch_server_process, basic_adapter_k...
    method test_flush_cache_non_master (line 574) | def test_flush_cache_non_master(self, mock_launch_server_process):
    method test_memory_management_methods (line 582) | def test_memory_management_methods(self, mock_launch_server_process, b...
    method test_generation_control_methods (line 599) | def test_generation_control_methods(self, mock_launch_server_process, ...
    method test_shutdown (line 606) | def test_shutdown(self, mock_launch_server_process, mock_kill_process_...
    method test_shutdown_with_errors (line 622) | def test_shutdown_with_errors(self, mock_launch_server_process, mock_k...
    method test_empty_and_none_parameters (line 643) | def test_empty_and_none_parameters(self, mock_launch_server_process, b...
    method test_large_payload_handling (line 667) | def test_large_payload_handling(self, mock_launch_server_process, basi...
    method test_timeout_edge_cases (line 690) | def test_timeout_edge_cases(self, mock_launch_server_process):
    method test_extreme_configuration_values (line 702) | def test_extreme_configuration_values(self, mock_launch_server_process):
  class TestAsyncHttpServerEngineAdapter (line 721) | class TestAsyncHttpServerEngineAdapter:
    method test_init (line 724) | def test_init(self, mock_launch_server_process, basic_adapter_kwargs):
    method test_make_async_request_success (line 731) | async def test_make_async_request_success(self, mock_launch_server_pro...
    method test_make_async_request_get_method (line 764) | async def test_make_async_request_get_method(self, mock_launch_server_...
    method test_make_async_request_non_master (line 793) | async def test_make_async_request_non_master(self, mock_launch_server_...
    method test_async_generate (line 802) | async def test_async_generate(self, mock_launch_server_process, basic_...
    method test_async_memory_management (line 819) | async def test_async_memory_management(self, mock_launch_server_proces...
  class TestErrorRecovery (line 840) | class TestErrorRecovery:
    method test_flush_cache_recovery (line 843) | def test_flush_cache_recovery(self, mock_launch_server_process, basic_...
    method test_flush_cache_max_attempts (line 860) | def test_flush_cache_max_attempts(self, mock_launch_server_process, ba...
    method test_network_partition_recovery (line 872) | def test_network_partition_recovery(self, mock_launch_server_process, ...
  class TestResourceManagement (line 889) | class TestResourceManagement:
    method test_resource_cleanup_on_exception (line 892) | def test_resource_cleanup_on_exception(
    method test_multiple_shutdown_calls (line 909) | def test_multiple_shutdown_calls(self, mock_launch_server_process, bas...
  class TestDataTypeHandling (line 919) | class TestDataTypeHandling:
    method test_complex_data_structures (line 922) | def test_complex_data_structures(self, mock_launch_server_process, bas...
  class TestIntegration (line 956) | class TestIntegration:
    method test_error_scenarios (line 959) | def test_error_scenarios(self, mock_launch_server_process, basic_adapt...

FILE: tests/workers/rollout/rollout_trtllm/test_adapter.py
  class TestAsyncTRTLLMHttpAdapter (line 27) | class TestAsyncTRTLLMHttpAdapter:
    method _build_async_session (line 28) | def _build_async_session(
    method test_make_async_request_get_method (line 48) | async def test_make_async_request_get_method(self):
    method test_make_async_request_post_method (line 70) | async def test_make_async_request_post_method(self):
    method test_make_async_request_http_error (line 94) | async def test_make_async_request_http_error(self):
    method test_make_async_request_max_attempts_exceeded (line 120) | async def test_make_async_request_max_attempts_exceeded(self):
  class TestTRTLLMServerAdapter (line 135) | class TestTRTLLMServerAdapter:
    method test_init_without_device_mesh (line 136) | def test_init_without_device_mesh(self):

FILE: tests/workers/rollout/rollout_trtllm/test_async_server.py
  class TestTRTLLMReplica (line 30) | class TestTRTLLMReplica:
    method test_placement_group_with_sub_ray_resource_pool (line 31) | def test_placement_group_with_sub_ray_resource_pool(self):
    method test_placement_group_with_ray_resource_pool (line 69) | def test_placement_group_with_ray_resource_pool(self):
  class TestTRTLLMHttpServer (line 110) | class TestTRTLLMHttpServer:
    method _build_rollout_config (line 112) | def _build_rollout_config(*, response_length: int | None = None, free_...
    method _create_server (line 137) | def _create_server(rollout_config, model_config, *, name: str):
    method test_async_generate (line 169) | def test_async_generate(self):
    method test_async_memory_management (line 215) | def test_async_memory_management(self):

FILE: tests/workers/rollout/rollout_trtllm/test_trtllm_rollout_utils.py
  function create_test_image (line 35) | def create_test_image(width: int = 224, height: int = 224) -> Image.Image:
  function create_rollout_config_dict (line 47) | def create_rollout_config_dict():
  function create_model_config_dict (line 77) | def create_model_config_dict(model_path: str):
  function get_tokenizer (line 87) | def get_tokenizer(model_path: str):
  function get_processor (line 91) | def get_processor(model_path: str):
  class TestUnimodalTRTLLMRollout (line 101) | class TestUnimodalTRTLLMRollout:
    method ray_context (line 103) | def ray_context(self):
    method trtllm_replica (line 111) | def trtllm_replica(self, ray_context):
    method tokenizer (line 134) | def tokenizer(self):
    method test_unimodal_generate (line 145) | def test_unimodal_generate(self, trtllm_replica, tokenizer, prompt):
    method test_unimodal_batch_generate (line 185) | def test_unimodal_batch_generate(self, trtllm_replica, tokenizer):
  class TestMultimodalTRTLLMRollout (line 230) | class TestMultimodalTRTLLMRollout:
    method ray_context (line 232) | def ray_context(self):
    method trtllm_vlm_replica (line 240) | def trtllm_vlm_replica(self, ray_context):
    method tokenizer (line 263) | def tokenizer(self):
    method processor (line 267) | def processor(self):
    method test_multimodal_generate_with_image (line 278) | def test_multimodal_generate_with_image(self, trtllm_vlm_replica, proc...
    method test_multimodal_different_image_sizes (line 336) | def test_multimodal_different_image_sizes(self, trtllm_vlm_replica, pr...
    method test_multimodal_text_only_fallback (line 376) | def test_multimodal_text_only_fallback(self, trtllm_vlm_replica, token...
  class TestTRTLLMServerLifecycle (line 413) | class TestTRTLLMServerLifecycle:
    method ray_context (line 415) | def ray_context(self):
    method trtllm_replica_lifecycle (line 423) | def trtllm_replica_lifecycle(self, ray_context):
    method tokenizer (line 446) | def tokenizer(self):
    method test_wake_sleep_cycle (line 449) | def test_wake_sleep_cycle(self, trtllm_replica_lifecycle, tokenizer):

FILE: tests/workers/rollout/rollout_vllm/run_fsdp_vllm.py
  function _pre_process_inputs (line 30) | def _pre_process_inputs(pad_token_id, prompt_token_ids: torch.Tensor) ->...
  function main (line 36) | def main():

FILE: tests/workers/rollout/rollout_vllm/test_vllm_abort.py
  function test_vllm_abort (line 29) | def test_vllm_abort():

FILE: tests/workers/rollout/test_hf_rollout.py
  function prepare_input_dataproto (line 48) | def prepare_input_dataproto(tokenizer, config, validate):
  function prepare_fsdp_model (line 75) | def prepare_fsdp_model(model, world_size):
  function test_hf_rollout (line 100) | def test_hf_rollout(n: int = 1, do_sample: bool = True, validate: bool =...

FILE: tests/workers/rollout/test_sglang_async_rollout_multimodal_delta.py
  function _test_add_tool_response_messages_image_delta (line 31) | def _test_add_tool_response_messages_image_delta(processor, image_list, ...
  function test_add_tool_response_messages_image_delta (line 157) | def test_add_tool_response_messages_image_delta():
  function test_add_tool_response_messages_image_delta_resize_image (line 179) | def test_add_tool_response_messages_image_delta_resize_image():

FILE: tests/workers/rollout/test_sglang_rollout_sharding_manager.py
  function test_get_named_tensor_buckets (line 50) | def test_get_named_tensor_buckets(named_tensors, bucket_size_mb, gt_grou...

FILE: tests/workers/rollout/test_vllm_cli_args_on_cpu.py
  class TestBuildCliArgsFromConfig (line 22) | class TestBuildCliArgsFromConfig:
    method test_string_value (line 25) | def test_string_value(self):
    method test_integer_value (line 31) | def test_integer_value(self):
    method test_float_value (line 37) | def test_float_value(self):
    method test_bool_true (line 43) | def test_bool_true(self):
    method test_bool_false (line 49) | def test_bool_false(self):
    method test_none_value (line 55) | def test_none_value(self):
    method test_list_values (line 61) | def test_list_values(self):
    method test_empty_list (line 67) | def test_empty_list(self):
    method test_list_with_strings (line 73) | def test_list_with_strings(self):
    method test_dict_value (line 79) | def test_dict_value(self):
    method test_mixed_config (line 87) | def test_mixed_config(self):
    method test_preserves_order (line 113) | def test_preserves_order(self):
    method test_empty_config (line 119) | def test_empty_config(self):
    method test_single_element_list (line 125) | def test_single_element_list(self):

FILE: tests/workers/test_fsdp_attn_implementation.py
  class TestFSDPAttnImplementation (line 43) | class TestFSDPAttnImplementation:
    method test_attn_implementation_extraction_logic (line 46) | def test_attn_implementation_extraction_logic(self):
    method test_attn_implementation_passed_to_autoconfig (line 71) | def test_attn_implementation_passed_to_autoconfig(self, mock_model_fro...
    method test_attn_implementation_passed_to_model (line 109) | def test_attn_implementation_passed_to_model(self, mock_model_from_pre...
    method test_override_config_integration (line 144) | def test_override_config_integration(self):
    method test_hydra_plus_prefix_config (line 165) | def test_hydra_plus_prefix_config(self):
    method test_backward_compatibility (line 194) | def test_backward_compatibility(self):
    method test_critic_attn_implementation_extraction_logic (line 214) | def test_critic_attn_implementation_extraction_logic(self):
    method test_critic_attn_implementation_passed_to_autoconfig (line 238) | def test_critic_attn_implementation_passed_to_autoconfig(self, mock_co...
    method test_critic_override_config_integration (line 277) | def test_critic_override_config_integration(self):
    method test_critic_hydra_plus_prefix_config (line 302) | def test_critic_hydra_plus_prefix_config(self):
    method test_both_actor_and_critic_configuration (line 331) | def test_both_actor_and_critic_configuration(self):
    method test_critic_backward_compatibility (line 358) | def test_critic_backward_compatibility(self):
  function test_attn_implementation_fix_integration (line 379) | def test_attn_implementation_fix_integration():
  function test_critic_attn_implementation_fix_integration (line 411) | def test_critic_attn_implementation_fix_integration():
  function test_complete_training_configuration (line 438) | def test_complete_training_configuration():

FILE: tests/workers/test_fsdp_workers.py
  function test_actor_rollout_ref_worker_actor_ref_model (line 21) | def test_actor_rollout_ref_worker_actor_ref_model():

FILE: verl/__init__.py
  function _sync_all_patch (line 90) | def _sync_all_patch(self):

FILE: verl/base_config.py
  class BaseConfig (line 22) | class BaseConfig(collections.abc.Mapping):
    method __setattr__ (line 33) | def __setattr__(self, name: str, value):
    method get (line 40) | def get(self, key: str, default: Any = None) -> Any:
    method __getitem__ (line 55) | def __getitem__(self, key: str):
    method __iter__ (line 70) | def __iter__(self):
    method __len__ (line 79) | def __len__(self):

FILE: verl/checkpoint_engine/base.py
  class TensorMeta (line 30) | class TensorMeta(TypedDict):
  class CheckpointEngineRegistry (line 37) | class CheckpointEngineRegistry:
    method register (line 42) | def register(backend: str):
    method get (line 56) | def get(cls, backend: str) -> type["CheckpointEngine"]:
    method new (line 68) | def new(cls, backend: str, *args, **kwargs) -> "CheckpointEngine":
  class CheckpointEngine (line 84) | class CheckpointEngine(ABC):
    method prepare (line 99) | def prepare(self) -> dict[str, Any]:
    method build_topology (line 116) | def build_topology(
    method init_process_group (line 143) | def init_process_group(self, **kwargs):
    method finalize (line 152) | def finalize(self):
    method send_weights (line 162) | async def send_weights(self, weights: Generator[tuple[str, torch.Tenso...
    method receive_weights (line 171) | async def receive_weights(self) -> Generator[tuple[str, torch.Tensor],...
  class CheckpointEngineWithCache (line 180) | class CheckpointEngineWithCache(CheckpointEngine):
    method get_weights (line 188) | async def get_weights(self) -> Generator[tuple[str, torch.Tensor], Non...
  class ColocatedCheckpointEngine (line 198) | class ColocatedCheckpointEngine(CheckpointEngine):
    method __init__ (line 209) | def __init__(self, bucket_size: int, is_master: bool = False) -> None:
    method prepare (line 213) | def prepare(self):
    method init_process_group (line 216) | def init_process_group(self, **kwargs):
    method finalize (line 219) | def finalize(self):
    method build_topology (line 223) | def build_topology(cls, *args, **kwargs):
    method send_weights (line 226) | def send_weights(self, weights: Generator[tuple[str, torch.Tensor], No...
    method receive_weights (line 234) | def receive_weights(self) -> Generator[tuple[str, torch.Tensor], None,...
  class CheckpointEngineWorker (line 244) | class CheckpointEngineWorker(Worker):
    method __init__ (line 253) | def __init__(
    method update_weights (line 286) | async def update_weights(self, global_steps: int = None):
    method execute_checkpoint_engine (line 291) | def execute_checkpoint_engine(self, method: str, *args, **kwargs):
    method get_replica_rank (line 295) | def get_replica_rank(self) -> int:
    method is_leader_rank (line 300) | def is_leader_rank(self) -> bool:
  class CheckpointEngineManager (line 308) | class CheckpointEngineManager:
    method __init__ (line 337) | def __init__(
    method build_process_group (line 349) | def build_process_group(self, rollout: RayWorkerGroup):
    method add_replicas (line 376) | def add_replicas(self, replicas: list[RolloutReplica]):
    method remove_replicas (line 384) | def remove_replicas(self, replicas: list[RolloutReplica]):
    method sleep_replicas (line 394) | async def sleep_replicas(self):
    method wake_up_replicas (line 399) | async def wake_up_replicas(self):
    method update_weights (line 404) | async def update_weights(self, global_steps: int = None):

FILE: verl/checkpoint_engine/hccl_checkpoint_engine.py
  class MasterMetadata (line 38) | class MasterMetadata:
  class BroadcastOperation (line 45) | class BroadcastOperation:
    method __init__ (line 57) | def __init__(
    method _run (line 75) | def _run(self):
    method wait_for_complete (line 87) | async def wait_for_complete(self) -> dict[str, TensorMeta]:
  class HCCLCheckpointEngine (line 97) | class HCCLCheckpointEngine(CheckpointEngine):
    method __init__ (line 109) | def __init__(
    method prepare (line 131) | def prepare(self) -> MasterMetadata:
    method finalize (line 141) | def finalize(self):
    method build_topology (line 155) | def build_topology(cls, trainer_world_size: int, rollout_world_size: i...
    method _start_zmq_server (line 168) | def _start_zmq_server(self):
    method _connect_zmq_client (line 182) | def _connect_zmq_client(self, metadata: MasterMetadata):
    method init_process_group (line 195) | def init_process_group(self, rank: int, world_size: int, master_metada...
    method send_weights (line 230) | async def send_weights(self, weights: Generator[tuple[str, torch.Tenso...
    method receive_weights (line 303) | async def receive_weights(self) -> AsyncGenerator[tuple[str, torch.Ten...

FILE: verl/checkpoint_engine/kimi_checkpoint_engine.py
  function ckpt_get_named_tensor_buckets (line 37) | def ckpt_get_named_tensor_buckets(
  function receive_tensor (line 66) | async def receive_tensor(
  class MasterMetadata (line 176) | class MasterMetadata:
  class BroadcastOperation (line 183) | class BroadcastOperation:
    method __init__ (line 193) | def __init__(
    method _run (line 208) | def _run(self):
    method wait_for_complete (line 212) | async def wait_for_complete(self) -> list[ParameterMeta]:
  class KIMICheckpointEngine (line 223) | class KIMICheckpointEngine(CheckpointEngine):
    method __init__ (line 234) | def __init__(
    method prepare (line 248) | def prepare(self) -> MasterMetadata:
    method finalize (line 259) | def finalize(self):
    method build_topology (line 268) | def build_topology(cls, trainer_world_size: int, rollout_world_size: i...
    method init_process_group (line 285) | def init_process_group(
    method send_weights (line 321) | async def send_weights(self, weights: Generator[tuple[str, torch.Tenso...
    method receive_weights (line 362) | async def receive_weights(self) -> AsyncGenerator[tuple[str, torch.Ten...

FILE: verl/checkpoint_engine/mooncake_checkpoint_engine.py
  class MooncakeCheckpointEngine (line 35) | class MooncakeCheckpointEngine(CheckpointEngine):
    method __init__ (line 45) | def __init__(
    method prepare (line 88) | def prepare(self) -> dict[str, Any]:
    method build_topology (line 98) | def build_topology(cls, trainer_world_size: int, rollout_world_size: i...
    method init_process_group (line 111) | def init_process_group(self, rank: int, world_size: int, metadata: dic...
    method finalize (line 135) | def finalize(self):
    method wait_for_complete (line 142) | async def wait_for_complete(self, buf: torch.Tensor):
    method send_weights (line 150) | async def send_weights(self, weights: Generator[tuple[str, torch.Tenso...
    method receive_weights (line 222) | async def receive_weights(self) -> AsyncGenerator[tuple[str, torch.Ten...

FILE: verl/checkpoint_engine/nccl_checkpoint_engine.py
  class MasterMetadata (line 38) | class MasterMetadata:
  class BroadcastOperation (line 43) | class BroadcastOperation:
    method __init__ (line 55) | def __init__(
    method _run (line 74) | def _run(self):
    method wait_for_complete (line 86) | async def wait_for_complete(self) -> dict[str, TensorMeta]:
  class NCCLCheckpointEngine (line 97) | class NCCLCheckpointEngine(CheckpointEngine):
    method __init__ (line 109) | def __init__(
    method prepare (line 128) | def prepare(self) -> MasterMetadata:
    method finalize (line 140) | def finalize(self):
    method build_topology (line 154) | def build_topology(cls, trainer_world_size: int, rollout_world_size: i...
    method _start_zmq_server (line 167) | def _start_zmq_server(self):
    method _connect_zmq_client (line 181) | def _connect_zmq_client(self, metadata: MasterMetadata):
    method init_process_group (line 194) | def init_process_group(self, rank: int, world_size: int, master_metada...
    method send_weights (line 224) | async def send_weights(self, weights: Generator[tuple[str, torch.Tenso...
    method receive_weights (line 297) | async def receive_weights(self) -> AsyncGenerator[tuple[str, torch.Ten...

FILE: verl/checkpoint_engine/nixl_checkpoint_engine.py
  class NixlAgentMetadata (line 42) | class NixlAgentMetadata:
  class NixlAgent (line 49) | class NixlAgent:
    method __init__ (line 54) | def __init__(self):
    method __getattr__ (line 63) | def __getattr__(self, name):
    method get_agent_metadata (line 75) | def get_agent_metadata(self) -> NixlAgentMetadata:
    method start_zmq_server (line 83) | def start_zmq_server(self):
    method add_remote_agent (line 97) | def add_remote_agent(self, metadata: NixlAgentMetadata) -> str:
    method remove_remote_agent (line 113) | def remove_remote_agent(self, agent_name: str):
    method send_message (line 118) | def send_message(self, agent_name, message: dict):
    method read_message (line 122) | async def read_message(self, agent_name: str) -> dict:
    method get_notification (line 128) | async def get_notification(self, remote_name: str) -> bytes:
  class ReadableOperation (line 137) | class ReadableOperation:
    method __init__ (line 150) | def __init__(
    method wait_for_complete (line 164) | async def wait_for_complete(self):
  class ReadOperation (line 171) | class ReadOperation:
    method __init__ (line 184) | def __init__(self, agent: NixlAgent, remote_agent: str, local_descs: n...
    method read_metadata (line 194) | async def read_metadata(self) -> dict:
    method begin_read (line 205) | def begin_read(self):
    method wait_for_complete (line 215) | async def wait_for_complete(self):
  class NIXLCheckpointEngine (line 233) | class NIXLCheckpointEngine(CheckpointEngine):
    method __init__ (line 246) | def __init__(
    method prepare (line 259) | def prepare(self) -> NixlAgentMetadata:
    method build_topology (line 283) | def build_topology(cls, trainer_world_size: int, rollout_world_size: i...
    method init_process_group (line 301) | def init_process_group(
    method finalize (line 343) | def finalize(self):
    method send_weights (line 365) | async def send_weights(self, weights: Generator[tuple[str, torch.Tenso...
    method receive_weights (line 435) | async def receive_weights(self) -> AsyncGenerator[tuple[str, torch.Ten...

FILE: verl/experimental/agent_loop/agent_loop.py
  class GlobalRequestLoadBalancer (line 58) | class GlobalRequestLoadBalancer:
    method __init__ (line 61) | def __init__(self, server_actor_ids: list[str], max_cache_size: int = ...
    method acquire_server (line 68) | def acquire_server(self, request_id: str) -> str:
    method release_server (line 82) | def release_server(self, server_id: str) -> None:
  function _get_rollout_and_model_config (line 91) | def _get_rollout_and_model_config(config: DictConfig) -> tuple[DictConfi...
  class AsyncLLMServerManager (line 99) | class AsyncLLMServerManager:
    method __init__ (line 106) | def __init__(
    method _acquire_server (line 123) | async def _acquire_server(self, request_id: str) -> tuple[str, ray.act...
    method _release_server (line 130) | def _release_server(self, server_id: str) -> None:
    method generate (line 136) | async def generate(
  class AgentLoopMetrics (line 169) | class AgentLoopMetrics(BaseModel):
  class AgentLoopOutput (line 177) | class AgentLoopOutput(BaseModel):
  class _InternalAgentLoopOutput (line 202) | class _InternalAgentLoopOutput(AgentLoopOutput):
  class DictConfigWrap (line 229) | class DictConfigWrap:
    method __init__ (line 232) | def __init__(self, config: DictConfig):
  class AgentLoopBase (line 236) | class AgentLoopBase(ABC):
    method __init__ (line 249) | def __init__(
    method process_vision_info (line 270) | async def process_vision_info(self, messages: list[dict]) -> dict:
    method apply_chat_template (line 291) | async def apply_chat_template(
    method run (line 360) | async def run(self, sampling_params: dict[str, Any], **kwargs) -> Agen...
  function register (line 381) | def register(agent_name: str):
  class AgentLoopWorker (line 392) | class AgentLoopWorker:
    method __init__ (line 401) | def __init__(
    method generate_sequences (line 454) | async def generate_sequences(self, batch: DataProto) -> DataProto:
    method _run_agent_loop (line 535) | async def _run_agent_loop(
    method _agent_loop_postprocess (line 569) | async def _agent_loop_postprocess(self, output, **kwargs) -> _Internal...
    method _compute_multi_modal_inputs (line 693) | def _compute_multi_modal_inputs(self, output, input_ids) -> dict[str, ...
    method _compute_position_ids (line 728) | def _compute_position_ids(self, input_ids, attention_mask, multi_modal...
    method _compute_score (line 759) | async def _compute_score(self, output, prompts, responses, attention_m...
    method _postprocess (line 789) | def _postprocess(
  function get_trajectory_info (line 880) | async def get_trajectory_info(step, index, validate):
  class AgentLoopManager (line 902) | class AgentLoopManager:
    method __init__ (line 915) | def __init__(
    method create (line 938) | async def create(
    method _initialize_llm_servers (line 952) | async def _initialize_llm_servers(self):
    method _init_agent_loop_workers (line 999) | async def _init_agent_loop_workers(self):
    method _init_global_load_balancer (line 1023) | async def _init_global_load_balancer(self) -> None:
    method generate_sequences (line 1030) | async def generate_sequences(self, prompts: DataProto) -> DataProto:
    method _performance_metrics (line 1056) | def _performance_metrics(self, metrics: list[list[dict[str, str]]], ou...
    method clear_kv_cache (line 1084) | async def clear_kv_cache(self):
    method start_profile (line 1089) | async def start_profile(self, **kwargs):
    method stop_profile (line 1094) | async def stop_profile(self):

FILE: verl/experimental/agent_loop/prometheus_utils.py
  function update_prometheus_config (line 28) | def update_prometheus_config(config: PrometheusConfig, server_addresses:...

FILE: verl/experimental/agent_loop/single_turn_agent_loop.py
  class SingleTurnAgentLoop (line 28) | class SingleTurnAgentLoop(AgentLoopBase):
    method __init__ (line 31) | def __init__(self, *args, **kwargs):
    method run (line 36) | async def run(self, sampling_params: dict[str, Any], **kwargs) -> Agen...

FILE: verl/experimental/agent_loop/tool_agent_loop.py
  class AgentState (line 44) | class AgentState(Enum):
  class AgentData (line 52) | class AgentData:
    method __init__ (line 56) | def __init__(
  class ToolAgentLoop (line 96) | class ToolAgentLoop(AgentLoopBase):
    method __init__ (line 97) | def __init__(self, *args, **kwargs):
    method run (line 124) | async def run(self, sampling_params: dict[str, Any], **kwargs) -> Agen...
    method _handle_pending_state (line 203) | async def _handle_pending_state(self, agent_data: AgentData, sampling_...
    method _handle_generating_state (line 214) | async def _handle_generating_state(
    method _handle_processing_tools_state (line 281) | async def _handle_processing_tools_state(self, agent_data: AgentData) ...
    method _handle_interacting_state (line 384) | async def _handle_interacting_state(self, agent_data: AgentData) -> Ag...
    method _call_tool (line 421) | async def _call_tool(
    method _initialize_interactions (line 471) | def _initialize_interactions(self, interaction_config_file):

FILE: verl/experimental/agent_loop/tool_parser.py
  class FunctionCall (line 31) | class FunctionCall(BaseModel):
  class ToolParser (line 44) | class ToolParser(ABC):
    method __init__ (line 47) | def __init__(self, tokenizer) -> None:
    method extract_tool_calls (line 51) | async def extract_tool_calls(
    method get_tool_parser (line 66) | def get_tool_parser(cls, name: str, tokenizer):
    method register (line 72) | def register(cls, name: str):
  class HermesToolParser (line 81) | class HermesToolParser(ToolParser):
    method __init__ (line 84) | def __init__(self, tokenizer) -> None:
    method extract_tool_calls (line 92) | async def extract_tool_calls(
  class GptOssToolParser (line 117) | class GptOssToolParser(ToolParser):
    method __init__ (line 126) | def __init__(self, tokenizer) -> None:
    method extract_tool_calls (line 141) | async def extract_tool_calls(
  class Qwen3XMLToolParser (line 174) | class Qwen3XMLToolParser(ToolParser):
    method __init__ (line 183) | def __init__(self, tokenizer):
    method _parse_xml_function_call (line 195) | def _parse_xml_function_call(
    method _get_function_calls (line 299) | def _get_function_calls(self, model_output: str) -> list[str]:
    method extract_tool_calls (line 316) | async def extract_tool_calls(

FILE: verl/experimental/agent_loop/utils.py
  function resolve_config_path (line 19) | def resolve_config_path(config_path: str) -> str:
  function format_gpt_oss_tool_response_manually (line 78) | def format_gpt_oss_tool_response_manually(tool_response: str, tool_call_...
  function add_generation_prompt_for_gpt_oss (line 90) | def add_generation_prompt_for_gpt_oss(message_content: str) -> str:
  function build_gpt_oss_tool_response_text (line 101) | def build_gpt_oss_tool_response_text(messages: list[dict[str, Any]], too...

FILE: verl/experimental/dataset/sampler.py
  class AbstractSampler (line 23) | class AbstractSampler(Sampler[int]):
    method __init__ (line 27) | def __init__(
  class AbstractCurriculumSampler (line 35) | class AbstractCurriculumSampler(AbstractSampler):
    method update (line 39) | def update(self, batch: DataProto) -> None:

FILE: verl/experimental/dynamic_dataset/dynamicgen_dataset.py
  class AbstractDataGenerator (line 38) | class AbstractDataGenerator(ABC):
    method __init__ (line 39) | def __init__(self, config: DictConfig):
    method generate (line 43) | def generate(self, dataset: Dataset) -> datasets.Dataset:
  class MockDataGenerator (line 54) | class MockDataGenerator(AbstractDataGenerator):
    method __init__ (line 60) | def __init__(self, config: DictConfig = None):
    method generate (line 63) | def generate(self, dataset: Dataset) -> datasets.Dataset:
  class DynamicGenDataset (line 68) | class DynamicGenDataset(RLHFDataset):
    method __init__ (line 74) | def __init__(
    method append_dataframe (line 100) | def append_dataframe(self, new_dataframe: datasets.Dataset):
    method on_batch_end (line 106) | def on_batch_end(self, batch: DataProto) -> None:

FILE: verl/experimental/fully_async_policy/agent_loop/agent_loop.py
  class FullyAsyncLLMServerManager (line 40) | class FullyAsyncLLMServerManager(AsyncLLMServerManager):
    method generate (line 46) | async def generate(
  class FullyAsyncAgentLoopWorker (line 127) | class FullyAsyncAgentLoopWorker(AgentLoopWorker):
    method __init__ (line 128) | def __init__(
  class FullyAsyncAgentLoopManager (line 139) | class FullyAsyncAgentLoopManager(AgentLoopManager):
    method __init__ (line 140) | def __init__(
    method generate_sequences_single (line 151) | async def generate_sequences_single(self, prompts: DataProto) -> DataP...
    method _select_best_worker (line 163) | def _select_best_worker(self):

FILE: verl/experimental/fully_async_policy/detach_utils.py
  class RolloutSample (line 28) | class RolloutSample:
  class ValidateMetrics (line 43) | class ValidateMetrics:
  function prepare_single_generation_data (line 50) | def prepare_single_generation_data(batch_dict, config) -> DataProto:
  function addition_process (line 84) | def addition_process(output: DataProto):
  function assemble_batch_from_rollout_samples (line 94) | def assemble_batch_from_rollout_samples(
  class MetricsAggregator (line 189) | class MetricsAggregator:
    method __init__ (line 192) | def __init__(self, total_gpus: int):
    method _init_aggregation_rules (line 207) | def _init_aggregation_rules(self) -> dict[str, dict[str, list[str]]]:
    method add_step_metrics (line 225) | def add_step_metrics(self, metrics: dict[str, Any], sample_count: int,...
    method _get_aggregation_type (line 241) | def _get_aggregation_type(self, metric_name: str) -> str:
    method _aggregate_single_metric (line 263) | def _aggregate_single_metric(self, metric_name: str, values: list[floa...
    method get_aggregated_metrics (line 302) | def get_aggregated_metrics(self) -> dict[str, Any]:
    method _special_metrics_aggergate (line 321) | def _special_metrics_aggergate(self, aggregated: dict[str, Any]) -> di...
    method reset (line 341) | def reset(self):
    method get_current_stats (line 348) | def get_current_stats(self) -> dict[str, Any]:
  function task_exception_handler (line 358) | def task_exception_handler(task: asyncio.Task):
  function safe_create_task (line 369) | def safe_create_task(coro, name: str, task_set: set = None):

FILE: verl/experimental/fully_async_policy/fully_async_main.py
  class FullyAsyncTaskRunner (line 35) | class FullyAsyncTaskRunner:
    method __init__ (line 40) | def __init__(self):
    method run (line 45) | def run(self, config):
    method _initialize_components (line 50) | def _initialize_components(self, config) -> None:
    method _create_rollouter (line 118) | def _create_rollouter(self, config) -> None:
    method _create_trainer (line 136) | def _create_trainer(self, config) -> None:
    method _run_training_loop (line 158) | def _run_training_loop(self):
  function main (line 195) | def main(config):

FILE: verl/experimental/fully_async_policy/fully_async_rollouter.py
  class FullyAsyncRollouter (line 43) | class FullyAsyncRollouter(SeparateRayPPOTrainer):
    method __init__ (line 50) | def __init__(
    method _init_async_objects (line 183) | def _init_async_objects(self):
    method set_message_queue_client (line 193) | async def set_message_queue_client(self, message_queue_client: Message...
    method set_max_required_samples (line 198) | async def set_max_required_samples(self):
    method get_rollout_wg (line 223) | def get_rollout_wg(self):
    method get_replicas (line 227) | def get_replicas(self):
    method get_max_queue_size (line 231) | def get_max_queue_size(self):
    method get_total_train_steps (line 234) | def get_total_train_steps(self):
    method reset_staleness (line 237) | async def reset_staleness(self):
    method do_validate (line 263) | def do_validate(self) -> ValidateMetrics:
    method save_checkpoint (line 270) | async def save_checkpoint(self, local_global_step_folder: str):
    method load_checkpoint (line 286) | def load_checkpoint(self):
    method _validate_config (line 344) | def _validate_config(self):
    method init_workers (line 350) | async def init_workers(self):
    method _create_actor_rollout_classes (line 362) | def _create_actor_rollout_classes(self):
    method _init_models (line 366) | def _init_models(self):
    method _create_continuous_iterator (line 371) | def _create_continuous_iterator(self):
    method _init_async_rollout_manager (line 380) | async def _init_async_rollout_manager(self):
    method _feed_samples (line 400) | async def _feed_samples(self):
    method _processor_worker (line 433) | async def _processor_worker(self):
    method _process_single_sample_streaming (line 500) | async def _process_single_sample_streaming(self, rollout_sample: Rollo...
    method _streaming_generation_main (line 519) | async def _streaming_generation_main(self):
    method fit (line 578) | async def fit(self):
    method _async_monitor_loop (line 614) | async def _async_monitor_loop(self):
    method _should_pause_generation (line 643) | async def _should_pause_generation(self) -> bool:
    method get_statistics (line 667) | async def get_statistics(self) -> dict:

FILE: verl/experimental/fully_async_policy/fully_async_trainer.py
  class TrainingStopException (line 47) | class TrainingStopException(Exception):
  class FullyAsyncTrainer (line 54) | class FullyAsyncTrainer(SeparateRayPPOTrainer):
    method __init__ (line 60) | def __init__(
    method _setup_checkpoint_manager (line 189) | def _setup_checkpoint_manager(self, rollouter):
    method set_message_queue_client (line 198) | def set_message_queue_client(self, message_queue_client: MessageQueueC...
    method set_rollouter (line 202) | def set_rollouter(self, rollouter):
    method set_total_train_steps (line 208) | def set_total_train_steps(self, total_training_steps):
    method get_actor_wg (line 223) | def get_actor_wg(self):
    method _get_samples_from_queue (line 227) | async def _get_samples_from_queue(self) -> tuple[None, None] | tuple[i...
    method _create_actor_rollout_classes (line 286) | def _create_actor_rollout_classes(self):
    method _init_models (line 297) | def _init_models(self):
    method init_workers (line 314) | async def init_workers(self):
    method _init_reward_loop (line 327) | def _init_reward_loop(self):
    method _init_async_rollout_manager (line 332) | async def _init_async_rollout_manager(self):
    method fit (line 389) | async def fit(self):
    method fit_step (line 421) | async def fit_step(self, batch_dict: dict = None):
    method _fit_generate (line 462) | async def _fit_generate(self, batch: DataProto = None) -> DataProto | ...
    method _compute_old_log_prob (line 473) | def _compute_old_log_prob(self, batch: DataProto):
    method _fit_update_local_step (line 495) | def _fit_update_local_step(self):
    method _fit_update_weights (line 509) | async def _fit_update_weights(self):
    method _validate_process (line 535) | async def _validate_process(self):
    method _fit_validate (line 558) | async def _fit_validate(self, val_before_train=False):
    method _fit_save_checkpoint (line 599) | def _fit_save_checkpoint(self, force=False):
    method _fit_postprocess_step (line 626) | def _fit_postprocess_step(self):
    method _save_checkpoint (line 636) | def _save_checkpoint(self):
    method load_checkpoint (line 699) | async def load_checkpoint(self):
    method _collect_metrics_from_samples (line 756) | def _collect_metrics_from_samples(self, batch, metrics):

FILE: verl/experimental/fully_async_policy/message_queue.py
  class MessageQueue (line 27) | class MessageQueue:
    method __init__ (line 32) | def __init__(self, config: DictConfig, max_queue_size: int = 1000):
    method put_sample (line 55) | async def put_sample(self, sample: Any) -> bool:
    method get_sample (line 85) | async def get_sample(self) -> Any | None:
    method get_queue_size (line 105) | async def get_queue_size(self) -> int:
    method get_statistics (line 110) | async def get_statistics(self) -> dict[str, Any]:
    method clear_queue (line 121) | async def clear_queue(self):
    method shutdown (line 128) | async def shutdown(self):
    method get_memory_usage (line 136) | async def get_memory_usage(self) -> dict:
    method put_validate (line 168) | async def put_validate(self, data):
    method get_validate (line 172) | async def get_validate(self):
  class MessageQueueClient (line 180) | class MessageQueueClient:
    method __init__ (line 183) | def __init__(self, queue_actor: Any):
    method put_sample (line 186) | async def put_sample(self, sample: Any) -> bool:
    method put_validate (line 191) | async def put_validate(self, data: Any) -> bool:
    method get_validate_sync (line 195) | def get_validate_sync(self) -> Any | None:
    method get_sample (line 198) | async def get_sample(self) -> Any | None:
    method get_queue_size (line 203) | async def get_queue_size(self) -> int:
    method get_statistics (line 208) | async def get_statistics(self) -> dict[str, Any]:
    method clear_queue (line 213) | async def clear_queue(self):
    method shutdown (line 218) | async def shutdown(self):
    method get_memory_usage (line 223) | async def get_memory_usage(self) -> dict:
    method get_sample_sync (line 228) | def get_sample_sync(self) -> Any | None:
    method get_statistics_sync (line 232) | def get_statistics_sync(self) -> dict[str, Any]:

FILE: verl/experimental/fully_async_policy/unittest/simple_streaming_demo.py
  class SimpleStreamingSystem (line 20) | class SimpleStreamingSystem:
    method __init__ (line 23) | def __init__(self, max_concurrent_tasks: int = 4):
    method data_stream (line 30) | async def data_stream(self):
    method add_data_stream (line 47) | async def add_data_stream(self, data_list: list[dict]):
    method _process_data_async (line 61) | async def _process_data_async(self, data_item: dict):
    method _submit_worker (line 85) | async def _submit_worker(self):
    method _consumer_worker (line 120) | async def _consumer_worker(self):
    method run_demo (line 140) | async def run_demo(self):
  function main (line 169) | async def main():

FILE: verl/experimental/one_step_off_policy/main_ppo.py
  class OneStepTaskRunner (line 35) | class OneStepTaskRunner:
    method run (line 36) | def run(self, config):
  function main (line 111) | def main(config):

FILE: verl/experimental/one_step_off_policy/ray_trainer.py
  class OneStepOffRayTrainer (line 48) | class OneStepOffRayTrainer(SeparateRayPPOTrainer):
    method __init__ (line 49) | def __init__(
    method _create_actor_rollout_classes (line 141) | def _create_actor_rollout_classes(self):
    method _init_models (line 151) | def _init_models(self):
    method _init_async_rollout_manager (line 169) | def _init_async_rollout_manager(self):
    method _create_continuous_iterator (line 188) | def _create_continuous_iterator(self):
    method _async_gen_next_batch (line 197) | async def _async_gen_next_batch(self, continuous_iterator):
    method _launch_individual_rewards (line 252) | def _launch_individual_rewards(batch, config, tokenizer):
    method fit (line 256) | async def fit(self):
    method fit_step (line 318) | async def fit_step(self, batch_data_future, continuous_iterator):
    method _fit_generate (line 383) | async def _fit_generate(self, batch_data_future, continuous_iterator):

FILE: verl/experimental/reward_loop/reward_loop.py
  function migrate_legacy_reward_impl (line 38) | def migrate_legacy_reward_impl(config):
  class RewardLoopWorker (line 92) | class RewardLoopWorker:
    method __init__ (line 108) | def __init__(self, config: DictConfig, reward_router_address: str = No...
    method _init_reward_fn (line 118) | def _init_reward_fn(self):
    method compute_score_batch (line 133) | async def compute_score_batch(self, data: DataProto) -> list[dict]:
    method compute_score (line 140) | async def compute_score(self, data: DataProto) -> dict:
    method _post_request (line 153) | async def _post_request(self, payload: dict, endpoint: str, max_retrie...
    method _preprocess_reward_inputs (line 193) | async def _preprocess_reward_inputs(self, data: DataProto) -> str:
    method compute_score_disrm (line 229) | async def compute_score_disrm(self, data: DataProto) -> dict:
  class RewardLoopManager (line 271) | class RewardLoopManager:
    method __init__ (line 277) | def __init__(self, config: DictConfig, rm_resource_pool: RayResourcePo...
    method _init_reward_loop_workers (line 289) | def _init_reward_loop_workers(self):
    method compute_rm_score (line 308) | def compute_rm_score(self, data: DataProto) -> DataProto:
    method _run_all (line 344) | def _run_all(self, tasks: list[asyncio.Task]):

FILE: verl/experimental/reward_loop/reward_manager/base.py
  class RewardManagerBase (line 33) | class RewardManagerBase(ABC):
    method __init__ (line 36) | def __init__(self, config: DictConfig, tokenizer: AutoTokenizer, compu...
    method init_class (line 50) | def init_class(cls, config: DictConfig, tokenizer: AutoTokenizer):
    method run_single (line 57) | async def run_single(self, data: DataProto):

FILE: verl/experimental/reward_loop/reward_manager/dapo.py
  class DAPORewardManager (line 24) | class DAPORewardManager(RewardManagerBase):
    method __init__ (line 27) | def __init__(self, config, tokenizer, compute_score, reward_router_add...
    method run_single (line 52) | async def run_single(self, data: DataProto) -> dict:

FILE: verl/experimental/reward_loop/reward_manager/gdpo.py
  class GDPORewardManager (line 24) | class GDPORewardManager(RewardManagerBase):
    method __init__ (line 27) | def __init__(self, config, tokenizer, compute_score, reward_router_add...
    method run_single (line 35) | async def run_single(self, data: DataProto) -> dict:

FILE: verl/experimental/reward_loop/reward_manager/limited.py
  class AsyncTokenBucket (line 32) | class AsyncTokenBucket:
    method __init__ (line 83) | def __init__(self, rate_limit: float, max_tokens: float = None):
    method acquire (line 90) | async def acquire(self, num_tokens: float = 1.0) -> None:
  class RateLimitedRewardManager (line 174) | class RateLimitedRewardManager(RewardManagerBase):
    method init_class (line 265) | def init_class(cls, config: DictConfig, tokenizer: AutoTokenizer):
    method __init__ (line 341) | def __init__(
    method _compute_reward (line 367) | async def _compute_reward(
    method run_single (line 398) | async def run_single(self, data: DataProto) -> dict:
    method __call__ (line 471) | def __call__(self, data: DataProto, return_dict: bool = False):

FILE: verl/experimental/reward_loop/reward_manager/naive.py
  class NaiveRewardManager (line 24) | class NaiveRewardManager(RewardManagerBase):
    method __init__ (line 27) | def __init__(self, config, tokenizer, compute_score, reward_router_add...
    method run_single (line 34) | async def run_single(self, data: DataProto) -> dict:

FILE: verl/experimental/reward_loop/reward_manager/registry.py
  function register (line 24) | def register(name: str) -> Callable[[type[RewardManagerBase]], type[Rewa...
  function get_reward_manager_cls (line 41) | def get_reward_manager_cls(name: str) -> type[RewardManagerBase]:

FILE: verl/experimental/reward_loop/reward_manager/remote.py
  class RewardComputeWorker (line 27) | class RewardComputeWorker:
    method __init__ (line 32) | def __init__(self, compute_score_fn):
    method compute_score (line 36) | def compute_score(self, **kwargs) -> dict:
  class RemoteRewardManager (line 41) | class RemoteRewardManager(RewardManagerBase):
    method __init__ (line 50) | def __init__(self, config, tokenizer, compute_score, reward_router_add...
    method choose_reward_worker (line 72) | def choose_reward_worker(self):
    method run_single (line 75) | async def run_single(self, data: DataProto) -> dict:

FILE: verl/experimental/reward_loop/reward_model.py
  class RewardModelManager (line 27) | class RewardModelManager:
    method __init__ (line 30) | def __init__(
    method _initialize_llm_servers (line 50) | def _initialize_llm_servers(self):
    method _initialize_router (line 87) | def _initialize_router(self):
    method get_router_address (line 100) | def get_router_address(self):
    method wake_up (line 103) | def wake_up(self):
    method sleep (line 107) | def sleep(self):
    method _run_all (line 111) | def _run_all(self, tasks: list[asyncio.Task]):

FILE: verl/experimental/reward_loop/router/inner_sglang_router.py
  function launch_router_process (line 30) | def launch_router_process(

FILE: verl/experimental/reward_loop/router/naive_router.py
  function _read_async_response (line 34) | async def _read_async_response(resp: aiohttp.ClientResponse) -> dict[str...
  function launch_router_process (line 51) | def launch_router_process(
  function run_router (line 77) | def run_router(router_ip: str, router_port: int, worker_urls: list[str]):
  class NaiveRouter (line 82) | class NaiveRouter:
    method __init__ (line 83) | def __init__(
    method _on_startup (line 115) | async def _on_startup(self):
    method _on_shutdown (line 128) | async def _on_shutdown(self):
    method _make_async_request (line 135) | async def _make_async_request(self, request: Request, endpoint: str):
    method _select_worker (line 175) | def _select_worker(self) -> str:
    method _release_worker (line 181) | def _release_worker(self, url: str) -> None:

FILE: verl/experimental/separation/engine_workers.py
  class DetachActorWorker (line 35) | class DetachActorWorker(ActorRolloutRefWorker):
    method __init__ (line 44) | def __init__(self, config: DictConfig, role: str):
    method _get_strategy_handlers (line 56) | def _get_strategy_handlers(self):
    method save_model_to_cpu (line 91) | def save_model_to_cpu(self, n):
    method restore_model_from_cpu (line 104) | def restore_model_from_cpu(self, n):
    method clear_cpu_model (line 121) | def clear_cpu_model(self, n):

FILE: verl/experimental/separation/ray_trainer.py
  class SeparateRayPPOTrainer (line 55) | class SeparateRayPPOTrainer(RayPPOTrainer):
    method __init__ (line 62) | def __init__(
    method init_workers (line 108) | def init_workers(self):
    method _init_resource_pools (line 128) | def _init_resource_pools(self):
    method _create_worker_classes (line 132) | def _create_worker_classes(self):
    method _create_actor_rollout_classes (line 138) | def _create_actor_rollout_classes(self):
    method _create_critic_class (line 141) | def _create_critic_class(self):
    method _create_reference_policy_class (line 171) | def _create_reference_policy_class(self):
    method _create_reward_model_class (line 183) | def _create_reward_model_class(self):
    method _init_worker_groups (line 193) | def _init_worker_groups(self):
    method _init_models (line 227) | def _init_models(self):
    method _init_reward_loop (line 254) | def _init_reward_loop(self):
    method _init_async_rollout_manager (line 266) | def _init_async_rollout_manager(self):
    method fit (line 269) | def fit(self):
    method fit_step (line 336) | def fit_step(self, batch_dict: Any = None):
    method _fit_prepare_step (line 378) | def _fit_prepare_step(self):
    method _fit_start_profile (line 383) | def _fit_start_profile(self):
    method _fit_get_batch (line 392) | def _fit_get_batch(self, batch_dict: dict) -> DataProto:
    method _fit_generate (line 399) | def _fit_generate(self, batch: DataProto = None) -> DataProto:
    method _fit_compute_reward (line 470) | def _fit_compute_reward(self, batch: DataProto) -> DataProto:
    method _fit_compute_log_prob (line 484) | def _fit_compute_log_prob(self, batch: DataProto) -> DataProto:
    method _fit_compute_ref_log_prob (line 535) | def _fit_compute_ref_log_prob(self, batch: DataProto) -> DataProto:
    method _fit_compute_critic (line 543) | def _fit_compute_critic(self, batch: DataProto) -> DataProto:
    method _fit_compute_advantage (line 551) | def _fit_compute_advantage(self, batch) -> DataProto:
    method _fit_update_critic (line 607) | def _fit_update_critic(self, batch: DataProto) -> DataProto:
    method _fit_update_actor (line 617) | def _fit_update_actor(self, batch: DataProto) -> DataProto:
    method _fit_update_weights (line 630) | def _fit_update_weights(self):
    method _fit_dump_data (line 637) | def _fit_dump_data(self, batch: DataProto):
    method _fit_validate (line 645) | def _fit_validate(self):
    method _fit_save_checkpoint (line 657) | def _fit_save_checkpoint(self):
    method _fit_stop_profile (line 684) | def _fit_stop_profile(self):
    method _fit_collect_metrics (line 700) | def _fit_collect_metrics(self, batch):
    method _fit_torch_memory (line 714) | def _fit_torch_memory(self):
    method _fit_experimental (line 723) | def _fit_experimental(self, batch):
    method _fit_postprocess_step (line 734) | def _fit_postprocess_step(self):

FILE: verl/experimental/separation/utils.py
  function create_resource_pool_manager (line 22) | def create_resource_pool_manager(config, roles: list) -> ResourcePoolMan...
  function create_role_worker_mapping (line 57) | def create_role_worker_mapping(config):

FILE: verl/experimental/vla/dp_rob.py
  class RobDataParallelPPOActor (line 39) | class RobDataParallelPPOActor(BasePPOActor):
    method __init__ (line 40) | def __init__(
    method process_tensor (line 57) | def process_tensor(self, tensor, pad_id):
    method generate_traj_mask (line 65) | def generate_traj_mask(self, end_step, traj_len):
    method apply_mask_with_grad_control (line 78) | def apply_mask_with_grad_control(self, log_probs, entropy, mask):
    method _forward_micro_batch (line 95) | def _forward_micro_batch(self, micro_batch, temperature) -> tuple[torc...
    method _forward_micro_batch_update (line 139) | def _forward_micro_batch_update(
    method _optimizer_step (line 165) | def _optimizer_step(self):
    method compute_log_prob (line 175) | def compute_log_prob(self, data: DataProto, calculate_entropy=False) -...
    method update_policy (line 231) | def update_policy(self, data: DataProto):

FILE: verl/experimental/vla/env_loop.py
  class EnvLoop (line 30) | class EnvLoop:
    method __init__ (line 34) | def __init__(self, env_wg: RayWorkerGroup, rollout_wg: RayWorkerGroup,...
    method generate_sequences (line 61) | def generate_sequences(self, prompts: DataProto, reset_future: asyncio...
    method run (line 80) | async def run(self, prompts: DataProto, reset_results: DataProto) -> D...
    method _restructure_obs_data (line 146) | def _restructure_obs_data(self, data_proto: DataProto) -> list[DataPro...
    method _collate_trajectories (line 164) | def _collate_trajectories(self, trajectories: dict, initial_state_ids:...

FILE: verl/experimental/vla/envs/action_utils.py
  function prepare_actions_simplevla (line 28) | def prepare_actions_simplevla(
  function prepare_actions (line 38) | def prepare_actions(
  function to_tensor (line 54) | def to_tensor(array: dict | torch.Tensor | np.ndarray | list | Any, devi...
  function tile_images (line 87) | def tile_images(images: list[np.ndarray | torch.Tensor], nrows: int = 1)...
  function put_text_on_image (line 155) | def put_text_on_image(image: np.ndarray, lines: list[str], max_width: in...
  function put_info_on_image (line 203) | def put_info_on_image(
  function list_of_dict_to_dict_of_list (line 224) | def list_of_dict_to_dict_of_list(
  function save_rollout_video (line 247) | def save_rollout_video(rollout_images: list[np.ndarray], output_dir: str...
  function resize_image (line 265) | def resize_image(img: np.ndarray, resize_size: tuple[int, int]) -> np.nd...
  function center_crop_image (line 298) | def center_crop_image(image: Image.Image) -> Image.Image:

FILE: verl/experimental/vla/envs/isaac_env/isaac_env.py
  class IsaacEnv (line 34) | class IsaacEnv(gym.Env):
    method __init__ (line 35) | def __init__(self, cfg, rank, world_size):
    method _init_env (line 70) | def _init_env(self, task_id=0):
    method _init_metrics (line 116) | def _init_metrics(self):
    method _reset_metrics (line 120) | def _reset_metrics(self, env_idx=None):
    method _record_metrics (line 134) | def _record_metrics(self, step_reward, terminations, infos):
    method reset (line 151) | def reset(self, env_idx: Optional[int | list[int] | np.ndarray] = None...
    method step (line 163) | def step(self, actions=None, critic_values=None):
    method chunk_step (line 204) | def chunk_step(self, chunk_actions, chunk_values=None):
    method _calc_step_reward (line 241) | def _calc_step_reward(self, reward):
    method _wrap_obs (line 249) | def _wrap_obs(self, raw_obs):
    method _extract_image_and_state (line 258) | def _extract_image_and_state(self, obs):
    method add_new_frames (line 282) | def add_new_frames(self, obs, plot_infos):
    method flush_video (line 291) | def flush_video(self, video_sub_dir: Optional[str] = None):
    method close (line 303) | def close(self):
    method load_state (line 308) | def load_state(self, state_buffer: bytes):
    method get_state (line 311) | def get_state(self):
    method reset_envs_to_state_ids (line 314) | def reset_envs_to_state_ids(self, state_ids_list, task_ids_list):

FILE: verl/experimental/vla/envs/libero_env/libero_env.py
  function patched_get_task_init_states (line 42) | def patched_get_task_init_states(self, i):
  class LiberoEnv (line 55) | class LiberoEnv(gym.Env):
    method __init__ (line 56) | def __init__(self, cfg, rank, world_size, stage_id: int = 0):
    method _compose_seed (line 89) | def _compose_seed(self, env_id: int, rollout_id: Optional[int] = None,...
    method elapsed_steps (line 103) | def elapsed_steps(self):
    method get_all_state_ids (line 106) | def get_all_state_ids(self):
    method _init_env (line 110) | def _init_env(self):
    method get_env_fns (line 114) | def get_env_fns(self):
    method get_env_fn_params (line 128) | def get_env_fn_params(self, env_idx=None):
    method _compute_total_num_group_envs (line 158) | def _compute_total_num_group_envs(self):
    method _init_task_and_trial_ids (line 169) | def _init_task_and_trial_ids(self):
    method _get_random_reset_state_ids (line 172) | def _get_random_reset_state_ids(self, num_reset_states):
    method get_reset_state_ids_all (line 176) | def get_reset_state_ids_all(self):
    method _get_ordered_reset_state_ids (line 185) | def _get_ordered_reset_state_ids(self, num_reset_states):
    method _get_task_and_trial_ids_from_reset_state_ids (line 193) | def _get_task_and_trial_ids_from_reset_state_ids(self, reset_state_ids):
    method _get_reset_states (line 214) | def _get_reset_states(self, env_idx):
    method _init_metrics (line 222) | def _init_metrics(self):
    method _reset_metrics (line 227) | def _reset_metrics(self, env_idx=None):
    method _record_metrics (line 243) | def _record_metrics(self, step_reward, terminations, infos):
    method _extract_image_and_state (line 254) | def _extract_image_and_state(self, obs):
    method _wrap_obs (line 267) | def _wrap_obs(self, obs_list):
    method _reconfigure (line 279) | def _reconfigure(self, reset_state_ids, env_idx):
    method reset (line 297) | def reset(
    method step (line 325) | def step(self, actions=None, critic_values=None):
    method chunk_step (line 363) | def chunk_step(self, chunk_actions, chunk_values=None):
    method _calc_step_reward (line 398) | def _calc_step_reward(self, terminations):
    method add_new_frames (line 408) | def add_new_frames(self, raw_obs, plot_infos):
    method flush_video (line 418) | def flush_video(self, video_sub_dir: Optional[str] = None):
    method reset_envs_to_state_ids (line 430) | def reset_envs_to_state_ids(self, state_ids_list, task_ids_list):
    method load_state (line 440) | def load_state(self, state_buffer: bytes):

FILE: verl/experimental/vla/envs/libero_env/utils.py
  function get_libero_image (line 24) | def get_libero_image(obs: dict[str, np.ndarray]) -> np.ndarray:
  function get_libero_wrist_image (line 39) | def get_libero_wrist_image(obs: dict[str, np.ndarray]) -> np.ndarray:
  function quat2axisangle (line 54) | def quat2axisangle(quat: np.ndarray) -> np.ndarray:
  function normalize_gripper_action (line 81) | def normalize_gripper_action(action: np.ndarray, binarize: bool = True) ...
  function invert_gripper_action (line 112) | def invert_gripper_action(action: np.ndarray) -> np.ndarray:

FILE: verl/experimental/vla/envs/libero_env/venv.py
  function _worker (line 35) | def _worker(
  class ReconfigureSubprocEnvWorker (line 121) | class ReconfigureSubprocEnvWorker(SubprocEnvWorker):
    method __init__ (line 122) | def __init__(self, env_fn: Callable[[], gym.Env], share_memory: bool =...
    method reconfigure_env_fn (line 143) | def reconfigure_env_fn(self, env_fn_param):
  class ReconfigureSubprocEnv (line 148) | class ReconfigureSubprocEnv(SubprocVectorEnv):
    method __init__ (line 149) | def __init__(self, env_fns: list[Callable[[], gym.Env]], **kwargs: Any...
    method reconfigure_env_fns (line 155) | def reconfigure_env_fns(self, env_fns, id=None):

FILE: verl/experimental/vla/fsdp_workers.py
  class RobActorRolloutRefWorker (line 52) | class RobActorRolloutRefWorker(ActorRolloutRefWorker):
    method _build_rollout (line 60) | def _build_rollout(self, trust_remote_code=False):
    method switch_to_rollout (line 111) | def switch_to_rollout(self):
    method switch_to_train (line 117) | def switch_to_train(self):
    method rollout_mode (line 122) | async def rollout_mode(self):
    method trainer_mode (line 168) | async def trainer_mode(self):
    method generate_sequences (line 201) | def generate_sequences(self, prompts: DataProto):
    method init_model (line 240) | def init_model(self):

FILE: verl/experimental/vla/main_ppo.py
  function calculate_reward (line 35) | def calculate_reward(data: DataProto, return_dict: bool = False) -> torc...
  function main (line 48) | def main(config):
  function main_task (line 84) | def main_task(config):

FILE: verl/experimental/vla/main_sac.py
  function calculate_reward (line 36) | def calculate_reward(data: DataProto, return_dict: bool = False) -> torc...
  function main (line 46) | def main(config):
  function main_task (line 59) | def main_task(config):

FILE: verl/experimental/vla/models/modules/mlp.py
  class MLP (line 19) | class MLP(nn.Module):
    method __init__ (line 36) | def __init__(
    method _get_activation (line 68) | def _get_activation(self, name: str):
    method init_weights (line 90) | def init_weights(self, m: nn.Module):
    method forward (line 122) | def forward(self, x):

FILE: verl/experimental/vla/models/openvla_oft/configuration_prismatic.py
  class PrismaticConfig (line 88) | class PrismaticConfig(PretrainedConfig):
    method __init__ (line 92) | def __init__(
  class OpenVLAConfig (line 145) | class OpenVLAConfig(PrismaticConfig):
    method __init__ (line 148) | def __init__(

FILE: verl/experimental/vla/models/openvla_oft/constants.py
  class NormalizationType (line 35) | class NormalizationType(str, Enum):
  function detect_robot_platform (line 67) | def detect_robot_platform():

FILE: verl/experimental/vla/models/openvla_oft/modeling_prismatic.py
  function unpack_tuple (line 61) | def unpack_tuple(fn: Callable[[Any], tuple[Any]]) -> Callable[[Any], Any]:
  function _ls_new_forward (line 72) | def _ls_new_forward(self, x: torch.Tensor) -> torch.Tensor:
  function ls_apply_patch (line 76) | def ls_apply_patch(ls_module: LayerScale):
  class PrismaticVisionBackbone (line 83) | class PrismaticVisionBackbone(nn.Module):
    method __init__ (line 91) | def __init__(
    method _create_featurizer (line 131) | def _create_featurizer(self, model_id: str, img_size: int, act_layer: ...
    method _patch_layer_scales (line 157) | def _patch_layer_scales(self) -> None:
    method get_num_patches (line 175) | def get_num_patches(self) -> int:
    method get_num_images_in_input (line 184) | def get_num_images_in_input(self) -> int:
    method set_num_images_in_input (line 193) | def set_num_images_in_input(self, num_images_in_input: int) -> None:
    method forward (line 202) | def forward(self, pixel_values: torch.Tensor) -> torch.Tensor:
  class PrismaticProjector (line 247) | class PrismaticProjector(nn.Module):
    method __init__ (line 248) | def __init__(self, use_fused_vision_backbone: bool, vision_dim: int, l...
    method forward (line 266) | def forward(self, img_patches: torch.Tensor) -> torch.Tensor:
  class PrismaticCausalLMOutputWithPast (line 283) | class PrismaticCausalLMOutputWithPast(ModelOutput):
  class PrismaticPreTrainedModel (line 296) | class PrismaticPreTrainedModel(PreTrainedModel):
    method _init_weights (line 305) | def _init_weights(self, module: nn.Module) -> None:
    method _supports_sdpa (line 328) | def _supports_sdpa(self) -> bool:
  class PrismaticForConditionalGeneration (line 333) | class PrismaticForConditionalGeneration(PrismaticPreTrainedModel):
    method __init__ (line 334) | def __init__(self, config: PrismaticConfig) -> None:
    method get_input_embeddings (line 379) | def get_input_embeddings(self) -> nn.Module:
    method set_input_embeddings (line 382) | def set_input_embeddings(self, value: nn.Module) -> None:
    method get_output_embeddings (line 385) | def get_output_embeddings(self) -> nn.Module:
    method set_output_embeddings (line 388) | def set_output_embeddings(self, new_embeddings: nn.Module) -> None:
    method get_decoder (line 391) | def get_decoder(self) -> nn.Module:
    method set_decoder (line 394) | def set_decoder(self, decoder: nn.Module) -> None:
    method tie_weights (line 397) | def tie_weights(self) -> None:
    method resize_token_embeddings (line 400) | def resize_token_embeddings(
    method _replace_input_embeddings (line 411) | def _replace_input_embeddings(self, input_embeddings, all_actions_mask...
    method _process_action_masks (line 447) | def _process_action_masks(self, labels):
    method _process_vision_features (line 454) | def _process_vision_features(self, pixel_values, language_embeddings=N...
    method _process_proprio_features (line 465) | def _process_proprio_features(self, projected_patch_embeddings, propri...
    method _build_multimodal_attention (line 477) | def _build_multimodal_attention(self, input_embeddings, projected_patc...
    method _build_multimodal_labels (line 502) | def _build_multimodal_labels(self, labels, projected_patch_embeddings):
    method prepare_inputs_for_generation (line 701) | def prepare_inputs_for_generation(
    method _reorder_cache (line 739) | def _reorder_cache(self, *args, **kwargs) -> Any:
    method _prepare_input_for_action_prediction_verl (line 742) | def _prepare_input_for_action_prediction_verl(self, input_ids, attenti...
    method _prepare_labels_for_action_prediction_verl (line 765) | def _prepare_labels_for_action_prediction_verl(self, labels, input_ids):
    method _verl_discrete_compute_logits (line 780) | def _verl_discrete_compute_logits(
    method forward (line 1091) | def forward(
  class OpenVLAForActionPrediction (line 1336) | class OpenVLAForActionPrediction(PrismaticForConditionalGeneration):
    method __init__ (line 1340) | def __init__(self, config: OpenVLAConfig) -> None:
    method _prepare_input_for_action_prediction (line 1351) | def _prepare_input_for_action_prediction(self, input_ids, attention_ma...
    method _prepare_labels_for_action_prediction (line 1374) | def _prepare_labels_for_action_prediction(self, labels, input_ids):
    method _unnormalize_actions (line 1389) | def _unnormalize_actions(self, normalized_actions, unnorm_key=None):
    method _run_diffusion_prediction (line 1410) | def _run_diffusion_prediction(
    method _regression_or_discrete_prediction (line 1496) | def _regression_or_discrete_prediction(
    method _verl_discrete_prediction (line 1563) | def _verl_discrete_prediction(
    method predict_action (line 1715) | def predict_action(
    method generate_action_verl (line 1833) | def generate_action_verl(
    method _check_unnorm_key (line 1976) | def _check_unnorm_key(norm_stats: dict[str, dict[str, Any]], unnorm_ke...
    method get_action_dim (line 1992) | def get_action_dim(self, unnorm_key: Optional[str] = None) -> int:
    method get_action_stats (line 1997) | def get_action_stats(self, unnorm_key: Optional[str] = None) -> dict[s...

FILE: verl/experimental/vla/models/openvla_oft/processing_prismatic.py
  function letterbox_pad_transform (line 40) | def letterbox_pad_transform(image: Image.Image, padding_fill_value: tupl...
  class PrismaticImageProcessor (line 49) | class PrismaticImageProcessor(ImageProcessingMixin):
    method __init__ (line 52) | def __init__(
    method apply_transform (line 145) | def apply_transform(self, img: Image.Image) -> torch.Tensor:
    method preprocess (line 164) | def preprocess(
    method __call__ (line 186) | def __call__(self, images: Image.Image | list[Image.Image], **kwargs) ...
  class PrismaticProcessor (line 192) | class PrismaticProcessor(ProcessorMixin):
    method __init__ (line 197) | def __init__(
    method __call__ (line 204) | def __call__(
    method batch_decode (line 236) | def batch_decode(
    method decode (line 250) | def decode(
    method model_input_names (line 265) | def model_input_names(self) -> list[str]:

FILE: verl/experimental/vla/models/openvla_oft/train_utils.py
  function get_current_action_mask (line 24) | def get_current_action_mask(token_ids):
  function get_next_actions_mask (line 41) | def get_next_actions_mask(token_ids):
  function compute_token_accuracy (line 58) | def compute_token_accuracy(predicted_token_ids, ground_truth_token_ids, ...
  function compute_actions_l1_loss (line 64) | def compute_actions_l1_loss(action_tokenizer, predicted_token_ids, groun...

FILE: verl/experimental/vla/models/pi0_torch/configuration_pi0_torch.py
  class PI0TorchConfig (line 18) | class PI0TorchConfig(PretrainedConfig):
    method __init__ (line 21) | def __init__(self, **kwargs):

FILE: verl/experimental/vla/models/pi0_torch/model/modeling_pi0.py
  function get_safe_dtype (line 30) | def get_safe_dtype(dtype: torch.dtype, device: str | torch.device) -> to...
  function create_sinusoidal_pos_embedding (line 40) | def create_sinusoidal_pos_embedding(
  function make_att_2d_masks (line 62) | def make_att_2d_masks(pad_masks: torch.Tensor, att_masks: torch.Tensor) ...
  class PI0Model (line 98) | class PI0Model(ModelMixin, ConfigMixin):
    method __init__ (line 124) | def __init__(
    method forward (line 161) | def forward(
    method sample_noise (line 209) | def sample_noise(self, shape: tuple[int, ...], device: torch.device | ...
    method embed_prefix (line 228) | def embed_prefix(
    method embed_suffix (line 304) | def embed_suffix(
    method sample_actions (line 384) | def sample_actions(
    method denoise_step (line 443) | def denoise_step(

FILE: verl/experimental/vla/models/pi0_torch/model/paligemma_with_expert.py
  function get_transformers_siglip_vision_config (line 38) | def get_transformers_siglip_vision_config() -> SiglipVisionConfig:
  class GemmaRMSNorm (line 54) | class GemmaRMSNorm(nn.Module):
    method __init__ (line 55) | def __init__(self, dim: int, eps: float = 1e-6, use_ada_rms_norm: bool...
    method _norm (line 65) | def _norm(self, x):
    method forward (line 68) | def forward(self, x, cond: torch.Tensor | None = None):
    method extra_repr (line 82) | def extra_repr(self):
  class SiglipVisionTransformer (line 89) | class SiglipVisionTransformer(nn.Module):
    method __init__ (line 90) | def __init__(self, config: SiglipVisionConfig):
    method forward (line 105) | def forward(
  class PaliGemmaMultiModalProjector (line 150) | class PaliGemmaMultiModalProjector(nn.Module):
    method __init__ (line 151) | def __init__(self, vision_hidden_size: int = 1152, projection_dim: int...
    method forward (line 155) | def forward(self, image_features: torch.Tensor) -> torch.Tensor:
  class RoPEEmbedding (line 161) | class RoPEEmbedding(nn.Module):
    method __init__ (line 168) | def __init__(self, dim: int, max_wavelength: int = 10_000, max_seq_len...
    method forward (line 193) | def forward(self, x: torch.Tensor, positions: torch.LongTensor) -> tor...
  class GemmaAttentionWithExpert (line 223) | class GemmaAttentionWithExpert(nn.Module):
    method __init__ (line 224) | def __init__(
    method forward (line 300) | def forward(
  class GemmaMLP (line 414) | class GemmaMLP(nn.Module):
    method __init__ (line 415) | def __init__(self, hidden_size: int = 1024, intermediate_size: int = 4...
    method forward (line 424) | def forward(self, x: torch.Tensor) -> torch.Tensor:
  class GemmaDecoderLayerWithExpert (line 430) | class GemmaDecoderLayerWithExpert(nn.Module):
    method __init__ (line 431) | def __init__(
    method gated_residual (line 496) | def gated_residual(self, x, y, gate):
    method forward (line 503) | def forward(
  class PaliGemmaWithExpertModel (line 574) | class PaliGemmaWithExpertModel(nn.Module):
    method __init__ (line 575) | def __init__(
    method embed_image (line 654) | def embed_image(self, image: torch.Tensor) -> torch.Tensor:
    method embed_language_tokens (line 661) | def embed_language_tokens(self, tokens: torch.Tensor) -> torch.Tensor:
    method forward (line 665) | def forward(

FILE: verl/experimental/vla/models/pi0_torch/modeling_pi0_torch.py
  function beta_schedule (line 43) | def beta_schedule(step, beta0, beta_min, T):
  class PI0ForActionPrediction (line 49) | class PI0ForActionPrediction(PreTrainedModel, SupportSACTraining):
    method __init__ (line 53) | def __init__(self, config: PI0TorchConfig):
    method _to (line 136) | def _to(self, device: torch.device | str):
    method forward (line 143) | def forward(
    method sample_actions (line 182) | def sample_actions(
    method from_pretrained (line 254) | def from_pretrained(cls, pretrained_model_name_or_path, *model_args, *...
    method freeze_vision_tower (line 272) | def freeze_vision_tower(self) -> None:
    method bc_loss (line 281) | def bc_loss(
    method _multi_heads_value (line 321) | def _multi_heads_value(
    method _cross_attention_pool_prefix (line 334) | def _cross_attention_pool_prefix(
    method _gaussian_log_prob (line 356) | def _gaussian_log_prob(
    method flow_sde_beta (line 366) | def flow_sde_beta(self) -> torch.Tensor:
    method _sample_actions_flow_sde (line 375) | def _sample_actions_flow_sde(
    method _build_kv_cache_from_prefix (line 460) | def _build_kv_cache_from_prefix(
    method sac_init (line 482) | def sac_init(self):
    method sac_forward_actor (line 494) | def sac_forward_actor(
    method sac_forward_critic (line 519) | def sac_forward_critic(
    method sac_get_critic_parameters (line 558) | def sac_get_critic_parameters(self) -> list[torch.nn.Parameter]:
    method sac_get_named_actor_parameters (line 564) | def sac_get_named_actor_parameters(self) -> list[tuple[str, torch.nn.P...
    method sac_forward_state_features (line 569) | def sac_forward_state_features(
    method sac_update_target_network (line 583) | def sac_update_target_network(self, tau: float):

FILE: verl/experimental/vla/models/pi0_torch/pi0_utils.py
  class Normalize (line 26) | class Normalize:
    method __init__ (line 34) | def __init__(self, stats: dict[str, Any], *, use_quantiles: bool = Fal...
    method to (line 54) | def to(self, device: torch.device | str) -> None:
    method __call__ (line 62) | def __call__(self, x: torch.Tensor) -> torch.Tensor:
  class Unnormalize (line 72) | class Unnormalize:
    method __init__ (line 73) | def __init__(self, stats, *, use_quantiles: bool = False):
    method to (line 85) | def to(self, device: torch.device | str) -> None:
    method __call__ (line 93) | def __call__(self, x: torch.Tensor) -> torch.Tensor:
  class DeltaActions (line 103) | class DeltaActions:
    method __init__ (line 106) | def __init__(self):
    method to (line 110) | def to(self, device: torch.device | str) -> None:
    method __call__ (line 113) | def __call__(self, data: dict[str, Any]) -> dict[str, Any]:
  class AbsoluteActions (line 125) | class AbsoluteActions:
    method __init__ (line 128) | def __init__(self):
    method to (line 132) | def to(self, device: torch.device | str) -> None:
    method __call__ (line 135) | def __call__(self, data: dict[str, Any]) -> dict[str, Any]:
  class AlohaInputs (line 147) | class AlohaInputs:
    method __init__ (line 150) | def __init__(self, adapt_to_pi: bool = True) -> None:
    method to (line 154) | def to(self, device: torch.device | str) -> None:
    method _gripper_from_angular_inv (line 157) | def _gripper_from_angular_inv(self, value: torch.Tensor) -> torch.Tensor:
    method _gripper_to_angular (line 162) | def _gripper_to_angular(self, value: torch.Tensor) -> torch.Tensor:
    method _encode_actions_inv (line 184) | def _encode_actions_inv(self, actions: torch.Tensor) -> torch.Tensor:
    method _decode_state (line 190) | def _decode_state(self, state: torch.Tensor) -> torch.Tensor:
    method _decode_aloha (line 198) | def _decode_aloha(self, state: torch.Tensor) -> torch.Tensor:
    method __call__ (line 204) | def __call__(self, data: dict[str, Any]) -> dict[str, Any]:
    method _encode_actions_inv_batch (line 218) | def _encode_actions_inv_batch(self, actions: torch.Tensor) -> torch.Te...
    method _decode_state_batch (line 224) | def _decode_state_batch(self, state: torch.Tensor) -> torch.Tensor:
    method call_batch (line 230) | def call_batch(self, data: dict[str, Any]) -> dict[str, Any]:
  class AlohaOutputs (line 240) | class AlohaOutputs:
    method __init__ (line 243) | def __init__(self, original_action_dim: int, adapt_to_pi: bool = True):
    method to (line 255) | def to(self, device: torch.device | str) -> None:
    method _gripper_from_angular (line 258) | def _gripper_from_angular(self, value: torch.Tensor) -> torch.Tensor:
    method _encode_actions (line 270) | def _encode_actions(self, actions: torch.Tensor) -> torch.Tensor:
    method __call__ (line 277) | def __call__(self, data: dict[str, Any]) -> dict[str, Any]:
    method _encode_actions_batch (line 283) | def _encode_actions_batch(self, actions: torch.Tensor) -> torch.Tensor:
    method call_batch (line 289) | def call_batch(self, data: dict[str, Any]) -> dict[str, Any]:
  class PadStatesAndActions (line 294) | class PadStatesAndActions:
    method __init__ (line 297) | def __init__(self, action_dim: int) -> None:
    method _pad_to_dim (line 300) | def _pad_to_dim(self, x: torch.Tensor, target_dim: int, axis: int = -1...
    method __call__ (line 312) | def __call__(self, data: dict[str, Any]) -> dict[str, Any]:
  function _normalize (line 319) | def _normalize(x: torch.Tensor, min_val: float, max_val: float) -> torch...
  function _unnormalize (line 323) | def _unnormalize(x: torch.Tensor, min_val: float, max_val: float) -> tor...
  function resize_with_pad (line 327) | def resize_with_pad(img: torch.Tensor, width: int, height: int, pad_valu...
  class ImageTransform (line 366) | class ImageTransform:
    method __init__ (line 367) | def __init__(
    method __call__ (line 397) | def __call__(self, data: dict[str, torch.Tensor]) -> tuple[list[torch....
    method call_batch (line 437) | def call_batch(self, data: dict[str, torch.Tensor]) -> tuple[list[torc...
  class PromptTokenizerTransform (line 483) | class PromptTokenizerTransform:
    method __init__ (line 484) | def __init__(self, max_length: int, discrete_state_input: bool = False...
    method __call__ (line 489) | def __call__(self, data: dict[str, Any], tokenizer) -> tuple[torch.Ten...
    method call_batch (line 528) | def call_batch(self, data: dict[str, Any], tokenizer) -> tuple[torch.T...

FILE: verl/experimental/vla/models/pi0_torch/policy/base.py
  class Pi0Input (line 20) | class Pi0Input(ABC):
    method __init__ (line 21) | def __init__(self):
    method from_env_obs (line 42) | def from_env_obs(cls, env_obs) -> "Pi0Input": ...
  class Pi0Output (line 45) | class Pi0Output:
    method __init__ (line 46) | def __init__(self):
    method from_model_output (line 51) | def from_model_output(cls, model_output) -> "Pi0Output": ...

FILE: verl/experimental/vla/models/pi0_torch/policy/libero_policy.py
  class LiberoPi0Input (line 27) | class LiberoPi0Input(Pi0Input):
    method from_env_obs (line 30) | def from_env_obs(cls, env_obs: DataProto) -> "LiberoPi0Input":
  class LiberoPi0Output (line 68) | class LiberoPi0Output(Pi0Output):
    method from_model_output (line 71) | def from_model_output(cls, model_output: dict) -> "LiberoPi0Output":

FILE: verl/experimental/vla/models/register_vla_models.py
  function register_openvla_oft (line 34) | def register_openvla_oft() -> None:
  function register_pi0_torch_model (line 47) | def register_pi0_torch_model() -> None:
  function register_vla_models (line 58) | def register_vla_models() -> None:

FILE: verl/experimental/vla/naive_rollout_rob.py
  function pad_sequence_to_length (line 45) | def pad_sequence_to_length(tensors, max_seq_len, pad_token_id, left_pad=...
  function process_input (line 58) | def process_input(task_descriptions, images_and_states, processor):
  class NaiveRolloutRob (line 112) | class NaiveRolloutRob(BaseRollout):
    method __init__ (line 113) | def __init__(
    method _generate_one_step (line 136) | def _generate_one_step(self, prompts: dict, do_sample, temperature, ma...
    method generate_sequences (line 181) | def generate_sequences(self, prompts: DataProto) -> DataProto:
    method update_weights (line 197) | async def update_weights(self, weights_iterator, **kwargs):
    method release (line 214) | async def release(self):
    method resume (line 221) | async def resume(self, **kwargs):

FILE: verl/experimental/vla/prepare_libero_dataset.py
  function patched_get_task_init_states (line 29) | def patched_get_task_init_states(self, i):
  function compute_total_num_group_envs (line 42) | def compute_total_num_group_envs(task_suite: Benchmark):
  function build_dataset_for_suite (line 55) | def build_dataset_for_suite(task_suite_name: str, local_save_dir: str):
  function resolve_task_suites (line 160) | def resolve_task_suites(task_suite_name: str) -> list[str]:

FILE: verl/experimental/vla/rob_ray_trainer.py
  function compute_response_mask (line 51) | def compute_response_mask(config, data: DataProto) -> torch.Tensor:
  function flatten_trajectories (line 85) | def flatten_trajectories(data: DataProto) -> DataProto:
  class RobRayPPOTrainer (line 105) | class RobRayPPOTrainer(RayPPOTrainer):
    method _start_profiling (line 113) | def _start_profiling(self, do_profile: bool) -> None:
    method _stop_profiling (line 119) | def _stop_profiling(self, do_profile: bool) -> None:
    method init_workers (line 125) | def init_workers(self):
    method _get_gen_batch (line 195) | def _get_gen_batch(self, batch: DataProto) -> DataProto:
    method _reset_envs (line 206) | def _reset_envs(self, gen_batch: DataProto) -> asyncio.Future:
    method fit (line 213) | def fit(self):
    method _validate (line 561) | def _validate(self):

FILE: verl/experimental/vla/sac/base.py
  class SupportSACTraining (line 23) | class SupportSACTraining:
    method sac_init (line 39) | def sac_init(self):
    method sac_get_critic_parameters (line 42) | def sac_get_critic_parameters(self) -> list[torch.nn.Parameter]:
    method sac_get_named_actor_parameters (line 51) | def sac_get_named_actor_parameters(self) -> list[tuple[str, torch.nn.P...
    method sac_forward_critic (line 60) | def sac_forward_critic(
    method sac_forward_actor (line 85) | def sac_forward_actor(
    method sac_forward_state_features (line 106) | def sac_forward_state_features(self, s: dict[str, torch.Tensor]) -> Any:
    method bc_loss (line 122) | def bc_loss(
    method sac_update_target_network (line 132) | def sac_update_target_network(self, tau: float):
  class BaseSACActor (line 142) | class BaseSACActor(ABC):
    method update_policy (line 144) | def update_policy(self, data: DataProto) -> dict:

FILE: verl/experimental/vla/sac/naive_rollout_pi05.py
  class PI0RolloutRob (line 37) | class PI0RolloutRob(NaiveRolloutRob):
    method __init__ (line 38) | def __init__(
    method generate_sequences (line 55) | def generate_sequences(self, prompts: DataProto) -> DataProto:

FILE: verl/experimental/vla/sac/replay_pool.py
  class _DualPoolState (line 28) | class _DualPoolState:
  class SACReplayPool (line 37) | class SACReplayPool:
    method __init__ (line 47) | def __init__(
    method add_batch (line 65) | def add_batch(self, batch: TensorDict, task_ids: Sequence[Any]):
    method sample_batch (line 110) | def sample_batch(
    method insert_and_resample (line 193) | def insert_and_resample(
    method save (line 203) | def save(self, directory: str):
    method load (line 242) | def load(self, directory: str):
    method from_path (line 275) | def from_path(
    method _insert_block_to_pool (line 299) | def _insert_block_to_pool(
    method _get_or_create_task_pool (line 331) | def _get_or_create_task_pool(self, task_id: str, sample: TensorDict) -...
    method _extract_positive_mask (line 361) | def _extract_positive_mask(self, batch: TensorDict) -> torch.Tensor:
    method _pad_sampled_batch (line 367) | def _pad_sampled_batch(self, sampled_batch: TensorDict, target_batch_s...
    method _index_select_batch (line 389) | def _index_select_batch(self, batch: TensorDict, idx: torch.Tensor) ->...
    method _sample_from_task_

Download .json

Condensed preview — 1128 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (8,818K chars).

[
  {
    "path": ".gemini/config.yaml",
    "chars": 207,
    "preview": "have_fun: false\ncode_review:\n  disable: false\n  comment_severity_threshold: HIGH\n  max_review_comments: -1\n  pull_reques"
  },
  {
    "path": ".git-blame-ignore-revs",
    "chars": 556,
    "preview": "# Local uasge: git config blame.ignoreRevsFile .git-blame-ignore-revs\n\n# [dev] feat: immigrate from yapf & pylint to ruf"
  },
  {
    "path": ".github/CODEOWNERS",
    "chars": 1347,
    "preview": "/docs @eric-haibin-lin @zhaochenyang20 @hongpeng-guo\n/docs/amd_tutorial @yushengsu-thu\n/docs/slang_multiturn @zhaochenya"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/bug-report.yml",
    "chars": 2301,
    "preview": "# modified from https://github.com/huggingface/transformers/blob/main/.github/ISSUE_TEMPLATE/bug-report.yml?plain=1\nname"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/config.yml",
    "chars": 40,
    "preview": "blank_issues_enabled: true\nversion: 0.1\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature-request.yml",
    "chars": 1203,
    "preview": "# modified from https://github.com/huggingface/transformers/blob/main/.github/ISSUE_TEMPLATE/feature-request.yml?plain=1"
  },
  {
    "path": ".github/PULL_REQUEST_TEMPLATE.md",
    "chars": 2906,
    "preview": "### What does this PR do?\n\n> Add **concise** overview of what this PR aims to achieve or accomplish. Reference related G"
  },
  {
    "path": ".github/dependabot.yml",
    "chars": 245,
    "preview": "## Enabled the dependabot to check the dependencies of the project\n## Dependabot will open pull requests to update depen"
  },
  {
    "path": ".github/workflows/README.md",
    "chars": 2406,
    "preview": "### Adding a New Workflow\n\nWhen adding a new workflow for continuous integration (CI), you have two runner options: a fi"
  },
  {
    "path": ".github/workflows/check-pr-title.yml",
    "chars": 2494,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/cpu_unit_tests.yml",
    "chars": 4619,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/doc.yml",
    "chars": 4196,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/docker-build-ascend-a2.yml",
    "chars": 2926,
    "preview": "name: docker-build-ascend-a2\n\non:\n  workflow_dispatch:\n  push:\n    branches: [\"main\"]\n    paths:\n      - \"docker/ascend/"
  },
  {
    "path": ".github/workflows/docker-build-ascend-a3.yml",
    "chars": 2926,
    "preview": "name: docker-build-ascend-a3\n\non:\n  workflow_dispatch:\n  push:\n    branches: [\"main\"]\n    paths:\n      - \"docker/ascend/"
  },
  {
    "path": ".github/workflows/e2e_ascend.yml",
    "chars": 6483,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/e2e_fully_async_policy.yml",
    "chars": 6411,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/e2e_fully_async_policy_ascend.yml",
    "chars": 6352,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/e2e_one_step_off_policy.yml",
    "chars": 6463,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/e2e_one_step_off_policy_ascend.yml",
    "chars": 6416,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/e2e_ppo_grpo_trainer_trtllm.yml",
    "chars": 12297,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/e2e_ppo_trainer.yml",
    "chars": 2196,
    "preview": "name: e2e_ppo_trainer\n\non:\n  # Trigger the workflow on push or pull request,\n  # but only for the main branch\n  # For pu"
  },
  {
    "path": ".github/workflows/e2e_ppo_trainer_megatron_sglang.yml",
    "chars": 8541,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/e2e_ppo_trainer_megatron_sglang_2.yml",
    "chars": 8679,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/e2e_ppo_trainer_megatron_vllm.yml",
    "chars": 9674,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/e2e_ppo_trainer_megatron_vllm_2.yml",
    "chars": 16565,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/e2e_ppo_trainer_megatron_vllm_2_ascend.yml",
    "chars": 11103,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/e2e_ppo_trainer_veomni_vllm.yml",
    "chars": 5943,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/e2e_sft_llm.yml",
    "chars": 6075,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/e2e_sft_llm_ascend.yml",
    "chars": 7299,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/e2e_sft_vlm.yml",
    "chars": 5013,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/gpu_unit_tests.yml",
    "chars": 5882,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/model.yml",
    "chars": 7532,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/model_ascend.yml",
    "chars": 5358,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/nightly_ascend.yml",
    "chars": 6970,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/npu_unit_tests.yml",
    "chars": 5686,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/pre-commit.yml",
    "chars": 1204,
    "preview": "# c.f. https://github.com/pre-commit/action?tab=readme-ov-file#using-this-action\nname: pre-commit\n\n# No need to avoid / "
  },
  {
    "path": ".github/workflows/precommit-autofix.yml",
    "chars": 1257,
    "preview": "name: scheduled pre-commit autofix\n\non:\n  schedule:\n    # Every hour\n    - cron: \"0 * * * *\"\n  workflow_dispatch:\n\npermi"
  },
  {
    "path": ".github/workflows/reward_model_sglang.yml",
    "chars": 5563,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/reward_model_vllm.yml",
    "chars": 5497,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/reward_model_vllm_ascend.yml",
    "chars": 4728,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/sanity.yml",
    "chars": 4736,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/scorecard.yml",
    "chars": 2759,
    "preview": "# This workflow uses actions that are not certified by GitHub. They are provided\n# by a third-party and are governed by "
  },
  {
    "path": ".github/workflows/secrets_scan.yml",
    "chars": 486,
    "preview": "on:\n  push:\n    branches:\n      - main\n      - v0.*\n  pull_request:\n\npermissions:\n  contents: read\n\njobs:\n  test:\n    ru"
  },
  {
    "path": ".github/workflows/sgl.yml",
    "chars": 6170,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".github/workflows/type-coverage-check.yml",
    "chars": 934,
    "preview": "name: Type Annotation and Docstring Coverage\n\non:\n  pull_request:\n    paths:\n      - '**/*.py'\n      - '.github/workflow"
  },
  {
    "path": ".github/workflows/vllm.yml",
    "chars": 6747,
    "preview": "# # Tests layout\n\n# Each folder under tests/ corresponds to a test category for a sub-namespace in verl. For instance:\n#"
  },
  {
    "path": ".gitignore",
    "chars": 1376,
    "preview": "**/*.pt\n**/checkpoints\n**/wget-log\n**/_build/\n**/*.ckpt\n**/outputs\n**/*.tar.gz\n**/playground\n**/wandb\n\n/pyrightconfig.js"
  },
  {
    "path": ".gitmodules",
    "chars": 91,
    "preview": "[submodule \"recipe\"]\n\tpath = recipe\n\turl = https://github.com/verl-project/verl-recipe.git\n"
  },
  {
    "path": ".pre-commit-config.yaml",
    "chars": 1290,
    "preview": "repos:\n  - repo: https://github.com/astral-sh/ruff-pre-commit\n    rev: \"v0.12.2\"\n    hooks:\n      - id: ruff\n        arg"
  },
  {
    "path": ".readthedocs.yaml",
    "chars": 333,
    "preview": "# Read the Docs configuration file\n# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details\n\nversion:"
  },
  {
    "path": "CONTRIBUTING.md",
    "chars": 3774,
    "preview": "# Contributing to verl\n\nThank you for considering a contribution to verl! We welcome contributions of any kind - bug fix"
  },
  {
    "path": "LICENSE",
    "chars": 11358,
    "preview": "\n                                 Apache License\n                           Version 2.0, January 2004\n                  "
  },
  {
    "path": "Notice.txt",
    "chars": 57,
    "preview": "Copyright 2023-2024 Bytedance Ltd. and/or its affiliates "
  },
  {
    "path": "README.md",
    "chars": 36359,
    "preview": "<div align=\"center\">\n 👋 Hi, everyone!\n    verl is a RL training library initiated by <b>ByteDance Seed team</b> and main"
  },
  {
    "path": "docker/Dockerfile.isaaclab230",
    "chars": 5325,
    "preview": "\n#FROM nvcr.nju.edu.cn/nvidia/isaac-lab:2.3.0\nFROM isaac-lab-base:latest\n\nENV ACCEPT_EULA=Y\nENTRYPOINT []\n\n# desktop\nRUN"
  },
  {
    "path": "docker/Dockerfile.stable.sglang",
    "chars": 2279,
    "preview": "# sgl059\n\nFROM lmsysorg/sglang:v0.5.9\n\nARG PIP_NO_CACHE_DIR=1\n\nRUN pip install pybind11\n\nRUN pip install nvidia-mathdx\n\n"
  },
  {
    "path": "docker/Dockerfile.stable.trtllm",
    "chars": 2680,
    "preview": "# Base image from NGC TensorRT-LLM, which includes a pre-installed TensorRT-LLM.\n# For available images, visit: https://"
  },
  {
    "path": "docker/Dockerfile.stable.vllm",
    "chars": 4058,
    "preview": "# vllm017\n\nFROM nvidia/cuda:12.9.1-devel-ubuntu22.04\n\nARG DEBIAN_FRONTEND=noninteractive\nARG PIP_NO_CACHE_DIR=1\n\nRUN apt"
  },
  {
    "path": "docker/README.md",
    "chars": 2697,
    "preview": "# Dockerfiles of verl\n\nWe provide pre-built Docker images for quick setup. And from this version, we utilize a new image"
  },
  {
    "path": "docker/ascend/Dockerfile.ascend.sglang_8.3.rc1_a2",
    "chars": 3760,
    "preview": "# Pull base image\nFROM swr.cn-south-1.myhuaweicloud.com/ascendhub/cann:8.3.rc1-910b-ubuntu22.04-py3.11\n\nARG ASCEND_CANN_"
  },
  {
    "path": "docker/ascend/Dockerfile.ascend.sglang_8.3.rc1_a3",
    "chars": 3545,
    "preview": "# Pull base image\nFROM swr.cn-south-1.myhuaweicloud.com/ascendhub/cann:8.3.rc1-a3-ubuntu22.04-py3.11\n\nARG ASCEND_CANN_PA"
  },
  {
    "path": "docker/ascend/Dockerfile.ascend_8.2.rc1_a2",
    "chars": 2578,
    "preview": "# Pull base image\nFROM swr.cn-south-1.myhuaweicloud.com/ascendhub/cann:8.2.rc1-910b-ubuntu22.04-py3.11\n\n# Prepare requir"
  },
  {
    "path": "docker/ascend/Dockerfile.ascend_8.2.rc1_a3",
    "chars": 2576,
    "preview": "# Pull base image\nFROM swr.cn-south-1.myhuaweicloud.com/ascendhub/cann:8.2.rc1-a3-ubuntu22.04-py3.11\n\n# Prepare required"
  },
  {
    "path": "docker/ascend/Dockerfile.ascend_8.3.rc1_a2",
    "chars": 2797,
    "preview": "# Pull base image\nFROM swr.cn-south-1.myhuaweicloud.com/ascendhub/cann:8.3.rc1-910b-ubuntu22.04-py3.11\n\n# Prepare requir"
  },
  {
    "path": "docker/ascend/Dockerfile.ascend_8.3.rc1_a3",
    "chars": 2795,
    "preview": "# Pull base image\nFROM swr.cn-south-1.myhuaweicloud.com/ascendhub/cann:8.3.rc1-a3-ubuntu22.04-py3.11\n\n# Prepare required"
  },
  {
    "path": "docker/ascend/Dockerfile.ascend_8.5.0_a2",
    "chars": 2822,
    "preview": "# Pull base image\nFROM swr.cn-south-1.myhuaweicloud.com/ascendhub/cann:8.5.0-910b-ubuntu22.04-py3.11\n\nARG SOC_VERSION=\"a"
  },
  {
    "path": "docker/ascend/Dockerfile.ascend_8.5.0_a3",
    "chars": 2824,
    "preview": "# Pull base image\nFROM swr.cn-south-1.myhuaweicloud.com/ascendhub/cann:8.5.0-a3-ubuntu22.04-py3.11\n\nARG SOC_VERSION=\"asc"
  },
  {
    "path": "docker/aws/Dockerfile.extention.awsefa",
    "chars": 2105,
    "preview": "# Base Image support aws EFA\n# Build Image with frameworks based on this\nFROM verlai/verl:app-verl0.6-transformers4.56.1"
  },
  {
    "path": "docker/aws/Dockerfile.ngc.vllm0.8.sagemaker",
    "chars": 1791,
    "preview": "# Using a pre-built image from AWS DLC which contains the current version of python (3.10) and supported cuda version (1"
  },
  {
    "path": "docker/rocm/Apptainerfile.rocm",
    "chars": 1519,
    "preview": "Bootstrap: docker\n\n# Support - Traing: fsdp; Inference: vllm\n# FROM: rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6."
  },
  {
    "path": "docker/rocm/Dockerfile.rocm",
    "chars": 10458,
    "preview": "# FROM \"compute-artifactory.amd.com:5000/rocm-plus-docker/framework/compute-rocm-rel-6.4:94_ubuntu22.04_py3.10_pytorch_r"
  },
  {
    "path": "docker/rocm/Dockerfile.rocm7",
    "chars": 4795,
    "preview": "# default base image\nARG REMOTE_VLLM=\"1\"\nARG COMMON_WORKDIR=/app\nARG BASE_IMAGE=rocm/vllm-dev:base\n\nFROM ${BASE_IMAGE} A"
  },
  {
    "path": "docker/rocm/Dockerfile.rocm_verl-0.3.0.post1",
    "chars": 1499,
    "preview": "#  Build the docker in the repo dir:\n# docker build -f docker/Dockerfile.rocm -t verl-rocm:03.04.2015 .\n# docker images "
  },
  {
    "path": "docker/rocm/Dockerfile.rocm_verl-0.4.1",
    "chars": 10480,
    "preview": "# FROM \"compute-artifactory.amd.com:5000/rocm-plus-docker/framework/compute-rocm-rel-6.4:94_ubuntu22.04_py3.10_pytorch_r"
  },
  {
    "path": "docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.app.sglang.vllm.mcore0.12",
    "chars": 1926,
    "preview": "# Start from the verl base image\n# Dockerfile.base\nFROM verlai/verl:base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4\n\n# Defi"
  },
  {
    "path": "docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.app.sglang.vllm.mcore0.12.deepep",
    "chars": 3447,
    "preview": "# Start from the verl base image\n# Dockerfile.base\nFROM verlai/verl:base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4\n\n# Defi"
  },
  {
    "path": "docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.app.sglang.vllm.mcore0.13.preview",
    "chars": 3453,
    "preview": "# Start from the verl base image\n# Dockerfile.base\nFROM verlai/verl:base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4\n\n# Defi"
  },
  {
    "path": "docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.app.vllm.mcore0.12",
    "chars": 2229,
    "preview": "# Start from the verl base image\n# Dockerfile.base\nFROM verlai/verl:base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4\n\n# Defi"
  },
  {
    "path": "docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.app.vllm.mcore0.12.deepep",
    "chars": 3750,
    "preview": "# Start from the verl base image\n# Dockerfile.base\nFROM verlai/verl:base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4\n\n# Defi"
  },
  {
    "path": "docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.app.vllm.mcore0.13.preview",
    "chars": 3663,
    "preview": "# Start from the verl base image\n# Dockerfile.base\nFROM verlai/verl:base-verl0.4-cu124-cudnn9.8-torch2.6-fa2.7.4\n\n# Defi"
  },
  {
    "path": "docker/verl0.4-cu124-torch2.6-fa2.7.4/Dockerfile.base",
    "chars": 5477,
    "preview": "# Base Docker Image of verl, with CUDA/Torch/FlashAttn/Apex/TransformerEngine, without other frameworks\n# Target: verlai"
  },
  {
    "path": "docker/verl0.4-cu124-torch2.6-fa2.7.4/README.md",
    "chars": 1005,
    "preview": "# verl image with verl v0.4.x\n\n## Important packages version\n\n```txt\ncuda==12.4\ncudnn==9.8.0\ntorch==2.6.0\nflash_attn=2.7"
  },
  {
    "path": "docker/verl0.5-cu126-torch2.7-fa2.7.4/Dockerfile.app.sglang0.4.10.post2.mcore0.13",
    "chars": 1680,
    "preview": "# Start from the verl base image\n# Dockerfile.base\nFROM verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.7.4\n\n# De"
  },
  {
    "path": "docker/verl0.5-cu126-torch2.7-fa2.7.4/Dockerfile.app.sglang0.4.9.post6.mcore0.13",
    "chars": 1679,
    "preview": "# Start from the verl base image\n# Dockerfile.base\nFROM verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.7.4\n\n# De"
  },
  {
    "path": "docker/verl0.5-cu126-torch2.7-fa2.7.4/Dockerfile.app.vllm.mcore0.13",
    "chars": 1575,
    "preview": "# Start from the verl base image\n# Dockerfile.base\nFROM verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.7.4\n\n# De"
  },
  {
    "path": "docker/verl0.5-cu126-torch2.7-fa2.7.4/Dockerfile.app.vllm.mcore0.15",
    "chars": 1627,
    "preview": "# Start from the verl base image\n# Dockerfile.base\nFROM iseekyan/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.7.4-h10"
  },
  {
    "path": "docker/verl0.5-cu126-torch2.7-fa2.7.4/Dockerfile.base.torch2.7.1",
    "chars": 6184,
    "preview": "# Base Docker Image of verl, with CUDA/Torch/FlashAttn/Apex/TransformerEngine, without other frameworks\n# Target: verlai"
  },
  {
    "path": "docker/verl0.5-cu126-torch2.7-fa2.7.4/README.md",
    "chars": 721,
    "preview": "# verl image with verl v0.5\n\n## Important packages version\n\n```txt\ncuda==12.6\ncudnn==9.8.0\ntorch==2.7.1\nflash_attn=2.7.4"
  },
  {
    "path": "docker/verl0.5-cu126-torch2.7.1-fa2.8.0/Dockerfile.app.sglang.mcore0.12",
    "chars": 1724,
    "preview": "# Start from the verl base image\n# Dockerfile.base\nFROM verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.8.0\n\n# De"
  },
  {
    "path": "docker/verl0.5-cu126-torch2.7.1-fa2.8.0/Dockerfile.app.sglang.mcore0.13.preview",
    "chars": 1732,
    "preview": "# Start from the verl base image\n# Dockerfile.base\nFROM verlai/verl:base-verl0.5-cu126-cudnn9.8-torch2.7.1-fa2.8.0\n\n# De"
  },
  {
    "path": "docker/verl0.5-cu126-torch2.7.1-fa2.8.0/Dockerfile.base",
    "chars": 6053,
    "preview": "# Base Docker Image of verl, with CUDA/Torch/FlashAttn/Apex/TransformerEngine, without other frameworks\n# Target: verlai"
  },
  {
    "path": "docker/verl0.5-cu126-torch2.7.1-fa2.8.0/README.md",
    "chars": 621,
    "preview": "# verl image with verl v0.5\n\n## Important packages version\n\n```txt\ncuda==12.6\ncudnn==9.8.0\ntorch==2.7.1\nflash_attn=2.8.0"
  },
  {
    "path": "docker/verl0.5-preview-cu128-torch2.7.1-fa2.8.0/Dockerfile.app.sglang.megatron",
    "chars": 1691,
    "preview": "# Start from the verl base image\n# Dockerfile.base\nFROM verlai/verl:base-verl0.5-preview-cu128-cudnn9.8-torch2.7.1-fa2.8"
  },
  {
    "path": "docker/verl0.5-preview-cu128-torch2.7.1-fa2.8.0/Dockerfile.base",
    "chars": 4587,
    "preview": "# Base Docker Image of verl, with CUDA/Torch/FlashAttn/Apex/TransformerEngine, without other frameworks\n# Target: verlai"
  },
  {
    "path": "docker/verl0.5-preview-cu128-torch2.7.1-fa2.8.0/README.md",
    "chars": 657,
    "preview": "# verl image with verl v0.5\n\n## Important packages version\n\n```txt\ncuda==12.8\ncudnn==9.8.0\ntorch==2.7.1\nflash_attn=2.8.0"
  },
  {
    "path": "docker/verl0.6-cu128-torch2.8.0-fa2.7.4/Dockerfile.app.sglang",
    "chars": 179,
    "preview": "FROM verlai/verl:base-verl0.6-cu128-cudnn9.8-torch2.8.0-fa2.7.4\n\nRUN pip install --no-cache-dir \"sglang[all]==0.5.2\"\nRUN"
  },
  {
    "path": "docker/verl0.6-cu128-torch2.8.0-fa2.7.4/Dockerfile.base",
    "chars": 4068,
    "preview": "# Start from the NVIDIA official image (ubuntu-24.04 + cuda-12.8 + python-3.12)\n# https://docs.nvidia.com/deeplearning/f"
  },
  {
    "path": "docker/verl0.6-cu128-torch2.8.0-fa2.7.4/Dockerfile.vllm011.mcore_gpt-oss",
    "chars": 514,
    "preview": "FROM nvcr.io/nvidia/nemo:25.07.gpt_oss\n\nRUN git clone -b v0.11.0 --depth 1 https://github.com/vllm-project/vllm.git /opt"
  },
  {
    "path": "docker/verl0.6.1-experimental/Dockerfile.sglang056exp",
    "chars": 2694,
    "preview": "# Dockerfile for verlai/verl:sgl056.exp\nFROM lmsysorg/sglang:v0.5.6.post1\n\nRUN pip install pybind11\n\nRUN pip install nvi"
  },
  {
    "path": "docker/verl0.6.1-experimental/Dockerfile.vllm012exp",
    "chars": 2665,
    "preview": "# dockerfile for verlai/verl:vll012.exp\nFROM nvcr.io/nvidia/pytorch:25.11-py3\n\nRUN git clone -b v0.12.0 --depth 1 https:"
  },
  {
    "path": "docs/Makefile",
    "chars": 602,
    "preview": "# Minimal makefile for Sphinx documentation\n#\n\n# You can set these variables from the command line.\nSPHINXOPTS    =\nSPHI"
  },
  {
    "path": "docs/README.md",
    "chars": 618,
    "preview": "# verl documentations\n\n## Build the docs\n\n```bash\n# If you want to view auto-generated API docstring, please make sure v"
  },
  {
    "path": "docs/README_vllm0.7.md",
    "chars": 3027,
    "preview": "# Upgrading to vllm >= 0.7\n\nNote: verl+vllm 0.8.3 is now stable. Please see ``docs/README_vllm0.8.md`` for upgrade guide"
  },
  {
    "path": "docs/README_vllm0.8.md",
    "chars": 1527,
    "preview": "# Upgrading to vLLM >= 0.8\n\nLast updated: 05/04/2025.\n\n## Installation\n\nNote: This version of verl+vLLM 0.8+ supports **"
  },
  {
    "path": "docs/_static/custom.css",
    "chars": 4891,
    "preview": "/* Make the documentation use full screen width */\n.wy-nav-content {\n    max-width: none !important;\n    width: 100% !im"
  },
  {
    "path": "docs/_static/js/resizable-sidebar.js",
    "chars": 8973,
    "preview": "// Resizable sidebar functionality\ndocument.addEventListener('DOMContentLoaded', function() {\n    const sidebar = docume"
  },
  {
    "path": "docs/_static/js/runllm-widget.js",
    "chars": 618,
    "preview": "document.addEventListener(\"DOMContentLoaded\", function () {\n    var script = document.createElement(\"script\");\n    scrip"
  },
  {
    "path": "docs/advance/agent_loop.rst",
    "chars": 9961,
    "preview": "Agent Loop\n==========\n\nLast updated: 07/17/2025.\n\n.. versionadded:: 0.4.2\n   [status: alpha]\n\n.. warning::\n   Agent Loop"
  },
  {
    "path": "docs/advance/async-on-policy-distill.md",
    "chars": 13439,
    "preview": "# Recipe: Async On-Policy Knowledge Distillation Trainer\n\n**Authors:** Brilliant Hanabi, furunding\n\n**Last updated:** 20"
  },
  {
    "path": "docs/advance/attention_implementation.rst",
    "chars": 4358,
    "preview": ".. _attention-implementation-override:\n\nAttention Implementation Override\n==================================\n\nLast updat"
  },
  {
    "path": "docs/advance/checkpoint.rst",
    "chars": 7652,
    "preview": ".. _checkpoint-page:\n\nUsing Checkpoints to Support Fault Tolerance Training\n============================================"
  },
  {
    "path": "docs/advance/dpo_extension.rst",
    "chars": 9704,
    "preview": "Extend to other RL(HF) algorithms\n=================================\n\nLast updated: 02/25/2025.\n\nWe already implemented t"
  },
  {
    "path": "docs/advance/fp8.md",
    "chars": 6549,
    "preview": "# FP8 RL in verl\n\nLast updated: 03/05/2026\n\nverl supports two FP8 modes for accelerating RL training:\n\n| Mode | Training"
  },
  {
    "path": "docs/advance/fsdp_extension.rst",
    "chars": 4264,
    "preview": "\nAdd models with the FSDP backend\n==================================\n\nLast updated: 02/09/2025.\n\nModel\n-----------------"
  },
  {
    "path": "docs/advance/fully_async.md",
    "chars": 39106,
    "preview": "# Recipe: Fully Async Policy Trainer\n\n**Author:** `https://github.com/meituan-search`\n\nLast updated: 02/05/2026.\n\nThis d"
  },
  {
    "path": "docs/advance/grafana_prometheus.md",
    "chars": 8492,
    "preview": "# Use Prometheus and Grafana to Monitor Rollout\n\n**Author:** `https://github.com/meituan-search`\n\nLast updated: 12/05/20"
  },
  {
    "path": "docs/advance/megatron_extension.rst",
    "chars": 766,
    "preview": "Add models with the Megatron-LM backend\n=========================================\n\nLast updated: 04/25/2025.\n\nModel\n----"
  },
  {
    "path": "docs/advance/mtp.md",
    "chars": 7179,
    "preview": "# Guide to Using MTP in SFT/RL Training and Inference\n\n**Author**: `https://github.com/meituan-search`\n\nLast updated: 02"
  },
  {
    "path": "docs/advance/one_step_off.md",
    "chars": 14897,
    "preview": "# Recipe: One Step Off Policy Async Trainer\n\n**Author:** `https://github.com/meituan-search`\n\nLast updated: 07/17/2025.\n"
  },
  {
    "path": "docs/advance/placement.rst",
    "chars": 456,
    "preview": "Ray API Design Tutorial\n=======================================\n\nLast updated: 10/30/2024.\n\nWe provide a tutorial for ou"
  },
  {
    "path": "docs/advance/ppo_lora.rst",
    "chars": 11369,
    "preview": "RL(HF) algorithms with LoRA Support\n===========================================\n\nLast updated: 02/03/2026.\n\nWe support L"
  },
  {
    "path": "docs/advance/reward_loop.rst",
    "chars": 13236,
    "preview": "Reward Loop\n===========\n\n.. _yyding: https://yyding1.github.io\n\nAuthor: `Yuyang Ding <https://yyding1.github.io>`_\n\nLast"
  },
  {
    "path": "docs/advance/rollout_skip.rst",
    "chars": 2285,
    "preview": "RolloutSkip Function Usage Documentation\n========================================\n\nLast updated: 08/01/2025.\n\nApplicable"
  },
  {
    "path": "docs/advance/rollout_trace.rst",
    "chars": 8833,
    "preview": "Trace Function Usage Instructions\n========================================\n\nLast updated: 07/10/2025.\n\nApplicable Scenar"
  },
  {
    "path": "docs/advance/rope.rst",
    "chars": 1168,
    "preview": "RoPE Scaling override\n=======================================\n\nLast updated: 05/14/2025.\n\nSome models such as `Qwen/Qwen"
  },
  {
    "path": "docs/algo/baseline.md",
    "chars": 12036,
    "preview": "# Algorithm Baselines\n\nLast updated: 06/18/2025.\n\n## Math related datasets\n\n### GSM8k\n\nAssuming GSM8k/math dataset is pr"
  },
  {
    "path": "docs/algo/collabllm.md",
    "chars": 6215,
    "preview": "# Recipe: CollabLLM \n\nLast updated: 09/22/2025.\n\n> Open-Source Algorithm Implementation & Expriement Running: [Haiquan C"
  },
  {
    "path": "docs/algo/dapo.md",
    "chars": 10546,
    "preview": "# Recipe: Decoupled Clip and Dynamic Sampling Policy Optimization (DAPO)\n\nLast updated: 06/19/2025.\n\n> Open-Source Algor"
  },
  {
    "path": "docs/algo/dppo.md",
    "chars": 4958,
    "preview": "# Divergence Proximal Policy Optimization (DPPO)\n\nLast updated: 02/25/2026.\n\n\n<div align=\"center\">\n\n## Rethinking the Tr"
  },
  {
    "path": "docs/algo/entropy.md",
    "chars": 7904,
    "preview": "# Recipe: Entropy Mechanism\n\nLast updated: 06/27/2025.\n\n\n<div align=\"center\">\n\n  The Entropy Mechanism of Reinforcement "
  },
  {
    "path": "docs/algo/gpg.md",
    "chars": 1563,
    "preview": "# GPG: Group Policy Gradient\n\nLast updated: 07/03/2025.\n\nGroup Policy Gradient (GPG) is a minimalist reinforcement learn"
  },
  {
    "path": "docs/algo/grpo.md",
    "chars": 5761,
    "preview": "# Group Relative Policy Optimization (GRPO)\n\nLast updated: 05/31/2025.\n\nIn reinforcement learning, classic algorithms li"
  },
  {
    "path": "docs/algo/opo.md",
    "chars": 2250,
    "preview": "# On-Policy RL with Optimal Reward Baseline (OPO)\n\nLast updated: 06/02/2025.\n\nLoose on-policy constraints and suboptimal"
  },
  {
    "path": "docs/algo/otb.md",
    "chars": 4773,
    "preview": "# Optimal Token Baseline (OTB)\n\nLast updated: 02/23/2026.\n\n📝 [ArXiv](https://www.arxiv.org/abs/2602.07078) | 📒 [Blog](ht"
  },
  {
    "path": "docs/algo/ppo.md",
    "chars": 6886,
    "preview": "# Proximal Policy Optimization (PPO)\n\nLast updated: 06/19/2025.\n\nProximal Policy Optimization (PPO) is a family of polic"
  },
  {
    "path": "docs/algo/rollout_corr.md",
    "chars": 53018,
    "preview": "# Rollout Correction\n\n**Author:** [Yingru Li](https://richardli.xyz/)\n\nLast updated: 10/30/2025.\n\n---\n\n> **📖 Documentati"
  },
  {
    "path": "docs/algo/rollout_corr_math.md",
    "chars": 47622,
    "preview": "# Mathematical Formulations of Rollout Correction Methods in `verl`\n\n**Author:** [Yingru Li](https://richardli.xyz)\n**La"
  },
  {
    "path": "docs/algo/spin.md",
    "chars": 11476,
    "preview": "# Recipe: Self-Play Fine-Tuning (SPIN)\n\nLast updated: 05/31/2025.\n\n`verl` provides a recipe inspired by the paper **\"Sel"
  },
  {
    "path": "docs/algo/sppo.md",
    "chars": 3214,
    "preview": "# Recipe: Self-Play Preference Optimization (SPPO)\n\nLast updated: 05/28/2025.\n\nverl provides a community recipe implemen"
  },
  {
    "path": "docs/amd_tutorial/amd_build_dockerfile_page.rst",
    "chars": 28799,
    "preview": "Getting started with AMD (ROCM Kernel)\n=====================================================\n\nLast updated: 07/06/2025.\n"
  },
  {
    "path": "docs/amd_tutorial/amd_vllm_page.rst",
    "chars": 2579,
    "preview": "verl performance tuning for AMD (ROCm Kernel)\n=====================================================\n\nLast updated: 11/13"
  },
  {
    "path": "docs/api/data.rst",
    "chars": 2276,
    "preview": "Data interface\n=========================\n\nLast updated: 05/19/2025 (API docstrings are auto-generated).\n\nDataProto is th"
  },
  {
    "path": "docs/api/single_controller.rst",
    "chars": 1040,
    "preview": "Single Controller interface\n============================\n\nLast updated: 05/27/2025 (API docstrings are auto-generated).\n"
  },
  {
    "path": "docs/api/trainer.rst",
    "chars": 855,
    "preview": "Trainer Interface\n================================\n\nLast updated: 06/08/2025 (API docstrings are auto-generated).\n\nTrain"
  },
  {
    "path": "docs/api/utils.rst",
    "chars": 1843,
    "preview": "Utilities\n============\n\nLast updated: 05/19/2025 (API docstrings are auto-generated).\n\nThis section documents the utilit"
  },
  {
    "path": "docs/ascend_tutorial/contribution_guide/ascend_ci_guide_zh.rst",
    "chars": 5981,
    "preview": "NPU-CI 添加指导\n===========\n\nLast updated: 02/02/2026.\n\n我们在 verl 上增加基于华为昇腾设备的CI用例添加指导。\n\nverl 仓库使用 GitHub Actions 作为 CI 平台，通过"
  },
  {
    "path": "docs/ascend_tutorial/examples/ascend_performance_analysis_guide.md",
    "chars": 6925,
    "preview": "# Ascend Performance Analysis Guide\n\nLast updated: 02/24/2026.\n\n## 背景介绍\n\n随着DeepSeek-R1的发布，大模型强化学习（RL）训练受到广泛关注。在昇腾NPU环境下，"
  },
  {
    "path": "docs/ascend_tutorial/examples/ascend_retool_best_pratice.rst",
    "chars": 8330,
    "preview": "Ascend Retool Best Practice\n===================================\n\nLast updated: 03/01/2026.\n\n引言\n-------------------------"
  },
  {
    "path": "docs/ascend_tutorial/examples/ascend_sglang_best_practices.rst",
    "chars": 9118,
    "preview": "Ascend SGLang Best Practice\n===================================\n\nLast updated: 01/27/2026.\n\n.. _Qwen3-30B: https://githu"
  },
  {
    "path": "docs/ascend_tutorial/examples/dapo_multi_model_optimization_practice.md",
    "chars": 9330,
    "preview": "# DAPO multi model optimization practice\n\n## DAPO 介绍\n\nLast updated: 03/04/2026.\n\nDAPO的论文可以参考：[DAPO](https://arxiv.org/pd"
  },
  {
    "path": "docs/ascend_tutorial/examples/gspo_optimization_practice.md",
    "chars": 12902,
    "preview": "# NPU Qwen3-32B GSPO Optimization Practice\r\n\r\nLast updated: 02/26/2026.\r\n\r\n本文章对应脚本地址：[qwen3_32b_gspo_npu](https://github"
  },
  {
    "path": "docs/ascend_tutorial/examples/run_qwen3_32B_megatron_1k_256k_npu.md",
    "chars": 5773,
    "preview": "# Long Sequence Qwen3-32B 1k-to-256k Example\n\nLast updated: 6/3/2026.\n\n本章对Qwen3-32B进行了长序列开发。Qwen3-32B的模型能力为最长推到40k\n\n## 全"
  },
  {
    "path": "docs/ascend_tutorial/faq/faq.rst",
    "chars": 27,
    "preview": "Last updated: 03/16/2026.\r\n"
  },
  {
    "path": "docs/ascend_tutorial/features/ascend_backend_features.md",
    "chars": 13689,
    "preview": "# Ascend Backend Features Guide\n==================================================================================\n\nLast"
  },
  {
    "path": "docs/ascend_tutorial/features/ascend_consistency.rst",
    "chars": 1211,
    "preview": "推理一致性指导\n====================================\n\n在昇腾设备上对齐verl和vLLM两个框架下的推理结果。\n\nLast updated: 11/17/2025.\n\n这是一份在昇腾设备上对齐verl和"
  },
  {
    "path": "docs/ascend_tutorial/profiling/ascend_profiling_en.rst",
    "chars": 18428,
    "preview": "Profiling Data Collection Guide\n========================================================================================"
  },
  {
    "path": "docs/ascend_tutorial/profiling/ascend_profiling_zh.rst",
    "chars": 13045,
    "preview": "Profiling采集指导\n==================================================================================\n\nLast updated: 12/20/20"
  },
  {
    "path": "docs/ascend_tutorial/quick_start/ascend_quick_start.rst",
    "chars": 22615,
    "preview": "Ascend Quickstart\n===================================\n\nLast updated: 03/03/2026.\n\n\n\n关键更新\n-------------------------------"
  },
  {
    "path": "docs/ascend_tutorial/quick_start/ascend_sglang_quick_start.rst",
    "chars": 5447,
    "preview": "Ascend Quickstart with SGLang Backend\n===================================\n\nLast updated: 01/27/2026.\n\n我们在 verl 上增加对华为昇腾设"
  },
  {
    "path": "docs/ascend_tutorial/quick_start/dockerfile_build_guidance.rst",
    "chars": 3266,
    "preview": "Ascend Dockerfile Build Guidance\n===================================\n\nLast updated: 03/03/2025.\n\n\n镜像获取 & 公开镜像地址\n--------"
  },
  {
    "path": "docs/blog/v0.7.md",
    "chars": 16929,
    "preview": "# verl 0.7 release blog\n\n**Author:** verl team\n\nLast updated: 01/03/2026.\n\n## Overview\nverl adopts a Hybrid-Controller a"
  },
  {
    "path": "docs/conf.py",
    "chars": 3870,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\r\n#\r\n# Licensed under the Apache License, Version 2.0 (the \"License"
  },
  {
    "path": "docs/data/transfer_queue.md",
    "chars": 18128,
    "preview": "# TransferQueue Data System\n\nLast updated: 01/07/2026.\n\nThis doc introduce [TransferQueue](https://gitcode.com/Ascend/Tr"
  },
  {
    "path": "docs/examples/config.rst",
    "chars": 34061,
    "preview": ".. _config-explain-page:\n\nConfig Explanation\n===================\n\nLast updated: 06/18/2025.\n\nppo_trainer.yaml for RL FSD"
  },
  {
    "path": "docs/examples/gsm8k_example.rst",
    "chars": 6919,
    "preview": "GSM8K Example\n=============\n\nLast updated: 03/25/2025.\n\nIntroduction\n------------\n\nIn this example, we train an LLM to t"
  },
  {
    "path": "docs/examples/multi_modal_example.rst",
    "chars": 944,
    "preview": "Multi-Modal Example Architecture\n=================================\n\nLast updated: 04/28/2025.\n\nIntroduction\n------------"
  },
  {
    "path": "docs/examples/ppo_code_architecture.rst",
    "chars": 9013,
    "preview": "PPO Example Architecture\n========================\n\nLast updated: 02/17/2025.\n\nLet's start with the Proximal Policy Optim"
  },
  {
    "path": "docs/examples/sandbox_fusion_example.rst",
    "chars": 2848,
    "preview": "Sandbox Fusion Example\n============================\n\nLast updated: 06/27/2025.\n\nIntroduction\n------------\n\nSandbox Fusio"
  },
  {
    "path": "docs/examples/skypilot_examples.rst",
    "chars": 4211,
    "preview": "SkyPilot Examples\n=================\n\nLast updated: 09/04/2025.\n\nThis guide provides examples of running VERL reinforceme"
  },
  {
    "path": "docs/faq/faq.rst",
    "chars": 10846,
    "preview": "Frequently Asked Questions\n====================================\n\nLast updated: 09/24/2025.\n\nRay related\n------------\n\nHo"
  },
  {
    "path": "docs/hybrid_flow.rst",
    "chars": 15395,
    "preview": "=========================================================\nHybridFlow Programming Guide\n================================="
  },
  {
    "path": "docs/index.rst",
    "chars": 6918,
    "preview": "Welcome to verl's documentation!\n================================================\n\nverl is a flexible, efficient and pro"
  },
  {
    "path": "docs/perf/best_practices.rst",
    "chars": 16759,
    "preview": "Verl LLM Best Practices (DAPO + Qwen3-235B)\n===========================================\n\nLast updated: 11/03/2025.\n\nPurp"
  },
  {
    "path": "docs/perf/device_tuning.rst",
    "chars": 7561,
    "preview": "Hardware Resource Needed for RL\n===============================\n\nLast updated: 06/25/2025.\n\nSince RL requires more resou"
  },
  {
    "path": "docs/perf/dpsk.md",
    "chars": 5053,
    "preview": "# Training DeepSeek 671b\n\nLast updated: 08/20/2025.\n\nverl integrates Megatron to support large MoE models such as `Qwen3"
  },
  {
    "path": "docs/perf/nsight_profiling.md",
    "chars": 5667,
    "preview": "# NVIDIA Nsight Systems profiling in verl\n\nLast updated: 06/20/2025.\n\nThis guide explains how to use NVIDIA Nsight Syste"
  },
  {
    "path": "docs/perf/perf_tuning.rst",
    "chars": 12777,
    "preview": "Performance Tuning Guide\n==============================\n\nLast updated: 07/17/2025.\n\nAuthor: `Guangming Sheng <https://gi"
  },
  {
    "path": "docs/perf/perf_tuning_on_ascend.rst",
    "chars": 8962,
    "preview": "Performance Tuning Guide on Ascend\n====================================\n\nLast updated:  01/29/2026.\n\nAuthor:  `Xiaobo Hu"
  },
  {
    "path": "docs/perf/torch_profiling.md",
    "chars": 4100,
    "preview": "# PyTorch Profiling in verl\n\nLast updated: 01/13/2026.\n\nThis guide explains how to use the native [PyTorch Profiler](htt"
  },
  {
    "path": "docs/perf/verl_profiler_system.md",
    "chars": 1909,
    "preview": "# verl Profiler System\n\nLast updated: 08/18/2025.\n\n## Architecture\n\nThe architecture of verl profiler system is like bel"
  },
  {
    "path": "docs/preparation/prepare_data.rst",
    "chars": 4352,
    "preview": "Prepare Data for Post-Training\n========================================\n\nLast updated: 02/09/2025.\n\nBefore starting the "
  },
  {
    "path": "docs/preparation/reward_function.rst",
    "chars": 3816,
    "preview": "Implement Reward Function for Dataset\n======================================\n\nLast updated: 06/02/2025.\n\nFor each datase"
  },
  {
    "path": "docs/requirements-docs.txt",
    "chars": 238,
    "preview": "# markdown support\r\nrecommonmark\r\nmyst_parser\r\n# markdown table support\r\nsphinx-markdown-tables\r\n\r\n# theme default rtd\r\n"
  },
  {
    "path": "docs/sglang_multiturn/interaction_system.rst",
    "chars": 14892,
    "preview": "Interaction System for Multi-turn RL Training\n=============================================\n\nLast updated: 06/25/2025.\n\n"
  },
  {
    "path": "docs/sglang_multiturn/multiturn.rst",
    "chars": 16054,
    "preview": "Multi-turn Rollout Support\n==========================\n\nLast updated: 06/27/2025.\n\nBasic Configuration\n~~~~~~~~~~~~~~~~~~"
  },
  {
    "path": "docs/sglang_multiturn/sandbox_fusion.rst",
    "chars": 16637,
    "preview": "===============================\nSandbox Fusion Tool Integration\n===============================\n\nLast updated: 06/10/202"
  },
  {
    "path": "docs/sglang_multiturn/search_tool_example.rst",
    "chars": 7927,
    "preview": "=======================\r\nSearch Tool Integration\r\n=======================\r\n\r\nLast updated: 05/30/2025.\r\n\r\nIntroduction\r\n"
  },
  {
    "path": "docs/single_controller.rst",
    "chars": 12896,
    "preview": "The Design of ``verl.single_controller``\n==============================================\n\nLast updated: 05/21/2025.\n\n**Au"
  },
  {
    "path": "docs/start/agentic_rl.rst",
    "chars": 7414,
    "preview": "Agentic RL Training\n===================\n\nLast updated: 07/15/2025.\n\nOverview\n----------\nThe goal of Agentic RL is to imp"
  },
  {
    "path": "docs/start/install.rst",
    "chars": 11704,
    "preview": "Installation\n============\n\nRequirements\n------------\n\n- **Python**: Version >= 3.10\n- **CUDA**: Version >= 12.8\n\nverl su"
  },
  {
    "path": "docs/start/more_resources.rst",
    "chars": 295,
    "preview": "More Resources\n==============\n\nLast updated: 06/30/2025.\n\n- Introduction to verl (`Slides <https://tongyx361.github.io/b"
  },
  {
    "path": "docs/start/multinode.rst",
    "chars": 31618,
    "preview": "Multinode Training\n==================\n\nLast updated: 06/10/2025.\n\n.. _wuxibin89: https://github.com/wuxibin89\n\nAuthor: `"
  },
  {
    "path": "docs/start/quickstart.rst",
    "chars": 8485,
    "preview": ".. _quickstart:\n\n=========================================================\nQuickstart: PPO training on GSM8K dataset\n==="
  },
  {
    "path": "docs/start/ray_debug_tutorial.rst",
    "chars": 3536,
    "preview": "Ray Debug Tutorial\r\n==================\r\n\r\nLast updated: 04/23/2025\r\n\r\n\r\n.. _wuxibin89: https://github.com/wuxibin89\r\n\r\nA"
  },
  {
    "path": "docs/workers/automodel_workers.rst",
    "chars": 1953,
    "preview": "Automodel Backend\n=================\n\nLast updated: 03/07/2026.\n\nWe support the Automodel (nemo_automodel) backend by imp"
  },
  {
    "path": "docs/workers/fsdp_workers.rst",
    "chars": 3867,
    "preview": "PyTorch FSDP Backend\n======================\n\nLast updated: 12/01/2025.\n\nWe support PyTorch FSDP Backend by implementing "
  },
  {
    "path": "docs/workers/megatron_workers.rst",
    "chars": 10855,
    "preview": "Megatron-LM Backend\n===================\n\nLast updated: 12/01/2025.\n\nWe support Megatron Backend by implementing various "
  },
  {
    "path": "docs/workers/model_engine.rst",
    "chars": 5016,
    "preview": "Model Engine\n============\n\n.. _vermouth: https://github.com/vermouth1992\n\nAuthor: `Chi Zhang <https://github.com/vermout"
  },
  {
    "path": "docs/workers/ray_trainer.rst",
    "chars": 11740,
    "preview": "PPO Ray Trainer\n===============\n\nLast updated: 02/12/2025.\n\nWe implement the RayPPOTrainer, which is a trainer runs on t"
  },
  {
    "path": "docs/workers/sglang_worker.rst",
    "chars": 10820,
    "preview": "SGLang Backend\n==============\n\nLast updated: 05/31/2025.\n\n**Authored By SGLang RL Team and listed alphabetically by last"
  }
]

// ... and 928 more files (download for full content)

About this extraction

This page contains the full source code of the verl-project/verl GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 1128 files (8.0 MB), approximately 2.2M tokens, and a symbol index with 4887 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo