Full Code of sii-research/siiRL for AI

main 89d8764b6133 cached

391 files

3.8 MB

1.0M tokens

2598 symbols

1 requests

Download .txt

Showing preview only (4,067K chars total). Download the full file or copy to clipboard to get everything.

Repository: sii-research/siiRL
Branch: main
Commit: 89d8764b6133
Files: 391
Total size: 3.8 MB

Directory structure:
gitextract_ufof6x83/

├── .gitignore
├── .pre-commit-config.yaml
├── .readthedocs.yaml
├── CONTRIBUTING.md
├── LICENSE
├── README-zh.md
├── README.md
├── docker/
│   ├── Dockerfile.cu124
│   └── Dockerfile.cu126
├── docs/
│   ├── Makefile
│   ├── conf.py
│   ├── examples/
│   │   ├── config.rst
│   │   ├── cpgd_example.rst
│   │   ├── deepscaler_example.rst
│   │   ├── embodied_srpo_example.rst
│   │   ├── megatron_backend_example.rst
│   │   └── mm_eureka_example.rst
│   ├── hardware_tutorial/
│   │   ├── ascend_profiling_en.rst
│   │   ├── ascend_quickstart.rst
│   │   └── metax_quickstart.rst
│   ├── index.rst
│   ├── preparation/
│   │   ├── prepare_data.rst
│   │   └── reward_function.rst
│   ├── programming_guide/
│   │   ├── code_structure.rst
│   │   ├── siiRL_code_explained.rst
│   │   ├── siirl_architecture_guide.rst
│   │   └── srpo_code_explained.rst
│   ├── requirements-docs.txt
│   ├── start/
│   │   ├── install.rst
│   │   └── quickstart.rst
│   └── user_interface/
│       ├── filter_interface.rst
│       ├── metrics_interface.rst
│       ├── pipeline_interface.rst
│       └── reward_interface.rst
├── examples/
│   ├── cpgd_trainer/
│   │   ├── run_qwen2_5-7b.sh
│   │   ├── run_qwen2_5_vl-72b.sh
│   │   ├── run_qwen2_5_vl-7b.sh
│   │   ├── run_qwen3-1.7b.sh
│   │   └── run_qwen3-8b.sh
│   ├── custom_pipeline_example/
│   │   └── custom_grpo.py
│   ├── custom_reward/
│   │   ├── rewardfunc_gsm8k.py
│   │   └── run_qwen2_5-7b-custom_reward.sh
│   ├── dapo_trainer/
│   │   ├── run_qwen2_5-7b.sh
│   │   ├── run_qwen3-235b-megatron-gspo.sh
│   │   └── run_qwen3-8b.sh
│   ├── data_preprocess/
│   │   ├── deepscaler.py
│   │   ├── geo3k.py
│   │   ├── gsm8k.py
│   │   ├── math_dataset.py
│   │   └── mm_eureka.py
│   ├── embodied_srpo_trainer/
│   │   ├── run_openvla_oft_libero_goal.sh
│   │   ├── run_openvla_oft_libero_long.sh
│   │   ├── run_openvla_oft_libero_object.sh
│   │   └── run_openvla_oft_libero_spatial.sh
│   ├── experimental/
│   │   ├── marft/
│   │   │   ├── config/
│   │   │   │   ├── code_env.py
│   │   │   │   ├── math_env.py
│   │   │   │   ├── process.py
│   │   │   │   ├── workflow_marft.yaml
│   │   │   │   └── workflow_marft_code.yaml
│   │   │   └── run_qwen2_5-3b_marft.sh
│   │   └── multiturn_server/
│   │       └── run_qwen2_5-3b_grpo_multiturn_vllm.sh
│   ├── grpo_trainer/
│   │   ├── run_qwen2_5-32b-metax.sh
│   │   ├── run_qwen2_5-32b-npu.sh
│   │   ├── run_qwen2_5-72b-npu.sh
│   │   ├── run_qwen2_5-7b-npu-e2e_prof.sh
│   │   ├── run_qwen2_5-7b-npu-mindspeed.sh
│   │   ├── run_qwen2_5-7b-npu.sh
│   │   ├── run_qwen2_5-7b.sh
│   │   ├── run_qwen2_5_vl-72b.sh
│   │   ├── run_qwen2_5_vl-7b-npu.sh
│   │   ├── run_qwen2_5_vl-7b.sh
│   │   ├── run_qwen3-235b-megatron.sh
│   │   ├── run_qwen3-235b-npu-mindspeed.sh
│   │   ├── run_qwen3-30b-npu-mindspeed.sh
│   │   ├── run_qwen3-8b-megatron.sh
│   │   └── run_qwen3-8b.sh
│   ├── gspo_trainer/
│   │   ├── run_qwen3-1.7b.sh
│   │   ├── run_qwen3-235b-megatron.sh
│   │   └── run_qwen3-30b-gspo-megatron.sh
│   ├── multi_turn/
│   │   ├── config/
│   │   │   ├── interaction_config/
│   │   │   │   └── gsm8k_interaction_config.yaml
│   │   │   └── tool_config/
│   │   │       └── gsm8k_tool_config.yaml
│   │   └── gsm8k/
│   │       └── run_qwen2_5-3b_grpo_multiturn_sglang.sh
│   └── ppo_trainer/
│       ├── run_qwen2_5-72b.sh
│       ├── run_qwen3-8b-megatron.sh
│       └── run_qwen3-8b.sh
├── pyproject.toml
├── requirements-npu.txt
├── requirements.txt
├── setup.py
├── siirl/
│   ├── __init__.py
│   ├── dag_worker/
│   │   ├── __init__.py
│   │   ├── checkpoint_manager.py
│   │   ├── constants.py
│   │   ├── core_algos.py
│   │   ├── dag_utils.py
│   │   ├── dagworker.py
│   │   ├── data_structures.py
│   │   ├── metric_aggregator.py
│   │   ├── metrics_collector.py
│   │   └── validator.py
│   ├── data_coordinator/
│   │   ├── __init__.py
│   │   ├── data_buffer.py
│   │   ├── dataloader/
│   │   │   ├── __init__.py
│   │   │   ├── data_loader_node.py
│   │   │   ├── embodied_preprocess.py
│   │   │   ├── partitioned_dataset.py
│   │   │   └── vision_utils.py
│   │   ├── protocol.py
│   │   └── sample.py
│   ├── engine/
│   │   ├── __init__.py
│   │   ├── actor/
│   │   │   ├── __init__.py
│   │   │   ├── base.py
│   │   │   ├── dp_actor.py
│   │   │   ├── embodied_actor.py
│   │   │   └── megatron_actor.py
│   │   ├── base_worker/
│   │   │   ├── __init__.py
│   │   │   ├── base/
│   │   │   │   ├── __init__.py
│   │   │   │   └── worker.py
│   │   │   ├── megatron/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── npu_mbridge_patch.py
│   │   │   │   └── worker.py
│   │   │   ├── register_center/
│   │   │   │   ├── __init__.py
│   │   │   │   └── register_center.py
│   │   │   └── resouce_pool.py
│   │   ├── critic/
│   │   │   ├── __init__.py
│   │   │   ├── base.py
│   │   │   ├── dp_critic.py
│   │   │   └── megatron_critic.py
│   │   ├── fsdp_workers.py
│   │   ├── megatron_workers.py
│   │   ├── reward_manager/
│   │   │   ├── __init__.py
│   │   │   ├── dapo.py
│   │   │   ├── embodied.py
│   │   │   ├── naive.py
│   │   │   └── parallel.py
│   │   ├── reward_model/
│   │   │   ├── __init__.py
│   │   │   ├── base.py
│   │   │   └── megatron/
│   │   │       ├── __init__.py
│   │   │       └── reward_model.py
│   │   ├── rollout/
│   │   │   ├── __init__.py
│   │   │   ├── async_server.py
│   │   │   ├── base.py
│   │   │   ├── embodied_rollout.py
│   │   │   ├── hf_rollout.py
│   │   │   ├── schemas.py
│   │   │   ├── sglang_rollout/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── async_sglang_server.py
│   │   │   │   ├── sglang_rollout.py
│   │   │   │   └── utils.py
│   │   │   └── vllm_rollout/
│   │   │       ├── __init__.py
│   │   │       ├── vllm_async_server.py
│   │   │       └── vllm_rollout_spmd.py
│   │   └── sharding_manager/
│   │       ├── __init__.py
│   │       ├── base.py
│   │       ├── fsdp_hf.py
│   │       ├── fsdp_sglang.py
│   │       ├── fsdp_ulysses.py
│   │       ├── fsdp_vllm.py
│   │       ├── megatron_sglang.py
│   │       └── megatron_vllm.py
│   ├── environment/
│   │   └── embodied/
│   │       ├── __init__.py
│   │       ├── adapters/
│   │       │   ├── __init__.py
│   │       │   └── libero.py
│   │       ├── base.py
│   │       └── venv.py
│   ├── execution/
│   │   ├── dag/
│   │   │   ├── __init__.py
│   │   │   ├── builtin_pipelines.py
│   │   │   ├── config_loader.py
│   │   │   ├── node.py
│   │   │   ├── pipeline.py
│   │   │   ├── task_graph.py
│   │   │   └── task_loader.py
│   │   ├── metric_worker/
│   │   │   ├── metric_worker.py
│   │   │   └── utils.py
│   │   ├── rollout_flow/
│   │   │   ├── multi_agent/
│   │   │   │   ├── multiagent_generate.py
│   │   │   │   └── utils.py
│   │   │   └── multiturn/
│   │   │       ├── __init__.py
│   │   │       ├── agent_loop/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── agent_loop.py
│   │   │       │   ├── single_turn_agent_loop.py
│   │   │       │   └── tool_agent_loop.py
│   │   │       ├── interactions/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── base.py
│   │   │       │   ├── gsm8k_interaction.py
│   │   │       │   └── utils/
│   │   │       │       ├── __init__.py
│   │   │       │       └── interaction_registry.py
│   │   │       └── tools/
│   │   │           ├── __init__.py
│   │   │           ├── base_tool.py
│   │   │           ├── geo3k_tool.py
│   │   │           ├── gsm8k_tool.py
│   │   │           ├── mcp_base_tool.py
│   │   │           ├── mcp_search_tool.py
│   │   │           ├── sandbox_fusion_tools.py
│   │   │           ├── schemas.py
│   │   │           ├── search_tool.py
│   │   │           └── utils/
│   │   │               ├── __init__.py
│   │   │               ├── mcp_clients/
│   │   │               │   ├── McpClientManager.py
│   │   │               │   ├── __init__.py
│   │   │               │   └── utils.py
│   │   │               ├── search_r1_like_utils.py
│   │   │               └── tool_registry.py
│   │   └── scheduler/
│   │       ├── __init__.py
│   │       ├── enums.py
│   │       ├── graph_updater.py
│   │       ├── launch.py
│   │       ├── process_group_manager.py
│   │       ├── ray_actor_manager.py
│   │       ├── resource_manager.py
│   │       ├── reward.py
│   │       └── task_scheduler.py
│   ├── main_dag.py
│   ├── models/
│   │   ├── __init__.py
│   │   ├── embodied/
│   │   │   ├── openvla/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── configuration_prismatic.py
│   │   │   │   ├── modeling_prismatic.py
│   │   │   │   └── processing_prismatic.py
│   │   │   └── openvla_oft/
│   │   │       ├── __init__.py
│   │   │       ├── configuration_prismatic.py
│   │   │       ├── constants.py
│   │   │       ├── modeling_prismatic.py
│   │   │       ├── processing_prismatic.py
│   │   │       └── train_utils.py
│   │   ├── llama/
│   │   │   ├── __init__.py
│   │   │   └── megatron/
│   │   │       ├── __init__.py
│   │   │       ├── checkpoint_utils/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── llama_loader.py
│   │   │       │   ├── llama_loader_depracated.py
│   │   │       │   └── llama_saver.py
│   │   │       ├── layers/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── parallel_attention.py
│   │   │       │   ├── parallel_decoder.py
│   │   │       │   ├── parallel_linear.py
│   │   │       │   ├── parallel_mlp.py
│   │   │       │   └── parallel_rmsnorm.py
│   │   │       └── modeling_llama_megatron.py
│   │   ├── loader.py
│   │   ├── mcore/
│   │   │   ├── __init__.py
│   │   │   ├── config_converter.py
│   │   │   ├── loader.py
│   │   │   ├── mbridge.py
│   │   │   ├── model_forward.py
│   │   │   ├── model_forward_fused.py
│   │   │   ├── model_initializer.py
│   │   │   ├── patch_v012.py
│   │   │   ├── registry.py
│   │   │   ├── saver.py
│   │   │   ├── util.py
│   │   │   └── weight_converter.py
│   │   ├── model_utils/
│   │   │   ├── __init__.py
│   │   │   └── visual.py
│   │   ├── patcher.py
│   │   ├── qwen2/
│   │   │   ├── __init__.py
│   │   │   └── megatron/
│   │   │       ├── __init__.py
│   │   │       ├── checkpoint_utils/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── qwen2_loader.py
│   │   │       │   ├── qwen2_loader_depracated.py
│   │   │       │   └── qwen2_saver.py
│   │   │       ├── layers/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── parallel_attention.py
│   │   │       │   ├── parallel_decoder.py
│   │   │       │   ├── parallel_linear.py
│   │   │       │   ├── parallel_mlp.py
│   │   │       │   └── parallel_rmsnorm.py
│   │   │       └── modeling_qwen2_megatron.py
│   │   ├── registry.py
│   │   ├── transformers/
│   │   │   ├── __init__.py
│   │   │   ├── internvl.py
│   │   │   ├── internvl_chat/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── configuration_intern_vit.py
│   │   │   │   ├── configuration_internlm2.py
│   │   │   │   ├── configuration_internvl_chat.py
│   │   │   │   ├── modeling_intern_vit.py
│   │   │   │   ├── modeling_internlm2.py
│   │   │   │   ├── modeling_internvl_chat.py
│   │   │   │   ├── tokenization_internlm2.py
│   │   │   │   └── tokenization_internlm2_fast.py
│   │   │   ├── kimi_vl.py
│   │   │   ├── llama.py
│   │   │   ├── monkey_patch.py
│   │   │   ├── npu_patch.py
│   │   │   ├── qwen2.py
│   │   │   ├── qwen2_5_vl.py
│   │   │   ├── qwen2_vl.py
│   │   │   └── transformers_compat.py
│   │   └── weight_loader_registry.py
│   ├── params/
│   │   ├── __init__.py
│   │   ├── dag_args.py
│   │   ├── data_args.py
│   │   ├── display_dict.py
│   │   ├── embodied_args.py
│   │   ├── model_args.py
│   │   ├── parser.py
│   │   ├── profiler_args.py
│   │   └── training_args.py
│   ├── third_party/
│   │   ├── __init__.py
│   │   └── sglang/
│   │       ├── __init__.py
│   │       └── parallel_state.py
│   ├── user_interface/
│   │   ├── filter_interface/
│   │   │   ├── __init__.py
│   │   │   ├── dapo.py
│   │   │   └── embodied.py
│   │   └── rewards_interface/
│   │       └── custom_gsm8k_reward.py
│   └── utils/
│       ├── __init__.py
│       ├── checkpoint/
│       │   ├── __init__.py
│       │   ├── checkpoint_manager.py
│       │   ├── fsdp_checkpoint_manager.py
│       │   └── megatron_checkpoint_manager.py
│       ├── debug/
│       │   ├── __init__.py
│       │   ├── mstx_profile.py
│       │   ├── performance.py
│       │   └── profile.py
│       ├── embodied/
│       │   ├── __init__.py
│       │   ├── libero_utils.py
│       │   ├── openvla_utils.py
│       │   └── video_emb.py
│       ├── experimental/
│       │   ├── __init__.py
│       │   └── torch_functional.py
│       ├── extras/
│       │   ├── __init__.py
│       │   ├── device.py
│       │   ├── fs.py
│       │   ├── hdfs_io.py
│       │   ├── import_utils.py
│       │   ├── misc.py
│       │   ├── net_utils.py
│       │   ├── packages.py
│       │   ├── patch.py
│       │   ├── py_functional.py
│       │   └── ray_utils.py
│       ├── import_string.py
│       ├── kernel/
│       │   ├── __init__.py
│       │   ├── kernels.py
│       │   └── linear_cross_entropy.py
│       ├── logger/
│       │   ├── __init__.py
│       │   ├── aggregate_logger.py
│       │   ├── logging_utils.py
│       │   └── tracking.py
│       ├── megatron/
│       │   ├── __init__.py
│       │   ├── dist_checkpointing.py
│       │   ├── megatron_utils.py
│       │   ├── memory.py
│       │   ├── memory_buffer.py
│       │   ├── optimizer.py
│       │   ├── pipeline_parallel.py
│       │   ├── sequence_parallel.py
│       │   └── tensor_parallel.py
│       ├── memory_utils.py
│       ├── metrics/
│       │   ├── __init__.py
│       │   └── metric_utils.py
│       ├── model_utils/
│       │   ├── __init__.py
│       │   ├── activation_offload.py
│       │   ├── attention_utils.py
│       │   ├── flops_counter.py
│       │   ├── fsdp_utils.py
│       │   ├── model.py
│       │   ├── npu_utils.py
│       │   ├── seqlen_balancing.py
│       │   ├── tensordict_utils.py
│       │   ├── torch_dtypes.py
│       │   ├── torch_functional.py
│       │   ├── ulysses.py
│       │   └── vllm_utils.py
│       └── reward_score/
│           ├── __init__.py
│           ├── embodied.py
│           ├── geo3k.py
│           ├── gsm8k.py
│           ├── math.py
│           ├── math_batch.py
│           ├── math_dapo.py
│           ├── math_verify.py
│           ├── mm_eureka.py
│           ├── prime_code/
│           │   ├── __init__.py
│           │   ├── testing_util.py
│           │   └── utils.py
│           ├── prime_math/
│           │   ├── __init__.py
│           │   ├── grader.py
│           │   └── math_normalize.py
│           ├── sandbox_fusion/
│           │   ├── __init__.py
│           │   └── utils.py
│           └── search_r1_like_qa_em.py
└── tests/
    ├── __init__.py
    ├── dag/
    │   ├── test_config_loader.py
    │   ├── test_node.py
    │   ├── test_task_graph.py
    │   └── test_task_loader.py
    ├── dag_worker/
    │   ├── test_dag_worker.py
    │   ├── test_dapo_merge.py
    │   └── test_dapo_pipeline.py
    ├── data_buffer/
    │   ├── detailed_put_performance_test.py
    │   ├── performance_test_data_buffer.py
    │   └── test_data_buffer.py
    └── scheduler/
        ├── test_process_group_manager.py
        └── test_task_scheduler.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
**/*.pt
**/checkpoints
**/wget-log
**/_build/
**/*.ckpt
**/outputs
**/*.tar.gz
**/playground
**/wandb

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
dataset/*
tensorflow/my_graph/*
.idea/
# C extensions
*.so

# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
tmp/
*.egg-info/
.installed.cfg
*.egg

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*,cover
.hypothesis/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# IPython Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# dotenv
.env

# virtualenv
venv/
.venv/
ENV/

# Spyder project settings
.spyderproject

# Rope project settings
.ropeproject

# vscode
.vscode

# Mac
.DS_Store

# vim
*.swp

# ckpt
*.lock

# data
*.parquet


# local logs
logs
log
outputs
.history

*tensorboard
tensorboard/

# version file
siirl/_version.py


================================================
FILE: .pre-commit-config.yaml
================================================

# Default list of files to exclude from checks.
# Add any other paths that should be ignored by all hooks.
exclude: |
  (?x)^(
      docs/.*|
      build/.*
  )$

repos:
-   repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v5.0.0
    hooks:
    -   id: trailing-whitespace
    -   id: end-of-file-fixer
    -   id: check-yaml
    -   id: check-added-large-files
        args: [--maxkb=500]
    -   id: check-case-conflict
    -   id: check-executables-have-shebangs
    -   id: check-merge-conflict
    -   id: check-symlinks
    -   id: detect-private-key

-   repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.12.6
    hooks:
    -   id: ruff
        args: ["--fix", "--show-fixes", "--output-format=full"]
    -   id: ruff-format

-   repo: https://github.com/codespell-project/codespell
    rev: v2.4.0
    hooks:
    -   id: codespell
        args:
          - --skip="*.json,*.txt"
          - --ignore-words-list=nd,repostory


================================================
FILE: .readthedocs.yaml
================================================
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Set the OS, Python version, and other tools you might need
build:
  os: ubuntu-22.04
  tools:
    python: "3.11"

# Build documentation in the "docs/" directory with Sphinx
sphinx:
   configuration: docs/conf.py

# Optionally, but recommended,
# declare the Python requirements required to build your documentation
# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
python:
  install:
    - requirements: docs/requirements-docs.txt
    - method: pip
      path: .

        

================================================
FILE: CONTRIBUTING.md
================================================
# Contributing to siiRL

Thank you for considering contributing to siiRL!

We welcome contributions in various forms, including but not limited to:
- Reporting a bug
- Submitting a fix
- Discussing the current state of the code
- Proposing new features
- Becoming a maintainer
- Review pull requests
- Add/Improve documentation
- ...

## Getting Started

To get started, please fork the latest branch.

### Reporting Bugs

If you find a bug, please open an issue on our GitHub repository. When you are creating a bug report, please include as many details as possible. Fill out the required template, detailed information helps us resolve issues faster.

### Suggesting Enhancements

If you have an idea for a new feature or an enhancement to an existing one, please open an issue on our GitHub repository. This allows for a discussion with the community and the project maintainers.

### Pull Requests

We actively welcome your pull requests.

1. Fork the repo and create your branch from `main`.
2. If you've added code that should be tested, add tests.
3. If you've changed APIs, update the documentation.
4. Ensure the test suite passes.
5. Make sure your code lints.
6. Issue that pull request!

## Styleguides

### Git Commit Messages

- Use the present tense ("Add feature" not "Added feature").
- Use the imperative mood ("Move A to..." not "Moves A to...").
- Limit the first line to 72 characters or less.
- Reference issues and pull requests liberally after the first line.

<!-- ### Code Style

We use XXX for code formatting and XXX for linting. Before submitting your pull request, please make sure your code is formatted and linted.

```bash
# Auto-format your code
pip install black
black .

# Lint your code
pip install ruff
ruff .
``` -->

## Any questions?

Don't hesitate to contact us if you have any questions. You can reach out to us by opening an issue on GitHub.

We are excited to see your contributions! 

================================================
FILE: LICENSE
================================================

                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: README-zh.md
================================================
<div align="center">
  <img src="asset/sii.png" width="100%"/>
  <br>
</div>
<br>

<h1 align="center">
siiRL: Shanghai Innovation Institute RL Framework for Advanced LLMs and Multi-Agent Systems
</h1>

<p align="center">
| <a href="https://arxiv.org/abs/2507.13833"><b>📄 论文</b></a> | 
| <a href="https://siirl.readthedocs.io/en/latest/index.html"><b>📚 文档</b></a> |
| <a href="asset/siiRL-feishu-group.png">
    <img src="asset/logo-feishu.png" alt="Feishu Group QR Code" height="15" /> 
    <b> 飞书群</b>
  </a> 
| <a href="asset/siiRL-wechat-group.png">
    <img src="asset/logo-wechat.png" alt="Wechat Group QR Code" height="15" /> 
    <b> 微信群</b>
  </a> 
| <a href="README.md"><b> English</b></a> |
</p>

**siiRL** 是一个新型的、**完全分布式的强化学习 (RL) 框架**，旨在突破大语言模型 (LLM) 后训练中的扩展性瓶颈，并支持未来的多智能体研究，由**上海创智学院**的研究人员开发。

通过移除主流框架中的中心化数据流控制器，siiRL 实现了**近线性的扩展能力**、**显著的吞吐量提升**，通过DAG模块化的设计获得了**极大的的灵活性**，为基于强化学习的 LLM 开发带来了全新的可能性。

---

## 🚀 亮点

+ **近线性扩展能力**: 多控制器模式通过将控制逻辑和数据管理分布到所有工作节点，消除了中心化瓶颈，从而实现了在数千张 GPU 上的近线性扩展。

+ **业界领先的吞吐量 (SOTA)**: 完全分布式的数据流架构最大限度地减少了通信和 I/O 开销，在数据密集型场景中实现了业界领先的吞吐量。

+ **灵活的 DAG 定义流水线**: 将您的算法逻辑与物理硬件解耦。通过 siiRL，您可以将复杂的 RL 工作流定义为一个简单的有向无环图 (DAG)，从而实现快速、经济且无需编写代码的实验。

+ **跨硬件兼容性**: siiRL 现已正式支持华为昇腾 (Ascend) NPU，为在不同硬件平台上进行训练和推理提供了高性能的替代方案。

+ **经过验证的性能与稳定性**: 在 7B 到 72B 尺寸的模型上进行了广泛的基准测试，siiRL 在各种任务中均表现出卓越的性能。其优势在长上下文和多模态训练等数据密集型工作负载中尤为明显。

---

## 📰 最新动态

* **[2025/11]**: siiRL 现已支持视觉-语言-动作（VLA）模型训练，基于 [SRPO (Self-Referential Policy Optimization for Vision-Language-Action Models)](https://arxiv.org/pdf/2511.15605) 算法，实现了机器人任务的具身强化学习训练。详细使用方法请参考[文档](/docs/examples/embodied_srpo_example.rst)。

* **[2025/09]**: siiRL 现已集成 Megatron 训练后端，并支持MoE模型训练。其性能已在 Qwen3-MoE 模型（30B、235B）上得到验证。

* **[2025/09]**: siiRL通过与华为昇腾、沐曦科技、阿里云等主要厂商合作，现已支持在其GPU 集群上从 32 卡稳定扩展至 1024 卡，线性扩展效率超过 90%。

* **[2025/09]**: siiRL 支持多智能体与环境之间进行多轮交互。

* **[2025/07]**: siiRL 为 LaMAS 新增了 [MARFT](https://arxiv.org/pdf/2504.16129) 支持，可通过 Flex-POMDP 对 LLM 多智能体进行强化学习微调。

* **[2025/07]**: siiRL 现已支持 [CPGD](https://arxiv.org/pdf/2505.12504v1)，这是一种通过正则化大幅度的策略更新来增强 RL 训练稳定性和性能的算法。

* **[2025/07]**: 我们很开心向开源社区发布 siiRL！欢迎查阅我们的[论文](https://arxiv.org/abs/2507.13833)，深入了解其架构和评测。

---

## 💡 架构概览

siiRL 是一个为大规模集群设计的完全分布式强化学习框架。siiRL 采用多控制器模式，将所有计算和数据流均匀地分派到每个 GPU。siiRL 由三个主要组件构成：DAG Planner，DAG Workers 和 Data Coordinator.

<div align="center">
  <img src="asset/overview.png" width="650px" alt="siiRL 架构概览">
  <p><i>图 1. siiRL 架构概览。</i></p>
</div>

siiRL 是一个**完全分布式、多控制器的架构**。

关键组件包括：
* **DAG Planner**: 将用户定义的 DAG 转换为序列化、可供每个DAG Worker执行的流水线。
* **DAG Workers**: 核心执行单元，每个DAG Worker绑定到单个 GPU，独立运行其分配的任务。
* **Data Coordinator**: 一组分布式组件（`分布式数据加载器`和`分布式数据缓冲区`），无需中央协调器即可管理从初始加载到中间数据重分配的整个数据生命周期。

### 典型支持的模型与算法

<table style="width: 100%; table-layout: auto; border-collapse: collapse;">
  <thead align="center" valign="bottom">
    <tr>
      <th style="min-width: 120px;">模型</th>
      <th style="min-width: 120px;">算法</th>
    </tr>
  </thead>
  <tbody valign="top">
    <tr>
      <td>
        <b>Qwen2.5 系列</b>
        <ul style="margin-left: 0; padding-left: 16px;">
          <li>Qwen2.5-7B </li>
          <li>Qwen2.5-72B </li>
          <li>Qwen2.5-VL-7B </li>
          <li>Qwen2.5-VL-72B </li>
        </ul>
        <b>Qwen3 系列</b>
        <ul style="margin-left: 0; padding-left: 16px;">
          <li>Qwen3-1.7B </li>
          <li>Qwen3-30B </li>
          <li>Qwen3-235B-A22B (MoE) </li>
        </ul>
        <b>VLA 模型</b>
        <ul style="margin-left: 0; padding-left: 16px;">
          <li>OpenVLA </li>
          <li>OpenVLA-OFT </li>
        </ul>
      </td>
      <td>
        <b>强化学习算法</b>
        <ul style="margin-left: 极: 0; padding-left: 16px;">
          <li>GRPO </li>
          <li>PPO </li>
          <li>DAPO </li>
          <li>GSPO </li>
        </ul>
      </td>
    </tr>
  </tbody>
</table>

## 🧪 实验评测

我们对 siiRL 的性能和扩展性进行了全面评测，并与业界领先的 RL 框架 verl 进行了比较。实验表明，siiRL 在所有指标上均表现出卓越的性能。

### 端到端吞吐量
在标准的 PPO 和 GRPO 算法下，siiRL 的吞吐量全面超越了基线系统。特别是在数据密集度更高的 GRPO 算法下，siiRL 通过其完全分布式的架构有效解决了数据瓶颈，实现了高达 **2.62 倍**的性能提升。

<p align="center">
<img src="asset/ppo_performance_comparison.png" width="80%" alt="PPO 算法性能对比"/>
<br>
<em>图 2:  PPO 算法下端到端性能对比</em>
</p>
<p align="center">
<img src="asset/grpo_performance_comparison.png" width="80%" alt="GRPO 算法性能对比"/>
<br>
<em>图 3: GRPO 算法下端到端性能对比</em>
</p>

### 大规模扩展性
siiRL 展示了近线性的扩展能力，可平滑扩展至 1024 张 GPU。相比之下，基线框架由于其单点数据瓶颈导致的 OOM (内存不足) 错误，在相同条件下运行失败。在基线系统所能支持的最大批量大小下，siiRL 的性能优势可高达 **7 倍**。

<p align="center">
<img src="asset/scaling_trend_new.png" width="80%" alt="siiRL 扩展性测试"/>
<br>
<em>图 4: siiRL 的扩展性测试</em>
</p>

<p align="center">
<img src="asset/batch_size_total_throughput_final.png" width="80%" alt="VLM 任务性能对比"/>
<br>
<em>图 5: 在基线系统最大负载下的性能对比</em>
</p>

### 长上下文性能
在处理长上下文任务时，数据传输开销成为主要瓶颈。siiRL 的分布式数据流设计使其性能优势随着上下文长度的增加而愈发明显，实现了高达 **2.03 倍**的吞吐量提升，并成功运行了基线系统无法处理的 72B 模型长上下文任务。

<p align="center">
<img src="asset/context_length_comparison_with_oom_label.png" width="80%" alt="长上下文性能对比"/>
<br>
<em>图 6: 长上下文场景下的性能对比</em>
</p>

### 模型收敛性
实验证实，siiRL 的性能优化并未以牺牲模型精度为代价。在超参数相同的情况下，siiRL 的奖励和熵收敛曲线与基线系统完全一致，同时将总训练时间**减少了 21%**。

<p align="center">
<img src="asset/reward_and_entropy_comparison_final.png" width="45%" alt="收敛曲线对比"/>
<br>
<em>图 7: 模型收敛曲线对比</em>
</p>

---

## 📚 相关资源

<a href="https://siirl.readthedocs.io/en/latest/index.html"><b>使用文档</b></a>

- <a href="https://siirl.readthedocs.io/en/latest/start/install.html"><b>安装指南</b></a>

- <a href="https://siirl.readthedocs.io/en/latest/start/quickstart.html"><b>快速入门: 运行 PPO/GRPO</b></a>

---

## 🗓️ 未来计划

siiRL 仍在积极开发中。我们对未来充满期待，并致力于在两个关键方向上扩展框架的功能：支持真实机器人 VLA 训练和训练推理分离。

### 具身 VLA 训练与真实世界部署
我们正在扩展视觉-语言-动作（VLA）能力，以支持**真实世界机器人部署**。

### 训练-推理分离架构
为增强部署灵活性和资源利用率，我们正在开发**解耦的训练-推理架构**。

---

## 🙏 致谢

我们首先要感谢开源 RL 框架 [verl](https://github.com/volcengine/verl)，我们使用它作为评测的主要基线系统。我们特别感谢其分层的 API 设计；我们复用了 verl 中的 `3DParallelWorker` 基类来管理 siiRL 中的系统组件。

siiRL 的构建也离不开其他优秀的开源项目。我们衷心感谢 PyTorch、Ray、vLLM、vLLM-Ascend 和 SGLang 团队的杰出工作。

我们的工作解决了研究过程中发现的扩展性问题并设计了灵活的工作流设计，并希望 siiRL 能为社区的共同进步做出积极贡献。

---

## 🖋️ 如何引用

如果您在研究中发现 siiRL 对您有帮助，请考虑引用我们的论文。

```bibtex
@misc{wang2025distflowfullydistributedrl,
      title={DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training}, 
      author={Zhixin Wang and Tianyi Zhou and Liming Liu and Ao Li and Jiarui Hu and Dian Yang and Jinlong Hou and Siyuan Feng and Yuan Cheng and Yuan Qi},
      year={2025},
      eprint={2507.13833},
      archivePrefix={arXiv},
      primaryClass={cs.DC},
      url={[https://arxiv.org/abs/2507.13833](https://arxiv.org/abs/2507.13833)}, 
}

================================================
FILE: README.md
================================================

<div align="center">
  <img src="asset/sii.png" width="100%"/>
  <br>
</div>
<br>

<h1 align="center">
siiRL: Shanghai Innovation Institute RL Framework for Advanced LLMs and Multi-Agent Systems
</h1>

<p align="center">
| <a href="https://arxiv.org/abs/2507.13833"><b>📄 Paper</b></a> 
| <a href="https://siirl.readthedocs.io/en/latest/index.html"><b>📚 Documentation</b></a> 
| <a href="asset/siiRL-feishu-group.png">
    <img src="asset/logo-feishu.png" alt="Feishu Group QR Code" height="15" /> 
    <b> Feishu Group</b>
  </a> 
| <a href="asset/siiRL-wechat-group.png">
    <img src="asset/logo-wechat.png" alt="Wechat Group QR Code" height="15" /> 
    <b> Wechat Group</b>
  </a> 
| <a href="README-zh.md"><b>🇨🇳 中文</b></a> |
</p>

**siiRL** is a novel, **fully distributed reinforcement learning (RL) framework** designed to break the scaling barriers in LLM post-training. Developed by researchers from **Shanghai Innovation Institute**, siiRL tackles the critical performance bottlenecks that limit current state-of-the-art systems.

By eliminating the centralized controller common in other frameworks, siiRL delivers **near-linear scalability**, **dramatic throughput gains**, and **unprecedented flexibility** for RL-based LLM development.

---

## 🚀 Highlights

+ **Near-Linear Scalability**: The multi-controller paradigm eliminates central bottlenecks by distributing control logic and data management across all workers, enabling near-linear scalability to thousands of GPUs.

+ **SOTA Throughput**: Fully distributed dataflow architecture minimizes communication and I/O overhead, achieving SOTA throughput in data-intensive scenarios.

+ **Flexible DAG-Defined Pipeline**: Decouple your algorithmic logic from the physical hardware. With siiRL, you can define complex RL workflows as a simple Directed Acyclic Graph (DAG), enabling rapid, cost-effective, and code-free experimentation.

+ **Cross-Hardware Compatibility**: siiRL now officially supports Huawei's Ascend NPUs, providing a high-performance alternative for training and inference on different hardware platforms.

+ **Proven Performance & Stability**: Extensively benchmarked on models from 7B to 72B, siiRL delivering excellent performance across a wide range of tasks. Its advantages are particularly evident in data-intensive workloads such as long-context and multi-modal training.

---

## 📰 News
* **[2025/11]**: siiRL now supports Vision-Language-Action (VLA) model training with [SRPO (Self-Referential Policy Optimization for Vision-Language-Action Models)](https://arxiv.org/pdf/2511.15605), enabling embodied RL training on robotics tasks. See the [documentation](/docs/examples/embodied_srpo_example.rst) for usage instructions.
* **[2025/09]**: Added an explanation of the siiRL [code implementation](/docs/code_explained/siiRL-code-explained.md) for interested users and developers. A [Chinese version](https://zhuanlan.zhihu.com/p/1951768778875605883) is also available on Zhihu.

* **[2025/09]**:siiRL now integrates Megatron training backend with support for MoE training. Performance has been validated on Qwen3-MoE models (30B, 235B).

* **[2025/09]**:siiRL now supports stable scaling on GPU clusters from 32 GPUs up to 1024 GPUs, with over 90% linear scalability efficiency, through collaboration with major manufacturers including Huawei Ascend, MetaX, and Alibaba PPU.

* **[2025/09]**:siiRL supports multi-turn interactions among multi-agents with the environment.

* **[2025/07]**:siiRL adds [MARFT](https://arxiv.org/pdf/2504.16129) support for LaMAS, enabling RL fine-tuning of multi-LLM agents via Flex-POMDP.

* **[2025/07]**: siiRL now supports [CPGD](https://arxiv.org/pdf/2505.12504v1), a novel algorithm that enhances RL training stability and performance by regularizing large policy updates.

* **[2025/07]**: We are excited to release siiRL to the open-source community! Check out our [paper](https://arxiv.org/abs/2507.13833) for a deep dive into the architecture and evaluation.

---

## 💡 Architecture Overview

siiRL is a fully distributed RL framework designed for scalability on large-scale clusters. siiRL employs a multi-controller paradigm that uniformly dispatches all computational and data flow across each GPU. siiRL consists of three main components: a DAG Planner, DAG Workers, and a Data Coordinator. 

<div align="center">
  <img src="asset/overview.png" width="650px" alt="Overview of siiRL">
  <p><i>Figure 1. Overview of siiRL.</i></p>
</div>

siiRL solves this problem with a **fully distributed, multi-controller architecture**.

Key components include:
* **DAG Planner**: Translates a user-defined logical workflow (DAG) into a serialized, executable pipeline for each worker.
* **DAG Workers**: The core execution units, with each worker bound to a single GPU, running its assigned tasks independently.
* **Data Coordinator**: A set of distributed components (`Distributed Dataloader` and `Distributed Databuffer`) that manage the entire data lifecycle, from initial loading to intermediate data redistribution, without a central coordinator.

### Typical Supported Models & Algorithms

<table style="width: 100%; table-layout: auto; border-collapse: collapse;">
  <thead align="center" valign="bottom">
    <tr>
      <th style="min-width: 120px;">Models</th>
      <th style="min-width: 120px;">Algorithms</th>
    </tr>
  </thead>
  <tbody valign="top">
    <tr>
      <td>
        <b>Qwen2.5 Series</b>
        <ul style="margin-left: 0; padding-left: 16px;">
          <li>Qwen2.5-7B </li>
          <li>Qwen2.5-72B </li>
          <li>Qwen2.5-VL-7B </li>
          <li>Qwen2.5-VL-72B </li>
        </ul>
        <b>Qwen3 Series</b>
        <ul style="margin-left: 0; padding-left: 16px;">
          <li>Qwen3-1.7B </li>
          <li>Qwen3-30B </li>
          <li>Qwen3-235B-A22B (MoE) </li>
        </ul>
        <b>VLA Models</b>
        <ul style="margin-left: 0; padding-left: 16px;">
          <li>OpenVLA </li>
          <li>OpenVLA-OFT </li>
        </ul>
      </td>
      <td>
        <b>Reinforcement Learning</b>
        <ul style="margin-left: 0; padding-left: 16px;">
          <li>GRPO </li>
          <li>PPO </li>
          <li>DAPO </li>
          <li>GSPO </li>
        </ul>
      </td>
    </tr>
  </tbody>
</table>

##  🧪 Experiment

We conducted a comprehensive evaluation of siiRL's performance and scalability across various scenarios, comparing it with the SOTA RL framework, verl. The experiments demonstrate that siiRL exhibits outstanding performance across all metrics.

### End-to-End Throughput
Under the standard PPO and GRPO algorithms, siiRL's throughput comprehensively surpasses the baseline. Particularly with the more data-intensive GRPO algorithm, siiRL effectively resolves data bottlenecks through its fully distributed architecture, achieving up to a 2.62x performance improvement.

<p align="center">
<img src="asset/ppo_performance_comparison.png" width="80%" alt="PPO Algorithm Performance Comparison"/>
<br>
<em>Figure 2: End-to-end performance comparison using the PPO algorithm </em>
</p>
<p align="center">
<img src="asset/grpo_performance_comparison.png" width="80%" alt="GRPO Algorithm Performance Comparison"/>
<br>
<em>Figure 3: End-to-end performance comparison using the GRPO algorithm </em>
</p>

### Large-Scale Scalability
siiRL demonstrates near-linear scalability, smoothly extending up to 1024 GPUs. In contrast, the baseline framework fails under identical conditions due to OOM errors caused by its single-point data bottleneck. At the maximum batch size the baseline can support, siiRL's performance advantage can be as high as 7x.

<p align="center">
<img src="asset/scaling_trend_new.png" width="80%" alt="siiRL Scalability Test"/>
<br>
<em>Figure 4: Near-linear scalability of siiRL on VLM models </em>
</p>

<p align="center">
<img src="asset/batch_size_total_throughput_final.png" width="80%" alt="VLM Task Performance Comparison"/>
<br>
<em>Figure 5: VLM task performance comparison under the baseline's maximum load </em>
</p>

### Long-Context Performance
When processing long-context tasks, data transfer overhead becomes a major bottleneck. siiRL's distributed dataflow design allows its performance advantage to become more pronounced as context length increases, achieving up to a 2.03x throughput improvement and successfully running a 72B model long-context task that the baseline could not handle.

<p align="center">
<img src="asset/context_length_comparison_with_oom_label.png" width="80%" alt="Long-Context Performance Comparison"/>
<br>
<em>Figure 6: Performance comparison in long-context scenarios </em>
</p>

### Model Convergence
Experiments confirm that siiRL's performance optimizations do not come at the cost of model accuracy. With identical hyperparameters, siiRL's reward and entropy convergence curves are identical to the baseline's, while reducing the total training time by 21%.

<p align="center">
<img src="asset/reward_and_entropy_comparison_final.png" width="45%" alt="Convergence Curve Comparison"/>
<br>
<em>Figure 7: Model convergence curve comparison </em>
</p>

---

## 📚 Resources

<a href="https://siirl.readthedocs.io/en/latest/index.html"><b>Documentation</b></a>

- <a href="https://siirl.readthedocs.io/en/latest/start/install.html"><b>Installation</b></a>

- <a href="https://siirl.readthedocs.io/en/latest/start/quickstart.html"><b>Quickstart: Running PPO/GRPO</b></a>

---

## 🗓️ Future Plans

siiRL is under active development. We are excited about the future and are focused on extending the framework's capabilities in two key directions: support training-tnference separation and real-robot VLA training.

###  Training-Inference Separation Architecture
To enhance deployment flexibility and resource utilization, we are developing a **decoupled training-inference architecture**.

###  Embodied VLA Training & Real-World Deployment
We are expanding our Vision-Language-Action (VLA) capabilities to support **real-world robotics deployment**.


We welcome community contributions! Please see our [Contributing Guide](CONTRIBUTING.md) to get started.

---

## 🙏 Acknowledgement

We would first like to thank the open-source RL framework [verl](https://github.com/volcengine/verl), which we used as a primary baseline for our evaluations. We would like to directly acknowledge its hierarchical API design; we reuse the 3DParallelWorker base class from verl to manage system components in siiRL.

siiRL is also built upon a foundation of other great open-source projects. We would like to thank the teams behind PyTorch, Ray, vLLM, vLLM-Ascend and SGLang for their incredible work.

Our work aims to address the scalability challenges identified during our research, and we hope siiRL can contribute positively to the community's collective progress.

---

## 🖋️ Citation

If you find siiRL useful in your research, please consider citing our paper.

```bibtex
@misc{wang2025distflowfullydistributedrl,
      title={DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training}, 
      author={Zhixin Wang and Tianyi Zhou and Liming Liu and Ao Li and Jiarui Hu and Dian Yang and Jinlong Hou and Siyuan Feng and Yuan Cheng and Yuan Qi},
      year={2025},
      eprint={2507.13833},
      archivePrefix={arXiv},
      primaryClass={cs.DC},
      url={https://arxiv.org/abs/2507.13833}, 
}
```



================================================
FILE: docker/Dockerfile.cu124
================================================
# Copyright 2025, Shanghai Innovation Institute. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

FROM nvcr.io/nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04

LABEL maintainer="SII AI Infra Team"

# base environment
RUN apt update \
    && apt install -y rdma-core ibverbs-providers ibverbs-utils   \
    && apt install -y python3 python3-pip \
    && ln -sf /usr/bin/python3 /usr/bin/python  \
    && python -m pip install -U pip \
    && pip install -U setuptools wheel

# dev tools
RUN apt install -y git cmake ninja-build vim

# python packages
RUN pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124   \
    && pip install flashinfer-python -i https://flashinfer.ai/whl/cu124/torch2.6/   \
    && pip install flash-attn==2.7.3 --no-build-isolation   \
    && pip install vllm==0.8.5.post1    \
    && pip install accelerate codetiming datasets dill hydra-core pandas wandb loguru tensorboard qwen_vl_utils \
    && pip install 'ray[default]>=2.47.1' \
    && pip install opentelemetry-exporter-prometheus==0.47b0 \
    && pip install mbridge \
    && pip install numpy==1.26.4 

# apex
RUN git clone https://github.com/NVIDIA/apex.git \
    && cd apex \
    && MAX_JOBS=16 pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./    \
    && cd .. && rm -rf apex

# optional: sglang
RUN pip install 'sglang[all]==0.4.6.post5'    \
    && pip install xgrammar==0.1.18

# Install TransformerEngine
RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@v2.3

# Install Megatron-LM
RUN pip3 install --no-deps --no-cache-dir git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.12.2


================================================
FILE: docker/Dockerfile.cu126
================================================
# Copyright 2025, Shanghai Innovation Institute. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

FROM nvcr.io/nvidia/cuda:12.6.3-cudnn-devel-ubuntu22.04

LABEL maintainer="SII AI Infra Team"

# base environment
RUN apt update \
    && apt install -y rdma-core ibverbs-providers ibverbs-utils libnuma-dev  \
    && apt install -y python3 python3-pip \
    && ln -sf /usr/bin/python3 /usr/bin/python  \
    && python -m pip install -U pip \
    && pip install -U setuptools wheel

# dev tools
RUN apt install -y git cmake ninja-build vim

# python packages
RUN pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1   \
    && pip install flash-attn==2.8.2 --no-build-isolation   \
    && pip install vllm==0.10.0    \
    && pip install accelerate codetiming datasets dill hydra-core pandas wandb loguru tensorboard qwen_vl_utils \
    && pip install mbridge \
    && pip install numpy==1.26.4 

# apex
RUN git clone https://github.com/NVIDIA/apex.git \
    && cd apex \
    && MAX_JOBS=16 pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./    \
    && cd .. && rm -rf apex

# optional: sglang
RUN pip install 'sglang[all]==0.4.10.post2' \
    && pip install outlines==1.2.3 xgrammar==0.1.21

# Install TransformerEngine
RUN export NVTE_FRAMEWORK=pytorch && pip3 install --resume-retries 999 --no-deps --no-cache-dir --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@v2.3

# Install Megatron-LM
RUN pip3 install --no-deps --no-cache-dir git+https://github.com/NVIDIA/Megatron-LM.git@core_v0.12.2

================================================
FILE: docs/Makefile
================================================
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS    ?=
SPHINXBUILD   ?= sphinx-build
SOURCEDIR     = .
BUILDDIR      = build

# Put it first so that "make" without argument is like "make help".
help:
	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)


================================================
FILE: docs/conf.py
================================================
# Configuration file for the Sphinx documentation builder.
#
# For the full list of built-in configuration values, see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

project = "siiRL"
copyright = "2025, SII AI Infra Team"
author = "SII AI Infra Team"

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

extensions = [
    "myst_parser",
    "sphinx.ext.autodoc",
    "sphinx.ext.autosummary",
    "sphinx.ext.autosectionlabel",
    "sphinx.ext.napoleon",
    "sphinx.ext.viewcode",
]
# Use Google style docstrings instead of NumPy docstrings.
napoleon_google_docstring = True
napoleon_numpy_docstring = False

# Make autosectionlabel use document name as prefix to avoid duplicate label warnings
autosectionlabel_prefix_document = True

# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
source_suffix = {
    ".rst": "restructuredtext",
    ".md": "markdown",
}

templates_path = ["_templates"]

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = "en"

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store", "plan_*.md"]

# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages.  See the documentation for
# a list of builtin themes.
#
html_theme = "sphinx_rtd_theme"

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]


================================================
FILE: docs/examples/config.rst
================================================
.. _config-explain-page:

===================
Configuration Guide
===================

siiRL uses Hydra-based configuration management with dataclass parameters. All configuration parameters are defined in the ``siirl/params/`` directory and can be set via command-line arguments.

Configuration Structure
-----------------------

Parameters are organized into the following modules:

- ``DataArguments``: Data-related parameters (``siirl/params/data_args.py``)
- ``ActorRolloutRefArguments``: Actor, Rollout, and Reference model parameters (``siirl/params/model_args.py``)
- ``CriticArguments``: Critic model parameters (``siirl/params/model_args.py``)
- ``RewardModelArguments``: Reward model parameters (``siirl/params/model_args.py``)
- ``AlgorithmArguments``: RL algorithm parameters (``siirl/params/model_args.py``)
- ``TrainingArguments``: Training configuration (``siirl/params/training_args.py``)
- ``DAGArguments``: DAG workflow parameters (``siirl/params/dag_args.py``)
- ``ProfilerArguments``: Profiling parameters (``siirl/params/profiler_args.py``)

All parameters are combined into the ``SiiRLArguments`` class.

Usage
-----

Parameters are set via command-line arguments using dot notation:

.. code-block:: bash

   python -m siirl.main_dag \
     data.train_files=/path/to/train.parquet \
     data.train_batch_size=512 \
     actor_rollout_ref.model.path=/path/to/model \
     algorithm.adv_estimator=grpo \
     trainer.total_epochs=30

Data Parameters
---------------

Location: ``siirl/params/data_args.py``

.. code-block:: bash

   data.tokenizer=null
   data.train_files=/path/to/train.parquet
   data.val_files=/path/to/val.parquet
   data.prompt_key=prompt
   data.max_prompt_length=512
   data.max_response_length=512
   data.train_batch_size=1024
   data.return_raw_input_ids=False
   data.return_raw_chat=False
   data.return_full_prompt=False
   data.shuffle=True
   data.filter_overlong_prompts=False
   data.filter_overlong_prompts_workers=1
   data.truncation=error
   data.image_key=images
   data.trust_remote_code=True

**Key Parameters:**

- ``data.train_files``: Training data file path (Parquet format, can be list or single file)
- ``data.val_files``: Validation data file path
- ``data.prompt_key``: Field name for prompt in dataset (default: "prompt")
- ``data.max_prompt_length``: Maximum prompt length (left-padded)
- ``data.max_response_length``: Maximum response length for rollout generation
- ``data.train_batch_size``: Training batch size per iteration
- ``data.return_raw_input_ids``: Return original input_ids without chat template (for different RM chat templates)
- ``data.shuffle``: Whether to shuffle data
- ``data.truncation``: Truncation strategy ("error", "left", "right", "middle")
- ``data.trust_remote_code``: Allow remote code execution for tokenizers

Custom Dataset
~~~~~~~~~~~~~~

.. code-block:: bash

   data.custom_cls.path=/path/to/custom_dataset.py
   data.custom_cls.name=MyDatasetClass

- ``data.custom_cls.path``: Path to custom dataset class file
- ``data.custom_cls.name``: Name of the dataset class

Actor/Rollout/Reference Model
------------------------------

Location: ``siirl/params/model_args.py``

Model Configuration
~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   actor_rollout_ref.hybrid_engine=True
   actor_rollout_ref.model.path=/path/to/model
   actor_rollout_ref.model.external_lib=null
   actor_rollout_ref.model.enable_gradient_checkpointing=False
   actor_rollout_ref.model.enable_activation_offload=False
   actor_rollout_ref.model.trust_remote_code=False
   actor_rollout_ref.model.use_remove_padding=False

- ``actor_rollout_ref.model.path``: Huggingface model path (local or HDFS)
- ``actor_rollout_ref.model.external_lib``: Additional Python packages to import
- ``actor_rollout_ref.model.enable_gradient_checkpointing``: Enable gradient checkpointing
- ``actor_rollout_ref.model.enable_activation_offload``: Enable activation offloading
- ``actor_rollout_ref.model.trust_remote_code``: Allow remote code model loading
- ``actor_rollout_ref.model.use_remove_padding``: Remove padding tokens for efficiency

Actor Configuration
~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   actor_rollout_ref.actor.strategy=fsdp
   actor_rollout_ref.actor.ppo_mini_batch_size=256
   actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=8
   actor_rollout_ref.actor.grad_clip=1.0
   actor_rollout_ref.actor.clip_ratio=0.2
   actor_rollout_ref.actor.entropy_coeff=0.0
   actor_rollout_ref.actor.use_kl_loss=False
   actor_rollout_ref.actor.kl_loss_coef=0.001
   actor_rollout_ref.actor.ppo_epochs=1
   actor_rollout_ref.actor.optim.lr=1e-6

- ``actor.strategy``: Backend strategy ("fsdp" or "megatron")
- ``actor.ppo_mini_batch_size``: Mini-batch size for PPO updates (global across GPUs)
- ``actor.ppo_micro_batch_size_per_gpu``: Micro-batch size per GPU (gradient accumulation)
- ``actor.grad_clip``: Gradient clipping threshold
- ``actor.clip_ratio``: PPO clip ratio
- ``actor.use_kl_loss``: Enable KL loss in actor
- ``actor.kl_loss_coef``: KL loss coefficient (for GRPO)
- ``actor.optim.lr``: Learning rate

Reference Model
~~~~~~~~~~~~~~~

.. code-block:: bash

   actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=16
   actor_rollout_ref.ref.fsdp_config.param_offload=False

- ``ref.log_prob_micro_batch_size_per_gpu``: Micro-batch size for reference log prob computation
- ``ref.fsdp_config.param_offload``: Enable parameter offloading (recommended for models > 7B)

Rollout Configuration
~~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   actor_rollout_ref.rollout.name=vllm
   actor_rollout_ref.rollout.temperature=1.0
   actor_rollout_ref.rollout.top_k=-1
   actor_rollout_ref.rollout.top_p=1.0
   actor_rollout_ref.rollout.tensor_model_parallel_size=2
   actor_rollout_ref.rollout.gpu_memory_utilization=0.5
   actor_rollout_ref.rollout.n=8

- ``rollout.name``: Rollout backend ("vllm", "sglang", "hf")
- ``rollout.temperature``: Sampling temperature
- ``rollout.top_k``: Top-k sampling (-1 for vLLM, 0 for HF)
- ``rollout.top_p``: Top-p sampling
- ``rollout.tensor_model_parallel_size``: Tensor parallelism size (vLLM only)
- ``rollout.gpu_memory_utilization``: GPU memory fraction for vLLM
- ``rollout.n``: Number of responses per prompt (>1 for GRPO/RLOO)

Critic Model
------------

Location: ``siirl/params/model_args.py``

.. code-block:: bash

   critic.enable=True
   critic.model.path=/path/to/critic_model
   critic.ppo_mini_batch_size=256
   critic.ppo_micro_batch_size_per_gpu=8
   critic.optim.lr=1e-5

Most parameters are similar to Actor configuration.

Reward Model
------------

Location: ``siirl/params/model_args.py``

.. code-block:: bash

   reward_model.enable=False
   reward_model.model.path=/path/to/reward_model
   reward_model.model.input_tokenizer=null
   reward_model.micro_batch_size_per_gpu=16
   reward_model.reward_manager=naive

- ``reward_model.enable``: Enable reward model (False = use only custom reward functions)
- ``reward_model.model.input_tokenizer``: Input tokenizer path (if different from policy)
- ``reward_model.reward_manager``: Reward manager type ("naive", "batch", "parallel", "dapo", "embodied")

Custom Reward Function
~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   custom_reward_function.path=/path/to/my_reward.py
   custom_reward_function.name=compute_score

- ``custom_reward_function.path``: Path to custom reward function file
- ``custom_reward_function.name``: Function name (default: "compute_score")

See :doc:`/user_interface/reward_interface` for details.

Algorithm Parameters
--------------------

Location: ``siirl/params/model_args.py``

.. code-block:: bash

   algorithm.gamma=1.0
   algorithm.lam=1.0
   algorithm.adv_estimator=grpo
   algorithm.use_kl_in_reward=False
   algorithm.kl_penalty=kl
   algorithm.kl_ctrl.type=fixed
   algorithm.kl_ctrl.kl_coef=0.005
   algorithm.workflow_type=DEFAULT

- ``algorithm.gamma``: Discount factor
- ``algorithm.lam``: GAE lambda (bias-variance tradeoff)
- ``algorithm.adv_estimator``: Advantage estimator ("gae", "grpo", "cpgd", "gspo", "rloo")
- ``algorithm.use_kl_in_reward``: Enable KL penalty in reward
- ``algorithm.kl_penalty``: KL divergence calculation method ("kl", "abs", "mse", "low_var_kl", "full")
- ``algorithm.workflow_type``: Workflow type ("DEFAULT", "DAPO", "EMBODIED")

Training Parameters
-------------------

Location: ``siirl/params/training_args.py``

.. code-block:: bash

   trainer.total_epochs=30
   trainer.project_name=siirl_examples
   trainer.experiment_name=gsm8k
   trainer.logger=['console', 'wandb']
   trainer.nnodes=1
   trainer.n_gpus_per_node=8
   trainer.save_freq=10
   trainer.val_before_train=True
   trainer.test_freq=2

- ``trainer.total_epochs``: Number of training epochs
- ``trainer.project_name``: Project name (for logging)
- ``trainer.experiment_name``: Experiment name (for logging)
- ``trainer.logger``: Logger types (["console", "wandb", "tensorboard", "mlflow"])
- ``trainer.nnodes``: Number of nodes
- ``trainer.n_gpus_per_node``: Number of GPUs per node
- ``trainer.save_freq``: Checkpoint saving frequency (by iteration)
- ``trainer.val_before_train``: Run validation before training
- ``trainer.test_freq``: Validation frequency (by iteration)

DAG Parameters
--------------

Location: ``siirl/params/dag_args.py``

.. code-block:: bash

   dag.custom_pipeline_fn=null

- ``dag.custom_pipeline_fn``: Custom pipeline function path (e.g., "module:function")

See :doc:`/user_interface/pipeline_interface` for custom pipeline details.

Complete Example
----------------

GRPO Training
~~~~~~~~~~~~~

.. code-block:: bash

   python -m siirl.main_dag \
     algorithm.adv_estimator=grpo \
     algorithm.workflow_type=DEFAULT \
     data.train_files=/path/to/gsm8k/train.parquet \
     data.train_batch_size=512 \
     data.max_prompt_length=2048 \
     data.max_response_length=4096 \
     actor_rollout_ref.model.path=/path/to/model \
     actor_rollout_ref.actor.optim.lr=1e-6 \
     actor_rollout_ref.actor.ppo_mini_batch_size=256 \
     actor_rollout_ref.rollout.name=vllm \
     actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
     actor_rollout_ref.rollout.n=8 \
     custom_reward_function.path=siirl/user_interface/rewards_interface/custom_gsm8k_reward.py \
     custom_reward_function.name=compute_score \
     trainer.total_epochs=30 \
     trainer.n_gpus_per_node=8 \
     trainer.save_freq=10

PPO Training
~~~~~~~~~~~~

.. code-block:: bash

   python -m siirl.main_dag \
     algorithm.adv_estimator=gae \
     critic.enable=True \
     data.train_files=/path/to/data.parquet \
     actor_rollout_ref.model.path=/path/to/model \
     actor_rollout_ref.actor.optim.lr=1e-6 \
     actor_rollout_ref.rollout.name=vllm \
     critic.optim.lr=1e-5 \
     trainer.total_epochs=30

DAPO Training
~~~~~~~~~~~~~

.. code-block:: bash

   python -m siirl.main_dag \
     algorithm.workflow_type=DAPO \
     algorithm.adv_estimator=grpo \
     algorithm.filter_groups.enable=True \
     algorithm.filter_groups.metric=seq_final_reward \
     data.train_files=/path/to/data.parquet \
     actor_rollout_ref.model.path=/path/to/model \
     trainer.total_epochs=30

Parameter Reference
-------------------

For the complete parameter definitions, see:

- ``siirl/params/data_args.py`` - Data parameters
- ``siirl/params/model_args.py`` - Model, algorithm parameters
- ``siirl/params/training_args.py`` - Training parameters
- ``siirl/params/dag_args.py`` - DAG workflow parameters
- ``siirl/params/profiler_args.py`` - Profiler parameters




================================================
FILE: docs/examples/cpgd_example.rst
================================================
DeepScaleR Example with CPGD
==============================

Introduction
------------

This example demonstrates how to fine-tune a Large Language Model for advanced mathematical reasoning on the **DeepScaleR** dataset using **Clipped Policy Gradient Optimization with Policy Drift (CPGD)**, a novel reinforcement learning algorithm designed for enhanced training stability.

**Paper:** `CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models <https://arxiv.org/abs/2505.12504>`__

**Dataset:** https://huggingface.co/datasets/agentica-org/DeepScaleR-Preview-Dataset

While algorithms like PPO and GRPO are powerful, they can sometimes suffer from instability due to their reliance on importance-sampling ratios in the loss function. CPGD is proposed to mitigate these issues by providing a more stable policy update mechanism, making it a robust choice for complex reasoning tasks.

CPGD Algorithm Overview
-----------------------

CPGD enhances training stability by making two key modifications to the standard policy gradient approach:

1.  **Clipped Policy Gradient Objective**: Instead of directly using the policy ratio in the loss (which can cause high variance), CPGD uses a policy gradient objective. It then applies a clipping mechanism to the *logarithm* of the policy ratio. This prevents excessive policy updates when the ratio becomes too large, effectively keeping the optimization within a trusted region.
2.  **Policy Drift Regularization**: CPGD introduces a *policy drift* term, which is a KL divergence penalty between the current policy and the old policy from the start of the training iteration. This acts as a dynamic regularizer, pulling the policy back if it strays too far, too quickly, thus preventing training collapse.

Together, these features allow CPGD to achieve consistent performance improvements while avoiding the instability often seen in other RL algorithms.

Step 1: Prepare the Dataset
---------------------------

The data preparation process is identical to other examples using this dataset. First, preprocess the DeepScaleR dataset into the required Parquet format.

.. code:: bash

   cd examples/data_preprocess
   python3 deepscaler.py --local_dir ~/data/deepscaler

This command downloads, processes, and saves the training and testing sets in the `~/data/deepscaler` directory.

Step 2: Download the Pre-trained Model
--------------------------------------

You need a base model to start the CPGD training. In this example, we use `Qwen2.5-7B-Instruct`.

- **Recommended: Download via CLI:** Use a tool like `huggingface-cli` to download the model to a local directory.

  .. code:: bash

     huggingface-cli download Qwen/Qwen2.5-7B-Instruct --local-dir ~/data/models/Qwen2.5-7B-Instruct

- **Automatic Download:** You can also specify the model name directly in the `actor_rollout_ref.model.path` field of the run script, and the framework will download it automatically.

Step 3: Perform CPGD Training
-----------------------------

With the data and model ready, you can now launch the training job using the CPGD algorithm.

**Reward Function**

For this task, we use the same rule-based reward function as in the PPO/GRPO examples. The framework's default reward mechanism performs an exact match on the final answer within the `\\boxed{...}` block. A correct answer receives a positive reward, and an incorrect one receives zero.

**Training Script**

Below is a complete training script from `examples/cpgd_trainer/run_qwen2_5-7b.sh`. It is configured to use the CPGD algorithm (`algorithm.adv_estimator=cpgd`). Note the presence of CPGD-specific parameters like `actor_rollout_ref.actor.policy_drift_coeff` and `algorithm.weight_factor_in_cpgd`.

.. literalinclude:: ../../examples/cpgd_trainer/run_qwen2_5-7b.sh
   :language: bash
   :caption: examples/cpgd_trainer/run_qwen2_5-7b.sh


================================================
FILE: docs/examples/deepscaler_example.rst
================================================
DeepScaleR Example with PPO
=============================

Introduction
------------

This example demonstrates how to fine-tune a Large Language Model for advanced mathematical reasoning using the **DeepScaleR** dataset.

**Paper:** https://pretty-radio-b75.notion.site/DeepScaleR-Surpassing-O1-Preview-with-a-1-5B-Model-by-Scaling-RL-19681902c1468005bed8ca303013a4e2.

**Dataset:** https://huggingface.co/datasets/agentica-org/DeepScaleR-Preview-Dataset

The core idea is to leverage Reinforcement Learning (RL), specifically Proximal Policy Optimization (PPO), to teach the model not just to find the correct answer, but to follow a logical, step-by-step reasoning process. This is achieved by rewarding the model based on the correctness of its final answer, which is extracted from a structured output.

Dataset Overview
----------------

The DeepScaleR dataset consists of challenging mathematical problems. Each sample includes a question (`problem`), a detailed reasoning path (`solution`), and a final answer enclosed in a `\\boxed{}` block (`answer`).

**An example from DeepScaleR:**

**Prompt:**
   "Let $a_n=6^{n}+8^{n}$. Determine the remainder upon dividing $a_ {83}$ by $49$."

**Solution:**
   "$6^{83} + 8^{83} = (6+8)(6^{82}-6^{81}8+\\ldots-8^{81}6+8^{82})$\n Becuase $7|(6+8)$, we only consider $6^{82}-6^{81}8+\\ldots-8^{81}6+8^{82} \\pmod{7}$\n$6^{82}-6^{81}8+\\ldots-8^{81}6+8^{82} \\equiv (-1)^{82} - (-1)^{81}+ \\ldots - (-1)^1 + 1 = 83 \\equiv 6 \\pmod{7}$\n$6^{83} + 8^{83} \\equiv 14 \\cdot 6 \\equiv \\boxed{035} \\pmod{49}$"

**Answer:**
   `35`

Step 1: Prepare the Dataset
---------------------------

First, preprocess the DeepScaleR dataset into the required Parquet format. Our framework includes a script for this purpose.

.. code:: bash

   cd examples/data_preprocess
   python3 deepscaler.py --local_dir ~/data/deepscaler

This will download the dataset from Hugging Face, process it, and save `train.parquet` and `test.parquet` files in the `~/data/deepscaler` directory.

Step 2: Download the Pre-trained Model
--------------------------------------

You need a base model to start the PPO training. In this example, we use `Qwen2.5-7B-Instruct`. There are several ways to make the model available to the trainer:

- **Recommended: Download via CLI:** Use tools like `huggingface-cli` or `modelscope` to download the model to a local directory. This gives you more control.

  .. code:: bash

     # For Hugging Face
     huggingface-cli download Qwen/Qwen2.5-7B-Instruct --local-dir ~/data/models/Qwen2.5-7B-Instruct --local-dir-use-symlinks False
     
     # For ModelScope
     modelscope download Qwen/Qwen2.5-7B-Instruct --local_dir ~/data/models/Qwen2.5-7B-Instruct

- **Automatic Download:** You can also specify the Hugging Face model name (e.g., `Qwen/Qwen2.5-7B-Instruct`) directly in the `actor_rollout_ref.model.path` and `critic.model.path` fields of your run script. The framework will attempt to download it automatically on the first run.

Step 3: Perform PPO Training
----------------------------

With the data and model ready, you can now launch the PPO training job.

**Reward Function**

For this task, we use a simple but effective rule-based reward function. The framework's default reward mechanism will be used, which performs an exact match between the model's generated answer and the `ground_truth` from the dataset.
- The model is prompted to provide its final answer inside a `\\boxed{...}` block.
- The reward function checks if the content inside the generated `\\boxed{}` matches the ground truth answer.
- A correct match receives a positive reward (e.g., 1.0), while an incorrect match or a malformed response receives zero reward.

**Training Script**

Below is a complete training script based on `examples/ppo_trainer/run_qwen3-8b.sh`. It is configured for a single-node, multi-GPU setup. You should adapt paths like `HOME` to your environment.

.. literalinclude:: ../../examples/ppo_trainer/run_qwen3-8b.sh
   :language: bash
   :caption: examples/ppo_trainer/run_qwen2_5-7b.sh


================================================
FILE: docs/examples/embodied_srpo_example.rst
================================================
Embodied SRPO Training
======================

Introduction
------------

This guide explains how to perform Embodied AI training using the SRPO algorithm with OpenVLA-oft models on tasks such as LIBERO. Embodied AI training involves an agent interacting with an environment, where the rewards are often based on task success.

This example demonstrates how to perform RL training on an `OpenVLA-oft-7B` model using the SRPO algorithm on the `libero_long` benchmark.

Step 1: Prepare the Environment
-------------------------------

You should use the provided Docker image for Embodied AI training, which contains all necessary dependencies including EGL support for rendering.

**Docker Image**: ``siiai/siirl-vla:libero-egl-cu12.6`` (Available at `Docker Hub <https://hub.docker.com/r/siiai/siirl-vla>`_)

Ensure you have the necessary environment variables set. This includes the path to the `siiRL` repository and any other dependencies.

.. code:: bash

   export SIIRL_DIR="/path/to/siiRL"
   export VJEPA2_DIR="$HOME/code/vjepa2"  # V-JEPA 2 code repository (https://github.com/facebookresearch/vjepa2)
   export PYTHONPATH="$SIIRL_DIR:/path/to/LIBERO:$VJEPA2_DIR:$PYTHONPATH"

Step 2: Prepare the Models
--------------------------

You need the following models:

1.  **SFT Model**: A Supervised Fine-Tuned (SFT) OpenVLA-oft model. You should select the model that corresponds to your specific task. For example, if you are training on `libero_long`, you should use the `Sylvest/OpenVLA-AC-PD-1traj-libero-long` model.

    Here are the recommended Hugging Face models from the `Sylvest collection <https://huggingface.co/collections/Sylvest/srpo>`_:

    - `Sylvest/OpenVLA-AC-PD-1traj-libero-object` (for `libero_object`)
    - `Sylvest/OpenVLA-AC-PD-1traj-libero-spatial` (for `libero_spatial`)
    - `Sylvest/OpenVLA-AC-PD-1traj-libero-goal` (for `libero_goal`)
    - `Sylvest/OpenVLA-AC-PD-1traj-libero-long` (for `libero_long`)

2.  **Visual Encoder**: A visual encoder model V-JEPA is **required** for processing visual observations.

    - First, clone the V-JEPA 2 code repository from GitHub (`facebookresearch/vjepa2 <https://github.com/facebookresearch/vjepa2>`_):
    
      .. code:: bash

         git clone https://github.com/facebookresearch/vjepa2.git $HOME/code/vjepa2

      Make sure to add the V-JEPA 2 directory to your ``PYTHONPATH`` as shown in Step 1.

    - Then, download the V-JEPA 2 model weights from Hugging Face: `Sylvest/vjepa2-vit-g <https://huggingface.co/Sylvest/vjepa2-vit-g>`_
    
      .. code:: bash

         huggingface-cli download Sylvest/vjepa2-vit-g --local-dir $HOME/models/vjepa2

Set the paths to these resources in your environment or script:

.. code:: bash

   export MODEL_PATH=$HOME/models/Sylvest/OpenVLA-AC-PD-1traj-libero-long
   export VJEPA_MODEL_PATH=$HOME/models/vjepa2/vitg-384.pt

.. note::
   
   You do not need to manually prepare a dataset file. ``siiRL`` will automatically generate the task manifest (Parquet files) based on the environment configuration and save them to the path specified in ``TRAIN_DATA_PATH`` and ``TEST_DATA_PATH``.

Step 3: Configure and Run the Training Script
---------------------------------------------

Embodied AI training requires specific configurations to handle the environment interaction and action spaces.

Key Configuration Parameters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Embodied Specifics**:

-   ``actor_rollout_ref.embodied.embodied_type``: The model type (e.g., ``openvla-oft``).
-   ``actor_rollout_ref.embodied.action_token_len``: The dimensionality of the action space (e.g., 7 for xyz + quaternion + gripper).
-   ``actor_rollout_ref.embodied.action_chunks_len``: The number of action steps predicted in one forward pass.
-   ``actor_rollout_ref.embodied.video_embedding_model_path``: Path to the V-JEPA 2 video embedding model (e.g., ``$VJEPA_MODEL_PATH``).

**Environment Configuration**:

-   ``actor_rollout_ref.embodied.env.env_type``: The environment library (e.g., ``libero``).
-   ``actor_rollout_ref.embodied.env.env_name``: The specific task suite name (e.g., ``libero_long``).
-   ``actor_rollout_ref.embodied.env.num_envs``: Number of parallel environments per rollout worker. Default is 16 environments per GPU, and it is not recommended to exceed 16.
-   ``actor_rollout_ref.embodied.env.max_steps``: Maximum steps per episode.

**Algorithm Adjustments**:

-   ``algorithm.embodied_sampling.filter_accuracy``: Enable filtering of prompts based on estimated success rate.
-   ``algorithm.embodied_sampling.accuracy_lower_bound``: Lower threshold for filtering (e.g., 0.1).
-   ``algorithm.embodied_sampling.accuracy_upper_bound``: Upper threshold for filtering (e.g., 0.9).

Complete Training Script
~~~~~~~~~~~~~~~~~~~~~~~~

Below is an example script `run_embodied_srpo.sh` to run SRPO training on `libero_long`.

**Note**: The siiRL repository provides ready-to-use training scripts for all four LIBERO tasks in the `examples/embodied_srpo_trainer/` directory:

-   ``run_openvla_oft_libero_long.sh``
-   ``run_openvla_oft_libero_goal.sh``
-   ``run_openvla_oft_libero_object.sh``
-   ``run_openvla_oft_libero_spatial.sh``

To train on a specific task, modify the following paths in the script to match your actual environment:

-   ``SIIRL_DIR``: Path to the siiRL repository
-   ``VJEPA2_DIR``: Path to the V-JEPA2 repository (for ``PYTHONPATH``)
-   ``HOME_PATH``: Your home directory or base path for models and data
-   ``MODEL_PATH``: Path to the corresponding SFT model for the task
-   ``VJEPA_MODEL_PATH``: Path to the V-JEPA 2 model weights file

**Note**: LIBERO is pre-installed in the Docker image at ``/root/LIBERO/`` and does not need to be modified.

.. code-block:: bash

    #!/usr/bin/env bash
    # ===================================================================================
    # ===    Embodied AI SRPO Training with OpenVLA-OFT on LIBERO-LONG               ===
    # ===================================================================================
    # 

    set -e

    # --- Environment Setup (Critical for siiRL) ---
    export SIIRL_DIR="${SIIRL_DIR:-your_siirl_path}"
    export PYTHONPATH="$SIIRL_DIR:/root/LIBERO/:${VJEPA2_DIR:-your_vjepa2_path}:$PYTHONPATH"

    # --- Experiment and Model Definition ---
    export DATASET=libero_long
    export ALG=srpo
    export MODEL_NAME=openvla-oft-7b
    export MODEL_TYPE=openvla-oft

    # --- Path Definitions (USER PROVIDED) ---
    export HOME_PATH=${HOME_PATH:your_home_path}
    export TRAIN_DATA_PATH=$HOME_PATH/data/train.parquet # generated automatically
    export TEST_DATA_PATH=$HOME_PATH/data/test.parquet # generated automatically
    export MODEL_PATH=$HOME_PATH/models/Sylvest/OpenVLA-AC-PD-1traj-libero-long
    export VJEPA_MODEL_PATH=$HOME_PATH/models/vjepa2/vitg-384.pt

    # Base output paths
    export BASE_CKPT_PATH=ckpts
    export BASE_TENSORBOARD_PATH=tensorboard

    # --- Embodied AI Specific Parameters ---
    export ACTION_TOKEN_LEN=7        # 7 dimensions: xyz (3), quaternion (3), gripper (1)
    export ACTION_CHUNKS_LEN=8       # OpenVLA-OFT uses 8-step action chunks
    export NUM_ENVS=16               # actor_rollout_ref.embodied.env.num_envs
    export MAX_EPISODE_STEPS=512     # actor_rollout_ref.embodied.env.max_steps

    # --- Data and Sampling Parameters ---
    export VAL_BATCH_SIZE=496                      # Validation batch size
    export MAX_PROMPT_LENGTH=256                   
    export MAX_RESPONSE_LENGTH=128                 

    # --- Embodied Sampling Parameters ---
    export FILTER_ACCURACY=True                    # Enable accuracy-based filtering
    export ACCURACY_LOWER_BOUND=0.1                # Only keep prompts with success rate >= 0.1
    export ACCURACY_UPPER_BOUND=0.9                # Only keep prompts with success rate <= 0.9
    export FILTER_TRUNCATED=False                  # Filter truncated episodes (uses env.max_steps)
    export OVERSAMPLE_FACTOR=1                     # Oversample factor for filtering

    # --- Training Hyperparameters ---
    export TRAIN_BATCH_SIZE=64       # data.train_batch_size
    export PPO_MINI_BATCH_SIZE=4     # actor_rollout_ref.actor.ppo_mini_batch_size
                                     # Note: actual ppo_mini_batch_size = PPO_MINI_BATCH_SIZE * ROLLOUT_N_SAMPLES
    export ROLLOUT_N_SAMPLES=8       # REUSED: Number of samples per prompt
    export PPO_EPOCHS=1              # actor_rollout_ref.actor.ppo_epochs

    # Algorithm parameters
    export LEARNING_RATE=5e-6        
    export WEIGHT_DECAY=0.0          # actor_rollout_ref.actor.optim.weight_decay
    export CLIP_RATIO_HIGH=0.28      # actor_rollout_ref.actor.clip_ratio_high
    export CLIP_RATIO_LOW=0.2        # actor_rollout_ref.actor.clip_ratio_low
    export ENTROPY_COEFF=0.0         
    export TEMPERATURE=1.6          
    export GAMMA=1.0                 
    export LAM=1.0                   
    export GRAD_CLIP=1.0            

    # --- Image/Video Processing ---
    export IMG_SIZE=384              # actor_rollout_ref.embodied.img_size
    export ENABLE_FP16=True          # actor_rollout_ref.embodied.enable_fp16
    export EMBEDDING_MODEL_OFFLOAD=False  # actor_rollout_ref.embodied.embedding_model_offload
    export CENTER_CROP=True          # actor_rollout_ref.embodied.center_crop
    export NUM_IMAGES_IN_INPUT=1     
    export NUM_STEPS_WAIT=10           # Environment stabilization steps

    # --- Trainer Configuration ---
    export SAVE_FREQ=5              
    export TEST_FREQ=5              
    export TOTAL_EPOCHS=1000         # trainer.total_epochs
    export MAX_CKPT_KEEP=5           # trainer.max_actor_ckpt_to_keep
    export VAL_BEFORE_TRAIN=True     # trainer.val_before_train

    # --- Multi-node distributed training ---
    export N_GPUS_PER_NODE=${N_GPUS_PER_NODE:-8}
    export NNODES=${PET_NNODES:-1}
    export NODE_RANK=${PET_NODE_RANK:-0}
    export MASTER_ADDR=${MASTER_ADDR:-localhost}
    export MASTER_PORT=${MASTER_PORT:-29500}

    # --- Environment Variables ---
    export MUJOCO_GL=egl
    export PYOPENGL_PLATFORM=egl
    export GLOO_SOCKET_TIMEOUT=600

    # --- Output Paths and Experiment Naming ---
    timestamp=$(date +%Y%m%d_%H%M%S)
    export CKPT_PATH=${BASE_CKPT_PATH}/${MODEL_NAME}_${ALG}_${DATASET}_${NNODES}nodes
    export PROJECT_NAME=siirl_embodied_${DATASET}
    export EXPERIMENT_NAME=openvla_oft_srpo_fsdp
    export TENSORBOARD_DIR=${BASE_TENSORBOARD_PATH}/${MODEL_NAME}_${ALG}_${DATASET}/${timestamp}
    export SIIRL_LOGGING_FILENAME=${MODEL_NAME}_${ALG}_${DATASET}_${timestamp}

    # --- Define the Training Command ---
    TRAINING_CMD=(
        python3 -m siirl.client.main_dag
        --config-name=embodied_srpo_trainer
        
        # Data configuration
        data.train_files=$TRAIN_DATA_PATH
        data.val_files=$TEST_DATA_PATH
        data.train_batch_size=$TRAIN_BATCH_SIZE
        data.val_batch_size=$VAL_BATCH_SIZE
        data.max_prompt_length=$MAX_PROMPT_LENGTH
        data.max_response_length=$MAX_RESPONSE_LENGTH
        
        # Algorithm configuration
        algorithm.workflow_type=embodied
        algorithm.adv_estimator=grpo
        algorithm.gamma=$GAMMA
        algorithm.lam=$LAM
        algorithm.norm_adv_by_std_in_grpo=True
        
        # Embodied sampling configuration (aligned with DAPO architecture)
        algorithm.embodied_sampling.filter_accuracy=$FILTER_ACCURACY
        algorithm.embodied_sampling.accuracy_lower_bound=$ACCURACY_LOWER_BOUND
        algorithm.embodied_sampling.accuracy_upper_bound=$ACCURACY_UPPER_BOUND
        algorithm.embodied_sampling.filter_truncated=$FILTER_TRUNCATED
        algorithm.embodied_sampling.oversample_factor=$OVERSAMPLE_FACTOR
        
        # Model configuration
        actor_rollout_ref.model.path=$MODEL_PATH
        actor_rollout_ref.model.enable_gradient_checkpointing=True
        
        # Actor configuration
        actor_rollout_ref.actor.optim.lr=$LEARNING_RATE
        actor_rollout_ref.actor.optim.weight_decay=$WEIGHT_DECAY
        actor_rollout_ref.actor.ppo_mini_batch_size=$PPO_MINI_BATCH_SIZE
        actor_rollout_ref.actor.ppo_epochs=$PPO_EPOCHS
        actor_rollout_ref.actor.grad_clip=$GRAD_CLIP
        actor_rollout_ref.actor.clip_ratio_high=$CLIP_RATIO_HIGH
        actor_rollout_ref.actor.clip_ratio_low=$CLIP_RATIO_LOW
        actor_rollout_ref.actor.entropy_coeff=$ENTROPY_COEFF
        actor_rollout_ref.actor.shuffle=True
        
        # Actor FSDP configuration
        actor_rollout_ref.actor.fsdp_config.param_offload=False
        actor_rollout_ref.actor.fsdp_config.grad_offload=False
        actor_rollout_ref.actor.fsdp_config.optimizer_offload=False
        
        # Rollout configuration
        actor_rollout_ref.rollout.name=hf
        actor_rollout_ref.rollout.n=$ROLLOUT_N_SAMPLES
        actor_rollout_ref.rollout.temperature=$TEMPERATURE
        actor_rollout_ref.rollout.do_sample=True
        actor_rollout_ref.rollout.response_length=512
        
        # Embodied AI specific configuration
        actor_rollout_ref.embodied.embodied_type=$MODEL_TYPE
        actor_rollout_ref.embodied.action_token_len=$ACTION_TOKEN_LEN
        actor_rollout_ref.embodied.action_chunks_len=$ACTION_CHUNKS_LEN
        actor_rollout_ref.embodied.video_embedding_model_path=$VJEPA_MODEL_PATH
        actor_rollout_ref.embodied.embedding_img_size=$IMG_SIZE
        actor_rollout_ref.embodied.embedding_enable_fp16=$ENABLE_FP16
        actor_rollout_ref.embodied.embedding_model_offload=$EMBEDDING_MODEL_OFFLOAD
        actor_rollout_ref.embodied.center_crop=$CENTER_CROP
        actor_rollout_ref.embodied.num_images_in_input=$NUM_IMAGES_IN_INPUT
        actor_rollout_ref.embodied.unnorm_key=$DATASET
        
        # Environment configuration
        actor_rollout_ref.embodied.env.env_type=libero
        actor_rollout_ref.embodied.env.env_name=$DATASET
        actor_rollout_ref.embodied.env.num_envs=$NUM_ENVS
        actor_rollout_ref.embodied.env.max_steps=$MAX_EPISODE_STEPS
        actor_rollout_ref.embodied.env.num_steps_wait=$NUM_STEPS_WAIT
        actor_rollout_ref.embodied.env.num_trials_per_task=50
        actor_rollout_ref.embodied.env.model_family=openvla
        
        # Critic configuration (SRPO doesn't use critic)
        critic.use_critic_model=False
        
        # Trainer configuration
        trainer.total_epochs=$TOTAL_EPOCHS
        trainer.save_freq=$SAVE_FREQ
        trainer.test_freq=$TEST_FREQ
        trainer.max_actor_ckpt_to_keep=$MAX_CKPT_KEEP
        trainer.logger=['console','tensorboard']
        trainer.project_name=$PROJECT_NAME
        trainer.experiment_name=$EXPERIMENT_NAME
        trainer.nnodes=$NNODES
        trainer.n_gpus_per_node=$N_GPUS_PER_NODE
        trainer.default_local_dir=$CKPT_PATH
        trainer.resume_mode=auto
        trainer.val_before_train=$VAL_BEFORE_TRAIN
    )

    # ===================================================================================
    # ===                          EXECUTION LOGIC                                    ===
    # ===================================================================================

    # --- Boilerplate Setup ---
    set -e
    set -o pipefail
    set -x

    # --- Infrastructure & Boilerplate Functions ---
    start_ray_cluster() {
        local RAY_HEAD_WAIT_TIMEOUT=600
        export RAY_RAYLET_NODE_MANAGER_CONFIG_NIC_NAME=${INTERFACE_NAME}
        export RAY_GCS_SERVER_CONFIG_NIC_NAME=${INTERFACE_NAME}
        export RAY_RUNTIME_ENV_AGENT_CREATION_TIMEOUT_S=1200
        export RAY_GCS_RPC_CLIENT_CONNECT_TIMEOUT_S=120

        local ray_start_common_opts=(
            --num-gpus "$N_GPUS_PER_NODE"
            --object-store-memory 100000000000
            --memory 100000000000
        )

        if [ "$NNODES" -gt 1 ]; then
            if [ "$NODE_RANK" = "0" ]; then
                echo "INFO: Starting Ray head node on $(hostname)..."
                export RAY_ADDRESS="$RAY_MASTER_ADDR:$RAY_MASTER_PORT"
                ray start --head --port="$RAY_MASTER_PORT" --dashboard-port="$RAY_DASHBOARD_PORT" "${ray_start_common_opts[@]}" --system-config='{"gcs_server_request_timeout_seconds": 60, "gcs_rpc_server_reconnect_timeout_s": 60}'
                local start_time=$(date +%s)
                while ! ray health-check --address "$RAY_ADDRESS" &>/dev/null; do
                    if [ "$(( $(date +%s) - start_time ))" -ge "$RAY_HEAD_WAIT_TIMEOUT" ]; then echo "ERROR: Timed out waiting for head node. Exiting." >&2; ray stop --force; exit 1; fi
                    echo "Head node not healthy yet. Retrying in 5s..."
                    sleep 5
                done
                echo "INFO: Head node is healthy."
            else
                local head_node_address="$MASTER_ADDR:$RAY_MASTER_PORT"
                echo "INFO: Worker node $(hostname) waiting for head at $head_node_address..."
                local start_time=$(date +%s)
                while ! ray health-check --address "$head_node_address" &>/dev/null; do
                    if [ "$(( $(date +%s) - start_time ))" -ge "$RAY_HEAD_WAIT_TIMEOUT" ]; then echo "ERROR: Timed out waiting for head. Exiting." >&2; exit 1; fi
                    echo "Head not healthy yet. Retrying in 5s..."
                    sleep 5
                done
                echo "INFO: Head is healthy. Worker starting..."
                ray start --address="$head_node_address" "${ray_start_common_opts[@]}"
            fi
        else
            echo "INFO: Starting Ray in single-node mode..."
            ray start --head "${ray_start_common_opts[@]}"
        fi
    }

    # --- Main Execution Function ---
    main() {
        local timestamp=$(date +"%Y%m%d_%H%M%S")
        ray stop --force

        export VLLM_USE_V1=1
        export GLOO_SOCKET_TIMEOUT=600
        export GLOO_TCP_TIMEOUT=600
        export GLOO_LOG_LEVEL=DEBUG
        export RAY_MASTER_PORT=${RAY_MASTER_PORT:-6379}
        export RAY_DASHBOARD_PORT=${RAY_DASHBOARD_PORT:-8265}
        export RAY_MASTER_ADDR=$MASTER_ADDR
        
        start_ray_cluster

        if [ "$NNODES" -gt 1 ] && [ "$NODE_RANK" = "0" ]; then
            echo "Waiting for all $NNODES nodes to join..."
            local TIMEOUT=600; local start_time=$(date +%s)
            while true; do
                if [ "$(( $(date +%s) - start_time ))" -ge "$TIMEOUT" ]; then echo "Error: Timeout waiting for nodes." >&2; exit 1; fi
                local ready_nodes=$(ray list nodes --format=json | python3 -c "import sys, json; print(len(json.load(sys.stdin)))")
                if [ "$ready_nodes" -ge "$NNODES" ]; then break; fi
                echo "Waiting... ($ready_nodes / $NNODES nodes ready)"
                sleep 5
            done
            echo "All $NNODES nodes have joined."
        fi

        if [ "$NODE_RANK" = "0" ]; then
            echo "INFO [RANK 0]: Starting main training command."
            eval "${TRAINING_CMD[@]}" "$@"
            echo "INFO [RANK 0]: Training finished."
            sleep 30; ray stop --force >/dev/null 2>&1
        elif [ "$NNODES" -gt 1 ]; then
            local head_node_address="$MASTER_ADDR:$RAY_MASTER_PORT"
            echo "INFO [RANK $NODE_RANK]: Worker active. Monitoring head node at $head_node_address."
            while ray health-check --address "$head_node_address" &>/dev/null; do sleep 15; done
            echo "INFO [RANK $NODE_RANK]: Head node down. Exiting."
        fi

        echo "INFO: Script finished on rank $NODE_RANK."
    }

    # --- Script Entrypoint ---
    main "$@"

Step 4: Checking the Results
----------------------------

1.  **Logs**: Monitor the console output for training progress and environment interaction stats.
2.  **TensorBoard**: Use TensorBoard to visualize rewards, success rates, and other metrics.

    .. code:: bash

       tensorboard --logdir ./tensorboard

3.  **Checkpoints**: Trained models are saved in the ``ckpts`` directory.



================================================
FILE: docs/examples/megatron_backend_example.rst
================================================
Megatron-LM Training Backend
============================================

Introduction
------------

This guide explains how to use the Megatron-LM backend in siiRL for RL training. Megatron-LM is a powerful library for training very large transformer models, and integrating it as a backend allows for efficient 5D parallelism (DP/TP/EP/PP/CP).

This example demonstrates how to fine-tune a `Qwen3-8B` model using the GRPO algorithm with the Megatron-LM as training backend.

Step 1: Prepare the Dataset
---------------------------

First, ensure your dataset is in the required Parquet format. If you are using one of the example datasets like `gsm8k` or `deepscaler`, you can use the provided preprocessing scripts. For example, for `deepscaler`:

.. code:: bash

   cd examples/data_preprocess
   python3 deepscaler.py --local_dir ~/data/deepscaler

This will download and process the dataset, saving `train.parquet` and `test.parquet` in the specified directory.

Step 2: Download the Pre-trained Model
--------------------------------------

You need a base model to start training. For this example, we'll use `Qwen3-8B`. Download it from Hugging Face or ModelScope to a local directory.

.. code:: bash

   # For Hugging Face
   huggingface-cli download Qwen/Qwen3-8B-Instruct --local-dir ~/data/models/Qwen3-8B --local-dir-use-symlinks False
   
   # For ModelScope
   modelscope download Qwen/Qwen3-8B-Instruct --local_dir ~/data/models/Qwen3-8B

Step 3: Configure and Run the Training Script
---------------------------------------------

To use the Megatron-LM backend, you need to modify the training configuration in your run script.

Key Configuration Changes
~~~~~~~~~~~~~~~~~~~~~~~~~

The main change is to set the training strategy to `megatron` and configure its parallelism parameters.

1.  **Set the Strategy**: e.g., in the `TRAINING_CMD` array, set `actor_rollout_ref.actor.strategy=megatron`.
2.  **Configure Parallelism**: Add Megatron-specific settings for 5D parallelism. For a 8B model on a single node with 8 GPUs, you might use 2-way tensor parallelism and 4-way pipeline parallelism, with sequence parallelism enabled.

    .. code-block:: text

        actor_rollout_ref.actor.megatron.tensor_model_parallel_size=2
        actor_rollout_ref.actor.megatron.pipeline_model_parallel_size=4
        actor_rollout_ref.actor.megatron.context_parallel_size=1
        actor_rollout_ref.actor.megatron.sequence_parallel=True

3.  **Configure Distributed Optimizer**: Add Megatron-specific settings for distributed optimizer. This allows for memory efficient training with ZeRO-1 optimization and is recommended for large models.

    .. code-block:: text

        actor_rollout_ref.actor.megatron.use_distributed_optimizer=True

4.  **Configure Offloading**: Add Megatron-specific settings for parameter, gradient, and optimizer offload. This allows for parameter, gradient, and optimizer offloading to CPU to save GPU memory.

    .. code-block:: text

        actor_rollout_ref.actor.megatron.param_offload=True
        actor_rollout_ref.actor.megatron.grad_offload=True
        actor_rollout_ref.actor.megatron.optimizer_offload=True

Complete Training Script
~~~~~~~~~~~~~~~~~~~~~~~~

Below is a complete example script, `run_qwen3-8b-megatron.sh`, which is adapted from the standard GRPO script to use the Megatron backend. You will need to create this script yourself or adapt an existing one.

.. code-block:: bash

    #!/usr/bin/env bash
    # ===================================================================================
    # ===                       USER CONFIGURATION SECTION                            ===
    # ===================================================================================

    # --- For debugging
    export HYDRA_FULL_ERROR=1
    export SIIRL_LOG_VERBOSITY=INFO

    # --- Experiment and Model Definition ---
    export DATASET=deepscaler
    export ALG=grpo
    export MODEL_NAME=qwen3-8b

    # --- Path Definitions ---
    export HOME=${HOME:-"/root"} # Set your home path
    export TRAIN_DATA_PATH=$HOME/data/datasets/$DATASET/train.parquet
    export TEST_DATA_PATH=$HOME/data/datasets/$DATASET/test.parquet
    export MODEL_PATH=$HOME/data/models/Qwen3-8B

    # Base output paths
    export BASE_CKPT_PATH=$HOME/ckpts
    export BASE_TENSORBOARD_PATH=$HOME/tensorboard

    # --- Key Training Hyperparameters ---
    export TRAIN_BATCH_SIZE_PER_NODE=128
    export PPO_MINI_BATCH_SIZE_PER_NODE=16
    export PPO_MICRO_BATCH_SIZE_PER_GPU=8
    export MAX_PROMPT_LENGTH=1024
    export MAX_RESPONSE_LENGTH=2048
    export ROLLOUT_GPU_MEMORY_UTILIZATION=0.45
    export ROLLOUT_N=8
    export SAVE_FREQ=30
    export TEST_FREQ=10
    export TOTAL_EPOCHS=30
    export MAX_CKPT_KEEP=5

    # ---- Megatron Parallelism Configuration ----
    export ACTOR_REF_TP=2
    export ACTOR_REF_PP=4
    export ACTOR_REF_CP=1
    export ACTOR_REF_SP=True

    # --- Distributed Training & Infrastructure ---
    export N_GPUS_PER_NODE=${N_GPUS_PER_NODE:-8}
    export NNODES=${PET_NNODES:-1}
    export NODE_RANK=${PET_NODE_RANK:-0}
    export MASTER_ADDR=${MASTER_ADDR:-localhost}

    # --- Output Paths and Experiment Naming ---
    timestamp=$(date +"%Y%m%d_%H%M%S")
    export CKPT_PATH=${BASE_CKPT_PATH}/${MODEL_NAME}_${ALG}_${DATASET}_megatron_${NNODES}nodes
    export PROJECT_NAME=siirl_${DATASET}_${ALG}
    export EXPERIMENT_NAME=siirl_${MODEL_NAME}_${ALG}_${DATASET}_megatron_experiment
    export TENSORBOARD_DIR=${BASE_TENSORBOARD_PATH}/${MODEL_NAME}_${ALG}_${DATASET}_megatron_tensorboard/dlc_${NNODES}_$timestamp
    export SIIRL_LOGGING_FILENAME=${MODEL_NAME}_${ALG}_${DATASET}_megatron_${NNODES}_$timestamp

    # --- Calculated Global Hyperparameters ---
    export TRAIN_BATCH_SIZE=$(($TRAIN_BATCH_SIZE_PER_NODE * $NNODES))
    export PPO_MINI_BATCH_SIZE=$(($PPO_MINI_BATCH_SIZE_PER_NODE * $NNODES))

    # --- Define the Training Command and its Arguments ---
    TRAINING_CMD=(
        python3 -m siirl.main_dag
        algorithm.adv_estimator=\$ALG
        data.train_files=\$TRAIN_DATA_PATH
        data.val_files=\$TEST_DATA_PATH
        data.train_batch_size=\$TRAIN_BATCH_SIZE
        data.max_prompt_length=\$MAX_PROMPT_LENGTH
        data.max_response_length=\$MAX_RESPONSE_LENGTH
        actor_rollout_ref.model.path=\$MODEL_PATH
        actor_rollout_ref.model.enable_gradient_checkpointing=True
        
        # --- Megatron Backend Configuration ---
        actor_rollout_ref.actor.strategy=megatron
        actor_rollout_ref.actor.megatron.tensor_model_parallel_size=\$ACTOR_REF_TP
        actor_rollout_ref.actor.megatron.pipeline_model_parallel_size=\$ACTOR_REF_PP
        actor_rollout_ref.actor.megatron.context_parallel_size=\$ACTOR_REF_CP
        actor_rollout_ref.actor.megatron.sequence_parallel=\$ACTOR_REF_SP
        actor_rollout_ref.actor.megatron.use_distributed_optimizer=True
        actor_rollout_ref.actor.megatron.param_dtype=bfloat16
        actor_rollout_ref.actor.megatron.param_offload=False
        
        # --- PPO & Other Hyperparameters ---
        actor_rollout_ref.actor.optim.lr=1e-6
        actor_rollout_ref.actor.ppo_mini_batch_size=\$PPO_MINI_BATCH_SIZE
        actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=\$PPO_MICRO_BATCH_SIZE_PER_GPU
        actor_rollout_ref.actor.grad_clip=1.0
        
        # --- Rollout (vLLM) Configuration ---
        actor_rollout_ref.rollout.tensor_model_parallel_size=\$ACTOR_REF_TP
        actor_rollout_ref.rollout.name=vllm
        actor_rollout_ref.rollout.gpu_memory_utilization=\$ROLLOUT_GPU_MEMORY_UTILIZATION
        actor_rollout_ref.rollout.n=\$ROLLOUT_N
        actor_rollout_ref.rollout.prompt_length=\$MAX_PROMPT_LENGTH  
        actor_rollout_ref.rollout.response_length=\$MAX_RESPONSE_LENGTH
        
        # --- Trainer Configuration ---
        trainer.logger=['console','tensorboard']
        trainer.project_name=\$PROJECT_NAME
        trainer.experiment_name=\$EXPERIMENT_NAME
        trainer.n_gpus_per_node=\$N_GPUS_PER_NODE
        trainer.nnodes=\$NNODES
        trainer.save_freq=\$SAVE_FREQ
        trainer.test_freq=\$TEST_FREQ
        trainer.total_epochs=\$TOTAL_EPOCHS
        trainer.resume_mode=auto
        trainer.max_actor_ckpt_to_keep=\$MAX_CKPT_KEEP
        trainer.default_local_dir=\$CKPT_PATH
        trainer.val_before_train=True
    )

Step 4: Checking the Results
----------------------------

During training, you can monitor the progress through several means:

1.  **Console Logs**: The console will output detailed logs. Look for initialization messages from the Megatron backend to confirm it's being used. You should see logs pertaining to the setup of 5D parallelism.

2.  **TensorBoard**: If you enabled the `tensorboard` logger, you can monitor training metrics in real-time.
    
    .. code:: bash

       tensorboard --logdir $HOME/tensorboard

    Navigate to the TensorBoard URL in your browser to view metrics such as reward, KL divergence, and loss curves.

3.  **Checkpoints**: Checkpoints will be saved in the directory specified by `CKPT_PATH`. You can use these to resume training or for inference later.


================================================
FILE: docs/examples/mm_eureka_example.rst
================================================
MM-Eureka Example with GRPO
===========================

Introduction
------------

This guide details how to fine-tune a multi-modal Large Language Model using the **Group Relative Policy Optimization (GRPO)** algorithm on the **MM-Eureka** dataset. MM-Eureka is a challenging dataset designed to test mathematical reasoning that requires interpreting both text and images.

**Paper:** https://arxiv.org/pdf/2503.07365.

**Dataset:** https://huggingface.co/datasets/FanqingM/MM-Eureka-Dataset

The goal is to enhance a model's ability to perform complex reasoning by processing visual and textual information simultaneously. We use GRPO, an advanced RL algorithm, to optimize the model's policy.

Dataset Overview
----------------

MM-Eureka problems consist of a text-based question paired with one or more images. The model must understand the content of the image to solve the problem correctly.

**An example from MM-Eureka:**

**Prompt:**
   .. image:: https://github.com/sii-research/siiRL/raw/main/docs/_static/cube.jpg
      :width: 50%

   Question: A cube loses one vertex after a 'corner' is removed. This geometric shape is ___ (fill in the number).

**Answer:**
   3

Step 1: Data Preprocessing
--------------------------

The raw MM-Eureka dataset, typically in `.jsonl` format, must be converted to Parquet. This involves not only structuring the text but also processing the associated images.

The script `examples/data_preprocess/mm_eureka.py` handles this. It performs the following actions:
- Parses each line of the input JSONL file.

- Reads the image file specified in `image_urls` and embeds its byte content directly into the Parquet file.

- Formats the user prompts to include instructions for the desired output structure (`<think>...</think><answer>...</answer>`).

- Splits the data into training and testing sets.

Run the script with your dataset file:

.. code:: bash

   cd examples/data_preprocess
   python3 mm_eureka.py --jsonl_file /path/to/your/mm_eureka_data.jsonl --output_dir ~/data/mm_eureka/

Step 2: Defining the Reward Score
---------------------------------

A custom reward function is crucial for multi-modal reasoning. For MM-Eureka, we use a composite score defined in `siirl/utils/reward_score/mm_eureka.py`. This function evaluates two aspects of the model's response:

1.  **Accuracy Reward**: This is the primary component. It parses the mathematical expression from the model's output (often in LaTeX) and compares it against the ground truth using the `math_verify` utility. This provides a robust check for mathematical correctness.
2.  **Format Reward**: A smaller, secondary reward is given if the model correctly follows the required `<think>...</think><answer>...</answer>` structure. This encourages the model to generate well-formed, interpretable reasoning chains.

The final reward is a weighted sum of these two components (e.g., `0.9 * accuracy_reward + 0.1 * format_reward`), balancing correctness with style.

Step 3: Download the Pre-trained Model
--------------------------------------

For this multi-modal task, we use a powerful vision-language model like `Qwen2.5-VL-7B-Instruct`. Ensure the model is available locally for the training script.

- **Recommended: Download via CLI:**

  .. code:: bash

     # For Hugging Face
     huggingface-cli download Qwen/Qwen2.5-VL-7B-Instruct --local-dir ~/data/models/Qwen2.5-VL-7B-Instruct
     
     # For ModelScope
     modelscope download Qwen/Qwen2.5-VL-7B-Instruct --local_dir ~/data/models/Qwen2.5-VL-7B-Instruct

- **Automatic Download:** Alternatively, specify the model identifier directly in the run script's `actor_rollout_ref.model.path` field.

Step 4: Perform GRPO Training
-----------------------------

With the data and model prepared, you can launch the training job using the GRPO algorithm.

**Training Script**

The script `examples/grpo_trainer/run_qwen2_5_vl-7b.sh` provides a complete configuration for this task. It sets up the environment, Ray cluster, and all necessary hyperparameters for GRPO training on the MM-Eureka dataset. Adapt the `HOME` path and other variables as needed for your environment.

.. literalinclude:: ../../examples/grpo_trainer/run_qwen2_5_vl-7b.sh
   :language: bash
   :caption: examples/grpo_trainer/run_qwen2_5_vl-7b.sh 

================================================
FILE: docs/hardware_tutorial/ascend_profiling_en.rst
================================================
Data Collection on Ascend Devices Based on the FSDP Backend
============================================================

Last updated: 08/14/2025.

This is a tutorial for using GRPO to collect data on Ascend devices based on the FSDP backend.

Configuration
-------------

- Global Collection Control: Use the configuration items in siirl/client/config/ppo_trainer.yaml to control the default collection mode.

Control collection parameters using parameters in ppo_trainer.yaml:

- enable: Whether to enable performance profiling.
- save_path: The path to save collected data.

- level: Collection level—options include level_none, level0, level1, and level2.
- level_none: Disables all level-based data collection (turns off profiler_level).
- level0: Collects high-level application data, low-level NPU data, and operator execution details on the NPU.
- level1: Adds CANN layer AscendCL data and AI Core performance metrics on the NPU based on level0.
- level2: Adds CANN layer Runtime data and AI CPU metrics based on level1.

- with memory: Enables memory analysis (defaults to True).
- record shapes: Enables recording of tensor shapes (defaults to False).
- with npu: Enables collection of device-side performance data (defaults to True).
- with cpu: Enables collection of host-side performance data (defaults to True).
- with module: Enables recording of framework-level Python call stack information.
- with stack: Enables recording of operator call stack information.
- analysis: Enables automatic data analysis.
- discrete: Enables discrete mode, collecting performance data for each stage separately (defaults to False).

- roles: Collection stage - used in conjunction with the discrete parameter. Options include:

generate, compute_reward, compute_old_log_prob, compute_ref_log_prob, compute_value, compute_advantage,

train_critic, train_actor

- all_ranks: Whether to collect data from all ranks.

- ranks: List of ranks for which to collect data. If empty, no data is collected.

- profile_steps: List of collection steps. For example, [2, 4] indicates that steps 2 and 4 will be collected. If set to null, no data is collected.

Example
-------
Disable collection
~~~~~~~~~~~~~~~~~~~~
.. code:: yaml

  profiler:
    enable: False # disable profile

End-to-end collection
~~~~~~~~~~~~~~~~~~~~~

.. code:: yaml

  profiler:
    steps: [1, 2, 5]
    discrete: False

The run_qwen2_5-7b-npu-e2e_prof.sh script is provided in examples/grpo_trainer for reference.

Discrete mode collection
~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: yaml

  profiler:
    discrete: True
    roles:['generate', 'train_actor']

The discrete mode acquisition script run_qwen2_5-7b-npu-discrete_prof.sh is provided in examples/grpo_trainer for reference.

Visualization
-------------

The acquired data is stored in the user-defined save_path and can be visualized using the MindStudio Insight tool，
you can refer to <https://www.hiascend.com/document/detail/zh/mindstudio/80RC1/GUI_baseddevelopmenttool/msascendinsightug/Insight_userguide_0002.html>.


If the analysis parameter is set to False, offline analysis is required after collection:

.. code:: python

        import argparse
        from torch_npu.profiler.profiler import analyse

        parser = argparse.ArgumentParser()
        parser.add_argument("--path", type=str, default="facebook/opt-125m")

        if __name__ == "__main__":
         args = parser.parse_args()
         path = args.path


================================================
FILE: docs/hardware_tutorial/ascend_quickstart.rst
================================================
Ascend NPU
==========

SiiRL is also supports for Huawei's Ascend NPU devices. This guide has been tested with the following hardware:

- Atlas 200T A2 Box16

Installation Process
--------------------

Core Environment Requirements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Ensure your environment meets these core software version requirements:

+---------------------+------------+
| Software            | Version    |
+---------------------+------------+
| Python              | == 3.10    |
+---------------------+------------+
| CANN                | == 8.1.RC1 |
+---------------------+------------+
| PyTorch             | == 2.5.1   |
+---------------------+------------+
| torch_npu           | == 2.5.1   |
+---------------------+------------+
| mindspeed(Optional) | == 0.12.1  |
+---------------------+------------+

Recommended Base Image
^^^^^^^^^^^^^^^^^^^^^^

For a smoother setup, we strongly recommend using our pre-built Docker image, which includes all necessary dependencies. Please note this pre-built docker image contains torch, torch-npu, vLLM and vLLM-Ascend packages, after pulling it you only need to install siiRL framework from source.

.. code-block:: bash

    docker pull crispig/verl_npu:cann8.1rc1-py3.10-torch2.5.1-vllm-ascend0.7.3.post1-250616

Compiling vLLM and vllm-ascend [Optional]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Proper integration of vLLM within siiRL requires compiling both `vllm` and `vllm-ascend` from source. Follow the steps below, paying close attention to the instructions specific to your hardware.

.. note::
    We recommend using the latest version of vllm v0.9.2 and vllm-ascend v0.9.0rc2, which support setting use_remove_padding=True.

.. code-block:: bash
    
    # vllm
    git clone -b v0.9.2 --depth 1 https://github.com/vllm-project/vllm.git
    cd vllm
    pip install -r requirements-build.txt

    # For Atlas 200T A2 Box16
    VLLM_TARGET_DEVICE=empty pip install -e . --extra-index https://download.pytorch.org/whl/cpu/

.. code-block:: bash
    
    # vllm-ascend
    git clone -b v0.9.0rc2 --depth 1 https://github.com/vllm-project/vllm-ascend.git
    cd vllm-ascend
    export COMPILE_CUSTOM_KERNELS=1
    python setup.py install

SiiRL Installation
^^^^^^^^^^^^^^^^^^

Finally, install the siiRL framework itself. DO NOT use the pip install command to install siiRL, it will cause dependency conflicts.

.. code-block:: bash

    git clone https://github.com/sii-research/siiRL.git
    cd siirl
    pip install -e .

Third-Party Library Considerations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Please be aware of the following specific requirements and limitations for certain libraries on Ascend hardware:

+--------------+---------------+
| Software     | Description   |
+--------------+---------------+
| transformers | v4.52.4       |
+--------------+---------------+
| flash_attn   | not supported |
+--------------+---------------+
| liger-kernel | not supported |
+--------------+---------------+
| tensordict   | 0.8.3 (ARM)   |
+--------------+---------------+

1.  Using `--flash_attention_2` through `transformers` is supported (requires `transformers` version >= 4.52.0).
2.  Flash Attention acceleration via the `flash_attn` package is not supported.
3.  `liger-kernel` is not supported.
4.  For ARM servers, `tensordict` version 0.8.3 is required. You can manually install it after the main dependencies are installed.
5.  For x86 servers, the CPU version of `torchvision` must be installed.

.. code-block:: bash

    pip install torchvision==0.20.1+cpu --index-url https://download.pytorch.org/whl/cpu

Verification with a Quick Start Example
---------------------------------------

To ensure your setup is correct, we recommend performing a quick test run. The following example trains a Qwen2.5-0.5B model on the GSM8k dataset using the GRPO algorithm.

1.  **Prepare the Dataset**
    First, download and preprocess the GSM8k dataset. The provided script will convert it to the Parquet format required by the framework.

.. code-block:: bash

    python3 examples/data_preprocess/gsm8k.py --local_dir ~/data/gsm8k

2.  **Run the Training Job**
    Next, execute the training command below. Ensure you have set the `VLLM_ATTENTION_BACKEND` environment variable.

.. code-block:: bash

    set -x

    python3 -m siirl.main_dag \
        algorithm.adv_estimator=grpo \
        data.train_files=/datasets/gsm8k/train.parquet\
        data.val_files=/datasets/gsm8k/teset.parquet \
        data.train_batch_size=1024 \
        data.max_prompt_length=1024 \
        data.max_response_length=1024 \
        data.filter_overlong_prompts=True \
        data.truncation='error' \
        actor_rollout_ref.model.path=/models/Qwen2.5-0.5B-Instruct \
        actor_rollout_ref.actor.optim.lr=5e-8 \
        actor_rollout_ref.model.use_remove_padding=False \
        actor_rollout_ref.actor.ppo_mini_batch_size=32 \
        actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=2 \
        actor_rollout_ref.actor.use_kl_loss=True \
        actor_rollout_ref.actor.entropy_coeff=0 \
        actor_rollout_ref.actor.kl_loss_coef=0.001 \
        actor_rollout_ref.actor.kl_loss_type=low_var_kl \
        actor_rollout_ref.model.enable_gradient_checkpointing=True \
        actor_rollout_ref.actor.fsdp_config.param_offload=False \
        actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
        actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=2 \
        actor_rollout_ref.rollout.tensor_model_parallel_size=4 \
        actor_rollout_ref.rollout.name=vllm \
        actor_rollout_ref.rollout.gpu_memory_utilization=0.3 \
        actor_rollout_ref.rollout.n=5 \
        actor_rollout_ref.rollout.enable_chunked_prefill=False \
        actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=2 \
        actor_rollout_ref.ref.fsdp_config.param_offload=True \
        algorithm.use_kl_in_reward=False \
        trainer.critic_warmup=0 \
        trainer.logger=['console'] \
        trainer.project_name='siirl_grpo_example_gsm8k' \
        trainer.experiment_name='qwen2_05b_function_rm' \
        trainer.n_gpus_per_node=16 \
        trainer.nnodes=$NNODES \
        trainer.save_freq=-1 \
        trainer.test_freq=5 \
        trainer.total_epochs=300 \
        trainer.device=npu $@

(Optional) Setting Up MindSpeed Training Backend Guide
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Refer to the MindSpeed README <https://gitee.com/ascend/MindSpeed>_ for instructions on installing the MindSpeed acceleration library, recommended versions: MindSpeed Core 0.12.1, Megatron-LM 0.12.2.

.. warning::

   Please Be sure to install **megatron-core** via ``pip install``.  
   Using ``PYTHONPATH`` to point to megatron will crash the program.

Enable siirl worker model ``strategy`` and set it to ``megatron``. For example: ``actor_rollout_ref.actor.strategy=megatron``.

Custom MindSpeed parameters can be passed through the override_transformer_config option. For instance, to enable FA for the actor model, you can use:
``+actor_rollout_ref.actor.megatron.override_transformer_config.use_flash_attn=True``.

MindSpeed provides the same support for siiRL and verl. For more feature details, please refer to the MindSpeed+verl documentation. <https://gitee.com/ascend/MindSpeed/blob/master/docs/user-guide/verl.md>_.


================================================
FILE: docs/hardware_tutorial/metax_quickstart.rst
================================================
MetaX(沐曦) GPU
===============

SiiRL is also supports for MetaX's GPU devices. This guide has been tested with the following hardware:

- 曦云 series GPU

Installation Process
--------------------

Recommended Base Image
^^^^^^^^^^^^^^^^^^^^^^

For a smoother setup, we strongly recommend using our pre-built Docker image, which includes all necessary dependencies. Please refer to MetaX developer website: https://developer.metax-tech.com/softnova/docker, after pulling it you only need to install siiRL framework from source.

.. code-block:: bash

    docker pull siiai/siirl-metax:maca.ai3.1.0.1-torch2.6-py310-ubuntu22.04-amd64

Start docker container
^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash
    
    docker run -d -t --net=host --uts=host --ipc=host --privileged=true --group-add video \
    --shm-size 100gb --ulimit memlock=-1 --security-opt seccomp=unconfined \
    --security-opt apparmor=unconfined --device=/dev/dri --device=/dev/mxcd --device=/dev/infiniband \
    -v /data/:/data/ \
    --name siirl \
    siiai/siirl-metax:maca.ai3.1.0.1-torch2.6-py310-ubuntu22.04-amd64 bash

SiiRL Installation
^^^^^^^^^^^^^^^^^^

Finally, install the siiRL framework itself. DO NOT use the pip install command to install siiRL, it will cause dependency conflicts.

.. code-block:: bash

    git clone https://github.com/sii-research/siiRL.git
    cd siirl
    # You need to comment out the libraries adapted for MetaX, such as ray and vllm, to prevent them from being overwritten.
    # vllm>=0.8.5.post1
    # ray[default]>=2.47.1
    pip install -r requirements.txt
    pip install -e .

Add environment variables for MetaX
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash

    # mx gpu env
    export MACA_PATH=/opt/maca
    export CUCC_PATH=${MACA_PATH}/tools/cu-bridge
    export CUDA_PATH=${CUCC_PATH}
    export MACA_CLANG_PATH=$MACA_PATH/mxgpu_llvm/bin
    export PATH=${CUDA_PATH}/bin:${MACA_CLANG_PATH}:${PATH}
    export LD_LIBRARY_PATH=${MACA_PATH}/tools/cu-bridge/lib/:${MACA_PATH}/lib:${MACA_PATH}/ompi/lib:${MACA_PATH}/mxgpu_llvm/lib:${LD_LIBRARY_PATH}
    export PYTORCH_ENABLE_SAME_RAND_A100=1
    export MCPYTORCH_DISABLE_PRINT=1
    export MAX_JOBS=20
    export VLLM_USE_V1=0
    export MCCL_ENABLE_FC=0
    export MCCL_MAX_NCHANNELS=8
    export PYTHONUNBUFFERED=1

    export MCCL_IB_HCA=mlx5
    export MCCL_SOCKET_IFNAME=ens1
    export GLOO_SOCKET_IFNAME=ens1
    export SOCKET_NIC=ens1

Verification with a Quick Start Example
---------------------------------------

To ensure your setup is correct, we recommend performing a quick test run. The following example trains a Qwen2.5-0.5B model on the GSM8k dataset using the GRPO algorithm.

1.  **Prepare the Dataset**
    First, download and preprocess the GSM8k dataset. The provided script will convert it to the Parquet format required by the framework.

.. code-block:: bash

    python3 examples/data_preprocess/gsm8k.py --local_dir ~/data/gsm8k

2.  **Run the Training Job**
    Next, execute the training command below. Ensure you have set the `VLLM_ATTENTION_BACKEND` environment variable.

.. code-block:: bash

    # --- Experiment and Model Definition ---
    export DATASET=gsm8k
    export ALG=grpo
    export MODEL_NAME=qwen2.5-05b

    # --- Path Definitions ---
    export HOME=/data/
    export TRAIN_DATA_PATH=$HOME/$DATASET/train.parquet
    export TEST_DATA_PATH=$HOME/$DATASET/test.parquet
    export MODEL_PATH=$HOME/Qwen2.5-0.5B-Instruct

    # Base output paths
    export BASE_CKPT_PATH=ckpts
    export BASE_TENSORBOARD_PATH=tensorboard

    # --- Key Training Hyperparameters ---
    export TRAIN_BATCH_SIZE_PER_NODE=512
    export PPO_MINI_BATCH_SIZE_PER_NODE=256
    export PPO_MICRO_BATCH_SIZE_PER_GPU=8
    export MAX_PROMPT_LENGTH=1024
    export MAX_RESPONSE_LENGTH=2048
    export ROLLOUT_GPU_MEMORY_UTILIZATION=0.4
    export ROLLOUT_TP=2
    export ROLLOUT_N=8
    export SAVE_FREQ=30
    export TEST_FREQ=10
    export TOTAL_EPOCHS=30
    export MAX_CKPT_KEEP=5

    # --- Multi-node (Multi-machine) distributed training environments ---

    # Uncomment the following line and set the correct network interface if needed for distributed backend

    # --- Distributed Training & Infrastructure ---
    export N_GPUS_PER_NODE=${N_GPUS_PER_NODE:-8}
    export NNODES=${PET_NNODES:-1}
    export NODE_RANK=${PET_NODE_RANK:-0}
    export MASTER_ADDR=${MASTER_ADDR:-localhost}

    # --- Output Paths and Experiment Naming ---
    export CKPT_PATH=${BASE_CKPT_PATH}/${MODEL_NAME}_${ALG}_${DATASET}_hybrid_${NNODES}nodes
    export PROJECT_NAME=siirl_${DATASET}_${ALG}
    export EXPERIMENT_NAME=siirl_${MODEL_NAME}_${ALG}_${DATASET}_experiment
    export TENSORBOARD_DIR=${BASE_TENSORBOARD_PATH}/${MODEL_NAME}_${ALG}_${DATASET}_hybrid_tensorboard/dlc_${NNODES}_$timestamp
    export SIIRL_LOGGING_FILENAME=${MODEL_NAME}_${ALG}_${DATASET}_hybrid_${NNODES}_$timestamp

    # --- Calculated Global Hyperparameters ---
    export TRAIN_BATCH_SIZE=$(($TRAIN_BATCH_SIZE_PER_NODE * $NNODES))
    export PPO_MINI_BATCH_SIZE=$(($PPO_MINI_BATCH_SIZE_PER_NODE * $NNODES))

    # mx gpu env
    export MACA_PATH=/opt/maca
    export CUCC_PATH=${MACA_PATH}/tools/cu-bridge
    export CUDA_PATH=${CUCC_PATH}
    export MACA_CLANG_PATH=$MACA_PATH/mxgpu_llvm/bin
    export PATH=${CUDA_PATH}/bin:${MACA_CLANG_PATH}:${PATH}
    export LD_LIBRARY_PATH=${MACA_PATH}/tools/cu-bridge/lib/:${MACA_PATH}/lib:${MACA_PATH}/ompi/lib:${MACA_PATH}/mxgpu_llvm/lib:${LD_LIBRARY_PATH}
    export PYTORCH_ENABLE_SAME_RAND_A100=1
    export MCPYTORCH_DISABLE_PRINT=1
    export MAX_JOBS=20
    export VLLM_USE_V1=0
    export MCCL_ENABLE_FC=0

    export MCCL_MAX_NCHANNELS=8
    export PYTHONUNBUFFERED=1
    export MCCL_IB_HCA=mlx5
    export MCCL_SOCKET_IFNAME=ens1
    export GLOO_SOCKET_IFNAME=ens1
    export SOCKET_NIC=ens1

    # --- Define the Training Command and its Arguments ---
    TRAINING_CMD=(
        python3 -m siirl.main_dag
        algorithm.adv_estimator=\$ALG
        data.train_files=\$TRAIN_DATA_PATH
        data.val_files=\$TEST_DATA_PATH
        data.train_batch_size=\$TRAIN_BATCH_SIZE
        data.max_prompt_length=\$MAX_PROMPT_LENGTH
        data.max_response_length=\$MAX_RESPONSE_LENGTH
        data.filter_overlong_prompts=True
        data.truncation='error'
        data.shuffle=False
        actor_rollout_ref.model.path=\$MODEL_PATH
        actor_rollout_ref.actor.optim.lr=1e-6
        actor_rollout_ref.model.use_remove_padding=True
        actor_rollout_ref.model.use_fused_kernels=False
        actor_rollout_ref.actor.policy_drift_coeff=0.001
        actor_rollout_ref.actor.ppo_mini_batch_size=\$PPO_MINI_BATCH_SIZE
        actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=\$PPO_MICRO_BATCH_SIZE_PER_GPU
        actor_rollout_ref.actor.use_kl_loss=True
        actor_rollout_ref.actor.grad_clip=0.5
        actor_rollout_ref.actor.clip_ratio=0.2
        actor_rollout_ref.actor.kl_loss_coef=0.01
        actor_rollout_ref.actor.kl_loss_type=low_var_kl
        actor_rollout_ref.model.enable_gradient_checkpointing=True
        actor_rollout_ref.actor.fsdp_config.param_offload=True
        actor_rollout_ref.actor.fsdp_config.optimizer_offload=True
        actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=\$PPO_MICRO_BATCH_SIZE_PER_GPU
        actor_rollout_ref.rollout.tensor_model_parallel_size=\$ROLLOUT_TP
        actor_rollout_ref.rollout.name=vllm
        actor_rollout_ref.rollout.gpu_memory_utilization=\$ROLLOUT_GPU_MEMORY_UTILIZATION
        actor_rollout_ref.rollout.max_model_len=\$MAX_RESPONSE_LENGTH
        actor_rollout_ref.rollout.enable_chunked_prefill=False
        actor_rollout_ref.rollout.enforce_eager=False
        actor_rollout_ref.rollout.free_cache_engine=False
        actor_rollout_ref.rollout.n=\$ROLLOUT_N
        actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=\$PPO_MICRO_BATCH_SIZE_PER_GPU
        actor_rollout_ref.ref.fsdp_config.param_offload=True
        algorithm.weight_factor_in_cpgd='STD_weight'
        algorithm.kl_ctrl.kl_coef=0.001
        trainer.critic_warmup=0
        trainer.logger=['console','tensorboard']
        trainer.project_name=\$PROJECT_NAME
        trainer.experiment_name=\$EXPERIMENT_NAME
        trainer.n_gpus_per_node=\$N_GPUS_PER_NODE
        trainer.nnodes=\$NNODES
        trainer.save_freq=\$SAVE_FREQ
        trainer.test_freq=\$TEST_FREQ
        trainer.total_epochs=\$TOTAL_EPOCHS
        trainer.resume_mode=auto
        trainer.max_actor_ckpt_to_keep=\$MAX_CKPT_KEEP
        trainer.default_local_dir=\$CKPT_PATH
        trainer.val_before_train=False
    )

    # ===================================================================================
    # ===                  MAIN EXECUTION LOGIC & INFRASTRUCTURE                      ===
    # ===================================================================================

    # --- Boilerplate Setup ---
    set -e
    set -o pipefail
    set -x

    # --- Infrastructure & Boilerplate Functions ---
    start_ray_cluster() {
        local RAY_HEAD_WAIT_TIMEOUT=600
        export RAY_RAYLET_NODE_MANAGER_CONFIG_NIC_NAME=${INTERFACE_NAME}
        export RAY_GCS_SERVER_CONFIG_NIC_NAME=${INTERFACE_NAME}
        export RAY_RUNTIME_ENV_AGENT_CREATION_TIMEOUT_S=1200
        export RAY_GCS_RPC_CLIENT_CONNECT_TIMEOUT_S=120

        local ray_start_common_opts=(
            --num-gpus "$N_GPUS_PER_NODE"
            --object-store-memory 100000000000
            --memory 100000000000
        )

        if [ "$NNODES" -gt 1 ]; then
            if [ "$NODE_RANK" = "0" ]; then
                echo "INFO: Starting Ray head node on $(hostname)..."
                export RAY_ADDRESS="$RAY_MASTER_ADDR:$RAY_MASTER_PORT"
                ray start --head --port="$RAY_MASTER_PORT" --dashboard-port="$RAY_DASHBOARD_PORT" "${ray_start_common_opts[@]}" --system-config='{"gcs_server_request_timeout_seconds": 60, "gcs_rpc_server_reconnect_timeout_s": 60}'
                local start_time=$(date +%s)
                while ! ray health-check --address "$RAY_ADDRESS" &>/dev/null; do
                    if [ "$(( $(date +%s) - start_time ))" -ge "$RAY_HEAD_WAIT_TIMEOUT" ]; then echo "ERROR: Timed out waiting for head node. Exiting." >&2; ray stop --force; exit 1; fi
                    echo "Head node not healthy yet. Retrying in 5s..."
                    sleep 5
                done
                echo "INFO: Head node is healthy."
            else
                local head_node_address="$MASTER_ADDR:$RAY_MASTER_PORT"
                echo "INFO: Worker node $(hostname) waiting for head at $head_node_address..."
                local start_time=$(date +%s)
                while ! ray health-check --address "$head_node_address" &>/dev/null; do
                    if [ "$(( $(date +%s) - start_time ))" -ge "$RAY_HEAD_WAIT_TIMEOUT" ]; then echo "ERROR: Timed out waiting for head. Exiting." >&2; exit 1; fi
                    echo "Head not healthy yet. Retrying in 5s..."
                    sleep 5
                done
                echo "INFO: Head is healthy. Worker starting..."
                ray start --address="$head_node_address" "${ray_start_common_opts[@]}"
            fi
        else
            echo "INFO: Starting Ray in single-node mode..."
            ray start --head "${ray_start_common_opts[@]}"
        fi
    }

    # --- Main Execution Function ---
    main() {
        local timestamp=$(date +"%Y%m%d_%H%M%S")
        ray stop --force

        # export VLLM_USE_V1=0
        export GLOO_SOCKET_TIMEOUT=600
        export GLOO_TCP_TIMEOUT=600
        export GLOO_LOG_LEVEL=DEBUG
        export RAY_MASTER_PORT=${RAY_MASTER_PORT:-6379}
        export RAY_DASHBOARD_PORT=${RAY_DASHBOARD_PORT:-8265}
        export RAY_MASTER_ADDR=$MASTER_ADDR
        
        start_ray_cluster

        if [ "$NNODES" -gt 1 ] && [ "$NODE_RANK" = "0" ]; then
            echo "Waiting for all $NNODES nodes to join..."
            local TIMEOUT=600; local start_time=$(date +%s)
            while true; do
                if [ "$(( $(date +%s) - start_time ))" -ge "$TIMEOUT" ]; then echo "Error: Timeout waiting for nodes." >&2; exit 1; fi
                local ready_nodes=$(ray list nodes --format=json | python3 -c "import sys, json; print(len(json.load(sys.stdin)))")
                if [ "$ready_nodes" -ge "$NNODES" ]; then break; fi
                echo "Waiting... ($ready_nodes / $NNODES nodes ready)"
                sleep 5
            done
            echo "All $NNODES nodes have joined."
        fi

        if [ "$NODE_RANK" = "0" ]; then
            echo "INFO [RANK 0]: Starting main training command."
            eval "${TRAINING_CMD[@]}" "$@"
            echo "INFO [RANK 0]: Training finished."
            sleep 30; ray stop --force >/dev/null 2>&1
        elif [ "$NNODES" -gt 1 ]; then
            local head_node_address="$MASTER_ADDR:$RAY_MASTER_PORT"
            echo "INFO [RANK $NODE_RANK]: Worker active. Monitoring head node at $head_node_address."
            while ray health-check --address "$head_node_address" &>/dev/null; do sleep 15; done
            echo "INFO [RANK $NODE_RANK]: Head node down. Exiting."
        fi

        echo "INFO: Script finished on rank $NODE_RANK."
    }

    # --- Script Entrypoint ---
    main "$@"
    !/usr/bin/env bash



================================================
FILE: docs/index.rst
================================================
.. siiRL documentation master file, created by
   sphinx-quickstart on Wed Jul  9 15:26:45 2025.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

siiRL documentation
===================

.. toctree::
   :maxdepth: 2
   :caption: Quickstart

   start/install
   start/quickstart

.. toctree::
   :maxdepth: 2
   :caption: Programming guide

   programming_guide/siirl_architecture_guide
   programming_guide/code_structure
   programming_guide/siiRL_code_explained
   programming_guide/srpo_code_explained

.. toctree::
   :maxdepth: 1
   :caption: Data Preparation

   preparation/prepare_data
   preparation/reward_function

.. toctree::
   :maxdepth: 2
   :caption: User Define Interface

   user_interface/filter_interface
   user_interface/reward_interface
   user_interface/pipeline_interface
   user_interface/metrics_interface

.. toctree::
   :maxdepth: 2
   :caption: Configurations

   examples/config

.. toctree::
   :maxdepth: 1
   :caption: Example

   examples/deepscaler_example
   examples/mm_eureka_example
   examples/cpgd_example
   examples/megatron_backend_example
   examples/embodied_srpo_example

.. toctree::
   :maxdepth: 1
   :caption: Hardware Support

   hardware_tutorial/ascend_quickstart
   hardware_tutorial/ascend_profiling_en
   hardware_tutorial/metax_quickstart


================================================
FILE: docs/preparation/prepare_data.rst
================================================
Prepare Data for Post-Training
========================================

Before starting the post-training job, we need to prepare the data for policy training. The data should be preprocessed and stored in Parquet format, which facilitates efficient distributed data loading and processing.

We provide several data preprocessing scripts for popular datasets under the ``examples/data_preprocess/`` directory, such as ``gsm8k.py``, ``math_dataset.py``, and ``deepscaler.py``. To support a new custom dataset, you will need to create a similar script.

This document uses the ``DeepScaleR`` dataset as an example to detail the data preparation process and its specifications.

General Data Preprocessing Workflow
-----------------------------------

A typical data preprocessing script involves the following steps:

1.  **Load Raw Data**: Use a library like Hugging Face's ``datasets`` to load the original dataset from the Hub or local files.
2.  **Define Processing Logic**: Implement a core mapping function (which we often name ``make_map_fn``) to convert each sample from the original dataset into the specific format required by our framework.
3.  **Apply Transformation and Save**: Use the ``datasets.map()`` method to apply this function to the entire dataset. Then, save the processed result in Parquet format locally, with an option to upload it to a distributed file system like HDFS.

Here is a simplified framework of the process:

.. code:: python

   import argparse
   import os
   import datasets
   from siirl.utils.extras.hdfs_io import copy, makedirs

   def make_map_fn(split_name):
       # ... Define your data processing logic here ...
       def process_fn(example, idx):
           # ... Transform each data sample ...
           return transformed_data
       return process_fn

   if __name__ == '__main__':
       parser = argparse.ArgumentParser()
       # ... Define arguments ...
       args = parser.parse_args()

       # 1. Load data
       raw_dataset = datasets.load_dataset(...)
       
       # 2. Apply transformation
       processed_dataset = raw_dataset.map(function=make_map_fn('train'), with_indices=True)

       # 3. Save as Parquet
       local_dir = args.local_dir
       processed_dataset.to_parquet(os.path.join(local_dir, "train.parquet"))

       # (Optional) Upload to HDFS
       if args.hdfs_dir:
           makedirs(args.hdfs_dir)
           copy(src=local_dir, dst=args.hdfs_dir)


DeepScaleR Dataset Processing in Practice
-------------------------------------------

Let's take ``examples/data_preprocess/deepscaler.py`` as a concrete example to demonstrate how to process the ``agentica-org/DeepScaleR-Preview-Dataset``.

The core task is to implement the ``make_map_fn`` function, which maps original fields (like ``problem``, ``answer``, and ``solution``) to the standard format required by the training framework.

.. code:: python

   data_source = "agentica-org/DeepScaleR-Preview-Dataset"
   instruction_following = 'Let\'s think step by step and output the final within \\boxed{}.'

   def make_map_fn(split_name):

       def process_fn(example, idx):
           question_raw = example.pop("problem") 
           answer_raw = example.pop("answer") 

           question = question_raw + " " + instruction_following 
           solution = example.pop("solution") 
           data = {
               "data_source": data_source,
               "prompt": [
                   {
                   "role": "user",
                       "content": question,
                   }
               ],
               "ability": "math",
               "reward_model": {"style": "rule", "ground_truth": answer_raw},
               "extra_info": {
                   "split": split_name,
                   "index": idx,
                   "answer": solution, 
                   "question": question_raw, 
               },
           }
           
           return data

       return process_fn

Data Format Specification
-------------------------

To ensure the framework can correctly parse and utilize the data, each sample processed by ``make_map_fn`` must contain the following five key fields:

1.  ``data_source``: A string indicating the source or name of the dataset. This field is used to dynamically select the corresponding reward function during training.
    - Example: ``"agentica-org/DeepScaleR-Preview-Dataset"``

2.  ``prompt``: A list used to construct the model's input, formatted to be compatible with Hugging Face's Chat Template. The data loader will automatically apply the template and tokenize the input.
    - Example: ``[{"role": "user", "content": "What is 2+2? Let's think step by step..."}]``

3.  ``ability``: A string defining the domain or capability of the current task, such as ``"math"``, ``"coding"``, or ``"general"``.

4.  ``reward_model``: A dictionary containing information needed to calculate the reward. Currently, the ``ground_truth`` field is primarily used during evaluation.
    - **Note**: The ``ground_truth`` you provide must align with the logic of the corresponding reward function you implement. For a math problem, it might be the standard answer; for code generation, it could be a set of unit tests.
    - Example: ``{"style": "rule", "ground_truth": "\\boxed{4}"}``

5.  ``extra_info``: A dictionary for storing additional metadata, such as the original dataset split (train/test) or sample index. This information is not used directly in training but is useful for debugging and data traceability.

By following these specifications, you can prepare your dataset to be used smoothly within the SiiRL post-training pipeline.

================================================
FILE: docs/preparation/reward_function.rst
================================================
Implementing Reward Functions for Datasets
===========================================

In Reinforcement Learning for LLMs, the reward function is a critical component that guides the model's learning process. It quantitatively evaluates the quality of a generated response, signaling what constitutes a "good" or "bad" output. Our framework provides a flexible system for defining these rewards, supporting both pre-implemented logic for common datasets and fully customized functions for specific tasks.

The RewardManager
-----------------

The ``RewardManager`` is the central hub for reward computation. As defined in `siirl/scheduler/reward.py`, its primary role is to orchestrate the scoring of generated responses by invoking a specified scoring function. Different managers, like `NaiveRewardManager` or `BatchRewardManager`, offer different strategies for handling this process. This design is consistent with the `verl` framework's architecture. [1]_

The typical workflow is as follows:
1. The manager receives a `DataProto` object, which is a batch containing all necessary information.
2. It extracts relevant fields, such as the model's generated text (`solution_strs`) and the reference answer (`ground_truth`).
3. It passes this data to a designated scoring function (`compute_score_fn`) to calculate the reward for each item in the batch.

This design allows the core training loop to remain agnostic to the specifics of reward calculation, which are neatly encapsulated within the manager and its scoring function.

Reward Function Implementations
-------------------------------

You can define reward logic in two ways: by using our pre-built functions or by creating your own.

Pre-implemented Functions
~~~~~~~~~~~~~~~~~~~~~~~~~

For standard benchmarks, we provide ready-to-use reward functions in the `siirl/utils/reward_score/` directory. These cover datasets like `GSM8K` and `MATH`, implementing their standard evaluation logic. For instance, the `GSM8K` scorer extracts the final numerical answer and compares it to the ground truth.

Customized Functions
~~~~~~~~~~~~~~~~~~~~

For novel tasks or custom evaluation criteria, you can supply your own reward function. This is configured via two parameters: `custom_reward_function.path` and `custom_reward_function.name`.

Let's consider a practical example from the `run_qwen2_5-7b-custom_reward.sh` script, which uses a batch-processing reward function for efficiency.

**1. Configuration in the script:**

The script specifies the path to the custom code, the function to use, and selects the `BatchRewardManager` to execute it.

.. code-block:: bash

   # ... other configurations ...
   python3 -m siirl.main_dag \
       # ...
       custom_reward_function.path=$HOME/rl/rewardfunc_gsm8k.py \
       custom_reward_function.name=compute_score \
       reward_model.reward_manager=batch \
       # ...

**2. Implementation of the reward function:**

The corresponding `rewardfunc_gsm8k.py` file implements the `compute_score` function. This function is designed to process an entire batch of solutions at once, which is significantly more efficient than processing them one by one.

.. code:: python

   import re

   def extract_solution(solution_str, method="strict"):
       # ... (logic to extract the final answer from text)
       # For example, finds the number after "####"
       if method == "strict":
           solution = re.search("#### (\\-?[0-9\\.\\,]+)", solution_str)
           if solution is None: return None
           final_answer = solution.group(0).split("#### ")[1].replace(",", "")
           return final_answer
       # ... other extraction logic ...

   def compute_score(data_sources, solution_strs, ground_truths, extra_infos, method="strict", score=1.0, **kwargs):
       """
       Computes scores for a batch of solutions.
       """
       scores = []
       for solution_str, ground_truth in zip(solution_strs, ground_truths):
           answer = extract_solution(solution_str=solution_str, method=method)
           if answer is not None and answer == ground_truth:
               scores.append(score)
           else:
               scores.append(0.0)
       return scores

The function signature should accept lists of `solution_strs` and `ground_truths`. You can also pass custom parameters from your configuration, like `method` or `score`, by defining them under `custom_reward_function.reward_kwargs`. This allows you to easily experiment with different reward schemes without changing the code.

.. [1] https://verl.readthedocs.io/en/latest/preparation/reward_function.html

================================================
FILE: docs/programming_guide/code_structure.rst
================================================
===============
Code Structure
===============

This document describes the code structure and architecture of siiRL.

Directory Structure
-------------------

.. code-block:: text

   siirl/
   ├── main_dag.py                   # Main entry point
   ├── dag_worker/                   # DAG Worker implementation
   ├── execution/                    # Execution engine
   ├── engine/                       # Model engine
   ├── data_coordinator/             # Data coordination
   ├── params/                       # Configuration parameters
   ├── environment/                  # Environment abstraction
   └── user_interface/               # User interface

Core Modules
------------

dag_worker/
~~~~~~~~~~~

DAG execution unit, one worker per GPU.

.. code-block:: text

   dag_worker/
   ├── dagworker.py              # Core Worker class (~1320 lines)
   ├── core_algos.py             # RL algorithm implementations
   ├── dag_utils.py              # Utility functions
   ├── checkpoint_manager.py     # Checkpoint management
   ├── metrics_collector.py      # Metrics collection
   ├── metric_aggregator.py      # Metrics aggregation
   ├── validator.py              # Validation logic
   ├── constants.py              # Constants
   └── data_structures.py        # Data structures

**Responsibilities:**

- Execute TaskGraph nodes
- Manage model Workers (Actor/Critic/Rollout/Reference/Reward)
- Data flow and caching
- Metrics collection and reporting
- Checkpoint saving and loading

execution/
~~~~~~~~~~

Execution engine for DAG definition, scheduling, and metrics aggregation.

.. code-block:: text

   execution/
   ├── dag/                      # DAG definition
   │   ├── task_graph.py         # TaskGraph class
   │   ├── node.py               # Node class
   │   ├── builtin_pipelines.py  # Built-in Pipelines
   │   ├── pipeline.py           # Pipeline Builder API
   │   ├── config_loader.py      # Configuration loader
   │   └── task_loader.py        # Task loader
   ├── scheduler/                # Task scheduling
   │   ├── task_scheduler.py     # Task scheduler
   │   ├── launch.py             # Ray launcher
   │   ├── process_group_manager.py  # Process group manager
   │   ├── graph_updater.py      # Graph updater
   │   ├── reward.py             # Reward scheduler
   │   ├── enums.py              # Enum definitions
   │   └── resource_manager.py   # Resource manager
   ├── metric_worker/            # Distributed metrics aggregation
   │   ├── metric_worker.py      # MetricWorker
   │   └── utils.py
   └── rollout_flow/             # Rollout flow
       ├── multi_agent/          # Multi-agent support
       └── multiturn/            # Multi-turn interaction

**Responsibilities:**

- DAG definition and validation
- Task scheduling and resource allocation
- Distributed metrics collection
- Multi-agent/multi-turn interaction flow

engine/
~~~~~~~

Model execution engine containing all model workers.

.. code-block:: text

   engine/
   ├── actor/                    # Actor models
   │   ├── base.py
   │   ├── dp_actor.py           # FSDP Actor
   │   ├── megatron_actor.py     # Megatron Actor
   │   └── embodied_actor.py     # Embodied Actor
   ├── critic/                   # Critic models
   │   ├── base.py
   │   ├── dp_critic.py
   │   └── megatron_critic.py
   ├── rollout/                  # Rollout engine
   │   ├── base.py
   │   ├── vllm_rollout/         # vLLM backend
   │   ├── sglang_rollout/       # SGLang backend
   │   ├── hf_rollout.py         # HuggingFace backend
   │   └── embodied_rollout.py   # Embodied Rollout
   ├── reward_model/             # Reward models
   ├── reward_manager/           # Reward managers
   │   ├── naive.py              # Simple reward
   │   ├── batch.py              # Batch Reward Model
   │   ├── parallel.py           # Parallel Reward Model
   │   ├── dapo.py               # DAPO Reward
   │   └── embodied.py           # Embodied Reward
   ├── sharding_manager/         # Sharding management
   ├── base_worker/              # Worker base classes
   ├── fsdp_workers.py           # FSDP Worker
   └── megatron_workers.py       # Megatron Worker

**Responsibilities:**

- Training and inference for Actor/Critic/Rollout/Reference/Reward models
- Support for FSDP and Megatron backends
- Support for vLLM/SGLang/HuggingFace inference backends

data_coordinator/
~~~~~~~~~~~~~~~~~

Data coordinator for distributed data management.

.. code-block:: text

   data_coordinator/
   ├── data_buffer.py            # Distributed data buffer
   ├── dataloader/               # Data loading
   │   ├── data_loader_node.py
   │   ├── partitioned_dataset.py
   │   ├── embodied_preprocess.py
   │   └── vision_utils.py
   ├── protocol.py               # Data protocol
   └── sample.py                 # Sampling logic

**Responsibilities:**

- Distributed data buffering (per-server)
- Data loading (per-GPU)
- Data redistribution and load balancing

params/
~~~~~~~

Parameter configuration using Hydra.

.. code-block:: text

   params/
   ├── __init__.py               # SiiRLArguments
   ├── parser.py                 # Configuration parser
   ├── data_args.py              # Data parameters
   ├── model_args.py             # Model parameters
   ├── training_args.py          # Training parameters
   ├── dag_args.py               # DAG parameters
   ├── embodied_args.py          # Embodied parameters
   └── profiler_args.py          # Profiler parameters

environment/
~~~~~~~~~~~~

Environment abstraction for Embodied AI and multi-agent systems.

.. code-block:: text

   environment/
   └── embodied/
       ├── base.py               # Environment base class
       ├── venv.py               # Vectorized environment
       └── adapters/             # Environment adapters
           └── libero.py         # Libero adapter

user_interface/
~~~~~~~~~~~~~~~

User-defined interfaces.

.. code-block:: text

   user_interface/
   ├── filter_interface/
   │   ├── dapo.py               # DAPO dynamic sampling
   │   └── embodied.py           # Embodied data filtering
   └── rewards_interface/
       └── custom_gsm8k_reward.py  # Custom reward example

**Purpose:** Provides interfaces for user-defined node functions.

Data Structures
---------------

NodeOutput
~~~~~~~~~~

Return value from node execution.

.. code-block:: python

   @dataclass
   class NodeOutput:
       batch: Any              # Data batch
       metrics: Dict = None    # Metrics
       info: Dict = None       # Additional info

Node
~~~~

DAG node definition.

.. code-block:: python

   @dataclass
   class Node:
       node_id: str                    # Node ID
       node_type: NodeType             # Node type
       node_role: NodeRole             # Node role
       dependencies: List[str]         # Dependency nodes
       executable: Callable            # Executable function
       executable_ref: str             # Function path
       only_forward_compute: bool      # Forward only

Enumerations
~~~~~~~~~~~~

**NodeType:**

.. code-block:: python

   class NodeType(Enum):
       MODEL_INFERENCE = "model_inference"
       MODEL_TRAIN = "model_train"
       COMPUTE = "compute"
       DATA_LOAD = "data_load"

**NodeRole:**

.. code-block:: python

   class NodeRole(Enum):
       ROLLOUT = "rollout"
       ACTOR = "actor"
       CRITIC = "critic"
       REFERENCE = "reference"
       REWARD = "reward"
       ADVANTAGE = "advantage"
       DYNAMIC_SAMPLING = "dynamic_sampling"
       DEFAULT = "default"

**AdvantageEstimator:**

.. code-block:: python

   class AdvantageEstimator(Enum):
       GRPO = "grpo"
       GAE = "gae"
       CPGD = "cpgd"
       GSPO = "gspo"

**WorkflowType:**

.. code-block:: python

   class WorkflowType(Enum):
       DEFAULT = "DEFAULT"
       DAPO = "DAPO"
       EMBODIED = "EMBODIED"

Execution Flow
--------------

Startup Flow (main_dag.py)
~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: text

   1. Parse configuration (parse_config)
   2. Load Pipeline (load_pipeline)
   3. Initialize DataBuffer (init_data_coordinator)
   4. Initialize MetricWorker
   5. Task scheduling (TaskScheduler)
   6. Launch Ray cluster (RayTrainer)
   7. Create DAGWorker (one per GPU)
   8. Execute training (DAGWorker.execute_task_graph)

DAGWorker Execution Flow
~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: text

   1. Initialize Workers (Actor/Critic/Rollout/Reference/Reward)
   2. Initialize DataLoader
   3. Initialize Validator
   4. Load Checkpoint (if exists)
   5. Training loop:
      - Load data
      - Execute nodes in topological order
      - Collect metrics
      - Save Checkpoint
      - Validate (if needed)

Node Execution Flow
~~~~~~~~~~~~~~~~~~~

.. code-block:: text

   1. DAGWorker gets node's executable function
   2. Call function with current batch
   3. Function processes data, returns NodeOutput
   4. Update batch, pass to next node
   5. Collect node metrics

Key Concepts
------------

TaskGraph
~~~~~~~~~

Directed Acyclic Graph representing training workflow.

**Core Methods:**

- ``add_node()``: Add node
- ``build_adjacency_lists()``: Build adjacency lists
- ``validate_graph()``: Validate DAG
- ``get_execution_order()``: Get topological sort

Pipeline
~~~~~~~~

Declarative API for building TaskGraph.

**Core Methods:**

- ``add_node()``: Add node (supports chaining)
- ``build()``: Build and validate TaskGraph

DAGWorker Class
~~~~~~~~~~~~~~~

Execution unit per GPU.

**Core Methods:**

- ``generate()``: Rollout generation
- ``compute_reward()``: Compute reward
- ``compute_advantage()``: Compute advantage
- ``compute_old_log_prob()``: Old policy log prob
- ``compute_ref_log_prob()``: Reference model log prob
- ``compute_value()``: Value function (PPO)
- ``train_actor()``: Train actor
- ``train_critic()``: Train critic (PPO)

Configuration Parameters
------------------------

Main Configuration Groups
~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: yaml

   algorithm:
     adv_estimator: grpo  # grpo/gae/cpgd/gspo
     workflow_type: DEFAULT  # DEFAULT/DAPO/EMBODIED

   data:
     train_files: /path/to/train.parquet
     train_batch_size: 512
     max_prompt_length: 2048
     max_response_length: 4096

   actor_rollout_ref:
     model:
       path: /path/to/model
     actor:
       optim:
         lr: 1e-6
       ppo_mini_batch_size: 256
     rollout:
       name: vllm  # vllm/sglang/hf
       tensor_model_parallel_size: 2
       n: 8  # GRPO group size

   trainer:
     n_gpus_per_node: 8
     nnodes: 1
     total_epochs: 30
     save_freq: 10

   dag:
     custom_pipeline_fn: null  # Custom Pipeline

Extension Points
----------------

Custom Pipeline
~~~~~~~~~~~~~~~

Add new functions in ``siirl/execution/dag/builtin_pipelines.py``.

Custom Node Functions
~~~~~~~~~~~~~~~~~~~~~

Implement functions following the signature:

.. code-block:: python

   def my_node(batch, config=None, **kwargs) -> NodeOutput:
       return NodeOutput(batch=batch, metrics={})

Custom Reward Manager
~~~~~~~~~~~~~~~~~~~~~

Add new classes in ``siirl/engine/reward_manager/``.

Custom Environment
~~~~~~~~~~~~~~~~~~

Add new environment classes in ``siirl/environment/``.


================================================
FILE: docs/programming_guide/siiRL_code_explained.rst
================================================
siiRL's Implementation Explained
================================

siiRL is under active development with an extensive roadmap for future enhancements. We strongly encourage community participation in this endeavor. Contributions in any form are highly valued, including but not limited to: filing issues, proposing new features, enhancing documentation, and providing suggestions for improvement.

Overall Implementation
----------------------

RL training itself has clear workflow characteristics, and DAG is the mainstream tool for describing workflows. Therefore, the source code of siiRL adopts a DAG-based design pattern. In terms of specific implementation, siiRL abstracts the entire RL training task into a TaskGraph composed of multiple Nodes, each of which implements the ``node.run()`` method to support the abstract orchestration of the top-level TaskGraph. The constructed TaskGraph is submitted to a set of DAGWorkers for execution.

In the context of multi-agent RL training, different DAGWorkers can process different TaskGraphs in parallel, and the data that different TaskGraphs depend on and process may also vary. Therefore, from a structural perspective, siiRL belongs to the MPMD paradigm.

In terms of user usage, in addition to the configurations related to Data/Trainer/Model/RL Algorithm used by mainstream RL frameworks, siiRL also provides DAG config, which supports users to customize workflows. The system will parse the DAG configuration when the training starts and correspondingly construct a TaskGraph instance.

Complex task workflow poses higher requirements for resource scheduling. To achieve fine-grained allocation of GPUs, siiRL implements a set of TaskScheduler, which is responsible for making globally optimal scheduling decisions, such as: how much computing resources to allocate to each TaskGraph, and specifically which GPU devices on which servers to use. Finally, the allocation plan generated by TaskScheduler is handed over to the underlying Ray framework for specific execution, making full use of Ray's distributed computing capabilities.

.. figure:: ../../asset/code_explained/siirl_arch.png
   :width: 60%
   :align: center
   :alt: Overall Architecture of siiRL's Code Implementation

   Figure 1: Overall Architecture of siiRL

We will first provide an overview diagram of the siiRL source code implementation, and then, in the following text, we will introduce each part of the diagram in detail according to the actual execution process.

.. figure:: ../../asset/code_explained/overview_diagram.png
   :width: 100%
   :align: center
   :alt: Diagram of Source Code Implementation

   Figure 2: Diagram of Source Code Implementation

Environment Abstraction
-----------------------

During initial RL stage of LLMs, the environment typically refers to the datasets used in post-training. siiRL abstracts the concept of environment to uniformly support RL tasks in different application areas, such as MCP calls and SandBox Server in agentic training scenarios, as well as simulators in the embodied AI domain, or real physical environments for agent interaction.

Similar to OpenAI Gym, siiRL defines two core asynchronous methods:

- ``reset()``: Resets the environment to its initial state and returns the initial observation. This function marks the start of a new episode.
- ``step(actions)``: Receives actions from one or multiple agents, executes these actions, updates the environment state, and returns a tuple containing (observation at the next time step, reward, information). This is the main loop for agent-environment interaction.

Taking the MathEnv of mathematical tasks as an example, the environment natively supports multiple agents. The step function receives a complex number of actions, and the returned observations are also an array prepared for each agent.

.. code-block:: python

   class MathEnv(BaseEnvironment):
       async def reset(self, dp_rank: int, ddp_world_size: int, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None):
           # ...
           obs = np.array([self.current_state for _ in range(self.n_agents)], dtype=np.object_)
           self.step_count = 0
           return obs
       
       async def step(self, actions):
           # ...
           return next_obs, rewards, infos

Control Flow: Pipeline
----------------------

The main pipeline of the siiRL control flow is shown in the figure below. First, load the configuration of the interactive environment, then sequentially complete the initialization of DataBuffer, the loading and parsing of DAG configuration, and the construction of TaskGraph. After the TaskGraph is constructed, the TaskScheduler schedules (makes decisions on) tasks, determining how many GPUs to allocate to each task and calculating the specific allocation topology. Then, use Ray to construct a distributed process group and initialize RayTrainer. Finally, initialize DAGWorker (Ray's Actor) and start the training task.

.. figure:: ../../asset/code_explained/pipeline.png
   :width: 40%
   :align: center
   :alt: Pipeline of Control Flow

   Figure 3: Pipeline of Control Flow

DataLoader and DataBuffer
-------------------------

DataLoader is a wrapper for torch's StatefulDataLoader, which, in combination with the custom PartitionedRLHFDataset, is responsible for tasks such as loading, preprocessing, and batching of training data. Different from other RL open-source frameworks, DataLoader in siiRL is also abstracted as a Node (DataLoaderNode) and embedded into the TaskGraph for execution. Under normal cluster scale and RL tasks, siiRL launches a data_loader process for each GPU rank, which is responsible for loading the data shard corresponding to the DAGWorker on the current rank.

.. code-block:: python

   class DataLoaderNode(Node):
       """
       Represents a data loader node in the DAG.
       This version uses the PartitionedRLHFDataset for efficient, memory-safe
       distributed data loading. Each rank only loads and processes its own data slice.
       """
       def run(self, epoch: Optional[int] = None, is_validation_step: bool = False, **kwargs: Any) -> Any:
           """
           Executes the data loading process for a given step or validation.
           """
           try:
               # for validation
               if is_validation_step:
                   try:
                       batch = next(self._current_val_iter)
               # for training
               else:  
                   try:
                       batch = next(self._current_train_iter)
               return batch

DataBuffer is essentially a distributed KV Store, maintained by an independent Ray Actor process. Typically, DataLoader is per-gpu, while DataBuffer is per-server. In static batching mode, siiRL checks the load balance when creating DataBuffer, as shown in the figure below. For example, if the training batch size is 128, it needs to be divisible by the number of servers to ensure that a global batch can be evenly distributed among servers. Similarly, the batch size allocated to a server, after being replicated ``n`` times (the group size in GRPO, or ``n = 1`` if it is PPO), also needs to be divisible by 8 to ensure that it can be evenly distributed among GPUs on the same server.

.. figure:: ../../asset/code_explained/data_loader.png
   :width: 95%
   :align: center
   :alt: Diagram of Source Code Implementation

   Figure 4: DataLoader, DataBuffer and Load Balance

TaskGraph Scheduling
--------------------

The core of TaskGraph is a dictionary composed of Nodes, and TaskGraph uses adjacency lists and reverse adjacency lists to represent the connection relationships between these Nodes. Among them, the reverse adjacency list is mainly used for dependency checking, such as Actor's training depending on rollout's generation. Meanwhile, TaskGraph provides a series of graph operation methods, such as adding, deleting, modifying, and querying nodes, DAG verification, copying, and displaying the graph, to implement the management of TaskGraph.

.. code-block:: python

   class Node:
       """
       Represents a node (task unit) in the DAG.
       """
       
   class TaskGraph:
       """
       Represents a Directed Acyclic Graph (DAG) of tasks, 
       composed of multiple Node objects and their dependencies.
       """

       def __init__(self, graph_id: str):
           """
           Initialize a task graph.
           Parameters:
               graph_id (str): The unique identifier of the graph.
           """
           self.graph_id: str = graph_id
           self.nodes: Dict[str, Node] = {} 
           self.adj: Dict[str, List[str]] = {}
           self.rev_adj: Dict[str, List[str]] = {}

The scheduling of TaskGraph includes four key steps:

1. **TaskGraph Splitting**: When a user-defined workflow contains parallel paths—as seen in multi-agent training where agents use both shared and specific Nodes—siiRL splits the original TaskGraph into multiple subgraphs for sequential execution. While this approach may not be the most efficient, it significantly simplifies resource scheduling.

2. **SubGraph Sorting**: To allocate resources reasonably, siiRL sorts all SubGraphs. The sorting is mainly based on two points. First, the size of the SubGraph, where this size refers to the parameter scale of the model to be trained on the current SubGraph (7B, 32B, 671B, etc.), with priority given to resource allocation for SubGraphs with larger parameter scales. Second, the number of Nodes on the SubGraph; the more Nodes, i.e., the "longer the chain" of the SubGraph, the earlier it is allocated.

3. **GPU Quota Allocation**: Based on the sorting results from Step 2, allocate the number of GPUs to each SubGraph. There are two allocation strategies: even and param_aware. In the even mode, the total number of GPUs is evenly distributed among SubGraphs as much as possible; in the param_aware mode, on the premise that each subgraph is allocated at least one GPU, subgraphs with larger sizes are allocated more GPUs as much as possible.

4. **GPU Topology Allocation**: With the allocation of the number of GPUs in Step 3, this step performs topology allocation. Suppose there are three SubGraphs, denoted as sg1, sg2, sg3, the training cluster consists of 2 machines with 16 GPUs, and the allocation result regarding the number in Step 3 is: (6, 5, 5), this step will determine "specifically, which 6 GPUs are allocated to sg1, which 6 to sg2, and finally, which 5 to sg3". siiRL makes decisions through a scoring mechanism:

   ``(cohesion_score(+), node_load_score(-), rank_preference_score(-))``

   Where: ``cohesion_score`` is the cohesion score: place a subgraph within the same server as much as possible to reduce communication; ``node_load_score`` is the load penalty: balance placement among servers as much as possible; ``rank_preference_score`` represents the rank partial order: place tasks on GPUs with smaller rank numbers as much as possible to make the scheduling behavior more predictable.

.. figure:: ../../asset/code_explained/taskgraph_sched.png
   :width: 95%
   :align: center
   :alt: TaskGraph Scheduling

   Figure 5: TaskGraph Scheduling

Build the Distributed Process Group
-----------------------------------

After task scheduling is completed, the distributed process group of Ray can be constructed. According to the topology determined by the above scheduling, construct the affiliated process group for each Node of the TaskGraph.

For example, actor's training (described as ``NodeRole=Actor, NodeType=Train`` in siiRL), if the assigned ranks are ``[0, 1, 2, 3, 4, 5]``, then use Python's Tuple as the key and a unique string as the value for naming: ``(0,1,2,3,4,5): "process_group_1"``

.. figure:: ../../asset/code_explained/dist_pg.png
   :width: 95%
   :align: center
   :alt: Distributed Process Group

   Figure 6: Distributed Process Group

Ray Trainer
-----------

After constructing the process group, initialize RayTrainer. This part is similar to the practices of other mainstream frameworks, with the core being the instantiation of Ray's resource pool management, i.e., resource_manager. Finally, collectively validate the configurations of all Nodes (Actor/Rollout/Reward, etc.).

.. figure:: ../../asset/code_explained/ray_trainer.png
   :width: 95%
   :align: center
   :alt: Ray Trainer

   Figure 7: Ray Trainer

DAGWorker
---------

Through a series of abstractions regarding DAG and TaskGraph, siiRL encapsulates and hides the training job flow beneath the control flow. The call logic related to training backend, inference backend, sharding manager, etc., which is directly visible in the control flow of veRL, is all encapsulated into DAGWorker in siiRL and is almost invisible in the control flow. In terms of programming mode, this hiding provides a higher level of abstraction, offering more convenient modular reuse and more flexible extensibility compared to other mainstream frameworks, but it may additionally increase the complexity of bug localization.

In terms of source code implementation, DAGWorker uses mixin classes for modularization. The core mixin classes include 5, which are responsible for initialization, pipeline execution, execution of specific Nodes, training validation, and utility functions, respectively, as shown below.

.. figure:: ../../asset/code_explained/dag_worker.png
   :width: 70%
   :align: left
   :alt: DAG Worker

When initializing DAGWorker, first call resource_manager (the one created during RayTrainer initialization) to create ResourcePool, then create RayActorManager to manage the lifecycle of all distributed DAGWorkers. Finally, call the method defined in the InitializationMixin mixin class to complete the initialization of DAGWorker.

.. figure:: ../../asset/code_explained/dag_init.png
   :width: 80%
   :align: center
   :alt: Initialization of DAG Worker

   Figure 8: Initialization of DAG Worker

When setting up the communication group, siiRL adopts the following strategy: if the total number of ranks is less than 256, it uses the pure NCCL backend; otherwise, it uses the GLOO+NCCL hybrid backend. In the hybrid backend mode, GLOO is mainly used for aggregated communication of data such as logs and metrics.

Training Initiation
-------------------

The main pipeline initiates training in the final step. Here, it primarily calls the ``execute_task_graph`` method in the ExecutionMixin mixin class. This method encapsulates the outer loop of epochs and the inner loop of batches within each epoch (i.e., a training step).

.. figure:: ../../asset/code_explained/train_init.png
   :width: 70%
   :align: center
   :alt: Training Job Initialization

   Figure 9: Training Job Initialization

Each training step is no longer "concrete and expanded", as in mainstream frameworks such as veRL, but rather "abstract and cyclic": traverse all Nodes in the Graph, for each Node, execute the run method, and write the resulting data to the DataBuffer, where the key is the node_id of the next node and the value is the output of the run method.

.. figure:: ../../asset/code_explained/data_buffer_loop.png
   :width: 70%
   :align: center
   :alt: Loop of TaskGraph Computation based on DataBuffer

   Figure 10: Loop of TaskGraph Computation based on DataBuffer



================================================
FILE: docs/programming_guide/siirl_architecture_guide.rst
================================================
=======================================
siiRL Complete Architecture Guide
=======================================

.. note::
   **Target Audience**: This document assumes no prior knowledge of siiRL, but expects basic familiarity with Python, PyTorch, and reinforcement learning concepts.
   We will systematically explain siiRL's design philosophy, architecture implementation, and core algorithms from the ground up.

Table of Contents
=================

- :ref:`sec1_overview`
- :ref:`sec2_design_philosophy`
- :ref:`sec3_main_entry`
- :ref:`sec4_dag_planner`
- :ref:`sec5_dag_worker`
- :ref:`sec6_data_coordinator`
- :ref:`sec7_engine`
- :ref:`sec8_core_algorithms`
- :ref:`sec9_execution_flow`
- :ref:`sec10_configuration`
- :ref:`sec11_extension_guide`

----

.. _sec1_overview:

1. siiRL Architecture Overview
==============================

1.1 What is siiRL?
------------------

**siiRL** (Shanghai Innovation Institute RL Framework) is a novel **fully distributed reinforcement learning framework** designed to break the scaling barriers in LLM post-training. By eliminating the centralized controller common in other frameworks, siiRL achieves:

- **Near-Linear Scalability**: The multi-controller paradigm eliminates central bottlenecks by distributing control logic and data management across all workers
- **SOTA Throughput**: Fully distributed dataflow architecture minimizes communication and I/O overhead
- **Flexible DAG-Defined Pipeline**: Decouples algorithmic logic from physical hardware, enabling rapid experimentation

1.2 System Architecture and Data Flow
-------------------------------------

**System Architecture Diagram**:

.. figure:: https://github.com/sii-research/siiRL/raw/main/asset/overview.png
   :width: 100%
   :alt: siiRL Architecture Overview
   :align: center
   
   **Figure 1.1**: siiRL System Architecture showing the three core components: DAG Planner, DAG Workers, and Data Coordinator

**Complete Training Step Sequence Diagram**:

The following sequence diagram shows the complete data flow for a single GRPO training step:

::

      User          MainRunner       DAGWorker      DataCoordinator     Engine
     (YAML)         (Planner)       (per GPU)        (Singleton)       Workers
        |               |               |                 |               |
   ============================================================================
   | INITIALIZATION PHASE                                                     |
   ============================================================================
        |               |               |                 |               |
        | 1. Config     |               |                 |               |
        |-------------->|               |                 |               |
        |               |               |                 |               |
        |               | 2. load_pipeline() + TaskScheduler.schedule()   |
        |               |------------------------------------------------>|
        |               |               |                 |               |
        |               | 3. Create DAGWorkers (one per GPU)              |
        |               |-------------->|                 |               |
        |               |               |                 |               |
        |               |               | 4. init_graph() |               |
        |               |               |    Load models  |               |
        |               |               |-------------------------------->|
        |               |               |                 |               |
   ============================================================================
   | TRAINING LOOP (per step)                                                 |
   ============================================================================
        |               |               |                 |               |
        |               |               | 5. DataLoader   |               |
        |               |               |    .run()       |               |
        |               |               |<----------------|               |
        |               |               | batch (prompts) |               |
        |               |               |                 |               |
        |               |               | 6. Node: rollout_actor          |
        |               |               |-------------------------------->|
        |               |               |     Rollout.generate_sequences()|
        |               |               |<--------------------------------|
        |               |               | batch + responses               |
        |               |               |                 |               |
        |               |               | 7. Node: function_reward        |
        |               |               |    compute_reward()             |
        |               |               |---------------->|               |
        |               |               | batch + scores  |               |
        |               |               |                 |               |
        |               |               | 8. Node: calculate_advantages   |
        |               |               |    compute_advantage()          |
        |               |               |    (GRPO group normalization)   |
        |               |               |                 |               |
        |               |               | 9. put_data_to_buffers()        |
        |               |               |    (if DP size changes)         |
        |               |               |---------------->|               |
        |               |               |                 | ray.put()     |
        |               |               |                 |               |
        |               |               | 10. get_data_from_buffers()     |
        |               |               |<----------------|               |
        |               |               | redistributed batch             |
        |               |               |                 |               |
        |               |               | 11. Node: actor_old_log_prob    |
        |               |               |-------------------------------->|
        |               |               |     Actor.compute_log_prob()    |
        |               |               |<--------------------------------|
        |               |               | batch + old_log_probs           |
        |               |               |                 |               |
        |               |               | 12. Node: reference_log_prob    |
        |               |               |-------------------------------->|
        |               |               |   Reference.compute_ref_log_prob|
        |               |               |<--------------------------------|
        |               |               | batch + ref_log_probs           |
        |               |               |                 |               |
        |               |               | 13. Node: actor_train           |
        |               |               |-------------------------------->|
        |               |               |     Actor.update_actor()        |
        |               |               |     - Forward pass              |
        |               |               |     - Compute policy loss       |
        |               |               |     - Backward pass             |
        |               |               |     - Optimizer step            |
        |               |               |<--------------------------------|
        |               |               | metrics                         |
        |               |               |                 |               |
        |               |               | 14. sync_weights_actor_to_rollout
        |               |               |-------------------------------->|
        |               |               |     ShardingManager.sync()      |
        |               |               |                 |               |
        |               |               | 15. Log metrics + checkpoint    |
        |               |               |                 |               |
   ============================================================================
   | REPEAT for next training step                                            |
   ============================================================================

**Data Flow Summary**:

::

                              GRPO Single Step Data Flow
   ==============================================================================
   
   DataLoader
       |
       | batch: {prompts, attention_mask, index}
       v
   +---------------------+
   | rollout_actor       | DAGWorker.generate()
   | (MODEL_INFERENCE)   | -> Rollout.generate_sequences()
   +----------+----------+
              | + {responses, response_ids, response_mask}
              v
   +---------------------+
   | function_reward     | DAGWorker.compute_reward()
   | (COMPUTE)           | -> RewardManager.compute_reward()
   +----------+----------+
              | + {token_level_scores, token_level_rewards}
              v
   +---------------------+
   | calculate_advantages| DAGWorker.compute_advantage()
   | (COMPUTE)           | -> compute_grpo_outcome_advantage()
   +----------+----------+ Group by prompt -> Normalize (score - mean)/std
              | + {advantages}
              v
   +---------------------+
   | actor_old_log_prob  | DAGWorker.compute_old_log_prob()
   | (MODEL_TRAIN)       | -> Actor.compute_log_prob()
   | only_forward=True   |
   +----------+----------+
              | + {old_log_probs}
              v
   +---------------------+
   | reference_log_prob  | DAGWorker.compute_ref_log_prob()
   | (MODEL_TRAIN)       | -> Reference.compute_ref_log_prob()
   +----------+----------+
              | + {ref_log_prob}
              v
   +---------------------+
   | actor_train         | DAGWorker.train_actor()
   | (MODEL_TRAIN)       | -> Actor.update_actor()
   +----------+----------+ policy_loss = -advantages * clip(ratio)
              |
              | metrics: {loss, clipfrac, kl, lr, ...}
              v
   +---------------------+
   | sync_weights        | ShardingManager.sync_weights_actor_to_rollout()
   +---------------------+                                            

1.3 Core Component Responsibilities
-----------------------------------

.. list-table:: siiRL Core Components
   :header-rows: 1
   :widths: 20 20 60

   * - Component
     - Process/Actor
     - Core Responsibilities
   * - **DAG Planner**
     - MainRunner Actor
     - Parse user-defined DAG workflows, generate execution plans, assign tasks to workers
   * - **DAG Worker**
     - One Actor per GPU
     - Core execution unit responsible for model initialization, task execution, data flow management
   * - **Data Coordinator**
     - Global Singleton Actor
     - Manage distributed data lifecycle including data loading and intermediate data redistribution
   * - **TaskScheduler**
     - Inside MainRunner
     - Split and assign TaskGraph to each DAG Worker
   * - **ProcessGroupManager**
     - Inside MainRunner
     - Manage creation and configuration of distributed communication groups (TP/PP/DP)
   * - **MetricWorker**
     - Standalone Actor
     - Distributed metrics collection and aggregation

1.4 Why is siiRL Different?
---------------------------

**Problems with Traditional Frameworks**:

1. **Single-Controller Bottleneck**: All data flows through a single node, causing I/O and communication overhead
2. **Rigid Algorithm Pipelines**: Modifying workflows requires deep modifications to framework source code

**siiRL's Solutions**:

.. list-table:: siiRL Design Advantages
   :header-rows: 1
   :widths: 25 35 40

   * - Traditional Frameworks
     - siiRL DistFlow
     - Advantage
   * - Centralized Controller
     - Multi-Controller Paradigm
     - Eliminates single-point bottleneck, near-linear scaling
   * - Hard-coded Workflows
     - DAG-Defined Pipeline
     - Declarative configuration, no code modification needed
   * - Centralized Data Management
     - Distributed Data Coordinator
     - Avoids OOM, parallelizes data loading

----

.. _sec2_design_philosophy:

2. DistFlow Design Philosophy
=============================

2.1 Fully Distributed Architecture
----------------------------------

The core idea of DistFlow is **"no central coordinator"**. Each DAG Worker is an independent execution unit with its own:

- Data loader (partitioned dataset)
- Model instances (Actor/Critic/Rollout/Reference/Reward)
- Task execution graph (subgraph of TaskGraph)
- Local data cache

2.2 Three-Layer Architecture Design
-----------------------------------

::

   ┌─────────────────────────────────────────────────────────────────┐
   │                     User Configuration Layer (YAML/Python)      │
   │   - workflow_grpo.yaml: Define algorithm DAG                    │
   │   - config.yaml: Model, data, training parameters               │
   └─────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
   ┌─────────────────────────────────────────────────────────────────┐
   │                     Execution Scheduling Layer (DAG Planner)     │
   │   - TaskScheduler: Task assignment                              │
   │   - ProcessGroupManager: Communication group management          │
   │   - GraphUpdater: Configuration injection                       │
   └─────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
   ┌─────────────────────────────────────────────────────────────────┐
   │                     Distributed Execution Layer (DAG Workers)    │
   │   ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐       │
   │   │Worker 0  │  │Worker 1  │  │Worker 2  │  │Worker N  │       │
   │   │ (GPU 0)  │  │ (GPU 1)  │  │ (GPU 2)  │  │ (GPU N)  │       │
   │   └──────────┘  └──────────┘  └──────────┘  └──────────┘       │
   └─────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
   ┌─────────────────────────────────────────────────────────────────┐
   │                     Data Coordination Layer (Data Coordinator)   │
   │   - Distributed DataLoader: Partitioned data loading            │
   │   - Distributed DataBuffer: Intermediate data redistribution    │
   └─────────────────────────────────────────────────────────────────┘

2.3 Core Design Principles
--------------------------

.. list-table:: DistFlow Design Principles
   :header-rows: 1
   :widths: 25 75

   * - Principle
     - Description
   * - **Worker Autonomy**
     - Each DAG Worker is a fully independent execution unit, not dependent on central coordination
   * - **Data Locality**
     - Data is processed locally as much as possible, reducing cross-node transfers
   * - **Declarative Workflows**
     - Algorithm logic is declared via DAG, decoupled from execution engine
   * - **Unified Sample Protocol**
     - All intermediate data uses Sample/SampleInfo protocol, supporting flexible routing
   * - **Late Binding**
     - Configuration is injected into nodes at runtime, supporting dynamic adjustment

----

.. _sec3_main_entry:

3. Program Entry and Startup Flow
=================================

3.1 main_dag.py Explained
-------------------------

``main_dag.py`` is the entry point of siiRL, but unlike traditional frameworks, its role is a **launcher** rather than an executor.

.. code-block:: python
   :caption: siirl/main_dag.py Core Structure

   def main() -> None:
       """Main entry: Initialize Ray cluster, parse config, start MainRunner"""
       
       # 1. Initialize Ray cluster
       if not ray.is_initialized():
           ray.init(runtime_env={"env_vars": RAY_RUNTIME_ENV_VARS})
       
       # 2. Parse configuration
       siirl_args = parse_config()
       
       # 3. Start main orchestration Actor
       runner = MainRunner.remote()
       ray.get(runner.run.remote(siirl_args))

3.2 MainRunner Actor
--------------------

``MainRunner`` is the "brain" of the system, responsible for orchestrating the entire training workflow:

.. code-block:: python
   :caption: MainRunner.run() Core Flow

   @ray.remote(num_cpus=MAIN_RUNNER_CPU_RESERVATION)
   class MainRunner:
       def run(self, siirl_args: SiiRLArguments) -> None:
           # 1. Initialize DataCoordinator
           data_coordinator_handle = init_data_coordinator(
               num_buffers=siirl_args.trainer.nnodes,
               ppo_mini_batch_size=siirl_args.actor_rollout_ref.actor.ppo_mini_batch_size,
               world_size=siirl_args.trainer.nnodes * siirl_args.trainer.n_gpus_per_node
           )
           
           # 2. Load and configure workflow DAG
           workflow_taskgraph = load_pipeline(siirl_args)
           update_task_graph_node_configs(workflow_taskgraph, siirl_args)
           
           # 3. Schedule tasks to each worker
           task_scheduler = TaskScheduler(siirl_args.trainer.nnodes, 
                                          siirl_args.trainer.n_gpus_per_node)
           rank_taskgraph_mapping = task_scheduler.schedule_and_assign_tasks([workflow_taskgraph])
           
           # 4. Create process groups
           process_group_manager = ProcessGroupManager(total_workers, rank_taskgraph_mapping)
           
           # 5. Create metric worker
           metric_worker_handle = MetricWorker.remote()
           
           # 6. Initialize and start DAG Workers
           trainer = RayTrainer(config=siirl_args, ...)
           trainer.init_workers()
           trainer.start_workers()

3.3 Startup Flow Sequence Diagram
---------------------------------

::

   main()
      │
      ├── ray.init()                          ← Initialize Ray cluster
      │
      ├── parse_config()                      ← Parse YAML configuration
      │
      └── MainRunner.run()
              │
              ├── init_data_coordinator()     ← Create global DataCoordinator
              │
              ├── load_pipeline()             ← Load DAG definition
              │       │
              │       └── grpo_pipeline()     ← Return TaskGraph
              │
              ├── TaskScheduler.schedule()    ← Assign tasks to each rank
              │
              ├── ProcessGroupManager()       ← Create communication group specs
              │
              ├── RayTrainer.init_workers()   ← Create DAG Worker Actors
              │       │
              │       └── DAGWorker.__init__() × N_workers
              │
              └── RayTrainer.start_workers()  ← Start training loop
                      │
                      └── DAGWorker.execute_task_graph() × N_workers

----

.. _sec4_dag_planner:

4. DAG Planner Deep Dive
========================

The DAG Planner is siiRL's "scheduling brain", responsible for converting user-defined high-level workflows into executable distributed tasks.

**Pipeline Architecture Overview**:

The following diagram shows how the core data structures relate to each other and how a Pipeline is built and executed:

::

                           Pipeline Data Structure Relationships
   ==============================================================================
   
                                 +------------------+
                                 |    Pipeline      |
                                 |    (Builder)     |
                                 +------------------+
                                 | - pipeline_id    |
                                 | - description    |
                                 | - _nodes: Dict   |
                                 +--------+---------+
                                          |
                                          | .build()
                                          v
                                 +------------------+
                                 |   TaskGraph      |
                                 |     (DAG)        |
                                 +------------------+
                                 | - graph_id       |
                                 | - nodes: Dict    |
                                 | - adj: Dict      |
                                 | - rev_adj: Dict  |
                                 +--------+---------+
                                          |
                                          | contains multiple
                                          v
         +----------------+    +----------------+    +----------------+
         |     Node       |    |     Node       |    |     Node       |  ...
         +----------------+    +----------------+    +----------------+
         | - node_id      |    | - node_id      |    | - node_id      |
         | - node_type    |    | - node_type    |    | - node_type    |
         | - node_role    |    | - node_role    |    | - node_role    |
         | - dependencies |    | - dependencies |    | - dependencies |
         | - executable   |    | - executable   |    | - executable   |
         | - config       |    | - config       |    | - config       |
         +----------------+    +----------------+    +----------------+
   
   ==============================================================================
   
   NodeType (from node.py)             NodeRole (from node.py)
   +------------------------+          +------------------------+
   | COMPUTE                |          | DEFAULT                |
   | DATA_LOAD              |          | ACTOR                  |
   | ENV_INTERACT           |          | ADVANTAGE              |
   | MODEL_INFERENCE        |          | CRITIC                 |
   | MODEL_TRAIN            |          | ROLLOUT                |
   | PUT_TO_BUFFER          |          | REFERENCE              |
   | GET_FROM_BUFFER        |          | REWARD                 |
   | BARRIER_SYNC           |          | DYNAMIC_SAMPLING       |
   | CUSTOM                 |          +------------------------+
   +------------------------+

**Pipeline Building Flow**:

::

                            How Pipeline is Built and Executed
   ================================================================================
   
   Step 1: User Defines Pipeline (Python Code)
   --------------------------------------------
   
       pipeline = Pipeline("grpo_training_pipeline")
       
       pipeline.add_node("rollout_actor", func="...:DAGWorker.generate", deps=[])
              .add_node("function_reward", func="...:DAGWorker.compute_reward", ...)
              .add_node("calculate_advantages", func="...:DAGWorker.compute_advantage", ...)
              .add_node("actor_old_log_prob", func="...:DAGWorker.compute_old_log_prob", ...)
              .add_node("reference_log_prob", func="...:DAGWorker.compute_ref_log_prob", ...)
              .add_node("actor_train", func="...:DAGWorker.train_actor", ...)
   
                                            |
                                            | pipeline.build()
                                            v
   
   Step 2: Build TaskGraph (Validation + Adjacency Lists)
   ------------------------------------------------------
   
       TaskGraph                          Adjacency Lists (adj)
       +--------------------+             +------------------------------------------+
       | graph_id: "grpo.." |             | rollout_actor      -> [function_reward]  |
       |                    |             | function_reward    -> [calculate_adv.]   |
       | nodes: {           |             | calculate_adv.     -> [actor_old_log]    |
       |   "rollout_actor", |             | actor_old_log      -> [reference_log]    |
       |   "function_reward"|             | reference_log      -> [actor_train]      |
       |   "calculate_adv.",|             | actor_train        -> []                 |
       |   ...              |             +------------------------------------------+
       | }                  |
       +--------------------+
                                            |
                                            | TaskScheduler.schedule()
                                            v
   
   Step 3: TaskScheduler Assigns to Workers
   ----------------------------------------
   
       +------------------------------------------------------------------------+
       |  TaskScheduler                                                         |
       |                                                                        |
       |  Input: TaskGraph + num_workers                                        |
       |                                                                        |
       |  1. discover_and_split_parallel_paths(graph) -> Split parallel branches|
       |  2. Apportion workers to subgraphs (param_aware / even)                |
       |  3. Assign each worker a TaskGraph copy                                |
       |                                                                        |
       |  Output: Dict[rank, TaskGraph] (rank_taskgraph_mapping)                |
       +------------------------------------------------------------------------+
   
                          +-------------------------------------------+
                          |           rank_taskgraph_mapping          |
                          +-------------------------------------------+
                          |  rank 0  ->  TaskGraph (copy)             |
                          |  rank 1  ->  TaskGraph (copy)             |
                          |  rank 2  ->  TaskGraph (copy)             |
                          |  ...     ->  ...                          |
                          |  rank N  ->  TaskGraph (copy)             |
                          +-------------------------------------------+
                                            |
                                            | DAGWorker receives TaskGraph
                                            v
   
   Step 4: DAGWorker Executes TaskGraph
   ------------------------------------
   
       +------------------------------------------------------------------------+
       |  DAGWorker.execute_task_graph()                                        |
       |                                                                        |
       |  for each training step:                                               |
       |      1. batch = DataLoader.run()                                       |
       |      2. entry_nodes = taskgraph.get_entry_nodes()  # [rollout_actor]   |
       |      3. node_queue = entry_nodes                                       |
       |                                                                        |
       |      while node_queue:                                                 |
       |          cur_node = node_queue.pop(0)                                  |
       |                                                                        |
       |          # Execute node's function                                     |
       |          output = cur_node.run(batch=batch, _dag_worker_instance=self) |
       |                                                                        |
       |          # Resolves executable_ref to actual function:                 |
       |          # "siirl.dag_worker.dagworker:DAGWorker.generate"             |
       |          #  -> DAGWorker.generate(self, batch, ...)                    |
       |                                                                        |
       |          # Get downstream nodes and add to queue                       |
       |          next_nodes = taskgraph.get_downstream_nodes(cur_node.node_id) |
       |          node_queue.extend(next_nodes)                                 |
       |                                                                        |
       |          # If DP size changes between nodes, use DataCoordinator       |
       |          put_data_to_buffers() / get_data_from_buffers()               |
       +------------------------------------------------------------------------+

**Execution Order Example (GRPO)**:

::

                            GRPO Pipeline Execution Order
   ================================================================================
   
   Topological Order:
   
     +------------------+      +------------------+      +---------------------+
     |  rollout_actor   |----->| function_reward  |----->|calculate_advantages |
     |  (Inference)     |      |    (Compute)     |      |      (Compute)      |
     |                  |      |                  |      |                     |
     |  NodeRole:       |      |  NodeRole:       |      |  NodeRole:          |
     |  ROLLOUT         |      |  REWARD          |      |  ADVANTAGE          |
     +------------------+      +------------------+      +----------+----------+
                                                                    |
         +----------------------------------------------------------+
         |
         v
     +---------------------+      +---------------------+      +------------------+
     | actor_old_log_prob  |----->| reference_log_prob  |----->|   actor_train    |
     |   (Forward Only)    |      |   (Forward Only)    |      |     (Train)      |
     |                     |      |                     |      |                  |
     |  NodeRole: ACTOR    |      |  NodeRole: REFERENCE|      |  NodeRole: ACTOR |
     |  only_forward=True  |      |                     |      |                  |
     +---------------------+      +---------------------+      +------------------+
   
   Data flows through each node, accumulating fields in the batch:
   
     batch: {prompts}
        |
        v rollout_actor
     batch: {prompts, responses, response_ids, response_mask}
        |
        v function_reward  
     batch: {..., token_level_scores, token_level_rewards}
        |
        v calculate_advantages
     batch: {..., advantages}
        |
        v actor_old_log_prob
     batch: {..., old_log_probs}
        |
        v reference_log_prob
     batch: {..., ref_log_prob}
        |
        v actor_train
     metrics: {loss, clipfrac, kl, ...}

4.1 Pipeline API
----------------

siiRL provides a clean Pipeline API for users to define training pipelines directly in Python:

.. code-block:: python
   :caption: siirl/execution/dag/pipeline.py

   class Pipeline:
       """Declarative Pipeline Builder"""
       
       def __init__(self, pipeline_id: str, description: str = ""):
           self.pipeline_id = pipeline_id
           self._nodes: Dict[str, Dict[str, Any]] = {}
       
       def add_node(
           self,
           node_id: str,
           func: Union[str, Callable],  # Function path or direct Callable
           deps: Optional[List[str]] = None,
           **kwargs
       ) -> "Pipeline":
           """Add node with method chaining support"""
           self._nodes[node_id] = {
               "func": func,
               "deps": deps or [],
               "kwargs": kwargs
           }
           return self  # Support method chaining
       
       def build(self) -> TaskGraph:
           """Build and validate TaskGraph"""
           task_graph = TaskGraph(graph_id=self.pipeline_id)
           # ... create nodes, build adjacency lists, validate DAG
           return task_graph

4.2 Built-in Pipeline Definitions
---------------------------------

siiRL provides four built-in pipeline definitions in ``siirl/execution/dag/builtin_pipelines.py``:

**4.2.1 GRPO Pipeline (grpo_pipeline)**

Standard GRPO (Group Relative Policy Optimization) training workflow:

.. code-block:: python
   :caption: siirl/execution/dag/builtin_pipelines.py - GRPO Pipeline

   def grpo_pipeline() -> TaskGraph:
       """
       Standard GRPO (Group Relative Policy Optimization) pipeline.

       Workflow:
           1. rollout_actor: Generate sequences using the policy model
           2. function_reward: Compute rewards for generated sequences
           3. calculate_advantages: Calculate advantage estimates
           4. actor_old_log_prob: Compute log probabilities with old policy (forward only)
           5. reference_log_prob: Compute log probabilities with reference model
           6. actor_train: Train the actor model
       """
       pipeline = Pipeline("grpo_training_pipeline", "Standard GRPO workflow")

       pipeline.add_node(
           "rollout_actor",
           func="siirl.dag_worker.dagworker:DAGWorker.generate",
           deps=[],
           node_type=NodeType.MODEL_INFERENCE,
           node_role=NodeRole.ROLLOUT
       ).add_node(
           "function_reward",
           func="siirl.dag_worker.dagworker:DAGWorker.compute_reward",
           deps=["rollout_actor"],
           node_type=NodeType.COMPUTE,
           node_role=NodeRole.REWARD
       ).add_node(
           "calculate_advantages",
           func="siirl.dag_worker.dagworker:DAGWorker.compute_advantage",
           deps=["function_reward"],
           node_type=NodeType.COMPUTE,
           node_role=NodeRole.ADVANTAGE
       ).add_node(
           "actor_old_log_prob",
           func="siirl.dag_worker.dagworker:DAGWorker.compute_old_log_prob",
           deps=["calculate_advantages"],
           node_type=NodeType.MODEL_TRAIN,
           node_role=NodeRole.ACTOR,
           only_forward_compute=True
       ).add_node(
           "reference_log_prob",
           func="siirl.dag_worker.dagworker:DAGWorker.compute_ref_log_prob",
           deps=["actor_old_log_prob"],
           node_type=NodeType.MODEL_TRAIN,
           node_role=NodeRole.REFERENCE
       ).add_node(
           "actor_train",
           func="siirl.dag_worker.dagworker:DAGWorker.train_actor",
           deps=["reference_log_prob"],
           node_type=NodeType.MODEL_TRAIN,
           node_role=NodeRole.ACTOR
       )

       return pipeline.build()

**4.2.2 PPO Pipeline (ppo_pipeline)**

Standard PPO with Critic model and GAE advantage estimation:

.. code-block:: python
   :caption: siirl/execution/dag/builtin_pipelines.py - PPO Pipeline

   def ppo_pipeline() -> TaskGraph:
       """
       Standard PPO (Proximal Policy Optimization) pipeline.

       Workflow:
           1. rollout_actor: Generate sequences using the policy model
           2. function_reward: Compute rewards for generated sequences
           3. compute_value: Compute value function estimates (forward only)
           4. calculate_advantages: Calculate GAE (Generalized Advantage Estimation)
           5. actor_old_log_prob: Compute log probabilities with old policy (forward only)
           6. reference_log_prob: Compute log probabilities with reference model
           7. actor_train: Train the actor model
           8. critic_train: Train the critic (value) model
       """
       pipeline = Pipeline("ppo_training_pipeline", "Standard PPO workflow")

       pipeline.add_node(
           "rollout_actor",
           func="siirl.dag_worker.dagworker:DAGWorker.generate",
           deps=[],
           node_type=NodeType.MODEL_INFERENCE,
           node_role=NodeRole.ROLLOUT
       ).add_node(
           "function_reward",
           func="siirl.dag_worker.dagworker:DAGWorker.compute_reward",
           deps=["rollout_actor"],
           node_type=NodeType.COMPUTE,
           node_role=NodeRole.REWARD
       ).add_node(
           "compute_value",
           func="siirl.dag_worker.dagworker:DAGWorker.compute_value",
           deps=["function_reward"],
           node_type=NodeType.MODEL_TRAIN,
           node_role=NodeRole.CRITIC,
           only_forward_compute=True
       ).add_node(
           "calculate_advantages",
           func="siirl.dag_worker.dagworker:DAGWorker.compute_advantage",
           deps=["compute_value"],
           node_type=NodeType.COMPUTE,
           node_role=NodeRole.ADVANTAGE
       ).add_node(
           "actor_old_log_prob",
           func="siirl.dag_worker.dagworker:DAGWorker.compute_old_log_prob",
           deps=["calculate_advantages"],
           node_type=NodeType.MODEL_TRAIN,
           node_role=NodeRole.ACTOR,
           only_forward_compute=True
       ).add_node(
           "reference_log_prob",
           func="siirl.dag_worker.dagworker:DAGWorker.compute_ref_log_prob",
           deps=["actor_old_log_prob"],
           node_type=NodeType.MODEL_TRAIN,
           node_role=NodeRole.REFERENCE
       ).add_node(
           "actor_train",
           func="siirl.dag_worker.dagworker:DAGWorker.train_actor",
           deps=["reference_log_prob"],
           node_type=NodeType.MODEL_TRAIN,
           node_role=NodeRole.ACTOR
       ).add_node(
           "critic_train",
           func="siirl.dag_worker.dagworker:DAGWorker.train_critic",
           deps=["actor_train"],
           node_type=NodeType.MODEL_TRAIN,
           node_role=NodeRole.CRITIC
       )

       return pipeline.build()

**4.2.3 DAPO Pipeline (dapo_pipeline)**

DAPO (Data-Augmented Policy Optimization) with dynamic sampling filtering:

.. code-block:: python
   :caption: siirl/execution/dag/builtin_pipelines.py - DAPO Pipeline

   def dapo_pipeline() -> TaskGraph:
       """
       DAPO (Data-Augmented Policy Optimization) pipeline.

       DAPO is a variant of GRPO with dynamic sampling filtering based on metric variance.
       The key difference is that after computing rewards, we filter out trajectory groups
       with zero variance (all correct or all incorrect) as they provide no learning signal.

       Workflow:
           1. rollout_actor: Generate sequences using the policy model
           2. function_reward: Compute rewards for generated sequences
           3. dynamic_sampling: DAPO-specific filtering based on metric variance
           4. calculate_advantages: Calculate advantage estimates
           5. actor_old_log_prob: Compute log probabilities with old policy (forward only)
           6. reference_log_prob: Compute log probabilities with reference model
           7. actor_train: Train the actor model
       """
       pipeline = Pipeline("dapo_training_pipeline", "DAPO workflow")

       pipeline.add_node(
           "rollout_actor",
           func="siirl.dag_worker.dagworker:DAGWorker.generate",
           deps=[],
           node_type=NodeType.MODEL_INFERENCE,
           node_role=NodeRole.ROLLOUT
       ).add_node(
           "function_reward",
           func="siirl.dag_worker.dagworker:DAGWorker.compute_reward",
           deps=["rollout_actor"],
           node_type=NodeType.COMPUTE,
           node_role=NodeRole.REWARD
       ).add_node(
           "dynamic_sampling",
           func="siirl.user_interface.filter_interface.dapo.dynamic_sampling",
           deps=["function_reward"],
           node_type=NodeType.COMPUTE,
           node_role=NodeRole.DYNAMIC_SAMPLING
       ).add_node(
           "calculate_advantages",
           func="siirl.dag_worker.dagworker:DAGWorker.compute_advantage",
           deps=["dynamic_sampling"],
           node_type=NodeType.COMPUTE,
           node_role=NodeRole.ADVANTAGE
       ).add_node(
           "actor_old_log_prob",
           func="siirl.dag_worker.dagworker:DAGWorker.compute_old_log_prob",
           deps=["calculate_advantages"],
           node_type=NodeType.MODEL_TRAIN,
           node_role=NodeRole.ACTOR,
           only_forward_compute=True
       ).add_node(
           "reference_log_prob",
           func="siirl.dag_worker.dagworker:DAGWorker.compute_ref_log_prob",
           deps=["actor_old_log_prob"],
           nod

Download .txt

gitextract_ufof6x83/

├── .gitignore
├── .pre-commit-config.yaml
├── .readthedocs.yaml
├── CONTRIBUTING.md
├── LICENSE
├── README-zh.md
├── README.md
├── docker/
│   ├── Dockerfile.cu124
│   └── Dockerfile.cu126
├── docs/
│   ├── Makefile
│   ├── conf.py
│   ├── examples/
│   │   ├── config.rst
│   │   ├── cpgd_example.rst
│   │   ├── deepscaler_example.rst
│   │   ├── embodied_srpo_example.rst
│   │   ├── megatron_backend_example.rst
│   │   └── mm_eureka_example.rst
│   ├── hardware_tutorial/
│   │   ├── ascend_profiling_en.rst
│   │   ├── ascend_quickstart.rst
│   │   └── metax_quickstart.rst
│   ├── index.rst
│   ├── preparation/
│   │   ├── prepare_data.rst
│   │   └── reward_function.rst
│   ├── programming_guide/
│   │   ├── code_structure.rst
│   │   ├── siiRL_code_explained.rst
│   │   ├── siirl_architecture_guide.rst
│   │   └── srpo_code_explained.rst
│   ├── requirements-docs.txt
│   ├── start/
│   │   ├── install.rst
│   │   └── quickstart.rst
│   └── user_interface/
│       ├── filter_interface.rst
│       ├── metrics_interface.rst
│       ├── pipeline_interface.rst
│       └── reward_interface.rst
├── examples/
│   ├── cpgd_trainer/
│   │   ├── run_qwen2_5-7b.sh
│   │   ├── run_qwen2_5_vl-72b.sh
│   │   ├── run_qwen2_5_vl-7b.sh
│   │   ├── run_qwen3-1.7b.sh
│   │   └── run_qwen3-8b.sh
│   ├── custom_pipeline_example/
│   │   └── custom_grpo.py
│   ├── custom_reward/
│   │   ├── rewardfunc_gsm8k.py
│   │   └── run_qwen2_5-7b-custom_reward.sh
│   ├── dapo_trainer/
│   │   ├── run_qwen2_5-7b.sh
│   │   ├── run_qwen3-235b-megatron-gspo.sh
│   │   └── run_qwen3-8b.sh
│   ├── data_preprocess/
│   │   ├── deepscaler.py
│   │   ├── geo3k.py
│   │   ├── gsm8k.py
│   │   ├── math_dataset.py
│   │   └── mm_eureka.py
│   ├── embodied_srpo_trainer/
│   │   ├── run_openvla_oft_libero_goal.sh
│   │   ├── run_openvla_oft_libero_long.sh
│   │   ├── run_openvla_oft_libero_object.sh
│   │   └── run_openvla_oft_libero_spatial.sh
│   ├── experimental/
│   │   ├── marft/
│   │   │   ├── config/
│   │   │   │   ├── code_env.py
│   │   │   │   ├── math_env.py
│   │   │   │   ├── process.py
│   │   │   │   ├── workflow_marft.yaml
│   │   │   │   └── workflow_marft_code.yaml
│   │   │   └── run_qwen2_5-3b_marft.sh
│   │   └── multiturn_server/
│   │       └── run_qwen2_5-3b_grpo_multiturn_vllm.sh
│   ├── grpo_trainer/
│   │   ├── run_qwen2_5-32b-metax.sh
│   │   ├── run_qwen2_5-32b-npu.sh
│   │   ├── run_qwen2_5-72b-npu.sh
│   │   ├── run_qwen2_5-7b-npu-e2e_prof.sh
│   │   ├── run_qwen2_5-7b-npu-mindspeed.sh
│   │   ├── run_qwen2_5-7b-npu.sh
│   │   ├── run_qwen2_5-7b.sh
│   │   ├── run_qwen2_5_vl-72b.sh
│   │   ├── run_qwen2_5_vl-7b-npu.sh
│   │   ├── run_qwen2_5_vl-7b.sh
│   │   ├── run_qwen3-235b-megatron.sh
│   │   ├── run_qwen3-235b-npu-mindspeed.sh
│   │   ├── run_qwen3-30b-npu-mindspeed.sh
│   │   ├── run_qwen3-8b-megatron.sh
│   │   └── run_qwen3-8b.sh
│   ├── gspo_trainer/
│   │   ├── run_qwen3-1.7b.sh
│   │   ├── run_qwen3-235b-megatron.sh
│   │   └── run_qwen3-30b-gspo-megatron.sh
│   ├── multi_turn/
│   │   ├── config/
│   │   │   ├── interaction_config/
│   │   │   │   └── gsm8k_interaction_config.yaml
│   │   │   └── tool_config/
│   │   │       └── gsm8k_tool_config.yaml
│   │   └── gsm8k/
│   │       └── run_qwen2_5-3b_grpo_multiturn_sglang.sh
│   └── ppo_trainer/
│       ├── run_qwen2_5-72b.sh
│       ├── run_qwen3-8b-megatron.sh
│       └── run_qwen3-8b.sh
├── pyproject.toml
├── requirements-npu.txt
├── requirements.txt
├── setup.py
├── siirl/
│   ├── __init__.py
│   ├── dag_worker/
│   │   ├── __init__.py
│   │   ├── checkpoint_manager.py
│   │   ├── constants.py
│   │   ├── core_algos.py
│   │   ├── dag_utils.py
│   │   ├── dagworker.py
│   │   ├── data_structures.py
│   │   ├── metric_aggregator.py
│   │   ├── metrics_collector.py
│   │   └── validator.py
│   ├── data_coordinator/
│   │   ├── __init__.py
│   │   ├── data_buffer.py
│   │   ├── dataloader/
│   │   │   ├── __init__.py
│   │   │   ├── data_loader_node.py
│   │   │   ├── embodied_preprocess.py
│   │   │   ├── partitioned_dataset.py
│   │   │   └── vision_utils.py
│   │   ├── protocol.py
│   │   └── sample.py
│   ├── engine/
│   │   ├── __init__.py
│   │   ├── actor/
│   │   │   ├── __init__.py
│   │   │   ├── base.py
│   │   │   ├── dp_actor.py
│   │   │   ├── embodied_actor.py
│   │   │   └── megatron_actor.py
│   │   ├── base_worker/
│   │   │   ├── __init__.py
│   │   │   ├── base/
│   │   │   │   ├── __init__.py
│   │   │   │   └── worker.py
│   │   │   ├── megatron/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── npu_mbridge_patch.py
│   │   │   │   └── worker.py
│   │   │   ├── register_center/
│   │   │   │   ├── __init__.py
│   │   │   │   └── register_center.py
│   │   │   └── resouce_pool.py
│   │   ├── critic/
│   │   │   ├── __init__.py
│   │   │   ├── base.py
│   │   │   ├── dp_critic.py
│   │   │   └── megatron_critic.py
│   │   ├── fsdp_workers.py
│   │   ├── megatron_workers.py
│   │   ├── reward_manager/
│   │   │   ├── __init__.py
│   │   │   ├── dapo.py
│   │   │   ├── embodied.py
│   │   │   ├── naive.py
│   │   │   └── parallel.py
│   │   ├── reward_model/
│   │   │   ├── __init__.py
│   │   │   ├── base.py
│   │   │   └── megatron/
│   │   │       ├── __init__.py
│   │   │       └── reward_model.py
│   │   ├── rollout/
│   │   │   ├── __init__.py
│   │   │   ├── async_server.py
│   │   │   ├── base.py
│   │   │   ├── embodied_rollout.py
│   │   │   ├── hf_rollout.py
│   │   │   ├── schemas.py
│   │   │   ├── sglang_rollout/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── async_sglang_server.py
│   │   │   │   ├── sglang_rollout.py
│   │   │   │   └── utils.py
│   │   │   └── vllm_rollout/
│   │   │       ├── __init__.py
│   │   │       ├── vllm_async_server.py
│   │   │       └── vllm_rollout_spmd.py
│   │   └── sharding_manager/
│   │       ├── __init__.py
│   │       ├── base.py
│   │       ├── fsdp_hf.py
│   │       ├── fsdp_sglang.py
│   │       ├── fsdp_ulysses.py
│   │       ├── fsdp_vllm.py
│   │       ├── megatron_sglang.py
│   │       └── megatron_vllm.py
│   ├── environment/
│   │   └── embodied/
│   │       ├── __init__.py
│   │       ├── adapters/
│   │       │   ├── __init__.py
│   │       │   └── libero.py
│   │       ├── base.py
│   │       └── venv.py
│   ├── execution/
│   │   ├── dag/
│   │   │   ├── __init__.py
│   │   │   ├── builtin_pipelines.py
│   │   │   ├── config_loader.py
│   │   │   ├── node.py
│   │   │   ├── pipeline.py
│   │   │   ├── task_graph.py
│   │   │   └── task_loader.py
│   │   ├── metric_worker/
│   │   │   ├── metric_worker.py
│   │   │   └── utils.py
│   │   ├── rollout_flow/
│   │   │   ├── multi_agent/
│   │   │   │   ├── multiagent_generate.py
│   │   │   │   └── utils.py
│   │   │   └── multiturn/
│   │   │       ├── __init__.py
│   │   │       ├── agent_loop/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── agent_loop.py
│   │   │       │   ├── single_turn_agent_loop.py
│   │   │       │   └── tool_agent_loop.py
│   │   │       ├── interactions/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── base.py
│   │   │       │   ├── gsm8k_interaction.py
│   │   │       │   └── utils/
│   │   │       │       ├── __init__.py
│   │   │       │       └── interaction_registry.py
│   │   │       └── tools/
│   │   │           ├── __init__.py
│   │   │           ├── base_tool.py
│   │   │           ├── geo3k_tool.py
│   │   │           ├── gsm8k_tool.py
│   │   │           ├── mcp_base_tool.py
│   │   │           ├── mcp_search_tool.py
│   │   │           ├── sandbox_fusion_tools.py
│   │   │           ├── schemas.py
│   │   │           ├── search_tool.py
│   │   │           └── utils/
│   │   │               ├── __init__.py
│   │   │               ├── mcp_clients/
│   │   │               │   ├── McpClientManager.py
│   │   │               │   ├── __init__.py
│   │   │               │   └── utils.py
│   │   │               ├── search_r1_like_utils.py
│   │   │               └── tool_registry.py
│   │   └── scheduler/
│   │       ├── __init__.py
│   │       ├── enums.py
│   │       ├── graph_updater.py
│   │       ├── launch.py
│   │       ├── process_group_manager.py
│   │       ├── ray_actor_manager.py
│   │       ├── resource_manager.py
│   │       ├── reward.py
│   │       └── task_scheduler.py
│   ├── main_dag.py
│   ├── models/
│   │   ├── __init__.py
│   │   ├── embodied/
│   │   │   ├── openvla/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── configuration_prismatic.py
│   │   │   │   ├── modeling_prismatic.py
│   │   │   │   └── processing_prismatic.py
│   │   │   └── openvla_oft/
│   │   │       ├── __init__.py
│   │   │       ├── configuration_prismatic.py
│   │   │       ├── constants.py
│   │   │       ├── modeling_prismatic.py
│   │   │       ├── processing_prismatic.py
│   │   │       └── train_utils.py
│   │   ├── llama/
│   │   │   ├── __init__.py
│   │   │   └── megatron/
│   │   │       ├── __init__.py
│   │   │       ├── checkpoint_utils/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── llama_loader.py
│   │   │       │   ├── llama_loader_depracated.py
│   │   │       │   └── llama_saver.py
│   │   │       ├── layers/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── parallel_attention.py
│   │   │       │   ├── parallel_decoder.py
│   │   │       │   ├── parallel_linear.py
│   │   │       │   ├── parallel_mlp.py
│   │   │       │   └── parallel_rmsnorm.py
│   │   │       └── modeling_llama_megatron.py
│   │   ├── loader.py
│   │   ├── mcore/
│   │   │   ├── __init__.py
│   │   │   ├── config_converter.py
│   │   │   ├── loader.py
│   │   │   ├── mbridge.py
│   │   │   ├── model_forward.py
│   │   │   ├── model_forward_fused.py
│   │   │   ├── model_initializer.py
│   │   │   ├── patch_v012.py
│   │   │   ├── registry.py
│   │   │   ├── saver.py
│   │   │   ├── util.py
│   │   │   └── weight_converter.py
│   │   ├── model_utils/
│   │   │   ├── __init__.py
│   │   │   └── visual.py
│   │   ├── patcher.py
│   │   ├── qwen2/
│   │   │   ├── __init__.py
│   │   │   └── megatron/
│   │   │       ├── __init__.py
│   │   │       ├── checkpoint_utils/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── qwen2_loader.py
│   │   │       │   ├── qwen2_loader_depracated.py
│   │   │       │   └── qwen2_saver.py
│   │   │       ├── layers/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── parallel_attention.py
│   │   │       │   ├── parallel_decoder.py
│   │   │       │   ├── parallel_linear.py
│   │   │       │   ├── parallel_mlp.py
│   │   │       │   └── parallel_rmsnorm.py
│   │   │       └── modeling_qwen2_megatron.py
│   │   ├── registry.py
│   │   ├── transformers/
│   │   │   ├── __init__.py
│   │   │   ├── internvl.py
│   │   │   ├── internvl_chat/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── configuration_intern_vit.py
│   │   │   │   ├── configuration_internlm2.py
│   │   │   │   ├── configuration_internvl_chat.py
│   │   │   │   ├── modeling_intern_vit.py
│   │   │   │   ├── modeling_internlm2.py
│   │   │   │   ├── modeling_internvl_chat.py
│   │   │   │   ├── tokenization_internlm2.py
│   │   │   │   └── tokenization_internlm2_fast.py
│   │   │   ├── kimi_vl.py
│   │   │   ├── llama.py
│   │   │   ├── monkey_patch.py
│   │   │   ├── npu_patch.py
│   │   │   ├── qwen2.py
│   │   │   ├── qwen2_5_vl.py
│   │   │   ├── qwen2_vl.py
│   │   │   └── transformers_compat.py
│   │   └── weight_loader_registry.py
│   ├── params/
│   │   ├── __init__.py
│   │   ├── dag_args.py
│   │   ├── data_args.py
│   │   ├── display_dict.py
│   │   ├── embodied_args.py
│   │   ├── model_args.py
│   │   ├── parser.py
│   │   ├── profiler_args.py
│   │   └── training_args.py
│   ├── third_party/
│   │   ├── __init__.py
│   │   └── sglang/
│   │       ├── __init__.py
│   │       └── parallel_state.py
│   ├── user_interface/
│   │   ├── filter_interface/
│   │   │   ├── __init__.py
│   │   │   ├── dapo.py
│   │   │   └── embodied.py
│   │   └── rewards_interface/
│   │       └── custom_gsm8k_reward.py
│   └── utils/
│       ├── __init__.py
│       ├── checkpoint/
│       │   ├── __init__.py
│       │   ├── checkpoint_manager.py
│       │   ├── fsdp_checkpoint_manager.py
│       │   └── megatron_checkpoint_manager.py
│       ├── debug/
│       │   ├── __init__.py
│       │   ├── mstx_profile.py
│       │   ├── performance.py
│       │   └── profile.py
│       ├── embodied/
│       │   ├── __init__.py
│       │   ├── libero_utils.py
│       │   ├── openvla_utils.py
│       │   └── video_emb.py
│       ├── experimental/
│       │   ├── __init__.py
│       │   └── torch_functional.py
│       ├── extras/
│       │   ├── __init__.py
│       │   ├── device.py
│       │   ├── fs.py
│       │   ├── hdfs_io.py
│       │   ├── import_utils.py
│       │   ├── misc.py
│       │   ├── net_utils.py
│       │   ├── packages.py
│       │   ├── patch.py
│       │   ├── py_functional.py
│       │   └── ray_utils.py
│       ├── import_string.py
│       ├── kernel/
│       │   ├── __init__.py
│       │   ├── kernels.py
│       │   └── linear_cross_entropy.py
│       ├── logger/
│       │   ├── __init__.py
│       │   ├── aggregate_logger.py
│       │   ├── logging_utils.py
│       │   └── tracking.py
│       ├── megatron/
│       │   ├── __init__.py
│       │   ├── dist_checkpointing.py
│       │   ├── megatron_utils.py
│       │   ├── memory.py
│       │   ├── memory_buffer.py
│       │   ├── optimizer.py
│       │   ├── pipeline_parallel.py
│       │   ├── sequence_parallel.py
│       │   └── tensor_parallel.py
│       ├── memory_utils.py
│       ├── metrics/
│       │   ├── __init__.py
│       │   └── metric_utils.py
│       ├── model_utils/
│       │   ├── __init__.py
│       │   ├── activation_offload.py
│       │   ├── attention_utils.py
│       │   ├── flops_counter.py
│       │   ├── fsdp_utils.py
│       │   ├── model.py
│       │   ├── npu_utils.py
│       │   ├── seqlen_balancing.py
│       │   ├── tensordict_utils.py
│       │   ├── torch_dtypes.py
│       │   ├── torch_functional.py
│       │   ├── ulysses.py
│       │   └── vllm_utils.py
│       └── reward_score/
│           ├── __init__.py
│           ├── embodied.py
│           ├── geo3k.py
│           ├── gsm8k.py
│           ├── math.py
│           ├── math_batch.py
│           ├── math_dapo.py
│           ├── math_verify.py
│           ├── mm_eureka.py
│           ├── prime_code/
│           │   ├── __init__.py
│           │   ├── testing_util.py
│           │   └── utils.py
│           ├── prime_math/
│           │   ├── __init__.py
│           │   ├── grader.py
│           │   └── math_normalize.py
│           ├── sandbox_fusion/
│           │   ├── __init__.py
│           │   └── utils.py
│           └── search_r1_like_qa_em.py
└── tests/
    ├── __init__.py
    ├── dag/
    │   ├── test_config_loader.py
    │   ├── test_node.py
    │   ├── test_task_graph.py
    │   └── test_task_loader.py
    ├── dag_worker/
    │   ├── test_dag_worker.py
    │   ├── test_dapo_merge.py
    │   └── test_dapo_pipeline.py
    ├── data_buffer/
    │   ├── detailed_put_performance_test.py
    │   ├── performance_test_data_buffer.py
    │   └── test_data_buffer.py
    └── scheduler/
        ├── test_process_group_manager.py
        └── test_task_scheduler.py

Download .txt

Showing preview only (239K chars total). Download the full file or copy to clipboard to get everything.

SYMBOL INDEX (2598 symbols across 253 files)

FILE: examples/custom_pipeline_example/custom_grpo.py
  function example_builtin_grpo (line 33) | def example_builtin_grpo() -> TaskGraph:
  function grpo_with_custom_reward (line 48) | def grpo_with_custom_reward() -> TaskGraph:
  function my_custom_reward_fn (line 97) | def my_custom_reward_fn(batch: TensorDict, **kwargs) -> NodeOutput:

FILE: examples/custom_reward/rewardfunc_gsm8k.py
  function extract_solution (line 4) | def extract_solution(solution_str, method="strict"):
  function compute_score (line 30) | def compute_score(data_sources, solution_strs, ground_truths, extra_info...

FILE: examples/data_preprocess/deepscaler.py
  function load_json (line 28) | def load_json(file_path):
  function make_map_fn (line 55) | def make_map_fn(split_name):

FILE: examples/data_preprocess/geo3k.py
  function make_map_fn (line 45) | def make_map_fn(split):

FILE: examples/data_preprocess/gsm8k.py
  function extract_solution (line 27) | def extract_solution(solution_str):
  function make_map_fn (line 52) | def make_map_fn(split):

FILE: examples/data_preprocess/math_dataset.py
  function extract_solution (line 27) | def extract_solution(solution_str):
  function make_map_fn (line 50) | def make_map_fn(split):

FILE: examples/data_preprocess/mm_eureka.py
  function make_map_fn (line 53) | def make_map_fn(split):

FILE: examples/experimental/marft/config/code_env.py
  class CodeEnv (line 18) | class CodeEnv():
    method __init__ (line 19) | def __init__(self):
    method reset (line 21) | def reset(self) -> Any:
    method step (line 23) | async def step(self, actions, ground_truth):

FILE: examples/experimental/marft/config/math_env.py
  class MathEnv (line 18) | class MathEnv():
    method __init__ (line 19) | def __init__(self):
    method reset (line 21) | def reset(self) -> Any:
    method step (line 23) | async def step(self, actions, ground_truth):

FILE: examples/experimental/marft/config/process.py
  function pre_process (line 16) | def pre_process(tokenizer, prompt_id, obs, **kwargs):
  function post_process (line 26) | def post_process(tokenizer, prompt_id, response_id, **kwargs):

FILE: siirl/dag_worker/checkpoint_manager.py
  class CheckpointManager (line 30) | class CheckpointManager:
    method __init__ (line 33) | def __init__(
    method save_checkpoint (line 53) | def save_checkpoint(self, global_steps: int) -> None:
    method _save_model_states (line 73) | def _save_model_states(self, global_steps: int, step_dir: str) -> None:
    method _save_dataloader_state (line 111) | def _save_dataloader_state(self, step_dir: str) -> None:
    method _commit_checkpoint (line 124) | def _commit_checkpoint(self, global_steps: int) -> None:
    method load_checkpoint (line 134) | def load_checkpoint(self) -> int:
    method _determine_checkpoint_path (line 172) | def _determine_checkpoint_path(self) -> Optional[str]:
    method _load_model_states (line 199) | def _load_model_states(self, global_step_folder: str) -> None:
    method _load_dataloader_state (line 237) | def _load_dataloader_state(self, global_step_folder: str) -> None:

FILE: siirl/dag_worker/constants.py
  class DAGInitializationError (line 19) | class DAGInitializationError(Exception):
  class DAGConstants (line 25) | class DAGConstants:

FILE: siirl/dag_worker/core_algos.py
  function register_policy_loss (line 54) | def register_policy_loss(name: str) -> Callable[[PolicyLossFn], PolicyLo...
  function get_policy_loss_fn (line 71) | def get_policy_loss_fn(name):
  function compute_response_mask (line 89) | def compute_response_mask(data: TensorDict):
  function register_adv_est (line 130) | def register_adv_est(name_or_enum: str | AdvantageEstimator) -> Any:
  function get_adv_estimator_fn (line 151) | def get_adv_estimator_fn(name_or_enum):
  class AdaptiveKLController (line 167) | class AdaptiveKLController:
    method __init__ (line 173) | def __init__(self, init_kl_coef, target_kl, horizon):
    method update (line 178) | def update(self, current_kl, n_steps):
  class FixedKLController (line 191) | class FixedKLController:
    method __init__ (line 194) | def __init__(self, kl_coef):
    method update (line 197) | def update(self, current_kl, n_steps):
  function get_kl_controller (line 207) | def get_kl_controller(kl_ctrl):
  function compute_gae_advantage_return (line 230) | def compute_gae_advantage_return(
  function compute_grpo_outcome_advantage (line 280) | def compute_grpo_outcome_advantage(
  function compute_marft_gae_advantage_return (line 356) | def compute_marft_gae_advantage_return(
  function compute_cpgd_outcome_advantage (line 466) | def compute_cpgd_outcome_advantage(
  function compute_rewards (line 525) | def compute_rewards(token_level_scores, old_log_prob, ref_log_prob, kl_r...
  function agg_loss (line 541) | def agg_loss(loss_mat: torch.Tensor, loss_mask: torch.Tensor, loss_agg_m...
  function compute_policy_loss_cpgd (line 578) | def compute_policy_loss_cpgd(
  function compute_policy_loss (line 636) | def compute_policy_loss(
  function compute_policy_loss_vanilla (line 716) | def compute_policy_loss_vanilla(
  function compute_policy_loss_gspo (line 801) | def compute_policy_loss_gspo(
  function compute_policy_loss_gpg (line 871) | def compute_policy_loss_gpg(
  function compute_policy_loss_clip_cov (line 904) | def compute_policy_loss_clip_cov(
  function compute_policy_loss_kl_cov (line 1004) | def compute_policy_loss_kl_cov(
  function compute_policy_loss_geo_mean (line 1080) | def compute_policy_loss_geo_mean(
  function compute_entropy_loss (line 1161) | def compute_entropy_loss(logits, response_mask, loss_agg_mode: str = "to...
  function compute_value_loss (line 1178) | def compute_value_loss(
  function kl_penalty (line 1220) | def kl_penalty(logprob: torch.FloatTensor, ref_logprob: torch.FloatTenso...
  function kl_penalty_forward (line 1246) | def kl_penalty_forward(logprob: torch.FloatTensor, ref_logprob: torch.Fl...
  function compute_pf_ppo_reweight_data (line 1284) | def compute_pf_ppo_reweight_data(
  function apply_kl_penalty (line 1363) | def apply_kl_penalty(data: TensorDict, kl_ctrl: AdaptiveKLController, kl...
  function compute_advantage (line 1405) | def compute_advantage(data: TensorDict, adv_estimator, gamma=1.0, lam=1....

FILE: siirl/dag_worker/dag_utils.py
  function timer (line 57) | def timer(enable_perf: bool, name: str, timing_dict: dict):
  function add_prefix_to_dataproto (line 69) | def add_prefix_to_dataproto(tensordict: TensorDict, node: Node):
  function remove_prefix_from_dataproto (line 95) | def remove_prefix_from_dataproto(tensordict, node: Node):
  function add_prefix_to_metrics (line 121) | def add_prefix_to_metrics(metrics: dict, node: Node) -> dict:
  function get_and_validate_rank (line 141) | def get_and_validate_rank() -> int:
  function get_taskgraph_for_rank (line 152) | def get_taskgraph_for_rank(rank: int, taskgraph_mapping: Dict[int, TaskG...
  function log_ray_actor_info (line 164) | def log_ray_actor_info(rank: int):
  function log_role_worker_mapping (line 176) | def log_role_worker_mapping(role_worker_mapping: Dict[NodeRole, Type[Wor...
  function find_first_non_compute_ancestor (line 195) | def find_first_non_compute_ancestor(taskgraph: TaskGraph, start_node_id:...
  function should_create_worker (line 228) | def should_create_worker(role_worker_mapping: Dict[NodeRole, Type[Worker...
  function generate_node_worker_key (line 236) | def generate_node_worker_key(node: Node) -> str:
  function setup_sharding_manager (line 241) | def setup_sharding_manager(
  function get_worker_classes (line 370) | def get_worker_classes(config, strategy: str) -> Dict[NodeRole, Type[Wor...
  function get_parallelism_config (line 414) | def get_parallelism_config(reference_node: Node) -> tuple[int, int]:
  function prepare_generation_batch (line 450) | def prepare_generation_batch(batch: TensorDict) -> TensorDict:
  function prepare_local_batch_metrics (line 465) | def prepare_local_batch_metrics(batch: TensorDict, use_critic: bool = Tr...
  function whether_put_data (line 515) | def whether_put_data(rank, is_current_last_pp_tp_rank0, next_dp_size, cu...
  function reduce_and_broadcast_metrics (line 542) | def reduce_and_broadcast_metrics(
  function format_metrics_by_group (line 580) | def format_metrics_by_group(metrics: Dict[str, Any], group_order: List[s...
  function log_metrics_to_console (line 629) | def log_metrics_to_console(rank: int, ordered_metrics: List[Tuple[str, A...
  function dump_validation_generations (line 638) | def dump_validation_generations(
  function aggregate_and_write_performance_metrics (line 684) | def aggregate_and_write_performance_metrics(
  function log_core_performance_metrics (line 775) | def log_core_performance_metrics(rank: int, enable_perf: bool, metrics: ...
  function get_time_now (line 884) | def get_time_now(time_zone: str = "Asia/Shanghai") -> datetime:
  function consistent_hash (line 889) | def consistent_hash(s: str) -> int:

FILE: siirl/dag_worker/dagworker.py
  class DAGWorker (line 81) | class DAGWorker(Worker):
    method __init__ (line 87) | def __init__(
    method execute_task_graph (line 150) | def execute_task_graph(self):
    method _run_training_loop (line 174) | def _run_training_loop(self):
    method _cleanup_step_buffers (line 256) | def _cleanup_step_buffers(self, timing_raw: dict) -> None:
    method _run_training_step (line 272) | def _run_training_step(self, epoch: int, batch_idx: int) -> Optional[L...
    method generate_sync_mode (line 469) | def generate_sync_mode(self, agent_group, batch: TensorDict) -> NodeOu...
    method generate_async_mode (line 478) | def generate_async_mode(self, batch: TensorDict) -> NodeOutput:
    method generate_multi_agent_mode (line 490) | def generate_multi_agent_mode(self, config, batch: TensorDict) -> Node...
    method generate_embodied_mode (line 508) | def generate_embodied_mode(self, agent_group, batch: TensorDict, **kwa...
    method generate (line 557) | def generate(self, config, batch: TensorDict, **kwargs) -> NodeOutput:
    method compute_reward (line 575) | def compute_reward(self, config, batch: TensorDict, **kwargs) -> NodeO...
    method compute_old_log_prob (line 604) | def compute_old_log_prob(self, config, batch: TensorDict, **kwargs) ->...
    method compute_ref_log_prob (line 624) | def compute_ref_log_prob(self, config, batch: TensorDict, **kwargs) ->...
    method compute_value (line 632) | def compute_value(self, config, batch: TensorDict, **kwargs) -> NodeOu...
    method compute_multi_agent_advantage (line 639) | def compute_multi_agent_advantage(self, config, batch: TensorDict, **k...
    method compute_advantage (line 686) | def compute_advantage(self, config, batch: TensorDict, **kwargs) -> No...
    method train_critic (line 708) | def train_critic(self, config, batch: TensorDict, **kwargs) -> NodeOut...
    method train_actor (line 716) | def train_actor(self, config, batch: TensorDict, **kwargs) -> NodeOutput:
    method _initialize_worker (line 732) | def _initialize_worker(self):
    method _setup_distributed_environment (line 767) | def _setup_distributed_environment(self):
    method _setup_tokenizers (line 817) | def _setup_tokenizers(self):
    method _setup_dataloader (line 843) | def _setup_dataloader(self):
    method _setup_reward_managers (line 877) | def _setup_reward_managers(self):
    method _setup_role_worker_mapping (line 893) | def _setup_role_worker_mapping(self):
    method _initialize_node_workers (line 913) | def _initialize_node_workers(self):
    method init_graph (line 965) | def init_graph(self):
    method _load_model_weights (line 996) | def _load_model_weights(self):
    method _setup_sharding_manager (line 1018) | def _setup_sharding_manager(self):
    method _setup_async_rollout (line 1037) | def _setup_async_rollout(self):
    method _setup_multi_agent_loop (line 1052) | def _setup_multi_agent_loop(self):
    method _init_validator (line 1071) | def _init_validator(self):
    method _init_metrics_collector (line 1095) | def _init_metrics_collector(self):
    method _init_checkpoint_manager (line 1112) | def _init_checkpoint_manager(self):
    method init_async_server (line 1126) | def init_async_server(self, node:Node, node_worker):
    method put_data_to_buffers (line 1150) | def put_data_to_buffers(
    method get_data_from_buffers (line 1221) | def get_data_from_buffers(
    method reset_data_buffer (line 1307) | def reset_data_buffer(self):
    method _get_node_process_group (line 1317) | def _get_node_process_group(self, node: Node) -> ProcessGroup:
    method _get_node_dp_info (line 1328) | def _get_node_dp_info(self, node: Node) -> tuple[int, int, int, int, i...
    method get_zeromq_address (line 1376) | def get_zeromq_address(self):
    method multi_agent_put_log (line 1379) | def multi_agent_put_log(self, key: str, data: TensorDict, agent_group:...
    method check_mode (line 1384) | def check_mode(self):

FILE: siirl/dag_worker/data_structures.py
  class ValidationResult (line 24) | class ValidationResult:
  class ValidationPayload (line 36) | class ValidationPayload:
  class NodeOutput (line 46) | class NodeOutput:

FILE: siirl/dag_worker/metric_aggregator.py
  class _ReduceOp (line 25) | class _ReduceOp(Enum):
  class DistributedMetricAggregator (line 54) | class DistributedMetricAggregator:
    method __init__ (line 60) | def __init__(
    method _bucket_local_metrics (line 78) | def _bucket_local_metrics(self, metrics: Dict, expected_keys: set = No...
    method aggregate_and_get_results (line 173) | def aggregate_and_get_results(self) -> Dict[str, float]:

FILE: siirl/dag_worker/metrics_collector.py
  class MetricsCollector (line 50) | class MetricsCollector:
    method __init__ (line 63) | def __init__(
    method collect_final_metrics (line 96) | def collect_final_metrics(self, batch: TensorDict, timing_raw: dict) -...
    method collect_multi_agent_final_metrics (line 240) | def collect_multi_agent_final_metrics(self, batch: TensorDict, ordered...

FILE: siirl/dag_worker/validator.py
  class Validator (line 43) | class Validator:
    method __init__ (line 57) | def __init__(
    method validate (line 122) | def validate(self, global_step: int) -> Dict[str, float]:
    method _validate_embodied (line 131) | def _validate_embodied(self, global_step) -> Dict[str, float]:
    method _validate_text_generation (line 200) | def _validate_text_generation(self, global_step: int) -> Dict[str, flo...
    method _generate_for_validation (line 264) | def _generate_for_validation(self, batch: TensorDict) -> TensorDict:
    method _score_and_package_results (line 306) | def _score_and_package_results(self, generated_proto: TensorDict) -> L...
    method _aggregate_and_log_validation_metrics (line 359) | def _aggregate_and_log_validation_metrics(self, all_payloads: List[Val...
    method _aggregate_validation_results (line 395) | def _aggregate_validation_results(self, all_payloads: List[ValidationP...
    method _prepare_embodied_validation_batch (line 459) | def _prepare_embodied_validation_batch(self) -> TensorDict:
    method _generate_for_embodied_validation (line 475) | def _generate_for_embodied_validation(self, batch: TensorDict, global_...
    method _score_embodied_results (line 519) | def _score_embodied_results(self, generated_proto: TensorDict) -> List...
    method _aggregate_and_log_embodied_metrics (line 580) | def _aggregate_and_log_embodied_metrics(self, all_payloads: List[Valid...
    method _aggregate_embodied_results (line 622) | def _aggregate_embodied_results(self, all_payloads: List[ValidationPay...

FILE: siirl/data_coordinator/data_buffer.py
  class DataCoordinator (line 29) | class DataCoordinator:
    method __init__ (line 36) | def __init__(self, nnodes: int, ppo_mini_batch_size: int, world_size: ...
    method put (line 56) | async def put(self, sample_info: SampleInfo, sample_ref: Any, caller_n...
    method put_batch (line 99) | async def put_batch(self, sample_infos: List[SampleInfo], sample_refs:...
    method get_batch (line 137) | async def get_batch(
    method _log_accumulation_progress (line 219) | def _log_accumulation_progress(self, current_samples: int, target_samp...
    method _log_dispatch_stats (line 242) | def _log_dispatch_stats(self, dispatched_samples: int):
    method _apply_length_balancing (line 268) | def _apply_length_balancing(
    method _apply_length_balancing_single_sample (line 353) | def _apply_length_balancing_single_sample(
    method get_all_by_filter (line 416) | async def get_all_by_filter(self, filter_plugin: Callable[[SampleInfo]...
    method get_valid_size (line 438) | async def get_valid_size(self) -> int:
    method peek_source_dp_size (line 443) | async def peek_source_dp_size(self, filter_plugin: Callable[[SampleInf...
    method reset_cache (line 461) | def reset_cache(self):
    method __repr__ (line 468) | def __repr__(self) -> str:
  function init_data_coordinator (line 476) | def init_data_coordinator(num_buffers: int, ppo_mini_batch_size: int, wo...

FILE: siirl/data_coordinator/dataloader/data_loader_node.py
  class RepeatDataset (line 29) | class RepeatDataset(torch.utils.data.Dataset):
    method __init__ (line 54) | def __init__(self, base_dataset, repeat_factor):
    method __len__ (line 59) | def __len__(self):
    method __getitem__ (line 62) | def __getitem__(self, idx):
  class DataLoaderNode (line 66) | class DataLoaderNode(Node):
    method __init__ (line 73) | def __init__(
    method _create_dataloader (line 135) | def _create_dataloader(self):
    method _reinit_dataloader_sampler (line 246) | def _reinit_dataloader_sampler(self):
    method get_train_dataloader (line 274) | def get_train_dataloader(self):
    method get_val_dataloader (line 283) | def get_val_dataloader(self):
    method run (line 292) | def run(self, epoch: Optional[int] = None, is_validation_step: bool = ...
    method state_dict (line 407) | def state_dict(self) -> Dict[str, Any]:
    method load_state_dict (line 418) | def load_state_dict(self, state_dict: Dict[str, Any]):

FILE: siirl/data_coordinator/dataloader/embodied_preprocess.py
  function prepare_libero_train_valid_datasets (line 23) | def prepare_libero_train_valid_datasets(

FILE: siirl/data_coordinator/dataloader/partitioned_dataset.py
  function collate_fn (line 37) | def collate_fn(data_list: list[dict]) -> dict:
  class PartitionedRLHFDataset (line 73) | class PartitionedRLHFDataset(Dataset):
    method __init__ (line 97) | def __init__(
    method _load_partitioned_raw_data (line 211) | def _load_partitioned_raw_data(self, dataset_files: Sequence[str]) -> ...
    method _filter_overlong_prompts (line 297) | def _filter_overlong_prompts(self, raw_dataframe: datasets.Dataset) ->...
    method __len__ (line 310) | def __len__(self) -> int:
    method _build_messages (line 313) | def _build_messages(self, example: dict) -> list:
    method _preprocess_function (line 333) | def _preprocess_function(self, row_dict: Dict) -> Dict:
    method __getitem__ (line 444) | def __getitem__(self, item: int) -> Dict:

FILE: siirl/data_coordinator/dataloader/vision_utils.py
  function process_image (line 23) | def process_image(image: Union[dict, Image.Image], max_pixels: int, min_...
  function process_video (line 74) | def process_video(
  function process_multi_modal_inputs_for_minicpmo (line 113) | def process_multi_modal_inputs_for_minicpmo(input_ids, attention_mask, p...

FILE: siirl/data_coordinator/protocol.py
  class _TensorDictConfigMeta (line 49) | class _TensorDictConfigMeta(type):
    method auto_padding (line 55) | def auto_padding(cls):
    method auto_padding (line 60) | def auto_padding(cls, enabled: bool):
  class TensorDictConfig (line 65) | class TensorDictConfig(metaclass=_TensorDictConfigMeta):
  function union_tensor_dict (line 72) | def union_tensor_dict(tensor_dict1: TensorDict, tensor_dict2: TensorDict...
  function union_numpy_dict (line 88) | def union_numpy_dict(tensor_dict1: dict[str, np.ndarray], tensor_dict2: ...
  function list_of_dict_to_dict_of_list (line 102) | def list_of_dict_to_dict_of_list(list_of_dict: list[dict]):
  function all_gather_data_proto (line 114) | def all_gather_data_proto(data: TensorDict, process_group):
  function select_idxs (line 125) | def select_idxs(batch: TensorDict, idxs):

FILE: siirl/data_coordinator/sample.py
  class SampleInfo (line 17) | class SampleInfo(BaseModel):
  class Sample (line 27) | class Sample(BaseModel):
    class Config (line 103) | class Config:
  class SampleManager (line 107) | class SampleManager(BaseModel):
    class Config (line 111) | class Config:
  function preprocess_dataloader (line 116) | def preprocess_dataloader(data:Dict, n:int = 1):
  function Dict2Samples (line 144) | def Dict2Samples(data:TensorDict)-> List[SampleManager]:
  function Samples2Dict (line 184) | def Samples2Dict(samples: List[Sample]) -> TensorDict:
  function filter_tensordict (line 251) | def filter_tensordict(batch: TensorDict, indices: List[int]) -> TensorDict:

FILE: siirl/engine/actor/base.py
  class BasePPOActor (line 28) | class BasePPOActor(ABC):
    method __init__ (line 29) | def __init__(self, config):
    method compute_log_prob (line 40) | def compute_log_prob(self, data: TensorDict) -> torch.Tensor:
    method update_policy (line 55) | def update_policy(self, data: TensorDict) -> Dict:

FILE: siirl/engine/actor/dp_actor.py
  class DataParallelPPOActor (line 58) | class DataParallelPPOActor(BasePPOActor):
    method __init__ (line 59) | def __init__(
    method _forward_micro_batch (line 92) | def _forward_micro_batch(
    method _optimizer_step (line 270) | def _optimizer_step(self):
    method compute_log_prob (line 291) | def compute_log_prob(self, data: TensorDict, calculate_entropy=False) ...
    method update_policy (line 354) | def update_policy(self, data: TensorDict):

FILE: siirl/engine/actor/embodied_actor.py
  class RobDataParallelPPOActor (line 40) | class RobDataParallelPPOActor(BasePPOActor):
    method __init__ (line 42) | def __init__(
    method process_tensor (line 57) | def process_tensor(self, tensor, pad_id):
    method generate_traj_mask (line 65) | def generate_traj_mask(self, end_step, traj_len):
    method apply_mask_with_grad_control (line 78) | def apply_mask_with_grad_control(self, log_probs, entropy, mask):
    method _forward_micro_batch (line 104) | def _forward_micro_batch(self, micro_batch, temperature) -> Tuple[torc...
    method _forward_micro_batch_update (line 192) | def _forward_micro_batch_update(self, input_ids, attention_mask, pixel...
    method _forward_micro_batch_entropy (line 248) | def _forward_micro_batch_entropy(self, micro_batch, temperature) -> Tu...
    method _optimizer_step (line 318) | def _optimizer_step(self):
    method compute_log_prob (line 328) | def compute_log_prob(self, data: TensorDict, calculate_entropy=False) ...
    method update_policy (line 380) | def update_policy(self, data: TensorDict):
    method compute_entropy (line 504) | def compute_entropy(self, bacth_data: TensorDict):

FILE: siirl/engine/actor/megatron_actor.py
  class MegatronPPOActor (line 53) | class MegatronPPOActor(BasePPOActor):
    method __init__ (line 54) | def __init__(
    method _validate_config (line 138) | def _validate_config(self, config) -> None:
    method compute_log_prob (line 149) | def compute_log_prob(self, data: TensorDict, calculate_entropy=False) ...
    method compute_ppo_loss (line 253) | def compute_ppo_loss(self, model_output, data):
    method forward_backward_batch (line 307) | def forward_backward_batch(
    method update_policy (line 513) | def update_policy(self, data:TensorDict) -> dict:

FILE: siirl/engine/base_worker/base/worker.py
  class DistRankInfo (line 29) | class DistRankInfo:
  class DistGlobalInfo (line 37) | class DistGlobalInfo:
  class WorkerHelper (line 44) | class WorkerHelper:
    method _get_node_ip (line 45) | def _get_node_ip(self):
    method _get_free_port (line 62) | def _get_free_port(self):
    method get_availale_master_addr_port (line 67) | def get_availale_master_addr_port(self):
    method _get_pid (line 70) | def _get_pid(self):
  class Worker (line 75) | class Worker(WorkerHelper):
    method __new__ (line 85) | def __new__(cls, *args, **kwargs):
    method _configure_before_init (line 103) | def _configure_before_init(self, register_center_name: str, rank: int):
    method env_keys (line 134) | def env_keys(cls):
    method __init__ (line 138) | def __init__(self, cuda_visible_devices=None) -> None:
    method get_fused_worker_by_name (line 176) | def get_fused_worker_by_name(self, worker_name: str):
    method _setup_env_cuda_visible_devices (line 185) | def _setup_env_cuda_visible_devices(self):
    method _configure_with_store (line 232) | def _configure_with_store(self, store: Dict):
    method get_master_addr_port (line 246) | def get_master_addr_port(self):
    method get_cuda_visible_devices (line 250) | def get_cuda_visible_devices(self):
    method world_size (line 258) | def world_size(self):
    method rank (line 263) | def rank(self):
    method execute_with_func_generator (line 267) | def execute_with_func_generator(self, func, *args, **kwargs):
    method execute_func_rank_zero (line 281) | def execute_func_rank_zero(self, func, *args, **kwargs):

FILE: siirl/engine/base_worker/megatron/npu_mbridge_patch.py
  function load_weights_patch (line 20) | def load_weights_patch(
  function _weight_name_mapping_mcore_local_to_global_patch (line 131) | def _weight_name_mapping_mcore_local_to_global_patch(

FILE: siirl/engine/base_worker/megatron/worker.py
  class MegatronWorker (line 20) | class MegatronWorker(Worker):
    method __init__ (line 21) | def __init__(self, cuda_visible_devices=None) -> None:
    method get_megatron_global_info (line 24) | def get_megatron_global_info(self):
    method get_megatron_rank_info (line 34) | def get_megatron_rank_info(self):
    method _init_hf_config_and_tf_config (line 45) | def _init_hf_config_and_tf_config(

FILE: siirl/engine/base_worker/register_center/register_center.py
  class WorkerGroupRegisterCenter (line 21) | class WorkerGroupRegisterCenter:
    method __init__ (line 22) | def __init__(self, rank_zero_info):
    method get_rank_zero_info (line 27) | def get_rank_zero_info(self):
    method set_worker_info (line 30) | def set_worker_info(self, rank, node_id) -> None:
    method get_worker_info (line 33) | def get_worker_info(self) -> Dict[int, str]:
  function create_worker_group_register_center (line 37) | def create_worker_group_register_center(name, info):

FILE: siirl/engine/base_worker/resouce_pool.py
  class ResourcePool (line 24) | class ResourcePool:
    method __init__ (line 31) | def __init__(self, process_on_nodes=None, max_colocate_count: int = 10...
    method add_node (line 45) | def add_node(self, process_count):
    method world_size (line 49) | def world_size(self):
    method __call__ (line 53) | def __call__(self) -> Any:
    method store (line 57) | def store(self):
    method local_world_size_list (line 60) | def local_world_size_list(self) -> List[int]:
    method local_rank_list (line 65) | def local_rank_list(self) -> List[int]:
  class ClassWithInitArgs (line 71) | class ClassWithInitArgs:
    method __init__ (line 78) | def __init__(self, cls, *args, **kwargs) -> None:
    method __call__ (line 92) | def __call__(self) -> Any:
  class WorkerGroup (line 97) | class WorkerGroup:
    method __init__ (line 105) | def __init__(self, resource_pool: ResourcePool, **kwargs) -> None:
    method _is_worker_alive (line 124) | def _is_worker_alive(self, worker):
    method world_size (line 129) | def world_size(self):
  function get_random_string (line 134) | def get_random_string(length: int) -> str:
  function sort_placement_group_by_node_ip (line 142) | def sort_placement_group_by_node_ip(pgs: List[PlacementGroup]) -> List[P...
  class RayResourcePool (line 162) | class RayResourcePool(ResourcePool):
    method __init__ (line 163) | def __init__(
    method get_placement_groups (line 180) | def get_placement_groups(self, strategy="STRICT_PACK", name=None, devi...
  function extract_pg_from_exist (line 208) | def extract_pg_from_exist(resource_pools: Dict[str, RayResourcePool], sr...
  function merge_resource_pool (line 225) | def merge_resource_pool(rp1: RayResourcePool, rp2: RayResourcePool) -> R...
  class RayClassWithInitArgs (line 239) | class RayClassWithInitArgs(ClassWithInitArgs):
    method __init__ (line 247) | def __init__(self, cls, *args, **kwargs) -> None:
    method set_additional_resource (line 253) | def set_additional_resource(self, additional_resource):
    method update_options (line 261) | def update_options(self, options: Dict):
    method __call__ (line 269) | def __call__(self, placement_group, placement_group_bundle_idx, use_gp...

FILE: siirl/engine/critic/base.py
  class BasePPOCritic (line 27) | class BasePPOCritic(ABC):
    method __init__ (line 28) | def __init__(self, config):
    method compute_values (line 33) | def compute_values(self, data: TensorDict) -> torch.Tensor:
    method update_critic (line 38) | def update_critic(self, data: TensorDict):

FILE: siirl/engine/critic/dp_critic.py
  class DataParallelPPOCritic (line 41) | class DataParallelPPOCritic(BasePPOCritic):
    method __init__ (line 42) | def __init__(self, config: CriticArguments, critic_module: nn.Module, ...
    method _forward_micro_batch (line 52) | def _forward_micro_batch(self, micro_batch):
    method _optimizer_step (line 111) | def _optimizer_step(self):
    method compute_values (line 130) | def compute_values(self, data: TensorDict) -> torch.Tensor:
    method update_critic (line 169) | def update_critic(self, data: TensorDict):

FILE: siirl/engine/critic/megatron_critic.py
  class MegatronPPOCritic (line 41) | class MegatronPPOCritic(BasePPOCritic):
    method __init__ (line 42) | def __init__(
    method _validate_config (line 77) | def _validate_config(self, config) -> None:
    method compute_values (line 88) | def compute_values(self, data: TensorDict) -> TensorDict:
    method forward_backward_batch (line 135) | def forward_backward_batch(self, data: TensorDict, forward_only=False,...
    method update_critic (line 264) | def update_critic(self, data: TensorDict):

FILE: siirl/engine/fsdp_workers.py
  function create_device_mesh_from_group (line 76) | def create_device_mesh_from_group(
  function get_sharding_strategy (line 154) | def get_sharding_strategy(device_mesh):
  class ActorRolloutRefWorker (line 166) | class ActorRolloutRefWorker(Worker):
    method __init__ (line 172) | def __init__(self, config: ActorRolloutRefArguments, role: str, proces...
    method _build_model_optimizer (line 241) | def _build_model_optimizer(
    method _build_model_optimizer_legacy (line 328) | def _build_model_optimizer_legacy(
    method _prepare_and_load_model (line 635) | def _prepare_and_load_model(
    method _get_model_class (line 750) | def _get_model_class(self, model_config, is_embodied: bool):
    method _register_embodied_model (line 763) | def _register_embodied_model(self, model_path: str, trust_remote_code:...
    method _setup_openvla_oft_model (line 806) | def _setup_openvla_oft_model(self, model: torch.nn.Module, model_path:...
    method _apply_model_modifications (line 828) | def _apply_model_modifications(
    method _setup_fsdp_wrapper (line 901) | def _setup_fsdp_wrapper(
    method _create_optimizer_and_scheduler (line 1026) | def _create_optimizer_and_scheduler(
    method _build_rollout (line 1094) | def _build_rollout(self, trust_remote_code=False):
    method init_model (line 1156) | def init_model(self):
    method update_actor (line 1273) | def update_actor(self, data: TensorDict):
    method generate_sequences (line 1313) | def generate_sequences(self, prompts: TensorDict):
    method compute_log_prob (line 1357) | def compute_log_prob(self, data: TensorDict):
    method compute_ref_log_prob (line 1410) | def compute_ref_log_prob(self, data: TensorDict):
    method save_checkpoint (line 1452) | def save_checkpoint(self, local_path, hdfs_path=None, global_step=0, m...
    method load_checkpoint (line 1493) | def load_checkpoint(self, local_path, hdfs_path=None, del_local_after_...
    method set_rollout_sharding_manager (line 1505) | def set_rollout_sharding_manager(self, sharding_manager):
  class CriticWorker (line 1510) | class CriticWorker(Worker):
    method __init__ (line 1511) | def __init__(self, config: CriticArguments, process_group: ProcessGroup):
    method _build_critic_model_optimizer (line 1546) | def _build_critic_model_optimizer(self, config):
    method init_model (line 1717) | def init_model(self):
    method compute_values (line 1737) | def compute_values(self, data: TensorDict):
    method update_critic (line 1759) | def update_critic(self, data: TensorDict):
    method save_checkpoint (line 1794) | def save_checkpoint(self, local_path, hdfs_path=None, global_step=0, m...
    method load_checkpoint (line 1806) | def load_checkpoint(self, local_path, hdfs_path=None, del_local_after_...
  class RewardModelWorker (line 1823) | class RewardModelWorker(Worker):
    method __init__ (line 1828) | def __init__(self, config: RewardModelArguments, process_group: Proces...
    method _build_model (line 1855) | def _build_model(self, config: RewardModelArguments):
    method init_model (line 1935) | def init_model(self):
    method _forward_micro_batch (line 1940) | def _forward_micro_batch(self, micro_batch):
    method _expand_to_token_level (line 1986) | def _expand_to_token_level(self, data: TensorDict, scores: torch.Tensor):
    method _switch_chat_template (line 2001) | def _switch_chat_template(self, data: TensorDict):
    method compute_rm_score (line 2062) | def compute_rm_score(self, data: TensorDict):
  class AsyncActorRolloutRefWorker (line 2123) | class AsyncActorRolloutRefWorker(ActorRolloutRefWorker):
    method _build_rollout (line 2124) | def _build_rollout(self, trust_remote_code=False):
    method generate_sequences (line 2139) | def generate_sequences(self, prompts: TensorDict):
    method execute_method (line 2142) | def execute_method(self, method: Union[str, bytes], *args, **kwargs):
    method chat_completion (line 2148) | async def chat_completion(self, json_request):
    method wake_up (line 2152) | async def wake_up(self):
    method sleep (line 2157) | async def sleep(self):
    method set_rollout_sharding_manager (line 2162) | def set_rollout_sharding_manager(self, sharding_manager):
    method get_zeromq_address (line 2166) | def get_zeromq_address(self):

FILE: siirl/engine/megatron_workers.py
  function set_random_seed (line 67) | def set_random_seed(seed):
  class ActorRolloutRefWorker (line 87) | class ActorRolloutRefWorker(MegatronWorker):
    method __init__ (line 93) | def __init__(self, config: DictConfig, role: str, process_group=None):
    method _build_model_optimizer (line 161) | def _build_model_optimizer(self, model_path, optim_config, override_mo...
    method _build_rollout (line 228) | def _build_rollout(self, trust_remote_code=False):
    method init_model (line 301) | def init_model(self):
    method update_actor (line 393) | def update_actor(self, data: TensorDict):
    method generate_sequences (line 427) | def generate_sequences(self, prompts: TensorDict):
    method load_checkpoint (line 463) | def load_checkpoint(self, checkpoint_path, hdfs_path=None, del_local_a...
    method load_pretrained_model (line 472) | def load_pretrained_model(self, checkpoint_path, del_local_after_load=...
    method save_checkpoint (line 475) | def save_checkpoint(self, checkpoint_path, hdfs_path=None, global_step...
  class AsyncActorRolloutRefWorker (line 484) | class AsyncActorRolloutRefWorker(ActorRolloutRefWorker):
    method _build_rollout (line 485) | def _build_rollout(self, trust_remote_code=False):
    method execute_method (line 500) | def execute_method(self, method: Union[str, bytes], *args, **kwargs):
    method chat_completion (line 506) | async def chat_completion(self, json_request):
    method wake_up (line 510) | async def wake_up(self):
    method sleep (line 515) | async def sleep(self):
  class CriticWorker (line 521) | class CriticWorker(MegatronWorker):
    method __init__ (line 522) | def __init__(self, config, process_group=None):
    method _build_critic_model_optimizer (line 569) | def _build_critic_model_optimizer(self, model_path, optim_config, over...
    method init_model (line 633) | def init_model(self):
    method compute_values (line 690) | def compute_values(self, data: TensorDict):
    method update_critic (line 705) | def update_critic(self, data: TensorDict):
    method load_checkpoint (line 729) | def load_checkpoint(self, local_path, hdfs_path=None, del_local_after_...
    method save_checkpoint (line 738) | def save_checkpoint(self, local_path, hdfs_path=None, global_step=0, m...
  class RewardModelWorker (line 746) | class RewardModelWorker(MegatronWorker):
    method __init__ (line 751) | def __init__(self, config, process_group=None):
    method _build_rm_model (line 788) | def _build_rm_model(self, model_path, tokenizer, override_model_config...
    method init_model (line 830) | def init_model(self):
    method compute_rm_score (line 871) | def compute_rm_score(self, data: TensorDict):
  function global_initialize_model_parallel (line 885) | def global_initialize_model_parallel(config: ActorRolloutRefArguments):
  function global_mindspeed_repatch (line 923) | def global_mindspeed_repatch(config):
  class ActorWorker (line 935) | class ActorWorker(MegatronWorker):
    method __init__ (line 940) | def __init__(self, config: DictConfig, process_group=None):
    method _build_actor_model_optimizer (line 959) | def _build_actor_model_optimizer(self, model_path, optim_config, overr...
    method init_model (line 1018) | def init_model(self):
    method update_actor (line 1086) | def update_actor(self, data: TensorDict):
    method compute_log_prob (line 1122) | def compute_log_prob(self, data: TensorDict):
    method load_checkpoint (line 1156) | def load_checkpoint(self, local_path, hdfs_path=None, del_local_after_...
    method load_pretrained_model (line 1165) | def load_pretrained_model(self, checkpoint_path, del_local_after_load=...
    method save_checkpoint (line 1168) | def save_checkpoint(self, local_path, hdfs_path=None, global_step=0, m...
  class RolloutWorker (line 1177) | class RolloutWorker(MegatronWorker):
    method __init__ (line 1182) | def __init__(self, config: DictConfig, process_group=None):
    method _build_rollout (line 1199) | def _build_rollout(self, trust_remote_code=False):
    method init_model (line 1250) | def init_model(self):
    method generate_sequences (line 1284) | def generate_sequences(self, prompts: TensorDict):
    method set_rollout_sharding_manager (line 1304) | def set_rollout_sharding_manager(self, sharding_manager):
  class ReferenceWorker (line 1308) | class ReferenceWorker(MegatronWorker):
    method __init__ (line 1313) | def __init__(self, config: DictConfig, process_group=None):
    method _build_ref_model (line 1331) | def _build_ref_model(self, model_path, override_model_config, override...
    method init_model (line 1373) | def init_model(self):
    method compute_ref_log_prob (line 1409) | def compute_ref_log_prob(self, data: TensorDict):
  class AsyncRolloutWorker (line 1439) | class AsyncRolloutWorker(RolloutWorker):
    method _build_rollout (line 1440) | def _build_rollout(self, trust_remote_code=False):
    method execute_method (line 1455) | def execute_method(self, method: Union[str, bytes], *args, **kwargs):
    method chat_completion (line 1461) | async def chat_completion(self, json_request):
    method wake_up (line 1465) | async def wake_up(self):
    method sleep (line 1470) | async def sleep(self):
    method set_rollout_sharding_manager (line 1475) | def set_rollout_sharding_manager(self, sharding_manager):

FILE: siirl/engine/reward_manager/__init__.py
  function __getattr__ (line 20) | def __getattr__(name):

FILE: siirl/engine/reward_manager/dapo.py
  class DAPORewardManager (line 25) | class DAPORewardManager:
    method __init__ (line 28) | def __init__(
    method __call__ (line 48) | def __call__(self, data: TensorDict, return_dict: bool = False):

FILE: siirl/engine/reward_manager/embodied.py
  class EmbodiedRewardManager (line 25) | class EmbodiedRewardManager:
    method __init__ (line 34) | def __init__(
    method __call__ (line 79) | def __call__(self, data: TensorDict, return_dict: bool = False) -> Uni...
    method verify (line 165) | def verify(self, data: TensorDict) -> Tuple[List[float], Dict[str, flo...

FILE: siirl/engine/reward_manager/naive.py
  class NaiveRewardManager (line 25) | class NaiveRewardManager:
    method __init__ (line 28) | def __init__(self, tokenizer, num_examine, compute_score=None, reward_...
    method __call__ (line 35) | def __call__(self, data: TensorDict, return_dict=False):

FILE: siirl/engine/reward_manager/parallel.py
  class ParallelRewardManager (line 26) | class ParallelRewardManager:
    method __init__ (line 29) | def __init__(self, tokenizer, num_examine, compute_score=None, reward_...
    method _process_single_item (line 35) | def _process_single_item(self, data_item):
    method _compute_score (line 47) | def _compute_score(self, item):
    method verify (line 56) | def verify(self, data):
    method __call__ (line 63) | def __call__(self, data: TensorDict, return_dict: bool = True):

FILE: siirl/engine/reward_model/base.py
  class BasePPORewardModel (line 23) | class BasePPORewardModel(ABC):
    method __init__ (line 24) | def __init__(self, config):
    method compute_reward (line 28) | def compute_reward(self, data: TensorDict) -> TensorDict:

FILE: siirl/engine/reward_model/megatron/reward_model.py
  class MegatronRewardModel (line 33) | class MegatronRewardModel(BasePPORewardModel):
    method __init__ (line 34) | def __init__(
    method re_encode_by_rm_tokenizer (line 59) | def re_encode_by_rm_tokenizer(self, data: TensorDict) -> TensorDict:
    method compute_reward (line 124) | def compute_reward(self, data: TensorDict) -> TensorDict:
    method forward_batch (line 203) | def forward_batch(self, data: TensorDict, use_dynamic_bsz=False, micro...
    method offload_params_to_cpu (line 304) | def offload_params_to_cpu(self):
    method load_params_to_cuda (line 312) | def load_params_to_cuda(self):

FILE: siirl/engine/rollout/async_server.py
  function _get_free_port (line 28) | def _get_free_port():
  class AsyncServerBase (line 34) | class AsyncServerBase(ABC):
    method __init__ (line 37) | def __init__(self):
    method chat_completion (line 42) | async def chat_completion(self, raw_request: Request):
    method generate (line 50) | async def generate(self, prompt_ids: List[int], sampling_params: Dict[...
    method init_engine (line 64) | async def init_engine(self):
    method wake_up (line 69) | async def wake_up(self):
    method sleep (line 74) | async def sleep(self):
  function async_server_class (line 79) | def async_server_class(

FILE: siirl/engine/rollout/base.py
  class BaseRollout (line 22) | class BaseRollout(ABC):
    method generate_sequences (line 26) | def generate_sequences(self, prompts: TensorDict) -> TensorDict:

FILE: siirl/engine/rollout/embodied_rollout.py
  function _timer (line 57) | def _timer(name: str, timing_dict: dict):
  function crop_and_resize (line 65) | def crop_and_resize(image, crop_scale, batch_size):
  function center_crop_image (line 111) | def center_crop_image(image):
  class EmbodiedHFRollout (line 135) | class EmbodiedHFRollout(BaseRollout):
    method __init__ (line 137) | def __init__(self, module: nn.Module, config: ActorRolloutRefArguments):
    method close (line 177) | def close(self):
    method __del__ (line 184) | def __del__(self):
    method embodied_preprocess (line 188) | def embodied_preprocess(self):
    method generate_sequences (line 202) | def generate_sequences(self, prompts):
    method process_input (line 247) | def process_input(self,inputs:list, task_descriptions:list):
    method _generate_chunk_rollout (line 312) | def _generate_chunk_rollout(self, prompts):
    method _generate_one_step (line 498) | def _generate_one_step(self, prompts: dict):
    method _obs_to_input (line 660) | def _obs_to_input(self, obs):

FILE: siirl/engine/rollout/hf_rollout.py
  class HFRollout (line 38) | class HFRollout(BaseRollout):
    method __init__ (line 39) | def __init__(self, module: nn.Module, config):
    method generate_sequences (line 44) | def generate_sequences(self, prompts: TensorDict) -> TensorDict:
    method _generate_minibatch (line 53) | def _generate_minibatch(self, prompts: TensorDict) -> TensorDict:

FILE: siirl/engine/rollout/schemas.py
  class FinishReasonTypeEnum (line 36) | class FinishReasonTypeEnum(str, Enum):
    method from_str (line 44) | def from_str(cls, value: str) -> "FinishReasonTypeEnum":
  class Message (line 55) | class Message(BaseModel):
  class AsyncRolloutRequestStateEnum (line 61) | class AsyncRolloutRequestStateEnum(str, Enum):
  class TokenizationSanityCheckModeEnum (line 72) | class TokenizationSanityCheckModeEnum(str, Enum):
  class AsyncRolloutRequest (line 80) | class AsyncRolloutRequest(BaseModel):
    method initialize_request (line 122) | def initialize_request(cls, values):
    method _handle_apply_chat_template (line 219) | def _handle_apply_chat_template(
    method _get_position_ids (line 255) | def _get_position_ids(
    method _update_input_ids (line 293) | def _update_input_ids(
    method _update_multi_modal_inputs (line 330) | def _update_multi_modal_inputs(self, new_multi_modal_inputs: Dict[str,...
    method get_generation_prompt_ids (line 342) | def get_generation_prompt_ids(
    method add_user_message (line 373) | def add_user_message(
    method add_assistant_message (line 389) | def add_assistant_message(
    method add_tool_response_messages (line 407) | def add_tool_response_messages(
    method update_metrics (line 484) | def update_metrics(self, metrics: Any, tool_id: str) -> None:
    method _get_prompt_diffs (line 492) | def _get_prompt_diffs(
    method finalize (line 549) | def finalize(
    method truncate_output_ids (line 660) | def truncate_output_ids(

FILE: siirl/engine/rollout/sglang_rollout/async_sglang_server.py
  class AsyncSglangServer (line 32) | class AsyncSglangServer(AsyncServerBase):
    method __init__ (line 33) | def __init__(self, config: ActorRolloutRefArguments, spmd_engine: SGLa...
    method init_engine (line 41) | async def init_engine(self):
    method chat_completion (line 48) | async def chat_completion(self, raw_request: Request):
    method generate (line 57) | async def generate(self, prompt_ids: List[int], sampling_params: Dict[...
    method wake_up (line 60) | async def wake_up(self):
    method sleep (line 70) | async def sleep(self):

FILE: siirl/engine/rollout/sglang_rollout/sglang_rollout.py
  function _set_envs_and_config (line 93) | def _set_envs_and_config(server_args: ServerArgs):
  class AsyncEngine (line 137) | class AsyncEngine(sglang.srt.entrypoints.engine.Engine):
    method __init__ (line 138) | def __init__(self, **kwargs):
    method release_memory_occupation (line 143) | async def release_memory_occupation(self, tags: Optional[list[str]] = ...
    method resume_memory_occupation (line 151) | async def resume_memory_occupation(self, tags: Optional[list[str]] = N...
    method update_weights_from_tensor (line 166) | async def update_weights_from_tensor(
    method flush_cache (line 183) | async def flush_cache(self):
  function _pre_process_inputs (line 189) | def _pre_process_inputs(
  function _post_process_outputs (line 199) | def _post_process_outputs(processing_class, output):
  function get_tool_call_parser_type (line 228) | def get_tool_call_parser_type(
  class SGLangRollout (line 252) | class SGLangRollout(BaseRollout):
    method __init__ (line 253) | def __init__(
    method _init_distributed_env (line 329) | def _init_distributed_env(self, device_mesh_cpu, **kwargs):
    method _verify_config (line 374) | def _verify_config(self, model_hf_config):
    method _init_inference_engine (line 420) | def _init_inference_engine(self, trust_remote_code, actor_module, port):
    method _init_sampling_params (line 479) | def _init_sampling_params(self, **kwargs):
    method _initialize_tools (line 495) | def _initialize_tools(self, config, processing_class):
    method _initialize_interactions (line 541) | def _initialize_interactions(self, config):
    method generate_sequences (line 558) | def generate_sequences(self, prompts: TensorDict, **kwargs) -> TensorD...
    method _batch_level_generate_sequences (line 585) | def _batch_level_generate_sequences(self, prompts: TensorDict, **kwarg...
    method _async_rollout_a_request (line 775) | async def _async_rollout_a_request(
    method _handle_engine_call (line 982) | async def _handle_engine_call(
    method _handle_engine_generate (line 988) | async def _handle_engine_generate(
    method _handle_pending_state (line 1003) | async def _handle_pending_state(self, _req: AsyncRolloutRequest) -> As...
    method generate_sequences_with_tools (line 1026) | def generate_sequences_with_tools(self, prompts: TensorDict, **kwargs)...
    method _req_level_generate_sequences (line 1036) | def _req_level_generate_sequences(self, prompts: TensorDict, **kwargs)...
    method _preprocess_prompt_to_async_rollout_requests (line 1208) | def _preprocess_prompt_to_async_rollout_requests(self, prompts: Tensor...
    method chat_completion (line 1277) | async def chat_completion(self, json_request):
    method generate (line 1352) | async def generate(
    method wake_up (line 1360) | async def wake_up(self):
    method sleep (line 1367) | async def sleep(self):
    method _init_zeromq (line 1375) | def _init_zeromq(self) -> str:
    method _get_free_port (line 1398) | def _get_free_port(self):
    method _loop_forever (line 1405) | def _loop_forever(self):
    method get_zeromq_address (line 1411) | def get_zeromq_address(self):
    method execute_method (line 1414) | def execute_method(self, method: Union[str, bytes], *args, **kwargs):
    method get_device_mesh (line 1426) | def get_device_mesh(self):
  function ensure_event_loop (line 1429) | def ensure_event_loop():

FILE: siirl/engine/rollout/sglang_rollout/utils.py
  function broadcast_pyobj (line 26) | def broadcast_pyobj(

FILE: siirl/engine/rollout/vllm_rollout/__init__.py
  function get_version (line 20) | def get_version(pkg):

FILE: siirl/engine/rollout/vllm_rollout/vllm_async_server.py
  class ExternalZeroMQDistributedExecutor (line 41) | class ExternalZeroMQDistributedExecutor(Executor):
    method _init_executor (line 46) | def _init_executor(self) -> None:
    method collective_rpc (line 67) | def collective_rpc(
    method check_health (line 89) | def check_health(self):
  class AsyncvLLMServer (line 92) | class AsyncvLLMServer(AsyncServerBase):
    method __init__ (line 108) | def __init__(self, config: ActorRolloutRefArguments,  spmd_engine: Any...
    method init_engine (line 122) | def init_engine(self):
    method _create_engine_config (line 179) | def _create_engine_config(self, engine_args: AsyncEngineArgs, zmq_addr...
    method generate_sequences (line 192) | def generate_sequences(self, prompts: TensorDict, **kwargs) -> TensorD...
    method chat_completion (line 195) | async def chat_completion(self, raw_request: Request):
    method generate (line 212) | async def generate(self, prompt_ids: List[int], sampling_params: Dict[...
    method wake_up (line 226) | async def wake_up(self):
    method sleep (line 230) | async def sleep(self):

FILE: siirl/engine/rollout/vllm_rollout/vllm_rollout_spmd.py
  function _pre_process_inputs (line 83) | def _pre_process_inputs(pad_token_id, prompt_token_ids: torch.Tensor) ->...
  function _repeat_interleave (line 92) | def _repeat_interleave(value: Union[torch.Tensor, np.ndarray], repeats: ...
  class vLLMRollout (line 99) | class vLLMRollout(BaseRollout):
    method __init__ (line 100) | def __init__(self, model_path: str, config: RolloutArguments, tokenize...
    method update_sampling_params (line 248) | def update_sampling_params(self, **kwargs):
    method generate_sequences (line 265) | def generate_sequences(self, prompts: TensorDict, **kwargs) -> TensorD...
  function _monkey_patch_compute_logits (line 426) | def _monkey_patch_compute_logits(model, vocab_size: int):
  class vLLMAsyncRollout (line 440) | class vLLMAsyncRollout:
    method __init__ (line 444) | def __init__(self, model_path: str, config: DictConfig, tokenizer, mod...
    method _init_zeromq (line 454) | def _init_zeromq(self) -> str:
    method _get_free_port (line 477) | def _get_free_port(self):
    method _loop_forever (line 484) | def _loop_forever(self):
    method get_zeromq_address (line 491) | def get_zeromq_address(self):
    method init_worker (line 494) | def init_worker(self, all_kwargs: List[Dict[str, Any]]):
    method load_model (line 504) | def load_model(self, *args, **kwargs):
    method sleep (line 512) | def sleep(self, *args, **kwargs):
    method wake_up (line 519) | def wake_up(self, *args, **kwargs):
    method execute_method (line 526) | def execute_method(self, method: Union[str, bytes], *args, **kwargs):

FILE: siirl/engine/sharding_manager/base.py
  class BaseShardingManager (line 21) | class BaseShardingManager:
    method __enter__ (line 22) | def __enter__(self):
    method __exit__ (line 25) | def __exit__(self, exc_type, exc_value, traceback):
    method preprocess_data (line 28) | def preprocess_data(self, data: TensorDict) -> TensorDict:
    method postprocess_data (line 31) | def postprocess_data(self, data: TensorDict) -> TensorDict:

FILE: siirl/engine/sharding_manager/fsdp_hf.py
  class FSDPHFShardingManager (line 28) | class FSDPHFShardingManager(BaseShardingManager):
    method __init__ (line 40) | def __init__(
    method __enter__ (line 69) | def __enter__(self):
    method __exit__ (line 88) | def __exit__(self, exc_type, exc_value, traceback):

FILE: siirl/engine/sharding_manager/fsdp_sglang.py
  function _preprocess_tensor_for_update_weights (line 58) | def _preprocess_tensor_for_update_weights(tensor: torch.Tensor):
  class MultiAgentFSDPSGLangShardingManager (line 64) | class MultiAgentFSDPSGLangShardingManager(BaseShardingManager):
    method __init__ (line 66) | def __init__(
    method __enter__ (line 108) | def __enter__(self):
    method __exit__ (line 133) | def __exit__(self, exc_type, exc_value, traceback):
    method update_weights (line 149) | async def update_weights(self, params):
    method release_memory (line 182) | async def release_memory(self):
    method wake_up (line 186) | async def wake_up(self):
    method sleep (line 210) | async def sleep(self):
    method preprocess_data (line 225) | def preprocess_data(self, data: TensorDict) -> TensorDict:
    method postprocess_data (line 236) | def postprocess_data(self, data: TensorDict) -> TensorDict:

FILE: siirl/engine/sharding_manager/fsdp_ulysses.py
  class FSDPUlyssesShardingManager (line 26) | class FSDPUlyssesShardingManager(BaseShardingManager):
    method __init__ (line 31) | def __init__(self, device_mesh: DeviceMesh):
    method __enter__ (line 36) | def __enter__(self):
    method __exit__ (line 44) | def __exit__(self, exc_type, exc_value, traceback):
    method preprocess_data (line 51) | def preprocess_data(self, data: TensorDict) -> TensorDict:
    method postprocess_data (line 63) | def postprocess_data(self, data: TensorDict) -> TensorDict:

FILE: siirl/engine/sharding_manager/fsdp_vllm.py
  class MultiAgentFSDPVLLMShardingManager (line 42) | class MultiAgentFSDPVLLMShardingManager(BaseShardingManager):
    method __init__ (line 44) | def __init__(self, module: FSDP, inference_engine: LLM, model_config, ...
    method __enter__ (line 86) | def __enter__(self):
    method __exit__ (line 180) | def __exit__(self, exc_type, exc_value, traceback):
    method update_params (line 193) | def update_params(self, updated_params, peft_config=None):

FILE: siirl/engine/sharding_manager/megatron_sglang.py
  function _preprocess_tensor_for_update_weights (line 54) | def _preprocess_tensor_for_update_weights(tensor: torch.Tensor):
  class MultiAgentMegatronSGLangShardingManager (line 59) | class MultiAgentMegatronSGLangShardingManager(BaseShardingManager):
    method __init__ (line 60) | def __init__(
    method __enter__ (line 101) | def __enter__(self):
    method __exit__ (line 106) | def __exit__(self, exc_type, exc_value, traceback):
    method update_weights (line 110) | async def update_weights(self, params):
    method release_memory (line 157) | async def release_memory(self):
    method wake_up (line 162) | async def wake_up(self):
    method sleep (line 190) | async def sleep(self):

FILE: siirl/engine/sharding_manager/megatron_vllm.py
  class AllGatherPPModel (line 55) | class AllGatherPPModel:
    method __init__ (line 56) | def __init__(self, model_provider, use_distributed_optimizer=True) -> ...
    method _build_param_buffer (line 105) | def _build_param_buffer(self, pp_rank):
    method _build_param_references (line 121) | def _build_param_references(self, pp_rank, maintain_weight=False):
    method _load_params_to_cuda (line 127) | def _load_params_to_cuda(self, pp_rank, to_empty=False):
    method _offload_params_to_cpu (line 137) | def _offload_params_to_cpu(self, pp_rank, to_empty=False):
    method load_params_to_cuda (line 147) | def load_params_to_cuda(self, to_empty=False):
    method allgather_params (line 153) | def allgather_params(self):
    method forward (line 163) | def forward(self, *inputs, **kwargs):
    method __call__ (line 182) | def __call__(self, *inputs, **kwargs):
    method eval (line 185) | def eval(self):
    method train (line 189) | def train(self):
    method offload_params_to_cpu (line 193) | def offload_params_to_cpu(self, to_empty=False):
    method get_all_params (line 199) | def get_all_params(self):
    method update_this_rank_models (line 222) | def update_this_rank_models(self, new_models):
    method this_rank_models (line 227) | def this_rank_models(self):
    method pp_size (line 231) | def pp_size(self):
    method pp_rank (line 235) | def pp_rank(self):
    method pp_group (line 239) | def pp_group(self):
    method pp_models (line 243) | def pp_models(self):
  class MultiAgentMegatronVLLMShardingManager (line 265) | class MultiAgentMegatronVLLMShardingManager(BaseShardingManager):
    method __init__ (line 267) | def __init__(
    method __enter__ (line 322) | def __enter__(self):
    method __exit__ (line 376) | def __exit__(self, exc_type, exc_value, traceback):

FILE: siirl/environment/embodied/adapters/libero.py
  class LIBEROAdapter (line 38) | class LIBEROAdapter(BaseVLAEnvironment):
    method __init__ (line 51) | def __init__(self,
    method _blocking_reset (line 87) | def _blocking_reset(self, task_ids: Optional[List[int]] = None, trial_...
    method reset (line 205) | async def reset(self, task_ids: Optional[List[int]] = None, trial_ids:...
    method _blocking_step (line 209) | def _blocking_step(self, action: Dict[str, Any]) -> List[Dict[str, Any]]:
    method step (line 284) | async def step(self, action: Dict[str, Any]) -> List[Dict[str, Any]]:
    method close (line 291) | def close(self):
    method _get_libero_env (line 298) | def _get_libero_env(task, gpu_id, resolution=256):
    method _get_dummy_action (line 313) | def _get_dummy_action(self) -> List[float]:
    method _normalize_gripper_action (line 317) | def _normalize_gripper_action(self, action: np.ndarray, binarize: bool...
    method _invert_gripper_action (line 342) | def _invert_gripper_action(self, action: np.ndarray) -> np.ndarray:

FILE: siirl/environment/embodied/base.py
  class BaseVLAEnvironment (line 24) | class BaseVLAEnvironment(ABC):
    method reset (line 32) | async def reset(self) -> Dict[str, Any]:
    method step (line 43) | async def step(self, action: Dict[str, Any]) -> Tuple[Dict[str, Any], ...

FILE: siirl/environment/embodied/venv.py
  function deprecation (line 44) | def deprecation(msg: str) -> None:
  class CloudpickleWrapper (line 49) | class CloudpickleWrapper(object):
    method __init__ (line 52) | def __init__(self, data: Any) -> None:
    method __getstate__ (line 55) | def __getstate__(self) -> str:
    method __setstate__ (line 58) | def __setstate__(self, data: str) -> None:
  class EnvWorker (line 78) | class EnvWorker(ABC):
    method __init__ (line 81) | def __init__(self, env_fn: Callable[[], gym.Env]) -> None:
    method get_env_attr (line 94) | def get_env_attr(self, key: str) -> Any:
    method set_env_attr (line 98) | def set_env_attr(self, key: str, value: Any) -> None:
    method send (line 101) | def send(self, action: Optional[np.ndarray]) -> None:
    method recv (line 120) | def recv(
    method reset (line 145) | def reset(self, **kwargs: Any) -> Union[np.ndarray, Tuple[np.ndarray, ...
    method step (line 148) | def step(
    method wait (line 161) | def wait(
    method seed (line 167) | def seed(self, seed: Optional[int] = None) -> Optional[List[int]]:
    method render (line 172) | def render(self, **kwargs: Any) -> Any:
    method close_env (line 177) | def close_env(self) -> None:
    method close (line 180) | def close(self) -> None:
  class ShArray (line 187) | class ShArray:
    method __init__ (line 190) | def __init__(self, dtype: np.generic, shape: Tuple[int]) -> None:
    method save (line 195) | def save(self, ndarray: np.ndarray) -> None:
    method get (line 203) | def get(self) -> np.ndarray:
  function _setup_buf (line 208) | def _setup_buf(space: gym.Space) -> Union[dict, tuple, ShArray]:
  function _worker (line 219) | def _worker(
  class DummyEnvWorker (line 307) | class DummyEnvWorker(EnvWorker):
    method __init__ (line 310) | def __init__(self, env_fn: Callable[[], gym.Env]) -> None:
    method get_env_attr (line 314) | def get_env_attr(self, key: str) -> Any:
    method set_env_attr (line 317) | def set_env_attr(self, key: str, value: Any) -> None:
    method reset (line 320) | def reset(self, **kwargs: Any) -> Union[np.ndarray, Tuple[np.ndarray, ...
    method wait (line 326) | def wait(  # type: ignore
    method send (line 332) | def send(self, action: Optional[np.ndarray], **kwargs: Any) -> None:
    method seed (line 338) | def seed(self, seed: Optional[int] = None) -> Optional[List[int]]:
    method render (line 346) | def render(self, **kwargs: Any) -> Any:
    method close_env (line 349) | def close_env(self) -> None:
    method check_success (line 352) | def check_success(self):
    method get_segmentation_of_interest (line 355) | def get_segmentation_of_interest(self, segmentation_image):
    method get_sim_state (line 358) | def get_sim_state(self):
    method set_init_state (line 361) | def set_init_state(self, init_state):
  class SubprocEnvWorker (line 365) | class SubprocEnvWorker(EnvWorker):
    method __init__ (line 368) | def __init__(
    method get_env_attr (line 391) | def get_env_attr(self, key: str) -> Any:
    method set_env_attr (line 395) | def set_env_attr(self, key: str, value: Any) -> None:
    method _decode_obs (line 398) | def _decode_obs(self) -> Union[dict, tuple, np.ndarray]:
    method wait (line 414) | def wait(  # type: ignore
    method send_reinit (line 433) | def send_reinit(self, env_fn: Callable[[], gym.Env]) -> None:
    method recv_reinit (line 440) | def recv_reinit(self) -> bool:
    method send (line 449) | def send(self, action: Optional[np.ndarray], **kwargs: Any) -> None:
    method recv (line 457) | def recv(
    method reset (line 482) | def reset(self, **kwargs: Any) -> Union[np.ndarray, Tuple[np.ndarray, ...
    method seed (line 499) | def seed(self, seed: Optional[int] = None) -> Optional[List[int]]:
    method render (line 505) | def render(self, **kwargs: Any) -> Any:
    method close_env (line 509) | def close_env(self) -> None:
    method check_success (line 520) | def check_success(self):
    method get_segmentation_of_interest (line 524) | def get_segmentation_of_interest(self, segmentation_image):
    method get_sim_state (line 528) | def get_sim_state(self):
    method set_init_state (line 532) | def set_init_state(self, init_state):
  class BaseVectorEnv (line 547) | class BaseVectorEnv(object):
    method __init__ (line 598) | def __init__(
    method _assert_is_not_closed (line 633) | def _assert_is_not_closed(self) -> None:
    method __len__ (line 638) | def __len__(self) -> int:
    method __getattribute__ (line 642) | def __getattribute__(self, key: str) -> Any:
    method get_env_attr (line 654) | def get_env_attr(
    method set_env_attr (line 678) | def set_env_attr(
    method _wrap_id (line 701) | def _wrap_id(
    method _assert_id (line 709) | def _assert_id(self, id: Union[List[int], np.ndarray]) -> None:
    method reset (line 718) | def reset(
    method step (line 765) | def step(
    method seed (line 859) | def seed(
    method render (line 882) | def render(self, **kwargs: Any) -> List[Any]:
    method close (line 892) | def close(self) -> None:
  class DummyVectorEnv (line 904) | class DummyVectorEnv(BaseVectorEnv):
    method __init__ (line 912) | def __init__(self, env_fns: List[Callable[[], gym.Env]], **kwargs: Any...
    method check_success (line 915) | def check_success(self):
    method get_segmentation_of_interest (line 918) | def get_segmentation_of_interest(self, segmentation_images):
    method get_sim_state (line 924) | def get_sim_state(self):
    method set_init_state (line 927) | def set_init_state(
  class SubprocVectorEnv (line 952) | class SubprocVectorEnv(BaseVectorEnv):
    method __init__ (line 960) | def __init__(self, env_fns: List[Callable[[], gym.Env]], **kwargs: Any...
    method reinit_envs (line 966) | def reinit_envs(self, env_fns: List[Callable[[], gym.Env]], id: Option...
    method check_success (line 993) | def check_success(self):
    method get_segmentation_of_interest (line 996) | def get_segmentation_of_interest(self, segmentation_images):
    method get_sim_state (line 1002) | def get_sim_state(self):
    method set_init_state (line 1005) | def set_init_state(

FILE: siirl/execution/dag/builtin_pipelines.py
  function grpo_pipeline (line 27) | def grpo_pipeline() -> TaskGraph:
  function ppo_pipeline (line 87) | def ppo_pipeline() -> TaskGraph:
  function dapo_pipeline (line 161) | def dapo_pipeline() -> TaskGraph:
  function embodied_srpo_pipeline (line 230) | def embodied_srpo_pipeline() -> TaskGraph:

FILE: siirl/execution/dag/config_loader.py
  class Ref (line 25) | class Ref:
    method __init__ (line 30) | def __init__(self, path):
    method __repr__ (line 33) | def __repr__(self):
  function ref_constructor (line 37) | def ref_constructor(loader, node):
  function resolve_refs (line 54) | def resolve_refs(config_item: Any, global_config: Dict[str, Any]) -> Any:
  class DAGConfigLoader (line 92) | class DAGConfigLoader:
    method __init__ (line 97) | def __init__(self):
    method _parse_raw_config (line 101) | def _parse_raw_config(raw_dag_config: Dict[str, Any], file_path: str) ...
    method load_from_file (line 206) | def load_from_file(file_path: str, file_type: str = "yaml") -> TaskGraph:

FILE: siirl/execution/dag/node.py
  function dynamic_load_function (line 28) | def dynamic_load_function(func_path: str):
  class NodeType (line 44) | class NodeType(Enum):
  class NodeRole (line 60) | class NodeRole(Enum):
  class NodeStatus (line 77) | class NodeStatus(Enum):
  class AgentProcess (line 90) | class AgentProcess:
    method __init__ (line 91) | def __init__(self, agent_options: AgentArguments, node_config):
    method load_attr (line 119) | def load_attr(self, file_path, attr_name):
    method init_env_class (line 134) | def init_env_class(self):
    method _init_process_handle (line 141) | def _init_process_handle(self, process_path):
    method apply_pre_process (line 148) | def apply_pre_process(self, prompt: Optional[Tuple[str, List]], obs: O...
    method apply_post_process (line 176) | def apply_post_process(self, oridinal_prompt, templated_prompt, respon...
  class Node (line 211) | class Node:
    method __init__ (line 216) | def __init__(
    method _resolve_executable (line 289) | def _resolve_executable(self) -> None:
    method executable (line 323) | def executable(self) -> Optional[Callable]:
    method executable (line 328) | def executable(self, execute: Optional[Callable]):
    method add_dependency (line 332) | def add_dependency(self, dependency_id: str) -> None:
    method remove_dependency (line 341) | def remove_dependency(self, dependency_id: str) -> None:
    method is_ready (line 350) | def is_ready(self, completed_node_ids: Set[str]) -> bool:
    method update_status (line 362) | def update_status(self, new_status: NodeStatus, error_info: Optional[s...
    method update_config (line 372) | def update_config(self, new_config_items: Dict[str, Any], overwrite: b...
    method can_retry (line 400) | def can_retry(self) -> bool:
    method increment_retry_count (line 404) | def increment_retry_count(self) -> None:
    method run (line 408) | def run(self, **kwargs: Any) -> Any:
    method __repr__ (line 475) | def __repr__(self) -> str:
    method copy (line 482) | def copy(self) -> "Node":

FILE: siirl/execution/dag/pipeline.py
  class NodeConfig (line 31) | class NodeConfig:
  class Pipeline (line 37) | class Pipeline:
    method __init__ (line 59) | def __init__(self, pipeline_id: str, description: str = ""):
    method add_node (line 71) | def add_node(
    method build (line 113) | def build(self) -> TaskGraph:
    method visualize (line 165) | def visualize(self, output_path: str = None, directory: str = "./"):

FILE: siirl/execution/dag/task_graph.py
  class TaskGraph (line 23) | class TaskGraph:
    method __init__ (line 28) | def __init__(self, graph_id: str):
    method add_node (line 42) | def add_node(self, node: Node) -> None:
    method add_nodes (line 59) | def add_nodes(self, nodes: List[Node]) -> None:
    method _update_adj_for_node (line 70) | def _update_adj_for_node(self, node: Node) -> None:
    method build_adjacency_lists (line 89) | def build_adjacency_lists(self) -> None:
    method get_node (line 113) | def get_node(self, node_id: str) -> Optional[Node]:
    method get_dependencies (line 123) | def get_dependencies(self, node_id: str) -> List[Node]:
    method get_dependents (line 140) | def get_dependents(self, node_id: str) -> List[Node]:
    method get_downstream_nodes (line 155) | def get_downstream_nodes(self, node_id: str) -> List[Node]:
    method get_entry_nodes (line 159) | def get_entry_nodes(self) -> List[Node]:
    method get_exit_nodes (line 169) | def get_exit_nodes(self) -> List[Node]:
    method validate_graph (line 179) | def validate_graph(self) -> Tuple[bool, Optional[str]]:
    method get_topological_sort (line 200) | def get_topological_sort(self) -> List[str]:
    method reset_nodes_status (line 247) | def reset_nodes_status(self) -> None:
    method load_from_config (line 256) | def load_from_config(cls, graph_id: str, config_data: List[Dict[str, A...
    method __repr__ (line 307) | def __repr__(self) -> str:
    method copy (line 359) | def copy(self) -> "TaskGraph":
    method save_dag_pic (line 366) | def save_dag_pic(self, filename: str = "task_graph", directory: Option...
    method get_nodes_by_type (line 440) | def get_nodes_by_type(self, node_types: List[NodeType]) -> List[Node]:
    method get_nodes_by_role (line 452) | def get_nodes_by_role(self, node_role: NodeRole) -> List[Node]:

FILE: siirl/execution/dag/task_loader.py
  function generate_structural_signature (line 24) | def generate_structural_signature(graph: TaskGraph) -> str:
  function get_all_downstream_nodes_recursive (line 64) | def get_all_downstream_nodes_recursive(src_task_graph: TaskGraph, start_...
  function get_all_ancestors (line 93) | def get_all_ancestors(graph: TaskGraph, node_id: str) -> Set[str]:
  function find_all_paths_dfs (line 129) | def find_all_paths_dfs(src_task_graph: TaskGraph, current_node_id: str, ...
  function find_all_paths (line 157) | def find_all_paths(src_task_graph: TaskGraph, start_node_id: str, end_no...
  function split_single_structure (line 180) | def split_single_structure(src_task_graph: TaskGraph, parallel_branch_no...
  function split_by_fan_out_to_exits (line 263) | def split_by_fan_out_to_exits(src_task_graph: TaskGraph, naming_prefix_i...
  function split_by_reconverging_paths (line 357) | def split_by_reconverging_paths(src_task_graph: TaskGraph, naming_prefix...
  function discover_and_split_parallel_paths (line 464) | def discover_and_split_parallel_paths(src_task_graph: TaskGraph) -> List...

FILE: siirl/execution/metric_worker/metric_worker.py
  class MetricClient (line 34) | class MetricClient():
    method __init__ (line 40) | def __init__(self, metric_worker: ActorHandle):
    method stop (line 49) | def stop(self):
    method submit_metric (line 54) | def submit_metric(self, metrics: dict, world_size):
    method wait_submit (line 64) | def wait_submit(self):
    method wait_final_res (line 69) | def wait_final_res(self):
    method compute_local_data_metric (line 78) | def compute_local_data_metric(self, data: TensorDict, world_size: int):
    method compute_local_throughout_metrics (line 98) | def compute_local_throughout_metrics(self, data: TensorDict, timing_ra...
    method compute_local_timing_metrics (line 113) | def compute_local_timing_metrics(self, data: TensorDict, timing_raw: d...
    method process_local_validation_metrics (line 125) | def process_local_validation_metrics(self, data_sources: list[str], sa...
  class MetricWorker (line 144) | class MetricWorker:
    method __init__ (line 150) | def __init__(self) -> None:
    method start (line 158) | async def start(self):
    method submit_metric (line 170) | async def submit_metric(self, metric: dict, world_size: int):
    method stop (line 181) | async def stop(self):
    method compute_metric (line 191) | async def compute_metric(self, metric_name, metrics):
    method parse_metric (line 211) | async def parse_metric(self, metric_data: tuple):
    method _process_metrics_loop (line 236) | async def _process_metrics_loop(self):
    method wait_final_res (line 246) | async def wait_final_res(self):

FILE: siirl/execution/metric_worker/utils.py
  class Metric (line 24) | class Metric:
  function MetricFunc (line 29) | def MetricFunc(name: str):
  function SumMetric (line 39) | def SumMetric(metrics: List[Metric]):
  function MeanMetric (line 45) | def MeanMetric(metrics: List[Metric]):
  function MaxMetric (line 52) | def MaxMetric(metrics: List[Metric]):
  function MinMetric (line 58) | def MinMetric(metrics: List[Metric]):

FILE: siirl/execution/rollout_flow/multi_agent/multiagent_generate.py
  class MultiAgentLoop (line 39) | class MultiAgentLoop():
    method __init__ (line 40) | def __init__(self, dag, config: ActorRolloutRefArguments, node_workers...
    method _parse_graph (line 60) | def _parse_graph(self, graph:TaskGraph):
    method _generate_node_worker_key (line 78) | def _generate_node_worker_key(self, node: Node) -> str:
    method node_if_local (line 82) | def node_if_local(self, node):
    method _preprocess (line 86) | def _preprocess(self, batch:TensorDict) -> List[str]:
    method _generate_key (line 99) | def _generate_key(self, cur_node: Node, next_node: Node, batch_id: int...
    method async_put_data (line 127) | async def async_put_data(self, key: str, value: Tuple[AgentOutput, str...
    method async_get_envdata (line 154) | async def async_get_envdata(self, key: str, timing_raw: Dict[str, floa...
    method async_get_data (line 177) | async def async_get_data(self, key: str, timing_raw: Dict[str, float]):
    method spread_task (line 207) | async def spread_task(self, cur_node, node_idx, batch_idx):
    method generate_spread (line 231) | async def generate_spread(self):
    method check_colocate_running (line 250) | async def check_colocate_running(self, finished_res: Dict, visited_age...
    method colocate_task (line 277) | async def colocate_task(self, agent_output:AgentOutput, agent_res:Dict...
    method generate_colocate (line 396) | async def generate_colocate(self, bs, sampling_params: Dict[str, Any],...
    method _postprocess (line 486) | def _postprocess(self, agent_outputs: Dict[str, List[AgentOutput]], me...
    method generate_sequence (line 658) | def generate_sequence(self, batch:TensorDict, timing_raw: Dict[str, fl...

FILE: siirl/execution/rollout_flow/multi_agent/utils.py
  class AgentOutputStatus (line 4) | class AgentOutputStatus:
  class AgentOutput (line 10) | class AgentOutput(BaseModel):

FILE: siirl/execution/rollout_flow/multiturn/agent_loop/agent_loop.py
  function get_ddp_world_size_rank (line 43) | async def get_ddp_world_size_rank(local_world_size, local_rank, local_pa...
  class AsyncLLMServerManager (line 51) | class AsyncLLMServerManager:
    method __init__ (line 58) | def __init__(self, config: DictConfig, server, max_cache_size: int = 1...
    method generate (line 69) | async def generate(
  class AgentLoopMetrics (line 95) | class AgentLoopMetrics(BaseModel):
  class AgentLoopOutput (line 102) | class AgentLoopOutput(BaseModel):
  class AgentLoopBase (line 112) | class AgentLoopBase(ABC):
    method __init__ (line 118) | def __init__(self, config: DictConfig, server_manager: AsyncLLMServerM...
    method init_class (line 133) | def init_class(cls, config: DictConfig, tokenizer: AutoTokenizer):
    method run (line 140) | async def run(self, messages: List[Dict[str, Any]], sampling_params: D...
  class AgentLoopWorker (line 153) | class AgentLoopWorker:
    method __init__ (line 156) | def __init__(self, config: DictConfig, server_handles: List[ray.actor....
    method generate_sequences (line 171) | async def generate_sequences(self, batch: TensorDict) -> TensorDict:
    method _run_agent_loop (line 220) | async def _run_agent_loop(
    method get_agent_loop_class (line 228) | def get_agent_loop_class(self, agent_name: str) -> Type[AgentLoopBase]:
    method _postprocess (line 239) | def _postprocess(self, inputs: List[AgentLoopOutput]) -> TensorDict:
  class AgentLoopManager (line 305) | class AgentLoopManager:
    method __init__ (line 308) | def __init__(self, config: DictConfig, cur_dp_rank, name_prefix, engin...
    method _initialize_llm_servers (line 323) | def _initialize_llm_servers(self):
    method _init_agent_loop_workers (line 339) | def _init_agent_loop_workers(self):
    method generate_sequences (line 343) | async def generate_sequences(self, prompts: TensorDict) -> TensorDict:
    method _performance_metrics (line 365) | def _performance_metrics(self, metrics: List[List[Dict[str, str]]], ou...
    method wake_up (line 387) | async def wake_up(self):
    method sleep (line 391) | async def sleep(self):

FILE: siirl/execution/rollout_flow/multiturn/agent_loop/single_turn_agent_loop.py
  function _timer (line 27) | def _timer(name: str, timing_dict: dict):
  class SingleTurnAgentLoop (line 39) | class SingleTurnAgentLoop(AgentLoopBase):
    method __init__ (line 42) | def __init__(self, config, server_manager, tokenizer):
    method run (line 47) | async def run(self, messages: List[Dict[str, Any]], sampling_params: D...

FILE: siirl/execution/rollout_flow/multiturn/agent_loop/tool_agent_loop.py
  function _timer (line 34) | def _timer(name: str, timing_dict: dict):
  class FunctionCall (line 46) | class FunctionCall(BaseModel):
  class ToolParser (line 59) | class ToolParser(ABC):
    method extract_tool_calls (line 61) | async def extract_tool_calls(self, responses_ids: List[int], prompt_id...
  class HermesToolParser (line 73) | class HermesToolParser(ToolParser):
    method __init__ (line 76) | def __init__(self, tokenizer) -> None:
    method extract_tool_calls (line 83) | async def extract_tool_calls(self, responses_ids: List[int], prompt_id...
  class ToolAgentLoop (line 101) | class ToolAgentLoop(AgentLoopBase):
    method __init__ (line 102) | def __init__(self, config, server_manager, tokenizer):
    method init_class (line 106) | def init_class(cls, config, tokenizer):
    method run (line 130) | async def run(self, messages: List[Dict[str, Any]], sampling_params: D...
    method _call_tool (line 208) | async def _call_tool(self, tool_call: FunctionCall) -> Dict[str, str]:
    method get_tool_parser (line 241) | def get_tool_parser(cls, name: str) -> ToolParser:

FILE: siirl/execution/rollout_flow/multiturn/interactions/base.py
  class BaseInteraction (line 20) | class BaseInteraction:
    method __init__ (line 21) | def __init__(self, config: Dict[str, Any]):
    method start_interaction (line 25) | async def start_interaction(self, instance_id: Optional[str] = None, *...
    method generate_response (line 39) | async def generate_response(
    method calculate_score (line 56) | async def calculate_score(self) -> float:  # More clear score calculat...
    method finalize_interaction (line 66) | async def finalize_interaction(self) -> None:  # More clear interactio...

FILE: siirl/execution/rollout_flow/multiturn/interactions/gsm8k_interaction.py
  class Gsm8kInteraction (line 28) | class Gsm8kInteraction(BaseInteraction):
    method __init__ (line 37) | def __init__(self, config: dict):
    method start_interaction (line 41) | async def start_interaction(
    method generate_response (line 53) | async def generate_response(
    method calculate_score (line 78) | async def calculate_score(self, instance_id: str, **kwargs) -> float:
    method finalize_interaction (line 87) | async def finalize_interaction(self, instance_id: str, **kwargs) -> None:

FILE: siirl/execution/rollout_flow/multiturn/interactions/utils/interaction_registry.py
  function get_interaction_class (line 25) | def get_interaction_class(cls_name):
  function initialize_interactions_from_config (line 40) | def initialize_interactions_from_config(interaction_config_file):

FILE: siirl/execution/rollout_flow/multiturn/tools/base_tool.py
  class BaseTool (line 22) | class BaseTool:
    method __init__ (line 34) | def __init__(self, config: dict, tool_schema: OpenAIFunctionToolSchema):
    method get_openai_tool_schema (line 41) | def get_openai_tool_schema(self) -> OpenAIFunctionToolSchema:
    method create (line 44) | async def create(self, instance_id: Optional[str] = None, **kwargs) ->...
    method execute (line 58) | async def execute(self, instance_id: str, parameters: dict[str, Any], ...
    method calc_reward (line 72) | async def calc_reward(self, instance_id: str, **kwargs) -> float:
    method release (line 83) | async def release(self, instance_id: str, **kwargs) -> None:

FILE: siirl/execution/rollout_flow/multiturn/tools/geo3k_tool.py
  class Geo3kTool (line 28) | class Geo3kTool(BaseTool):
    method __init__ (line 37) | def __init__(self, config: dict, tool_schema: OpenAIFunctionToolSchema):
    method get_openai_tool_schema (line 60) | def get_openai_tool_schema(self) -> OpenAIFunctionToolSchema:
    method create (line 63) | async def create(self, instance_id: Optional[str] = None, ground_truth...
    method execute (line 73) | async def execute(self, instance_id: str, parameters: dict[str, Any], ...
    method calc_reward (line 87) | async def calc_reward(self, instance_id: str, **kwargs) -> float:
    method release (line 95) | async def release(self, instance_id: str, **kwargs) -> None:

FILE: siirl/execution/rollout_flow/multiturn/tools/gsm8k_tool.py
  class Gsm8kTool (line 29) | class Gsm8kTool(BaseTool):
    method __init__ (line 39) | def __init__(self, config: dict, tool_schema: OpenAIFunctionToolSchema):
    method get_openai_tool_schema (line 62) | def get_openai_tool_schema(self) -> OpenAIFunctionToolSchema:
    method create (line 65) | async def create(self, instance_id: Optional[str] = None, ground_truth...
    method execute (line 75) | async def execute(self, instance_id: str, parameters: dict[str, Any], ...
    method calc_reward (line 93) | async def calc_reward(self, instance_id: str, **kwargs) -> float:
    method release (line 102) | async def release(self, instance_id: str, **kwargs) -> None:

FILE: siirl/execution/rollout_flow/multiturn/tools/mcp_base_tool.py
  class MCPBaseTool (line 29) | class MCPBaseTool(BaseTool):
    method __init__ (line 30) | def __init__(self, config: dict, tool_schema: OpenAIFunctionToolSchema):
    method get_openai_tool_schema (line 38) | def get_openai_tool_schema(self) -> OpenAIFunctionToolSchema:
    method create (line 42) | async def create(self, instance_id: Optional[str] = None, **kwargs) ->...
    method _call_tool (line 59) | async def _call_tool(self, instance_id, parameters) -> Tuple[str, dict]:
    method execute (line 75) | async def execute(self, instance_id: str, parameters: dict[str, Any], ...
    method calc_reward (line 102) | async def calc_reward(self, instance_id: str, **kwargs) -> str:
    method release (line 105) | async def release(self, instance_id: str, **kwargs) -> None:
    method _parse_tool_result (line 109) | def _parse_tool_result(self, content: list) -> Tuple[str, dict]:

FILE: siirl/execution/rollout_flow/multiturn/tools/mcp_search_tool.py
  class MCPSearchTool (line 26) | class MCPSearchTool(MCPBaseTool):
    method __init__ (line 27) | def __init__(self, config: dict, tool_schema: OpenAIFunctionToolSchema):
    method _parse_tool_result (line 30) | def _parse_tool_result(self, content: list) -> Tuple[str, dict]:

FILE: siirl/execution/rollout_flow/multiturn/tools/sandbox_fusion_tools.py
  class PoolMode (line 37) | class PoolMode(Enum):
  class TokenBucketWorker (line 43) | class TokenBucketWorker:
    method __init__ (line 44) | def __init__(self, rate_limit: int):
    method acquire (line 51) | def acquire(self):
    method release (line 56) | def release(self):
    method get_current_count (line 60) | def get_current_count(self):
  class ExecutionWorker (line 64) | class ExecutionWorker:
    method __init__ (line 65) | def __init__(self, enable_global_rate_limit=True, rate_limit=10):
    method _init_rate_limit (line 68) | def _init_rate_limit(self, rate_limit):
    method ping (line 73) | def ping(self):
    method execute (line 76) | def execute(self, fn: Callable[..., T], *fn_args, **fn_kwargs) -> T:
  function init_execution_pool (line 87) | def init_execution_pool(
  class SandboxFusionTool (line 101) | class SandboxFusionTool(BaseTool):
    method __init__ (line 111) | def __init__(self, config: dict, tool_schema: OpenAIFunctionToolSchema):
    method get_openai_tool_schema (line 152) | def get_openai_tool_schema(self) -> OpenAIFunctionToolSchema:
    method create (line 155) | async def create(self, instance_id: Optional[str] = None, ground_truth...
    method execute (line 165) | async def execute(self, instance_id: str, parameters: dict[str, Any], ...
    method execute_code (line 176) | def execute_code(self, instance_id, code, timeout=30, language="python"):
    method calc_reward (line 188) | async def calc_reward(self, instance_id: str, **kwargs) -> str:
    method release (line 191) | async def release(self, instance_id: str, **kwargs) -> None:

FILE: siirl/execution/rollout_flow/multiturn/tools/schemas.py
  class OpenAIFunctionPropertySchema (line 21) | class OpenAIFunctionPropertySchema(BaseModel):
  class OpenAIFunctionParametersSchema (line 29) | class OpenAIFunctionParametersSchema(BaseModel):
  class OpenAIFunctionSchema (line 37) | class OpenAIFunctionSchema(BaseModel):
  class OpenAIFunctionToolSchema (line 46) | class OpenAIFunctionToolSchema(BaseModel):
  class OpenAIFunctionParsedSchema (line 53) | class OpenAIFunctionParsedSchema(BaseModel):
  class OpenAIFunctionCallSchema (line 60) | class OpenAIFunctionCallSchema(BaseModel):
    method from_openai_function_parsed_schema (line 67) | def from_openai_function_parsed_schema(
  class OpenAIFunctionToolCall (line 84) | class OpenAIFunctionToolCall(BaseModel):

FILE: siirl/execution/rollout_flow/multiturn/tools/search_tool.py
  class PoolMode (line 39) | class PoolMode(Enum):
  class TokenBucketWorker (line 47) | class TokenBucketWorker:
    method __init__ (line 50) | def __init__(self, rate_limit: int):
    method acquire (line 56) | def acquire(self):
    method release (line 62) | def release(self):
    method get_current_count (line 67) | def get_current_count(self):
  class SearchExecutionWorker (line 72) | class SearchExecutionWorker:
    method __init__ (line 75) | def __init__(self, enable_global_rate_limit=True, rate_limit=10):
    method _init_rate_limit (line 78) | def _init_rate_limit(self, rate_limit):
    method ping (line 82) | def ping(self):
    method execute (line 86) | def execute(self, fn: Callable[..., T], *fn_args, **fn_kwargs) -> T:
  function init_search_execution_pool (line 101) | def init_search_execution_pool(
  class SearchTool (line 115) | class SearchTool(BaseTool):
    method __init__ (line 130) | def __init__(self, config: dict, tool_schema: OpenAIFunctionToolSchema):
    method get_openai_tool_schema (line 182) | def get_openai_tool_schema(self) -> OpenAIFunctionToolSchema:
    method create (line 186) | async def create(self, instance_id: Optional[str] = None, **kwargs) ->...
    method execute_search (line 203) | def execute_search(self, instance_id: str, query_list: list, retrieval...
    method execute (line 226) | async def execute(self, instance_id: str, parameters: dict[str, Any], ...
    method calc_reward (line 270) | async def calc_reward(self, instance_id: str, **kwargs) -> str:
    method release (line 273) | async def release(self, instance_id: str, **kwargs) -> None:

FILE: siirl/execution/rollout_flow/multiturn/tools/utils/mcp_clients/McpClientManager.py
  class MCPClientManager (line 28) | class MCPClientManager:
    method initialize (line 35) | async def initialize(self, config_path, rate_limit: float = 10.0):
    method call_tool (line 58) | async def call_tool(self, tool_name, parameters, timeout):
    method fetch_tool_schemas (line 67) | async def fetch_tool_schemas(self, tool_selected_list: list[str]) -> l...
    method get_client_with_tool_name (line 82) | def get_client_with_tool_name(self, tool_name: str):
    method _load_config (line 85) | def _load_config(self, file: str) -> dict[str, Any]:

FILE: siirl/execution/rollout_flow/multiturn/tools/utils/mcp_clients/utils.py
  class TokenBucket (line 24) | class TokenBucket:
    method __init__ (line 25) | def __init__(self, rate_limit: float):
    method acquire (line 31) | def acquire(self) -> bool:
  function mcp2openai (line 45) | def mcp2openai(mcp_tool: Tool) -> dict:

FILE: siirl/execution/rollout_flow/multiturn/tools/utils/search_r1_like_utils.py
  function call_search_api (line 34) | def call_search_api(
  function _passages2string (line 130) | def _passages2string(retrieval_result):
  function perform_single_search_batch (line 141) | def perform_single_search_batch(

FILE: siirl/execution/rollout_flow/multiturn/tools/utils/tool_registry.py
  class ToolType (line 29) | class ToolType(Enum):
  function initialize_mcp_tool (line 34) | async def initialize_mcp_tool(tool_cls, tool_config) -> list:
  function get_tool_class (line 66) | def get_tool_class(cls_name):
  function initialize_tools_from_config (line 80) | def initialize_tools_from_config(tools_config_file):

FILE: siirl/execution/scheduler/enums.py
  class AdvantageEstimator (line 18) | class AdvantageEstimator(str, Enum):
  class WorkflowType (line 29) | class WorkflowType(str, Enum):
  class Role (line 35) | class Role(Enum):
  class AlgorithmType (line 49) | class AlgorithmType(Enum):

FILE: siirl/execution/scheduler/graph_updater.py
  function unflatten_dict_with_omegaconf (line 35) | def unflatten_dict_with_omegaconf(flat_dict: Dict[str, Any]) -> Dict[str...
  function update_task_graph_node_configs (line 57) | def update_task_graph_node_configs(workerflow_taskgraph: TaskGraph, basi...
  function display_node_config (line 131) | def display_node_config(workerflow_taskgraph: TaskGraph) -> None:

FILE: siirl/execution/scheduler/launch.py
  class RayTrainer (line 31) | class RayTrainer:
    method __init__ (line 42) | def __init__(self, config: SiiRLArguments, process_group_manager: Proc...
    method _check_mutually_exclusive (line 94) | def _check_mutually_exclusive(self, component_config: Dict[str, Any], ...
    method validate_actor_config (line 118) | def validate_actor_config(self, node: Node, actor_conf: ActorArguments...
    method validate_reference_config (line 154) | def validate_reference_config(self, node: Node, reference_conf: RefArg...
    method validate_rollout_config (line 169) | def validate_rollout_config(self, node: Node, rollout_conf: RolloutArg...
    method validate_critic_config (line 194) | def validate_critic_config(self, node: Node, critic_conf: CriticArgume...
    method validate_reward_model_config (line 219) | def validate_reward_model_config(self, node: Node, reward_model_conf: ...
    method validate_configurations_for_task_graph (line 227) | def validate_configurations_for_task_graph(self, task_graph: TaskGraph...
    method _validate_config (line 305) | def _validate_config(self):
    method init_workers (line 310) | def init_workers(self):
    method start_workers (line 328) | def start_workers(self):

FILE: siirl/execution/scheduler/process_group_manager.py
  class ProcessGroupManager (line 22) | class ProcessGroupManager:
    method __init__ (line 43) | def __init__(
    method _clear_internal_mappings (line 91) | def _clear_internal_mappings(self):
    method _collect_initial_topology_info (line 99) | def _collect_initial_topology_info(
    method _aggregate_ranks_for_nodes (line 145) | def _aggregate_ranks_for_nodes(
    method _populate_node_rank_mappings (line 165) | def _populate_node_rank_mappings(self, node_id_to_final_ranks: Dict[st...
    method _define_process_groups (line 170) | def _define_process_groups(self) -> Dict[Tuple[int, ...], str]:
    method _populate_final_node_and_type_assignments (line 188) | def _populate_final_node_and_type_assignments(
    method _populate_subgraph_node_type_process_group_mapping (line 205) | def _populate_subgraph_node_type_process_group_mapping(
    method _compute_group_configurations (line 219) | def _compute_group_configurations(self):
    method get_group_spec (line 252) | def get_group_spec(self, group_name: str) -> Optional[Dict[str, Any]]:
    method get_all_specs (line 257) | def get_all_specs(self) -> Dict[str, Dict[str, Any]]:
    method get_node_assignment (line 261) | def get_node_assignment(self, node_id: str) -> Optional[Dict[str, Any]]:
    method get_process_groups_for_node_type (line 270) | def get_process_groups_for_node_type(self, node_type_value: str) -> Se...
    method get_process_group_for_node_type_in_subgraph (line 277) | def get_process_group_for_node_type_in_subgraph(self, graph_id: str, n...
  function _format_ranks_for_logging (line 290) | def _format_ranks_for_logging(ranks: Optional[List[int]], detailed_print...
  function _log_group_specs_report (line 313) | def _log_group_specs_report(pgm: ProcessGroupManager, detailed_printing:...
  function _log_node_assignments_report (line 328) | def _log_node_assignments_report(
  function _log_global_type_mappings_report (line 364) | def _log_global_type_mappings_report(pgm: ProcessGroupManager, node_type...
  function _log_subgraph_mappings_report (line 393) | def _log_subgraph_mappings_report(
  function log_process_group_manager_details (line 434) | def log_process_group_manager_details(

FILE: siirl/execution/scheduler/ray_actor_manager.py
  class DistributedEnv (line 41) | class DistributedEnv(Enum):
  class RayActorManager (line 62) | class RayActorManager(WorkerGroup):
    method __init__ (line 79) | def __init__(
    method _initialize_workers (line 140) | def _initialize_workers(self, resource_pool: RayResourcePool, bin_pack...
    method _get_register_center_and_master_info (line 182) | def _get_register_center_and_master_info(self) -> Tuple[str, str]:
    method _create_worker_actor (line 225) | def _create_worker_actor(
    method _is_worker_alive (line 288) | def _is_worker_alive(self, worker: ActorHandle) -> bool:
    method _invoke_on_worker (line 306) | def _invoke_on_worker(self, worker: ActorHandle, method_name: str, *ar...
    method map_sync (line 311) | def map_sync(self, method_name: str, *args: Any, **kwargs: Any) -> Lis...
    method map_async (line 327) | def map_async(self, method_name: str, *args: Any, **kwargs: Any) -> Li...
    method map (line 362) | def map(self, method_name: str, *args: Any, **kwargs: Any) -> List[ray...
    method worker_names (line 369) | def worker_names(self) -> List[str]:
    method master_address (line 374) | def master_address(self) -> Optional[str]:
    method master_port (line 379) | def master_port(self) -> Optional[str]:
    method workers (line 384) | def workers(self) -> List[ActorHandle]:
    method world_size (line 389) | def world_size(self) -> int:

FILE: siirl/execution/scheduler/resource_manager.py
  class ResourcePoolManager (line 32) | class ResourcePoolManager:
    method create_resource_pool (line 40) | def create_resource_pool(self):
    method get_resource_pool (line 51) | def get_resource_pool(self, resource_pool_name: str) -> RayResourcePool:
    method get_n_gpus (line 54) | def get_n_gpus(self) -> int:
    method _check_resource_available (line 58) | def _check_resource_available(self, timeout=60, interval=1):
    method _verify_placement_possible (line 105) | def _verify_placement_possible(self, available_gpus_per_node: dict):

FILE: siirl/execution/scheduler/reward.py
  function load_custom_reward_function (line 44) | def load_custom_reward_function(config: SiiRLArguments) -> Optional[Call...
  function create_reward_manager (line 101) | def create_reward_manager(
  function compute_reward (line 192) | def compute_reward(data: TensorDict, reward_fn: Callable) -> Tuple[Rewar...

FILE: siirl/execution/scheduler/task_scheduler.py
  function _parse_model_params_string (line 24) | def _parse_model_params_string(params_value: any) -> float:
  function estimate_graph_model_params (line 58) | def estimate_graph_model_params(task_graph: TaskGraph) -> float:
  class TaskScheduler (line 101) | class TaskScheduler:
    method __init__ (line 108) | def __init__(self, num_physical_nodes: int, gpus_per_node: int):
    method _reset_scheduler_state (line 134) | def _reset_scheduler_state(self):
    method _get_original_graph_id (line 147) | def _get_original_graph_id(self, task_graph: TaskGraph) -> str:
    method _apportion_workers_to_tasks (line 167) | def _apportion_workers_to_tasks(self, task_graphs_with_estimated_sizes...
    method schedule_and_assign_tasks (line 253) | def schedule_and_assign_tasks(
    method get_unique_assigned_task_graphs (line 424) | def get_unique_assigned_task_graphs(self) -> Dict[str, TaskGraph]:
  function _format_ranks_for_logging (line 447) | def _format_ranks_for_logging(ranks: Optional[List[int]], detailed_rank_...
  function log_schedule_assignments (line 468) | def log_schedule_assignments(
  function create_dummy_graph (line 572) | def create_dummy_graph(graph_id: str, num_nodes: int, model_params: any ...

FILE: siirl/main_dag.py
  function _maybe_prepare_embodied_manifest (line 45) | def _maybe_prepare_embodied_manifest(siirl_args: SiiRLArguments) -> None:
  function load_pipeline (line 85) | def load_pipeline(siirl_args: SiiRLArguments) -> TaskGraph:
  class MainRunner (line 186) | class MainRunner:
    method run (line 196) | def run(self, siirl_args: SiiRLArguments) -> None:
  function main (line 266) | def main() -> None:

FILE: siirl/models/embodied/openvla/configuration_prismatic.py
  class PrismaticConfig (line 72) | class PrismaticConfig(PretrainedConfig):
    method __init__ (line 76) | def __init__(
  class OpenVLAConfig (line 129) | class OpenVLAConfig(PrismaticConfig):
    method __init__ (line 132) | def __init__(

FILE: siirl/models/embodied/openvla/modeling_prismatic.py
  function unpack_tuple (line 41) | def unpack_tuple(fn: Callable[[Any], Tuple[Any]]) -> Callable[[Any], Any]:
  function _ls_new_forward (line 52) | def _ls_new_forward(self, x: torch.Tensor) -> torch.Tensor:
  function ls_apply_patch (line 56) | def ls_apply_patch(ls_module: LayerScale):
  class PrismaticVisionBackbone (line 63) | class PrismaticVisionBackbone(nn.Module):
    method __init__ (line 64) | def __init__(
    method forward (line 114) | def forward(self, pixel_values: torch.Tensor) -> torch.Tensor:
  class PrismaticProjector (line 127) | class PrismaticProjector(nn.Module):
    method __init__ (line 128) | def __init__(self, use_fused_vision_backbone: bool, vision_dim: int, l...
    method forward (line 146) | def forward(self, img_patches: torch.Tensor) -> torch.Tensor:
  class PrismaticCausalLMOutputWithPast (line 163) | class PrismaticCausalLMOutputWithPast(ModelOutput):
  class PrismaticPreTrainedModel (line 176) | class PrismaticPreTrainedModel(PreTrainedModel):
    method _init_weights (line 185) | def _init_weights(self, module: nn.Module) -> None:
    method _supports_sdpa (line 208) | def _supports_sdpa(self) -> bool:
  class PrismaticForConditionalGeneration (line 213) | class PrismaticForConditionalGeneration(PrismaticPreTrainedModel):
    method __init__ (line 214) | def __init__(self, config: PrismaticConfig) -> None:
    method get_input_embeddings (line 258) | def get_input_embeddings(self) -> nn.Module:
    method set_input_embeddings (line 261) | def set_input_embeddings(self, value: nn.Module) -> None:
    method get_output_embeddings (line 264) | def get_output_embeddings(self) -> nn.Module:
    method set_output_embeddings (line 267) | def set_output_embeddings(self, new_embeddings: nn.Module) -> None:
    method get_decoder (line 270) | def get_decoder(self) -> nn.Module:
    method set_decoder (line 273) | def set_decoder(self, decoder: nn.Module) -> None:
    method tie_weights (line 276) | def tie_weights(self) -> None:
    method resize_token_embeddings (line 279) | def resize_token_embeddings(
    method forward (line 291) | def forward(
    method prepare_inputs_for_generation (line 450) | def prepare_inputs_for_generation(
    method _reorder_cache (line 488) | def _reorder_cache(self, *args, **kwargs) -> Any:
  class OpenVLAForActionPrediction (line 492) | class OpenVLAForActionPrediction(PrismaticForConditionalGeneration):
    method __init__ (line 495) | def __init__(self, config: OpenVLAConfig) -> None:
    method predict_action (line 506) | def predict_action(
    method _check_unnorm_key (line 539) | def _check_unnorm_key(norm_stats: Dict[str, Dict[str, Any]], unnorm_ke...
    method get_action_dim (line 554) | def get_action_dim(self, unnorm_key: Optional[str] = None) -> int:
    method get_action_stats (line 559) | def get_action_stats(self, unnorm_key: Optional[str] = None) -> Dict[s...

FILE: siirl/models/embodied/openvla/processing_prismatic.py
  function letterbox_pad_transform (line 23) | def letterbox_pad_transform(image: Image.Image, padding_fill_value: Tupl...
  class PrismaticImageProcessor (line 32) | class PrismaticImageProcessor(ImageProcessingMixin):
    method __init__ (line 35) | def __init__(
    method apply_transform (line 129) | def apply_transform(self, img: Image.Image) -> torch.Tensor:
    method preprocess (line 148) | def preprocess(
    method __call__ (line 172) | def __call__(self, images: Union[Image.Image, List[Image.Image]], **kw...
  class PrismaticProcessor (line 178) | class PrismaticProcessor(ProcessorMixin):
    method __init__ (line 183) | def __init__(
    method __call__ (line 190) | def __call__(
    method batch_decode (line 224) | def batch_decode(
    method decode (line 238) | def decode(
    method model_input_names (line 253) | def model_input_names(self) -> List[str]:

FILE: siirl/models/embodied/openvla_oft/configuration_prismatic.py
  class PrismaticConfig (line 72) | class PrismaticConfig(PretrainedConfig):
    method __init__ (line 76) | def __init__(
  class OpenVLAConfig (line 129) | class OpenVLAConfig(PrismaticConfig):
    method __init__ (line 132) | def __init__(

FILE: siirl/models/embodied/openvla_oft/constants.py
  class NormalizationType (line 17) | class NormalizationType(str, Enum):
  function detect_robot_platform (line 49) | def detect_robot_platform():

FILE: siirl/models/embodied/openvla_oft/modeling_prismatic.py
  function unpack_tuple (line 45) | def unpack_tuple(fn: Callable[[Any], Tuple[Any]]) -> Callable[[Any], Any]:
  function _ls_new_forward (line 56) | def _ls_new_forward(self, x: torch.Tensor) -> torch.Tensor:
  function ls_apply_patch (line 60) | def ls_apply_patch(ls_module: LayerScale):
  class PrismaticVisionBackbone (line 67) | class PrismaticVisionBackbone(nn.Module):
    method __init__ (line 75) | def __init__(
    method _create_featurizer (line 115) | def _create_featurizer(self, model_id: str, img_size: int, act_layer: ...
    method _patch_layer_scales (line 141) | def _patch_layer_scales(self) -> None:
    method get_num_patches (line 159) | def get_num_patches(self) -> int:
    method get_num_images_in_input (line 168) | def get_num_images_in_input(self) -> int:
    method set_num_images_in_input (line 177) | def set_num_images_in_input(self, num_images_in_input: int) -> None:
    method forward (line 186) | def forward(self, pixel_values: torch.Tensor) -> torch.Tensor:
  class PrismaticProjector (line 231) | class PrismaticProjector(nn.Module):
    method __init__ (line 232) | def __init__(self, use_fused_vision_backbone: bool, vision_dim: int, l...
    method forward (line 250) | def forward(self, img_patches: torch.Tensor) -> torch.Tensor:
  class PrismaticCausalLMOutputWithPast (line 267) | class PrismaticCausalLMOutputWithPast(ModelOutput):
  class PrismaticPreTrainedModel (line 280) | class PrismaticPreTrainedModel(PreTrainedModel):
    method _init_weights (line 289) | def _init_weights(self, module: nn.Module) -> None:
    method _supports_sdpa (line 312) | def _supports_sdpa(self) -> bool:
  class PrismaticForConditionalGeneration (line 317) | class PrismaticForConditionalGeneration(PrismaticPreTrainedModel):
    method __init__ (line 318) | def __init__(self, config: PrismaticConfig) -> None:
    method get_input_embeddings (line 363) | def get_input_embeddings(self) -> nn.Module:
    method set_input_embeddings (line 366) | def set_input_embeddings(self, value: nn.Module) -> None:
    method get_output_embeddings (line 369) | def get_output_embeddings(self) -> nn.Module:
    method set_output_embeddings (line 372) | def set_output_embeddings(self, new_embeddings: nn.Module) -> None:
    method get_decoder (line 375) | def get_decoder(self) -> nn.Module:
    method set_decoder (line 378) | def set_decoder(self, decoder: nn.Module) -> None:
    method tie_weights (line 381) | def tie_weights(self) -> None:
    method resize_token_embeddings (line 384) | def resize_token_embeddings(
    method _replace_input_embeddings (line 395) | def _replace_input_embeddings(self, input_embeddings, all_actions_mask...
    method _process_action_masks (line 431) | def _process_action_masks(self, labels):
    method _process_vision_features (line 438) | def _process_vision_features(self, pixel_values, language_embeddings=N...
    method _process_proprio_features (line 449) | def _process_proprio_features(self, projected_patch_embeddings, propri...
    method _build_multimodal_attention (line 461) | def _build_multimodal_attention(self, input_embeddings, projected_patc...
    method _build_multimodal_labels (line 486) | def _build_multimodal_labels(self, labels, projected_patch_embeddings):
    method prepare_inputs_for_generation (line 682) | def prepare_inputs_for_generation(
    method _reorder_cache (line 720) | def _reorder_cache(self, *args, **kwargs) -> Any:
    method _prepare_input_for_action_prediction_verl (line 723) | def _prepare_input_for_action_prediction_verl(self, input_ids, attenti...
    method _prepare_labels_for_action_prediction_verl (line 746) | def _prepare_labels_for_action_prediction_verl(self, labels, input_ids):
    method _verl_discrete_compute_logits (line 761) | def _verl_discrete_compute_logits(
    method forward (line 1077) | def forward(
  class OpenVLAForActionPrediction (line 1326) | class OpenVLAForActionPrediction(PrismaticForConditionalGeneration):
    method __init__ (line 1329) | def __init__(self, config: OpenVLAConfig) -> None:
    method _prepare_input_for_action_prediction (line 1340) | def _prepare_input_for_action_prediction(self, input_ids, attention_ma...
    method _prepare_labels_for_action_prediction (line 1363) | def _prepare_labels_for_action_prediction(self, labels, input_ids):
    method _unnormalize_actions (line 1378) | def _unnormalize_actions(self, normalized_actions, unnorm_key=None):
    method _run_diffusion_prediction (line 1399) | def _run_diffusion_prediction(
    method _regression_or_discrete_prediction (line 1485) | def _regression_or_discrete_prediction(
    method _verl_discrete_prediction (line 1552) | def _verl_discrete_prediction(
    method predict_action (line 1711) | def predict_action(
    method generate_action_verl (line 1827) | def generate_action_verl(
    method _check_unnorm_key (line 1972) | def _check_unnorm_key(norm_stats: Dict[str, Dict[str, Any]], unnorm_ke...
    method get_action_dim (line 1988) | def get_action_dim(self, unnorm_key: Optional[str] = None) -> int:
    method get_action_stats (line 1993) | def get_action_stats(self, unnorm_key: Optional[str] = None) -> Dict[s...

FILE: siirl/models/embodied/openvla_oft/processing_prismatic.py
  function letterbox_pad_transform (line 23) | def letterbox_pad_transform(image: Image.Image, padding_fill_value: Tupl...
  class PrismaticImageProcessor (line 32) | class PrismaticImageProcessor(ImageProcessingMixin):
    method __init__ (line 35) | def __init__(
    method apply_transform (line 128) | def apply_transform(self, img: Image.Image) -> torch.Tensor:
    method preprocess (line 147) | def preprocess(
    method __call__ (line 169) | def __call__(self, images: Union[Image.Image, List[Image.Image]], **kw...
  class PrismaticProcessor (line 175) | class PrismaticProcessor(ProcessorMixin):
    method __init__ (line 180) | def __init__(
    method __call__ (line 187) | def __call__(
    method batch_decode (line 219) | def batch_decode(
    method decode (line 233) | def decode(
    method model_input_names (line 248) | def model_input_names(self) -> List[str]:

FILE: siirl/models/embodied/openvla_oft/train_utils.py
  function get_current_action_mask (line 8) | def get_current_action_mask(token_ids):
  function get_next_actions_mask (line 25) | def get_next_actions_mask(token_ids):
  function compute_token_accuracy (line 42) | def compute_token_accuracy(predicted_token_ids, ground_truth_token_ids, ...
  function compute_actions_l1_loss (line 48) | def compute_actions_l1_loss(action_tokenizer, predicted_token_ids, groun...

FILE: siirl/models/llama/megatron/checkpoint_utils/llama_loader.py
  function _megatron_calc_layer_map (line 23) | def _megatron_calc_layer_map(config):
  function load_state_dict_to_megatron_llama (line 53) | def load_state_dict_to_megatron_llama(state_dict, wrapped_models, config...

FILE: siirl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py
  function _megatron_calc_layer_map (line 23) | def _megatron_calc_layer_map(config):
  function load_state_dict_to_megatron_llama (line 53) | def load_state_dict_to_megatron_llama(state_dict, wrapped_models, config...

FILE: siirl/models/llama/megatron/checkpoint_utils/llama_saver.py
  function _megatron_calc_global_rank (line 28) | def _megatron_calc_global_rank(tp_rank: int = 0, dp_rank: int = 0, pp_ra...
  function _megatron_calc_layer_map (line 39) | def _megatron_calc_layer_map(config):
  function merge_megatron_ckpt_llama (line 67) | def merge_megatron_ckpt_llama(wrapped_models, config, dtype, is_value_mo...

FILE: siirl/models/llama/megatron/layers/parallel_attention.py
  class LlamaRotaryEmbedding (line 38) | class LlamaRotaryEmbedding(nn.Module):
    method __init__ (line 39) | def __init__(self, dim, max_position_embeddings=2048, base=10000, devi...
    method _set_cos_sin_cache (line 51) | def _set_cos_sin_cache(self, seq_len, device, dtype):
    method forward (line 61) | def forward(self, x, seq_len=None):
  class LlamaLinearScalingRotaryEmbedding (line 72) | class LlamaLinearScalingRotaryEmbedding(LlamaRotaryEmbedding):
    method __init__ (line 75) | def __init__(self, dim, max_position_embeddings=2048, base=10000, devi...
    method _set_cos_sin_cache (line 79) | def _set_cos_sin_cache(self, seq_len, device, dtype):
  class LlamaDynamicNTKScalingRotaryEmbedding (line 91) | class LlamaDynamicNTKScalingRotaryEmbedding(LlamaRotaryEmbedding):
    method __init__ (line 94) | def __init__(self, dim, max_position_embeddings=2048, base=10000, devi...
    method _set_cos_sin_cache (line 98) | def _set_cos_sin_cache(self, seq_len, device, dtype):
  class LlamaLlama3ScalingRotaryEmbedding (line 115) | class LlamaLlama3ScalingRotaryEmbedding(LlamaRotaryEmbedding):
    method __init__ (line 116) | def __init__(self, dim, config, max_position_embeddings=2048, base=100...
  function rotate_half (line 142) | def rotate_half(x):
  function apply_rotary_pos_emb (line 149) | def apply_rotary_pos_emb(q, k, cos, sin, position_ids):
  function repeat_kv (line 157) | def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor:
  class ParallelLlamaAttention (line 169) | class ParallelLlamaAttention(nn.Module):
    method __init__ (line 172) | def __init__(self, config: LlamaConfig, megatron_config: ModelParallel...
    method _init_rope (line 232) | def _init_rope(self):
    method _shape (line 267) | def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int):
    method forward (line 270) | def forward(
  function apply_rotary_pos_emb_rmpad (line 326) | def apply_rotary_pos_emb_rmpad(q, k, cos, sin, position_ids, indices, se...
  function apply_rotary_pos_emb_rmpad_flash (line 344) | def apply_rotary_pos_emb_rmpad_flash(q, k, cos, sin, cu_seqlens, max_seq...
  class ParallelLlamaAttentionRmPad (line 350) | class ParallelLlamaAttentionRmPad(ParallelLlamaAttention):
    method forward (line 351) | def forward(

FILE: siirl/models/llama/megatron/layers/parallel_decoder.py
  class ParallelLlamaDecoderLayer (line 35) | class ParallelLlamaDecoderLayer(nn.Module):
    method __init__ (line 36) | def __init__(self, config: LlamaConfig, megatron_config: ModelParallel...
    method forward (line 47) | def forward(
  class ParallelLlamaDecoderLayerRmPad (line 102) | class ParallelLlamaDecoderLayerRmPad(nn.Module):
    method __init__ (line 103) | def __init__(self, config: LlamaConfig, megatron_config: ModelParallel...
    method forward (line 114) | def forward(

FILE: siirl/models/llama/megatron/layers/parallel_linear.py
  class QKVParallelLinear (line 20) | class QKVParallelLinear(tensor_parallel.ColumnParallelLinear):
    method __init__ (line 21) | def __init__(
  class MergedColumnParallelLinear (line 54) | class MergedColumnParallelLinear(tensor_parallel.ColumnParallelLinear):
    method __init__ (line 55) | def __init__(
  class LinearForLastLayer (line 82) | class LinearForLastLayer(torch.nn.Linear):
    method __init__ (line 83) | def __init__(
    method forward (line 96) | def forward(

FILE: siirl/models/llama/megatron/layers/parallel_mlp.py
  class ParallelLlamaMLP (line 30) | class ParallelLlamaMLP(nn.Module):
    method __init__ (line 31) | def __init__(self, config, megatron_config: ModelParallelConfig = None...
    method forward (line 71) | def forward(self, x):

FILE: siirl/models/llama/megatron/layers/parallel_rmsnorm.py
  class ParallelLlamaRMSNorm (line 26) | class ParallelLlamaRMSNorm(nn.Module):
    method __init__ (line 27) | def __init__(self, config: LlamaConfig, megatron_config: ModelParallel...
    method forward (line 41) | def forward(self, hidden_states):

FILE: siirl/models/llama/megatron/modeling_llama_megatron.py
  function _make_causal_mask (line 47) | def _make_causal_mask(input_ids_shape: torch.Size, dtype: torch.dtype, d...
  function _expand_mask (line 60) | def _expand_mask(mask: torch.Tensor, dtype: torch.dtype, tgt_len: Option...
  class ParallelLlamaModel (line 74) | class ParallelLlamaModel(nn.Module):
    method __init__ (line 82) | def __init__(self, config: LlamaConfig, megatron_config: ModelParallel...
    method _prepare_decoder_attention_mask (line 97) | def _prepare_decoder_attention_mask(self, attention_mask, input_shape,...
    method forward (line 115) | def forward(
  class ParallelLlamaForCausalLM (line 153) | class ParallelLlamaForCausalLM(nn.Module):
    method __init__ (line 154) | def __init__(self, config: LlamaConfig, megatron_config: ModelParallel...
    method forward (line 174) | def forward(
  class ParallelLlamaModelRmPad (line 215) | class ParallelLlamaModelRmPad(nn.Module):
    method __init__ (line 223) | def __init__(self, config: LlamaConfig, megatron_config: ModelParallel...
    method forward (line 238) | def forward(
  class ParallelLlamaForCausalLMRmPad (line 281) | class ParallelLlamaForCausalLMRmPad(nn.Module):
    method __init__ (line 282) | def __init__(self, config: LlamaConfig, megatron_config: ModelParallel...
    method _init_head (line 290) | def _init_head(self, config):
    method _forward_head (line 304) | def _forward_head(self, hidden_states):
    method forward (line 311) | def forward(
  class ParallelLlamaForValueRmPad (line 369) | class ParallelLlamaForValueRmPad(ParallelLlamaForCausalLMRmPad):
    method _init_head (line 370) | def _init_head(self, config):
    method _forward_head (line 379) | def _forward_head(self, hidden_states):
    method forward (line 386) | def forward(
  class ParallelLlamaModelRmPadPP (line 402) | class ParallelLlamaModelRmPadPP(nn.Module):
    method __init__ (line 412) | def __init__(self, config: LlamaConfig, megatron_config: ModelParallel...
    method set_input_tensor (line 454) | def set_input_tensor(self, input_tensor):
    method forward (line 464) | def forward(
  class ParallelLlamaForCausalLMRmPadPP (line 515) | class ParallelLlamaForCausalLMRmPadPP(nn.Module):
    method __init__ (line 516) | def __init__(
    method set_input_tensor (line 536) | def set_input_tensor(self, input_tensor):
    method _init_head (line 547) | def _init_head(self, config):
    method _forward_head (line 561) | def _forward_head(self, hidden_states):
    method forward (line 569) | def forward(
  class ParallelLlamaForValueRmPadPP (line 633) | class ParallelLlamaForValueRmPadPP(ParallelLlamaForCausalLMRmPadPP):
    method _init_head (line 634) | def _init_head(self, config):
    method _forward_head (line 643) | def _forward_head(self, hidden_states):
    method forward (line 650) | def forward(

FILE: siirl/models/loader.py
  class TokenizerModule (line 21) | class TokenizerModule(TypedDict):
  function _get_init_kwargs (line 26) | def _get_init_kwargs(model_args: "ModelArguments") -> Dict[str, Any]:
  function set_pad_token_id (line 41) | def set_pad_token_id(tokenizer):
  function load_tokenizer (line 56) | def load_tokenizer(

FILE: siirl/models/mcore/config_converter.py
  function _get_base_transformer_config (line 26) | def _get_base_transformer_config(hf_config: PretrainedConfig, dtype: tor...
  function hf_to_mcore_config_dense (line 88) | def hf_to_mcore_config_dense(hf_config: PretrainedConfig, dtype: torch.d...
  function hf_to_mcore_config_qwen2moe (line 97) | def hf_to_mcore_config_qwen2moe(hf_config: PretrainedConfig, dtype: torc...
  function hf_to_mcore_config_mixtral (line 128) | def hf_to_mcore_config_mixtral(hf_config: PretrainedConfig, dtype: torch...
  function hf_to_mcore_config_qwen3moe (line 158) | def hf_to_mcore_config_qwen3moe(hf_config: PretrainedConfig, dtype: torc...
  function hf_to_mcore_config_dpskv3 (line 187) | def hf_to_mcore_config_dpskv3(hf_config: PretrainedConfig, dtype: torch....
  function hf_to_mcore_config_qwen2_5_vl (line 277) | def hf_to_mcore_config_qwen2_5_vl(hf_config: PretrainedConfig, dtype: to...
  function hf_to_mcore_config_llama4 (line 282) | def hf_to_mcore_config_llama4(hf_config: PretrainedConfig, dtype: torch....

FILE: siirl/models/mcore/loader.py
  function _megatron_calc_layer_map (line 26) | def _megatron_calc_layer_map(config):
  function load_state_dict_to_megatron_gptmodel (line 54) | def load_state_dict_to_megatron_gptmodel(state_dict, wrapped_models, con...

FILE: siirl/models/mcore/model_forward.py
  function gptmodel_forward (line 22) | def gptmodel_forward(

FILE: siirl/models/mcore/model_forward_fused.py
  function patch_fused_forward (line 36) | def patch_fused_forward(model: torch.nn.Module):
  function unpatch_fused_forward (line 47) | def unpatch_fused_forward(model: torch.nn.Module):
  function fused_forward_gptmodel (line 57) | def fused_forward_gptmodel(
  function _fused_GPTModel_forward (line 103) | def _fused_GPTModel_forward(

FILE: siirl/models/mcore/model_initializer.py
  class BaseModelInitializer (line 26) | class BaseModelInitializer(ABC):
    method __init__ (line 29) | def __init__(self, tfconfig: TransformerConfig, hf_config: PretrainedC...
    method get_transformer_layer_spec (line 34) | def get_transformer_layer_spec(self):
    method get_rope_scaling_args (line 39) | def get_rope_scaling_args(self) -> dict:
    method initialize (line 48) | def initialize(
  class DenseModel (line 93) | class DenseModel(BaseModelInitializer):
    method get_transformer_layer_spec (line 96) | def get_transformer_layer_spec(self):
  class Qwen2MoEModel (line 101) | class Qwen2MoEModel(BaseModelInitializer):
    method get_transformer_layer_spec (line 104) | def get_transformer_layer_spec(self):
    method initialize (line 114) | def initialize(self, **kwargs):
  class MixtralModel (line 124) | class MixtralModel(BaseModelInitializer):
    method get_transformer_layer_spec (line 127) | def get_transformer_layer_spec(self):
    method initialize (line 132) | def initialize(self, **kwargs):
  class Qwen3MoEModel (line 141) | class Qwen3MoEModel(BaseModelInitializer):
    method get_transformer_layer_spec (line 144) | def get_transformer_layer_spec(self):
    method initialize (line 149) | def initialize(self, **kwargs):
  class DeepseekV3Model (line 159) | class DeepseekV3Model(BaseModelInitializer):
    method get_transformer_layer_spec (line 162) | def get_transformer_layer_spec(self):
    method get_rope_scaling_args (line 166) | def get_rope_scaling_args(self) -> dict:
    method initialize (line 171) | def initialize(
  class Qwen25VLModel (line 192) | class Qwen25VLModel(BaseModelInitializer):
    method get_transformer_layer_spec (line 195) | def get_transformer_layer_spec(self):

FILE: siirl/models/mcore/patch_v012.py
  function apply_patch (line 20) | def apply_patch():

FILE: siirl/models/mcore/registry.py
  class SupportedModel (line 60) | class SupportedModel(Enum):
  function get_supported_model (line 137) | def get_supported_model(model_type: str) -> SupportedModel:
  function hf_to_mcore_config (line 145) | def hf_to_mcore_config(hf_config: PretrainedConfig, dtype: torch.dtype, ...
  function init_mcore_model (line 151) | def init_mcore_model(
  function get_mcore_forward_fn (line 183) | def get_mcore_forward_fn(hf_config: PretrainedConfig) -> Callable:
  function get_mcore_forward_fused_fn (line 191) | def get_mcore_forward_fused_fn(hf_config: PretrainedConfig) -> Callable:
  function get_mcore_weight_converter (line 199) | def get_mcore_weight_converter(hf_config: PretrainedConfig, dtype: torch...

FILE: siirl/models/mcore/saver.py
  function _megatron_calc_global_rank (line 29) | def _megatron_calc_global_rank(tp_rank: int = 0, dp_rank: int = 0, pp_ra...
  function _megatron_calc_layer_map (line 48) | def _megatron_calc_layer_map(config):
  function merge_megatron_ckpt_gptmodel (line 76) | def merge_megatron_ckpt_gptmodel(wrapped_models, config, dtype, is_value...
  function merge_megatron_ckpt_gptmodel_qwen_moe (line 467) | def merge_megatron_ckpt_gptmodel_qwen_moe(wrapped_models, config, dtype,...
  function merge_megatron_ckpt_gptmodel_dpskv3 (line 471) | def merge_megatron_ckpt_gptmodel_dpskv3(wrapped_models, config, dtype, i...
  function merge_megatron_ckpt_gptmodel_mixtral (line 475) | def merge_megatron_ckpt_gptmodel_mixtral(wrapped_models, config, dtype, ...

FILE: siirl/models/mcore/util.py
  function preprocess_packed_seqs (line 23) | def preprocess_packed_seqs(
  function postprocess_packed_seqs (line 105) | def postprocess_packed_seqs(
  function remove_left_padding (line 165) | def remove_left_padding(
  function recover_left_padding (line 206) | def recover_left_padding(
  function postprocess_packed_seqs_for_dict_output (line 228) | def postprocess_packed_seqs_for_dict_output(

FILE: siirl/models/mcore/weight_converter.py
  class McoreToHFWeightConverterBase (line 25) | class McoreToHFWeightConverterBase:
    method __init__ (line 26) | def __init__(self, hf_config: PretrainedConfig, mcore_config: Transfor...
    method convert_param (line 30) | def convert_param(self, name: str, params_one_group: list[torch.Tensor...
  class McoreToHFWeightConverterDense (line 34) | class McoreToHFWeightConverterDense(McoreToHFWeightConverterBase):
    method _convert_attention_param (line 35) | def _convert_attention_param(self, name: str, params: list[torch.Tenso...
    method _convert_mlp_param (line 65) | def _convert_mlp_param(self, name: str, params: list[torch.Tensor]) ->...
    method convert_param (line 86) | def convert_param(self, name: str, params_one_group: list[torch.Tensor...
  class McoreToHFWeightConverterQwen2Moe (line 103) | class McoreToHFWeightConverterQwen2Moe(McoreToHFWeightConverterDense):
    method _convert_mlp_param (line 104) | def _convert_mlp_param(self, name: str, params: list[torch.Tensor]) ->...
  class McoreToHFWeightConverterDpskv3 (line 150) | class McoreToHFWeightConverterDpskv3(McoreToHFWeightConverterBase):
    method _convert_attention_param (line 151) | def _convert_attention_param(self, name: str, params: list[torch.Tenso...
    method _convert_mlp_param (line 190) | def _convert_mlp_param(self, name: str, params: list[torch.Tensor]) ->...
    method _convert_mtp_param (line 260) | def _convert_mtp_param(self, name: str, params: list[torch.Tensor]) ->...
    method convert_param (line 277) | def convert_param(self, name: str, params_one_group: list[torch.Tensor...
  class McoreToHFWeightConverterMixtral (line 295) | class McoreToHFWeightConverterMixtral(McoreToHFWeightConverterDense):
    method _convert_mlp_param (line 296) | def _convert_mlp_param(self, name: str, params: list[torch.Tensor]) ->...
  class McoreToHFWeightConverterQwen3Moe (line 319) | class McoreToHFWeightConverterQwen3Moe(McoreToHFWeightConverterDense):
    method _convert_mlp_param (line 320) | def _convert_mlp_param(self, name: str, params: list[torch.Tensor]) ->...

FILE: siirl/models/model_utils/visual.py
  class CompositeModel (line 25) | class CompositeModel:
    method get_projector (line 32) | def get_projector(self, module: "torch.nn.Module") -> "torch.nn.Module":
  function _register_composite_model (line 42) | def _register_composite_model(
  class LlavaMultiModalProjectorForYiVL (line 58) | class LlavaMultiModalProjectorForYiVL(torch.nn.Module):
    method __init__ (line 59) | def __init__(self, config: "LlavaConfig") -> None:
    method forward (line 72) | def forward(self, image_features: "torch.Tensor") -> "torch.Tensor":
  class LlavaMultiModalProjectorForYiVLForVLLM (line 92) | class LlavaMultiModalProjectorForYiVLForVLLM(LlavaMultiModalProjectorFor...
    method __init__ (line 93) | def __init__(self, vision_hidden_size: int, text_hidden_size: int, pro...
  function autocast_projector_dtype (line 103) | def autocast_projector_dtype(model: "PreTrainedModel", model_args: "Mode...
  function configure_visual_model (line 122) | def configure_visual_model(config: "PretrainedConfig") -> None:
  function get_forbidden_modules (line 135) | def get_forbidden_modules(config: "PretrainedConfig", finetuning_args: "...
  function get_image_seqlen (line 160) | def get_image_seqlen(config: "PretrainedConfig") -> int:
  function get_patch_size (line 177) | def get_patch_size(config: "PretrainedConfig", processor: "ProcessorMixi...
  function get_vision_feature_select_strategy (line 185) | def get_vision_feature_select_strategy(config: "PretrainedConfig", proce...
  function patch_target_modules (line 197) | def patch_target_modules(

FILE: siirl/models/patcher.py
  function patch_tokenizer (line 39) | def patch_tokenizer(tokenizer: "PreTrainedTokenizer", model_args: "Model...
  function patch_processor (line 66) | def patch_processor(

FILE: siirl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py
  function _megatron_calc_layer_map (line 22) | def _megatron_calc_layer_map(config):
  function load_state_dict_to_megatron_qwen2 (line 50) | def load_state_dict_to_megatron_qwen2(state_dict, wrapped_models, config...

FILE: siirl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py
  function _megatron_calc_layer_map (line 23) | def _megatron_calc_layer_map(config):
  function load_state_dict_to_megatron_qwen2 (line 51) | def load_state_dict_to_megatron_qwen2(state_dict, wrapped_models, config...

FILE: siirl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py
  function _megatron_calc_global_rank (line 28) | def _megatron_calc_global_rank(tp_rank: int = 0, dp_rank: int = 0, pp_ra...
  function _megatron_calc_layer_map (line 39) | def _megatron_calc_layer_map(config):
  function merge_megatron_ckpt_qwen2 (line 67) | def merge_megatron_ckpt_qwen2(wrapped_models, config, dtype, is_value_mo...

FILE: siirl/models/qwen2/megatron/layers/parallel_attention.py
  class Qwen2RotaryEmbedding (line 42) | class Qwen2RotaryEmbedding(nn.Module):
    method __init__ (line 43) | def __init__(self, dim, max_position_embeddings=2048, base=10000, devi...
    method _set_cos_sin_cache (line 55) | def _set_cos_sin_cache(self, seq_len, device, dtype):
    method forward (line 65) | def forward(self, x, seq_len=None):
  class Qwen2LinearScalingRotaryEmbedding (line 76) | class Qwen2LinearScalingRotaryEmbedding(Qwen2RotaryEmbedding):
    method __init__ (line 79) | def __init__(self, dim, max_position_embeddings=2048, base=10000, devi...
    method _set_cos_sin_cache (line 83) | def _set_cos_sin_cache(self, seq_len, device, dtype):
  class Qwen2DynamicNTKScalingRotaryEmbedding (line 95) | class Qwen2DynamicNTKScalingRotaryEmbedding(Qwen2RotaryEmbedding):
    method __init__ (line 98) | def __init__(self, dim, max_position_embeddings=2048, base=10000, devi...
    method _set_cos_sin_cache (line 102) | def _set_cos_sin_cache(self, seq_len, device, dtype):
  function rotate_half (line 119) | def rotate_half(x):
  function apply_rotary_pos_emb (line 126) | def apply_rotary_pos_emb(q, k, cos, sin, position_ids):
  function repeat_kv (line 134) | def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor:
  class ParallelQwen2Attention (line 146) | class ParallelQwen2Attention(nn.Module):
    method __init__ (line 149) | def __init__(self, config: Qwen2Config, megatron_config: ModelParallel...
    method _init_rope (line 211) | def _init_rope(self):
    method _shape (line 218) | def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int):
    method forward (line 221) | def forward(
  function apply_rotary_pos_emb_rmpad (line 272) | def apply_rotary_pos_emb_rmpad(q, k, cos, sin, position_ids, indices, se...
  function apply_rotary_pos_emb_rmpad_flash (line 290) | def apply_rotary_pos_emb_rmpad_flash(q, k, cos, sin, cu_seqlens, max_seq...
  class ParallelQwen2AttentionRmPad (line 296) | class ParallelQwen2AttentionRmPad(ParallelQwen2Attention):
    method forward (line 297) | def forward(

FILE: siirl/models/qwen2/megatron/layers/parallel_decoder.py
  class ParallelQwen2DecoderLayer (line 35) | class ParallelQwen2DecoderLayer(nn.Module):
    method __init__ (line 36) | def __init__(self, config: Qwen2Config, megatron_config: ModelParallel...
    method forward (line 47) | def forward(
  class ParallelQwen2DecoderLayerRmPad (line 102) | class ParallelQwen2DecoderLayerRmPad(nn.Module):
    method __init__ (line 103) | def __init__(self, config: Qwen2Config, megatron_config: ModelParallel...
    method forward (line 114) | def forward(

FILE: siirl/models/qwen2/megatron/layers/parallel_linear.py
  class QKVParallelLinear (line 20) | class QKVParallelLinear(tensor_parallel.ColumnParallelLinear):
    method __init__ (line 21) | def __init__(
  class MergedColumnParallelLinear (line 54) | class MergedColumnParallelLinear(tensor_parallel.ColumnParallelLinear):
    method __init__ (line 55) | def __init__(

FILE: siirl/models/qwen2/megatron/layers/parallel_mlp.py
  class ParallelQwen2MLP (line 30) | class ParallelQwen2MLP(nn.Module):
    method __init__ (line 31) | def __init__(self, config, megatron_config: ModelParallelConfig = None...
    method forward (line 71) | def forward(self, x):

FILE: siirl/models/qwen2/megatron/layers/parallel_rmsnorm.py
  class ParallelQwen2RMSNorm (line 26) | class ParallelQwen2RMSNorm(nn.Module):
    method __init__ (line 27) | def __init__(self, config: Qwen2Config, megatron_config: ModelParallel...
    method forward (line 41) | def forward(self, hidden_states):

FILE: siirl/models/qwen2/megatron/modeling_qwen2_megatron.py
  function _make_causal_mask (line 48) | def _make_causal_mask(input_ids_shape: torch.Size, dtype: torch.dtype, d...
  function _expand_mask (line 61) | def _expand_mask(mask: torch.Tensor, dtype: torch.dtype, tgt_len: Option...
  class ParallelQwen2Model (line 75) | class ParallelQwen2Model(nn.Module):
    method __init__ (line 83) | def __init__(self, config: Qwen2Config, megatron_config: ModelParallel...
    method _prepare_decoder_attention_mask (line 98) | def _prepare_decoder_attention_mask(self, attention_mask, input_shape,...
    method forward (line 116) | def forward(
  class ParallelQwen2ForCausalLM (line 154) | class ParallelQwen2ForCausalLM(nn.Module):
    method __init__ (line 155) | def __init__(self, config: Qwen2Config, megatron_config: ModelParallel...
    method forward (line 175) | def forward(
  class ParallelQwen2ModelRmPad (line 216) | class ParallelQwen2ModelRmPad(nn.Module):
    method __init__ (line 224) | def __init__(self, config: Qwen2Config, megatron_config: ModelParallel...
    method forward (line 239) | def forward(
  class ParallelQwen2ForCausalLMRmPad (line 282) | class ParallelQwen2ForCausalLMRmPad(nn.Module):
    method __init__ (line 283) | def __init__(self, config: Qwen2Config, megatron_config: ModelParallel...
    method _init_head (line 291) | def _init_head(self, config: Qwen2Config):
    method _forward_head (line 305) | def _forward_head(self, hidden_states):
    method forward (line 312) | def forward(
  class ParallelQwen2ForValueRmPad (line 370) | class ParallelQwen2ForValueRmPad(ParallelQwen2ForCausalLMRmPad):
    method _init_head (line 371) | def _init_head(self, config):
    method _forward_head (line 380) | def _forward_head(self, hidden_states):
    method forward (line 387) | def forward(
  class ParallelQwen2ModelRmPadPP (line 403) | class ParallelQwen2ModelRmPadPP(nn.Module):
    method __init__ (line 413) | def __init__(self, config: Qwen2Config, megatron_config: ModelParallel...
    method set_input_tensor (line 454) | def set_input_tensor(self, input_tensor):
    method forward (line 464) | def forward(
  class ParallelQwen2ForCausalLMRmPadPP (line 515) | class ParallelQwen2ForCausalLMRmPadPP(nn.Module):
    method __init__ (line 516) | def __init__(
    method set_input_tensor (line 537) | def set_input_tensor(self, input_tensor):
    method _init_head (line 548) | def _init_head(self, config):
    method setup_embeddings_and_output_layer (line 563) | def setup_embeddings_and_output_layer(self) -> None:
    method shared_embedding_or_output_weight (line 602) | def shared_embedding_or_output_weight(self) -> torch.Tensor:
    method _forward_head (line 609) | def _forward_head(self, hidden_states):
    method forward (line 620) | def forward(
  class ParallelQwen2ForValueRmPadPP (line 683) | class ParallelQwen2ForValueRmPadPP(ParallelQwen2ForCausalLMRmPadPP):
    method _init_head (line 684) | def _init_head(self, config):
    method _forward_head (line 693) | def _forward_head(self, hidden_states):
    method forward (line 700) | def forward(

FILE: siirl/models/registry.py
  class ModelRegistry (line 39) | class ModelRegistry:
    method load_model_cls (line 41) | def load_model_cls(model_arch: str, value=False) -> Optional[Type[nn.M...
    method get_supported_archs (line 57) | def get_supported_archs() -> List[str]:

FILE: siirl/models/transformers/internvl.py
  class SeparatorStyle (line 55) | class SeparatorStyle(IntEnum):
  class Conversation (line 79) | class Conversation:
    method get_prompt (line 103) | def get_prompt(self) -> str:
    method set_system_message (line 289) | def set_system_message(self, system_message: str):
    method append_message (line 293) | def append_message(self, role: str, message: str):
    method update_last_message (line 297) | def update_last_message(self, message: str):
    method to_gradio_chatbot (line 305) | def to_gradio_chatbot(self):
    method to_openai_api_messages (line 315) | def to_openai_api_messages(self):
    method copy (line 327) | def copy(self):
    method dict (line 342) | def dict(self):
  function register_conv_template (line 355) | def register_conv_template(template: Conversation, override: bool = False):
  function get_conv_template (line 363) | def get_conv_template(name: str) -> Conversation:
  function calculate_ngram_repetition (line 388) | def calculate_ngram_repetition(text, n):
  function check_conversations_repetition (line 397) | def check_conversations_repetition(conversations, repeat_threshold=0.4, ...
  function get_frame_indices (line 406) | def get_frame_indices(num_frames, vlen, sample="rand", fix_start=None, i...
  function read_frames_gif (line 447) | def read_frames_gif(video_path, num_frames, sample="rand", fix_start=Non...
  function read_frames_decord (line 467) | def read_frames_decord(video_path, num_frames, sample="rand", fix_start=...
  function extract_frame_number (line 495) | def extract_frame_number(filename):
  function sort_frames (line 501) | def sort_frames(frame_paths):
  function read_frames_folder (line 506) | def read_frames_folder(video_path, num_frames, sample="rand", fix_start=...
  class WeightedConcatDataset (line 531) | class WeightedConcatDataset(ConcatDataset):
    method __init__ (line 532) | def __init__(self, datasets, weights):
    method __iter__ (line 538) | def __iter__(self):
    method __len__ (line 541) | def __len__(self):
  function pil_loader (line 545) | def pil_loader(img_str):
  class TCSLoader (line 551) | class TCSLoader(object):
    method __init__ (line 552) | def __init__(self, conf_path, sc_config_key="sensecore"):
    method __call__ (line 559) | def __call__(self, fn, image_type="image", max_num_frames=-1, min_num_...
  function expand2square (line 575) | def expand2square(pil_img, background_color):
  function simulate_jpeg_degradation (line 589) | def simulate_jpeg_degradation(quality):
  function build_transform (line 605) | def build_transform(is_train, input_size, pad2square=False, normalize_ty...
  function preprocess (line 629) | def preprocess(template_name, sources, tokenizer: transformers.PreTraine...
  function preprocess_mpt (line 721) | def preprocess_mpt(template_name, sources, tokenizer: transformers.PreTr...
  function preprocess_phi3 (line 802) | def preprocess_phi3(template_name, sources, tokenizer: transformers.PreT...
  function preprocess_internlm (line 898) | def preprocess_internlm(template_name, sources, tokenizer: transformers....
  function preprocess_internvl2_5 (line 978) | def preprocess_internvl2_5(template_name, sources, tokenizer: transforme...
  function find_closest_aspect_ratio (line 1070) | def find_closest_aspect_ratio(aspect_ratio, target_ratios, width, height...
  function dynamic_preprocess (line 1087) | def dynamic_preprocess(image, min_num=1, max_num=6, image_size=448, use_...
  function preprocess_internvl2_5_siirl (line 1118) | def preprocess_internvl2_5_siirl(sources, tokenizer: transformers.PreTra...
  function find_closest_aspect_ratio (line 1205) | def find_closest_aspect_ratio(aspect_ratio, target_ratios, width, height...
  function dynamic_preprocess (line 1222) | def dynamic_preprocess(image, min_num=1, max_num=6, image_size=448, use_...

FILE: siirl/models/transformers/internvl_chat/configuration_intern_vit.py
  class InternVisionConfig (line 16) | class InternVisionConfig(PretrainedConfig):
    method __init__ (line 64) | def __init__(
    method from_pretrained (line 108) | def from_pretrained(cls, pretrained_model_name_or_path: Union[str, os....

FILE: siirl/models/transformers/internvl_chat/configuration_internlm2.py
  class InternLM2Config (line 27) | class InternLM2Config(PretrainedConfig):
    method __init__ (line 78) | def __init__(  # pylint: disable=W0102
    method _rope_scaling_validation (line 132) | def _rope_scaling_validation(self):

FILE: siirl/models/transformers/internvl_chat/configuration_internvl_chat.py
  class InternVLChatConfig (line 19) | class InternVLChatConfig(PretrainedConfig):
    method __init__ (line 23) | def __init__(self, vision_config=None, llm_config=None, use_backbone_l...
    method to_dict (line 58) | def to_dict(self):

FILE: siirl/models/transformers/internvl_chat/modeling_intern_vit.py
  class FlashAttention (line 34) | class FlashAttention(nn.Module):
    method __init__ (line 45) | def __init__(self, softmax_scale=None, attention_dropout=0.0, device=N...
    method forward (line 50) | def forward(self, qkv, key_padding_mask=None, causal=False, cu_seqlens...
  class InternRMSNorm (line 85) | class InternRMSNorm(nn.Module):
    method __init__ (line 86) | def __init__(self, hidden_size, eps=1e-6):
    method forward (line 91) | def forward(self, hidden_states):
  class InternVisionEmbeddings (line 119) | class InternVisionEmbeddings(nn.Module):
    method __init__ (line 120) | def __init__(self, config: InternVisionConfig):
    method _get_pos_embed (line 138) | def _get_pos_embed(self, pos_embed, H, W):
    method forward (line 144) | def forward(self, pixel_values: torch.FloatTensor) -> torch.Tensor:
  class InternAttention (line 156) | class InternAttention(nn.Module):
    method __init__ (line 159) | def __init__(self, config: InternVisionConfig):
    method _naive_attn (line 186) | def _naive_attn(self, x):
    method _flash_attn (line 205) | def _flash_attn(self, x, key_padding_mask=None, need_weights=False):
    method forward (line 220) | def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
  class InternMLP (line 225) | class InternMLP(nn.Module):
    method __init__ (line 226) | def __init__(self, config: InternVisionConfig):
    method forward (line 233) | def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
  class InternVisionEncoderLayer (line 240) | class InternVisionEncoderLayer(nn.Module):
    method __init__ (line 241) | def __init__(self, config: InternVisionConfig, drop_path_rate: float):
    method forward (line 257) | def forward(
  class InternVisionEncoder (line 272) | class InternVisionEncoder(nn.Module):
    method __init__ (line 282) | def __init__(self, config: InternVisionConfig):
    method forward (line 290) | def forward(
  class InternVisionModel (line 331) | class InternVisionModel(PreTrainedModel):
    method __init__ (line 337) | def __init__(self, config: InternVisionConfig):
    method resize_pos_embeddings (line 344) | def resize_pos_embeddings(self, old_size, new_size, patch_size):
    method get_input_embeddings (line 356) | def get_input_embeddings(self):
    method forward (line 359) | def forward(

FILE: siirl/models/transformers/internvl_chat/modeling_internlm2.py
  function _import_flash_attn (line 62) | def _import_flash_attn():
  function _get_unpad_data (line 79) | def _get_unpad_data(attention_mask):
  function _make_causal_mask (line 92) | def _make_causal_mask(input_ids_shape: torch.Size, dtype: torch.dtype, d...
  function _expand_mask (line 108) | def _expand_mask(mask: torch.Tensor, dtype: torch.dtype, tgt_len: Option...
  class InternLM2RMSNorm (line 123) | class InternLM2RMSNorm(nn.Module):
    method __init__ (line 124) | def __init__(self, hidden_size, eps=1e-6):
    method forward (line 132) | def forward(self, hidden_states):
  class InternLM2RotaryEmbedding (line 141) | class InternLM2RotaryEmbedding(nn.Module):
    method __init__ (line 142) | def __init__(self, dim, max_position_embeddings=2048, base=10000, devi...
    method _set_cos_sin_cache (line 154) | def _set_cos_sin_cache(self, seq_len, device, dtype):
    method forward (line 164) | def forward(self, x, seq_len=None):
  class InternLM2LinearScalingRotaryEmbedding (line 176) | class InternLM2LinearScalingRotaryEmbedding(InternLM2RotaryEmbedding):
    method __init__ (line 179) | def __init__(self, dim, max_position_embeddings=2048, base=10000, devi...
    method _set_cos_sin_cache (line 183) | def _set_cos_sin_cache(self, seq_len, device, dtype):
  class InternLM2DynamicNTKScalingRotaryEmbedding (line 196) | class InternLM2DynamicNTKScalingRotaryEmbedding(InternLM2RotaryEmbedding):
    method __init__ (line 201) | def __init__(self, dim, max_position_embeddings=2048, base=10000, devi...
    method _set_cos_sin_cache (line 205) | def _set_cos_sin_cache(self, seq_len, device, dtype):
  function rotate_half (line 223) | def rotate_half(x):
  function apply_rotary_pos_emb (line 231) | def apply_rotary_pos_emb(q, k, cos, sin, position_ids, unsqueeze_dim=1):
  class InternLM2MLP (line 240) | class InternLM2MLP(nn.Module):
    method __init__ (line 241) | def __init__(self, config):
    method forward (line 251) | def forward(self, x):
  function repeat_kv (line 258) | def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor:
  class InternLM2Attention (line 271) | class InternLM2Attention(nn.Module):
    method __init__ (line 274) | def __init__(self, config: InternLM2Config):
    method _init_rope (line 297) | def _init_rope(self):
    method _shape (line 325) | def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int):
    method forward (line 328) | def forward(
  class InternLM2FlashAttention2 (line 406) | class InternLM2FlashAttention2(InternLM2Attention):
    method forward (line 413) | def forward(
    method _flash_attention_forward (line 480) | def _flash_attention_forward(self, query_states, key_states, value_sta...
    method _unpad_input (line 528) | def _unpad_input(self, query_layer, key_layer, value_layer, attention_...
  class InternLM2DecoderLayer (line 567) | class InternLM2DecoderLayer(nn.Module):
    method __init__ (line 568) | def __init__(self, config: InternLM2Config):
    method forward (line 578) | def forward(
  class InternLM2PreTrainedModel (line 660) | class InternLM2PreTrainedModel(PreTrainedModel):
    method _init_weights (line 668) | def _init_weights(self, module):
  class InternLM2Model (line 750) | class InternLM2Model(InternLM2PreTrainedModel):
    method __init__ (line 760) | def __init__(self, config: InternLM2Config):
    method get_input_embeddings (line 778) | def get_input_embeddings(self):
    method set_input_embeddings (line 781) | def set_input_embeddings(self, value):
    method _prepare_decoder_attention_mask (line 784) | def _prepare_decoder_attention_mask(self, attention_mask, input_shape,...
    method forward (line 804) | def forward(
  class InternLM2ForCausalLM (line 928) | class InternLM2ForCausalLM(InternLM2PreTrainedModel):
    method __init__ (line 933) | def __init__(self, config):
    method get_input_embeddings (line 942) | def get_input_embeddings(self):
    method set_input_embeddings (line 945) | def set_input_embeddings(self, value):
    method get_output_embeddings (line 948) | def get_output_embeddings(self):
    method set_output_embeddings (line 951) | def set_output_embeddings(self, new_embeddings):
    method set_decoder (line 954) | def set_decoder(self, decoder):
    method get_decoder (line 957) | def get_decoder(self):
    method forward (line 962) | def forward(
    method prepare_inputs_for_generation (line 1050) | def prepare_inputs_for_generation(self, input_ids, past_key_values=Non...
    method _reorder_cache (line 1088) | def _reorder_cache(past_key_values, beam_idx):
    method build_inputs (line 1094) | def build_inputs(self, tokenizer, query: str, history: List[Tuple[str,...
    method chat (line 1107) | def chat(
    method stream_chat (line 1143) | def stream_chat(
  class InternLM2ForSequenceClassification (line 1242) | class InternLM2ForSequenceClassification(InternLM2PreTrainedModel):
    method __init__ (line 1243) | def __init__(self, config):
    method get_input_embeddings (line 1252) | def get_input_embeddings(self):
    method set_input_embeddings (line 1255) | def set_input_embeddings(self, value):
    method forward (line 1259) | def forward(

FILE: siirl/models/transformers/internvl_chat/modeling_internvl_chat.py
  function version_cmp (line 28) | def version_cmp(v1, v2, op="eq"):
  class InternVLChatModel (line 37) | class InternVLChatModel(PreTrainedModel):
    method __init__ (line 45) | def __init__(self, config: InternVLChatConfig, vision_model=None, lang...
    method forward (line 86) | def forward(
    method pixel_shuffle (line 168) | def pixel_shuffle(self, x, scale_factor=0.5):
    method extract_feature (line 182) | def extract_feature(self, pixel_values):
    method batch_chat (line 196) | def batch_chat(self, tokenizer, pixel_values, questions, generation_co...
    method chat (line 238) | def chat(self, tokenizer, pixel_values, question, generation_config, h...
    method generate (line 287) | def generate(

FILE: siirl/models/transformers/internvl_chat/tokenization_internlm2.py
  class InternLM2Tokenizer (line 35) | class InternLM2Tokenizer(PreTrainedTokenizer):
    method __init__ (line 49) | def __init__(
    method no_prefix_space_tokens (line 81) | def no_prefix_space_tokens(self):
    method vocab_size (line 88) | def vocab_size(self):
    method bos_token_id (line 93) | def bos_token_id(self) -> Optional[int]:
    method eos_token_id (line 97) | def eos_token_id(self) -> Optional[int]:
    method get_vocab (line 100) | def get_vocab(self):
    method _tokenize (line 106) | def _tokenize(self, text):
    method _convert_token_to_id (line 110) | def _convert_token_to_id(self, token):
    method _convert_id_to_token (line 114) | def _convert_id_to_token(self, index):
    method _maybe_add_prefix_space (line 119) | def _maybe_add_prefix_space(self, tokens, decoded):
    method convert_tokens_to_string (line 125) | def convert_tokens_to_string(self, tokens):
    method save_vocabulary (line 146) | def save_vocabulary(self, save_directory, filename_prefix: Optional[st...
    method build_inputs_with_special_tokens (line 171) | def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=No...
    method get_special_tokens_mask (line 187) | def get_special_tokens_mask(self, token_ids_0: List[int], token_ids_1:...
    method create_token_type_ids_from_sequences (line 210) | def create_token_type_ids_from_sequences(self, token_ids_0: List[int],...

FILE: siirl/models/transformers/internvl_chat/tokenization_internlm2_fast.py
  class InternLM2Converter (line 37) | class InternLM2Converter(SpmConverter):
    method vocab (line 40) | def vocab(self, proto):
    method unk_id (line 49) | def unk_id(self, proto):
    method decoder (line 53) | def decoder(self, replacement, add_prefix_space):
    method tokenizer (line 63) | def tokenizer(self, proto):
    method normalizer (line 85) | def normalizer(self, proto):
    method pre_tokenizer (line 92) | def pre_tokenizer(self, replacement, add_prefix_space):
  class InternLM2TokenizerFast (line 100) | class InternLM2TokenizerFast(PreTrainedTokenizerFast):
    method __init__ (line 107) | def __init__(
    method can_save_slow_tokenizer (line 140) | def can_save_slow_tokenizer(self) -> bool:
    method update_post_processor (line 143) | def update_post_processor(self):
    method add_eos_token (line 168) | def add_eos_token(self):
    method add_bos_token (line 172) | def add_bos_token(self):
    method add_eos_token (line 176) | def add_eos_token(self, value):
    method add_bos_token (line 181) | def add_bos_token(self, value):
    method save_vocabulary (line 185) | def save_vocabulary(self, save_directory: str, filename_prefix: Option...

FILE: siirl/models/transformers/kimi_vl.py
  function _merge_with_image_features (line 25) | def _merge_with_image_features(
  function rotate_half (line 70) | def rotate_half(x):
  function apply_rotary_pos_emb (line 78) | def apply_rotary_pos_emb(q, k, cos, sin, position_ids, unsqueeze_dim=1):
  function repeat_kv (line 114) | def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor:
  function _ulysses_flash_attn_forward (line 126) | def _ulysses_flash_attn_forward(

FILE: siirl/models/transformers/llama.py
  function llama_flash_attn_forward (line 42) | def llama_flash_attn_forward(
  function llama_attn_forward (line 165) | def llama_attn_forward(
  class CausalLMOutputForPPO (line 236) | class CausalLMOutputForPPO(CausalLMOutputWithPast):
  function forward_for_ppo (line 241) | def forward_for_ppo(

FILE: siirl/models/transformers/monkey_patch.py
  function repeat_kv (line 39) | def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor:
  function _ulysses_flash_attention_forward (line 51) | def _ulysses_flash_attention_forward(
  function patch_vlm_for_ulysses_input_slicing (line 112) | def patch_vlm_for_ulysses_input_slicing(model_class: type):
  function apply_monkey_patch (line 143) | def apply_monkey_patch(
  function is_transformers_version_in_range (line 249) | def is_transformers_version_in_range(min_version: Optional[str] = None, ...

FILE: siirl/models/transformers/npu_patch.py
  function apply_rotary_pos_emb_flashatt_npu (line 29) | def apply_rotary_pos_emb_flashatt_npu(q: torch.Tensor, k: torch.Tensor, ...
  function apply_rotary_pos_emb_npu (line 39) | def apply_rotary_pos_emb_npu(q, k, cos, sin, position_ids=None, unsqueez...
  function rms_norm_forward (line 48) | def rms_norm_forward(self, x):

FILE: siirl/models/transformers/qwen2.py
  function qwen2_flash_attn_forward (line 33) | def qwen2_flash_attn_forward(
  function qwen2_attn_forward (line 149) | def qwen2_attn_forward(

FILE: siirl/models/transformers/qwen2_5_vl.py
  class Qwen2_5_VLCausalLMOutputForPPO (line 26) | class Qwen2_5_VLCausalLMOutputForPPO(Qwen2_5_VLCausalLMOutputWithPast):
  function forward_for_ppo (line 31) | def forward_for_ppo(

FILE: siirl/models/transformers/qwen2_vl.py
  function get_rope_index (line 54) | def get_rope_index(
  function prepare_fa2_from_position_ids (line 154) | def prepare_fa2_from_position_ids(query: torch.Tensor, key: torch.Tensor...
  function flash_attention_forward (line 170) | def flash_attention_forward(
  function ulysses_flash_attn_forward (line 233) | def ulysses_flash_attn_forward(
  class Qwen2VLCausalLMOutputForPPO (line 310) | class Qwen2VLCausalLMOutputForPPO(Qwen2VLCausalLMOutputWithPast):
  function forward_for_ppo (line 315) | def forward_for_ppo(

FILE: siirl/models/transformers/transformers_compat.py
  function flash_attn_supports_top_left_mask (line 26) | def flash_attn_supports_top_left_mask():

FILE: siirl/models/weight_loader_registry.py
  function get_weight_loader (line 16) | def get_weight_loader(arch: str):
  function get_weight_saver (line 29) | def get_weight_saver(arch: str):

FILE: siirl/params/dag_args.py
  class DagArguments (line 20) | class DagArguments:

FILE: siirl/params/data_args.py
  class DataArguments (line 23) | class DataArguments:
    method __post_init__ (line 128) | def __post_init__(self):
    method to_dict (line 149) | def to_dict(self) -> Dict[str, Any]:

FILE: siirl/params/display_dict.py
  function _render_dict_recursively_util (line 27) | def _render_dict_recursively_util(current_dict_to_render: Dict[str, Any]...
  function log_dict_formatted (line 71) | def log_dict_formatted(data_dict: Dict[str, Any], title: Optional[str] =...

FILE: siirl/params/embodied_args.py
  class EnvironmentArgs (line 20) | class EnvironmentArgs:
  class EmbodiedArguments (line 53) | class EmbodiedArguments:
    method to_dict (line 117) | def to_dict(self) -> Dict[str, Any]:
  class EmbodiedSamplingConfig (line 121) | class EmbodiedSamplingConfig:
    method to_dict (line 145) | def to_dict(self) -> Dict[str, Any]:

FILE: siirl/params/model_args.py
  class MixedPrecisionArguments (line 22) | class MixedPrecisionArguments:
  class FSDPArguments (line 43) | class FSDPArguments:
    method to_dict (line 60) | def to_dict(self) -> Dict[str, Any]:
  class MegatronArguments (line 65) | class MegatronArguments:
    method to_dict (line 93) | def to_dict(self) -> Dict[str, Any]:
  class OptimizerArguments (line 98) | class OptimizerArguments:
    method to_dict (line 124) | def to_dict(self) -> Dict[str, Any]:
  class ProcessorArguments (line 129) | class ProcessorArguments:
  class ModelArguments (line 161) | class ModelArguments(ProcessorArguments):
    method __post_init__ (line 228) | def __post_init__(self):
    method to_dict (line 238) | def to_dict(self) -> Dict[str, Any]:
  class CheckpointArguments (line 243) | class CheckpointArguments:
  class PolicyLossArguments (line 260) | class PolicyLossArguments:
  class ActorArguments (line 274) | class ActorArguments:
    method to_dict (line 339) | def to_dict(self) -> Dict[str, Any]:
  class EvalSamplingArguments (line 344) | class EvalSamplingArguments:
  class LayerNameMapArguments (line 353) | class LayerNameMapArguments:
    method to_dict (line 357) | def to_dict(self) -> Dict[str, Any]:
  class MultiTurnArguments (line 362) | class MultiTurnArguments:
  class CustomAsyncServer (line 410) | class CustomAsyncServer:
  class AgentArguments (line 418) | class AgentArguments:
  class EngineArguments (line 436) | class EngineArguments:
  class RolloutArguments (line 442) | class RolloutArguments:
    method to_dict (line 493) | def to_dict(self) -> Dict[str, Any]:
  class RefArguments (line 498) | class RefArguments:
    method to_dict (line 533) | def to_dict(self) -> Dict[str, Any]:
  class ActorRolloutRefArguments (line 538) | class ActorRolloutRefArguments:
    method to_dict (line 546) | def to_dict(self) -> Dict[str, Any]:
  class CriticArguments (line 551) | class CriticArguments:
    method to_dict (line 583) | def to_dict(self) -> Dict[str, Any]:
  class OverlongBufferArguments (line 588) | class OverlongBufferArguments:
    method to_dict (line 596) | def to_dict(self) -> Dict[str, Any]:
  class RewardModelArguments (line 601) | class RewardModelArguments:
    method to_dict (line 635) | def to_dict(self) -> Dict[str, Any]:
  class KLCtrlArguments (line 640) | class KLCtrlArguments:
  class FilterGroupsArguments (line 648) | class FilterGroupsArguments:
    method to_dict (line 657) | def to_dict(self) -> Dict[str, Any]:
  class AlgorithmArguments (line 662) | class AlgorithmArguments:
    method to_dict (line 689) | def to_dict(self) -> Dict[str, Any]:

FILE: siirl/params/parser.py
  function _set_transformers_logging (line 26) | def _set_transformers_logging() -> None:
  function parse_config (line 33) | def parse_config() -> SiiRLArguments:

FILE: siirl/params/profiler_args.py
  class ProfilerArguments (line 20) | class ProfilerArguments:
    method to_dict (line 39) | def to_dict(self) -> Dict[str, Any]:

FILE: siirl/params/training_args.py
  class TrainingArguments (line 29) | class TrainingArguments:
    method to_dict (line 78) | def to_dict(self) -> Dict[str, Any]:
  class CustomRewardArguments (line 83) | class CustomRewardArguments:
  class SiiRLArguments (line 90) | class SiiRLArguments:
    method to_dict (line 101) | def to_dict(self) -> Dict[str, Any]:

FILE: siirl/third_party/sglang/parallel_state.py
  function initialize_parallel_state (line 38) | def initialize_parallel_state(
  function ensure_model_parallel_initialized (line 77) | def ensure_model_parallel_initialized(
  function model_parallel_is_initialized (line 105) | def model_parallel_is_initialized():
  function initialize_model_parallel_for_sglang (line 111) | def initialize_model_parallel_for_sglang(
  function initialize_model_parallel (line 204) | def initialize_model_parallel(
  function get_device_mesh (line 292) | def get_device_mesh():
  function get_tensor_model_parallel_group (line 306) | def get_tensor_model_parallel_group():
  function get_tensor_model_parallel_world_size (line 313) | def get_tensor_model_parallel_world_size():
  function get_tensor_model_parallel_rank (line 318) | def get_tensor_model_parallel_rank():
  function get_tensor_model_parallel_src_rank (line 323) | def get_tensor_model_parallel_src_rank():

FILE: siirl/user_interface/filter_interface/dapo.py
  function dynamic_sampling (line 24) | def dynamic_sampling(config: SiiRLArguments, batch: TensorDict, **kwargs...

FILE: siirl/user_interface/filter_interface/embodied.py
  function verify (line 28) | def verify(
  function _filter_batch (line 74) | def _filter_batch(batch: TensorDict, n_samples: int, config: SiiRLArgume...
  function _compute_embodied_verification_metrics (line 145) | def _compute_embodied_verification_metrics(
  function embodied_local_rank_sampling (line 192) | def embodied_local_rank_sampling(

FILE: siirl/user_interface/rewards_interface/custom_gsm8k_reward.py
  function extract_solution (line 18) | def extract_solution(solution_str, method="strict"):
  function compute_score (line 43) | def compute_score(data_source, solution_str, ground_truth, extra_info):

FILE: siirl/utils/checkpoint/checkpoint_manager.py
  class BaseCheckpointManager (line 33) | class BaseCheckpointManager:
    method __init__ (line 48) | def __init__(
    method should_save_model (line 88) | def should_save_model(self) -> bool:
    method should_save_optimizer (line 95) | def should_save_optimizer(self) -> bool:
    method should_save_extra (line 102) | def should_save_extra(self) -> bool:
    method should_save_hf_model (line 109) | def should_save_hf_model(self) -> bool:
    method should_load_model (line 117) | def should_load_model(self) -> bool:
    method should_load_optimizer (line 124) | def should_load_optimizer(self) -> bool:
    method should_load_extra (line 131) | def should_load_extra(self) -> bool:
    method load_checkpoint (line 137) | def load_checkpoint(self, local_path: str, hdfs_path: str = None, del_...
    method save_checkpoint (line 140) | def save_checkpoint(
    method checkpath (line 146) | def checkpath(local_path: str, hdfs_path: str):
    method remove_previous_save_local_path (line 150) | def remove_previous_save_local_path(self, path):
    method local_mkdir (line 165) | def local_mkdir(path):
    method get_rng_state (line 186) | def get_rng_state():
    method load_rng_state (line 201) | def load_rng_state(rng_state):
  function find_latest_ckpt_path (line 212) | def find_latest_ckpt_path(path, directory_format="global_step_{}"):
  function get_checkpoint_tracker_filename (line 244) | def get_checkpoint_tracker_filename(root_path: str):

FILE: siirl/utils/checkpoint/fsdp_checkpoint_manager.py
  class FSDPCheckpointManager (line 33) | class FSDPCheckpointManager(BaseCheckpointManager):
    method __init__ (line 51) | def __init__(
    method load_checkpoint (line 76) | def load_checkpoint(self, local_path: str, hdfs_path: str = None, del_...
    method save_checkpoint (line 129) | def save_checkpoint(self, local_path: str, hdfs_path: str = None, glob...

FILE: siirl/utils/checkpoint/megatron_checkpoint_manager.py
  class MegatronCheckpointManager (line 44) | class MegatronCheckpointManager(BaseCheckpointManager):
    method __init__ (line 98) | def __init__(
    method get_rng_state (line 147) | def get_rng_state(self, use_dist_ckpt: bool = True, data_parallel_rand...
    method get_checkpoint_name (line 181) | def get_checkpoint_name(
    method generate_state_dict (line 227) | def generate_state_dict(
    method load_rng_states (line 267) | def load_rng_states(self, rng_states, data_parallel_random_init=False,...
    method load_checkpoint (line 285) | def load_checkpoint(self, local_path: str, hdfs_path: str = None, del_...
    method save_checkpoint (line 356) | def save_checkpoint(self, local_path: str, hdfs_path: str = None, glob...

FILE: siirl/utils/debug/mstx_profile.py
  function mark_start_range (line 30) | def mark_start_range(message: Optional[str] = None) -> None:
  function mark_end_range (line 40) | def mark_end_range(range_id: str) -> None:
  function mark_annotate (line 50) | def mark_annotate(message: Optional[str] = None) -> Callable:
  function marked_timer (line 66) | def marked_timer(name: str, timing_raw: dict[str, float], *args: Any, **...
  function get_npu_profiler (line 90) | def get_npu_profiler(config: ProfilerArguments, role: Optional[str] = No...
  class NPUProfiler (line 136) | class NPUProfiler(DistProfiler):
    method __init__ (line 143) | def __init__(self, rank: int, config: ProfilerArguments, **kwargs):
    method start (line 149) | def start(self, **kwargs):
    method stop (line 159) | def stop(self):
    method annotate (line 168) | def annotate(message: Optional[str] = None, role: Optional[str] = None...

FILE: siirl/utils/debug/performance.py
  function _get_current_mem_info (line 26) | def _get_current_mem_info(unit: str = "GB", precision: int = 2) -> Tuple...
  function log_gpu_memory_usage (line 44) | def log_gpu_memory_usage(head: str, logger=None, level="DEBUG", rank: in...
  class GPUMemoryLogger (line 55) | class GPUMemoryLogger(DecoratorLoggerBase):
    method __init__ (line 65) | def __init__(self, role: str, logger: logging.Logger = None, level=log...
    method __call__ (line 72) | def __call__(self, decorated_function: callable):
    method log (line 78) | def log(self, func, *args, **kwargs):
  function log_print (line 93) | def log_print(ctn: Any):

FILE: siirl/utils/debug/profile.py
  class Profiler (line 26) | class Profiler:
    method __init__ (line 27) | def __init__(self, config):
    method _validate (line 54) | def _validate(self):
    method check (line 63) | def check(self):
    method start (line 66) | def start(self):
    method step (line 71) | def step(self):
    method stop (line 75) | def stop(self):
    method save (line 80) | def save(self):
    method stop_and_save (line 90) | def stop_and_save(self):
    method stop_trace (line 95) | def stop_trace(self):
  function mark_start_range (line 101) | def mark_start_range(
  function mark_end_range (line 118) | def mark_end_range(range_id: str) -> None:
  function mark_annotate (line 127) | def mark_annotate(
  class DistProfiler (line 151) | class DistProfiler:
    method __init__ (line 153) | def __init__(self, rank: int, config: ProfilerArguments, **kwargs):
    method start (line 159) | def start(self, **kwargs):
    method stop (line 162) | def stop(self):
    method annotate (line 166) | def annotate(

FILE: siirl/utils/embodied/libero_utils.py
  function get_libero_env (line 18) | def get_libero_env(task, model_family, gpu_id=-1, resolution=256):
  function get_libero_dummy_action (line 33) | def get_libero_dummy_action(model_family: str):
  function resize_image (line 38) | def resize_image(img, resize_size):
  function get_libero_image (line 56) | def get_libero_image(obs, resize_size):
  function get_libero_wrist_image (line 67) | def get_libero_wrist_image(obs, resize_size):
  function quat2axisangle (line 93) | def quat2axisangle(quat):
  function get_image_resize_size (line 119) | def get_image_resize_size(cfg):
  function normalize_gripper_action (line 150) | def normalize_gripper_action(action: np.ndarray, binarize: bool = True) ...
  function invert_gripper_action (line 190) | def invert_gripper_action(action: np.ndarray) -> np.ndarray:
  function save_rollout_video (line 211) | def save_rollout_video(rollout_images, exp_name, task_name, step_idx, su...

FILE: siirl/utils/embodied/openvla_utils.py
  function update_auto_map (line 27) | def update_auto_map(pretrained_checkpoint: str) -> None:
  function check_identical_files (line 128) | def check_identical_files(path1: Union[str, Path], path2: Union[str, Pat...
  function _handle_file_sync (line 149) | def _handle_file_sync(curr_filepath: str, checkpoint_filepath: str, file...
  function check_model_logic_mismatch (line 197) | def check_model_logic_mismatch(pretrained_checkpoint: str) -> None:

FILE: siirl/utils/embodied/video_emb.py
  class VideoEmbeddingModel (line 29) | class VideoEmbeddingModel:
    method __init__ (line 34) | def __init__(self, model_path: str, img_size: int = 384, device_id: in...
    method _build_pt_video_transform (line 46) | def _build_pt_video_transform(self):
    method _load_pretrained_vjepa_pt_weights (line 59) | def _load_pretrained_vjepa_pt_weights(self, model, pretrained_weights):
    method _create_model_instance (line 71) | def _create_model_instance(self):
    method offload_to_host (line 81) | def offload_to_host(self):
    method load_to_device (line 87) | def load_to_device(self):
    method extract_video_embedding (line 92) | def extract_video_embedding(self, video_tensor):
    method extract_video_embedding_batch (line 109) | def extract_video_embedding_batch(self, video_tensor_list):
    method get_embeddings (line 124) | def get_embeddings(self, batch_names, batch_frames):

FILE: siirl/utils/experimental/torch_functional.py
  function _fused_linear_for_ppo_fwd (line 20) | def _fused_linear_for_ppo_fwd(hidden_states: torch.FloatTensor, vocab_we...
  function _fused_linear_for_ppo_bwd (line 35) | def _fused_linear_for_ppo_bwd(
  class FusedLinearForPPOFunction (line 70) | class FusedLinearForPPOFunction(torch.autograd.Function):
    method forward (line 72) | def forward(
    method backward (line 127) | def backward(ctx, dlog_probs: Optional[torch.FloatTensor], dentropy: O...
  class FusedLinearForPPO (line 191) | class FusedLinearForPPO(torch.nn.Module):
    method __init__ (line 192) | def __init__(self, chunk_size: int = 512):
    method forward (line 197) | def forward(

FILE: siirl/utils/extras/device.py
  function is_torch_npu_available (line 18) | def is_torch_npu_available() -> bool:
  function get_device_name (line 32) | def get_device_name() -> str:
  function get_torch_device (line 47) | def get_torch_device() -> any:
  function get_device_id (line 60) | def get_device_id() -> int:
  function get_nccl_backend (line 67) | def get_nccl_backend() -> str:
  function device_synchronize (line 80) | def device_synchronize():
  function set_expandable_segments (line 94) | def set_expandable_segments(enable: bool) -> None:

FILE: siirl/utils/extras/fs.py
  function is_non_local (line 34) | def is_non_local(path):
  function md5_encode (line 46) | def md5_encode(path: str) -> str:
  function get_local_temp_path (line 61) | def get_local_temp_path(hdfs_path: str, cache_dir: str) -> str:
  function verify_copy (line 82) | def verify_copy(src: str, dest: str) -> bool:
  function copy_to_shm (line 141) | def copy_to_shm(src: str):
  function _record_directory_structure (line 161) | def _record_directory_structure(folder_path):
  function _check_directory_structure (line 175) | def _check_directory_structure(folder_path, record_file):
  function copy_to_local (line 192) | def copy_to_local(src: str, cache_dir=None, filelock=".file.lock", verbo...
  function copy_local_path_from_hdfs (line 214) | def copy_local_path_from_hdfs(src: str, cache_dir=None, filelock=".file....
  function local_mkdir_safe (line 256) | def local_mkdir_safe(path):

FILE: siirl/utils/extras/hdfs_io.py
  function exists (line 25) | def exists(path: str, **kwargs) -> bool:
  function _exists (line 41) | def _exists(file_path: str):
  function makedirs (line 48) | def makedirs(name, mode=0o777, exist_ok=False, **kwargs) -> None:
  function _mkdir (line 73) | def _mkdir(file_path: str) -> bool:
  function copy (line 82) | def copy(src: str, dst: str, **kwargs) -> bool:
  function _copy (line 111) | def _copy(from_path: str, to_path: str, timeout: int = None) -> bool:
  function _run_cmd (line 138) | def _run_cmd(cmd: str, timeout=None):
  function _hdfs_cmd (line 142) | def _hdfs_cmd(cmd: str) -> str:
  function _is_non_local (line 146) | def _is_non_local(path: str):

FILE: siirl/utils/extras/import_utils.py
  function is_megatron_core_available (line 25) | def is_megatron_core_available():
  function is_vllm_available (line 34) | def is_vllm_available():
  function is_sglang_available (line 43) | def is_sglang_available():
  function is_nvtx_available (line 52) | def is_nvtx_available():
  function import_external_libs (line 60) | def import_external_libs(external_libs=None):
  function load_extern_type (line 71) | def load_extern_type(file_path: Optional[str], type_name: Optional[str]):
  function _get_qualified_name (line 95) | def _get_qualified_name(func):
  function deprecated (line 102) | def deprecated(replacement: str = ""):

FILE: siirl/utils/extras/misc.py
  class AverageMeter (line 35) | class AverageMeter:
    method __init__ (line 40) | def __init__(self):
    method reset (line 43) | def reset(self):
    method update (line 49) | def update(self, val, n=1):
  function check_version (line 56) | def check_version(requirement: str, mandatory: bool = False) -> None:
  function check_dependencies (line 72) | def check_dependencies() -> None:
  function calculate_tps (line 85) | def calculate_tps(
  function count_parameters (line 104) | def count_parameters(model: "torch.nn.Module") -> Tuple[int, int]:
  function get_current_device (line 133) | def get_current_device() -> "torch.device":
  function get_device_count (line 151) | def get_device_count() -> int:
  function get_logits_processor (line 165) | def get_logits_processor() -> "LogitsProcessorList":
  function get_peak_memory (line 174) | def get_peak_memory() -> Tuple[int, int]:
  function has_tokenized_data (line 186) | def has_tokenized_data(path: "os.PathLike") -> bool:
  function infer_optim_dtype (line 193) | def infer_optim_dtype(model_dtype: "torch.dtype") -> "torch.dtype":
  function is_gpu_or_npu_available (line 205) | def is_gpu_or_npu_available() -> bool:
  function is_env_enabled (line 212) | def is_env_enabled(env_var: str, default: str = "0") -> bool:
  function numpify (line 219) | def numpify(inputs: Union["NDArray", "torch.Tensor"]) -> "NDArray":
  function skip_check_imports (line 233) | def skip_check_imports() -> None:
  function torch_gc (line 241) | def torch_gc() -> None:
  function try_download_model_from_other_hub (line 256) | def try_download_model_from_other_hub(model_args: "ModelArguments") -> str:
  function use_modelscope (line 282) | def use_modelscope() -> bool:
  function use_openmind (line 286) | def use_openmind() -> bool:
  function use_ray (line 290) | def use_ray() -> bool:

FILE: siirl/utils/extras/net_utils.py
  function is_ipv4 (line 30) | def is_ipv4(ip_str: str) -> bool:
  function is_ipv6 (line 47) | def is_ipv6(ip_str: str) -> bool:

FILE: siirl/utils/extras/packages.py
  function _is_package_available (line 12) | def _is_package_available(name: str) -> bool:
  function _get_package_version (line 16) | def _get_package_version(name: str) -> "Version":
  function is_pyav_available (line 23) | def is_pyav_available():
  function is_librosa_available (line 27) | def is_librosa_available():
  function is_fastapi_available (line 31) | def is_fastapi_available():
  function is_galore_available (line 35) | def is_galore_available():
  function is_apollo_available (line 39) | def is_apollo_available():
  function is_gradio_available (line 43) | def is_gradio_available():
  function is_matplotlib_available (line 47) | def is_matplotlib_available():
  function is_pillow_available (line 51) | def is_pillow_available():
  function is_ray_available (line 55) | def is_ray_available():
  function is_requests_available (line 59) | def is_requests_available():
  function is_rouge_available (line 63) | def is_rouge_available():
  function is_starlette_available (line 67) | def is_starlette_available():
  function is_transformers_version_greater_than (line 72) | def is_transformers_version_greater_than(content: str):
  function is_uvicorn_available (line 76) | def is_uvicorn_available():
  function is_vllm_available (line 80) | def is_vllm_available():

FILE: siirl/utils/extras/patch.py
  function verify (line 24) | def verify(

FILE: siirl/utils/extras/py_functional.py
  function _mp_target_wrapper (line 30) | def _mp_target_wrapper(target_func: Callable, mp_queue: multiprocessing....
  function timeout_limit (line 51) | def timeout_limit(seconds: float, use_signals: bool = False):
  function union_two_dict (line 141) | def union_two_dict(dict1: Dict, dict2: Dict):
  function append_to_dict (line 159) | def append_to_dict(data: Dict, new_data: Dict):
  class NestedNamespace (line 178) | class NestedNamespace(SimpleNamespace):
    method __init__ (line 194) | def __init__(self, dictionary, **kwargs):
  class DynamicEnumMeta (line 203) | class DynamicEnumMeta(type):
    method __iter__ (line 204) | def __iter__(cls) -> Iterator[Any]:
    method __contains__ (line 207) | def __contains__(cls, item: Any) -> bool:
    method __getitem__ (line 213) | def __getitem__(cls, name: str) -> Any:
    method __reduce_ex__ (line 216) | def __reduce_ex__(cls, protocol):
    method names (line 220) | def names(cls):
    method values (line 223) | def values(cls):
  class DynamicEnum (line 227) | class DynamicEnum(metaclass=DynamicEnumMeta):
    method __init__ (line 231) | def __init__(self, name: str, value: int):
    method __repr__ (line 235) | def __repr__(self):
    method __reduce_ex__ (line 238) | def __reduce_ex__(self, protocol):
    method register (line 248) | def register(cls, name: str) -> "DynamicEnum":
    method remove (line 259) | def remove(cls, name: str):
    method from_name (line 266) | def from_name(cls, name: str) -> Optional["DynamicEnum"]:
  function convert_to_regular_types (line 270) | def convert_to_regular_types(obj):

FILE: siirl/utils/extras/ray_utils.py
  function ray_noset_visible_devices (line 25) | def ray_noset_visible_devices(env_vars=os.environ):
  function parallel_put (line 48) | def parallel_put(data_list: List[Any], max_workers: Optional[int] = None):

FILE: siirl/utils/import_string.py
  function import_string (line 18) | def import_string(import_name: str):

FILE: siirl/utils/kernel/kernels.py
  function null_decorator (line 56) | def null_decorator(*args, **kwargs):
  class EntropyReductionEnum (line 73) | class EntropyReductionEnum:
  function get_entropy_reduction_enum_number (line 83) | def get_entropy_reduction_enum_number(reduction: str) -> int:
  function get_entropy_reduction_enum (line 99) | def get_entropy_reduction_enum(ce_reduction: int) -> EntropyReductionEnum:
  class BackwardEnum (line 116) | class BackwardEnum:
  class Config (line 130) | class Config:
  function set_backward_method (line 145) | def set_backward_method(backward_method: BackwardEnum):
  function efficient_entropy_kernel_general_mainloop (line 158) | def efficient_entropy_kernel_general_mainloop(
  function efficient_entropy_triton_kernel_epilogue (line 292) | def efficient_entropy_triton_kernel_epilogue(
  function efficient_entropy_triton_kernel_epilogue_tp (line 384) | def efficient_entropy_triton_kernel_epilogue_tp(
  function efficient_entropy_triton_epilogue_tp_update (line 460) | def efficient_entropy_triton_epilogue_tp_update(
  function efficient_entropy_forward (line 507) | def efficient_entropy_forward(
  function efficient_entropy_backward_kernel_general_mainloop_MN (line 711) | def efficient_entropy_backward_kernel_general_mainloop_MN(
  function efficient_entropy_backward_kernel_d_hidden (line 888) | def efficient_entropy_backward_kernel_d_hidden(
  function efficient_entropy_backward_kernel_d_weight (line 1015) | def efficient_entropy_backward_kernel_d_weight(
  function efficient_entropy_backward_kernel_general_d_logits (line 1135) | def efficient_entropy_backward_kernel_general_d_logits(
  function efficient_entropy_backward_kernel_general_d_logits_split_N (line 1274) | def efficient_entropy_backward_kernel_general_d_logits_split_N(
  function efficient_entropy_backward (line 1378) | def efficient_entropy_backward(

FILE: siirl/utils/kernel/linear_cross_entropy.py
  class LinearCrossEntropy (line 40) | class LinearCrossEntropy(torch.autograd.Function):
    method forward (line 42) | def forward(
    method backward (line 90) | def backward(ctx, dlogprobs: torch.Tensor, dentropy: torch.Tensor) -> ...

FILE: siirl/utils/logger/aggregate_logger.py
  function concat_dict_to_str (line 24) | def concat_dict_to_str(dict: Dict, step):
  class LocalLogger (line 33) | class LocalLogger:
    method __init__ (line 34) | def __init__(self, remote_logger=None, enable_wandb=False, print_to_co...
    method flush (line 37) | def flush(self):
    method log (line 40) | def log(self, data, step):
  class DecoratorLoggerBase (line 45) | class DecoratorLoggerBase:
    method __init__ (line 46) | def __init__(self, role: str, logger: logging.Logger = None, level=log...
    method log_by_print (line 56) | def log_by_print(self, log_str):
    method log_by_logging (line 60) | def log_by_logging(self, log_str):
  function log_with_rank (line 67) | def log_with_rank(message: str, rank, logger: logging.Logger, level=logg...

FILE: siirl/utils/logger/logging_utils.py
  function set_basic_config (line 26) | def set_basic_config():

FILE: siirl/utils/logger/tracking.py
  class Tracking (line 27) | class Tracking:
    method __init__ (line 40) | def __init__(self, project_name, experiment_name, default_backend: Uni...
    method log (line 132) | def log(self, data, step, backend=None):
    method __del__ (line 137) | def __del__(self):
  class ClearMLLogger (line 151) | class ClearMLLogger:
    method __init__ (line 152) | def __init__(self, project_name: str, experiment_name: str, config):
    method _get_logger (line 167) | def _get_logger(self):
    method log (line 170) | def log(self, data, step):
    method finish (line 196) | def finish(self):
  class _TensorboardAdapter (line 200) | class _TensorboardAdapter:
    method __init__ (line 201) | def __init__(self):
    method log (line 211) | def log(self, data, step):
    method finish (line 215) | def finish(self):
  class _MlflowLoggingAdapter (line 219) | class _MlflowLoggingAdapter:
    method log (line 220) | def log(self, data, step):
  function _compute_mlflow_params_from_objects (line 227) | def _compute_mlflow_params_from_objects(params) -> Dict[str, Any]:
  function _transform_params_to_json_serializable (line 234) | def _transform_params_to_json_serializable(x, convert_list_to_dict: bool):
  function _flatten_dict (line 254) | def _flatten_dict(raw: Dict[str, Any], *, sep: str) -> Dict[str, Any]:
  class ValidationGenerationsLogger (line 263) | class ValidationGenerationsLogger:
    method log (line 264) | def log(self, loggers, samples, step):
    method log_generations_to_wandb (line 277) | def log_generations_to_wandb(self, samples, step):
    method log_generations_to_swanlab (line 304) | def log_generations_to_swanlab(self, samples, step):
    method log_generations_to_mlflow (line 326) | def log_generations_to_mlflow(self, samples, step):
    method log_generations_to_clearml (line 348) | def log_generations_to_clearml(self, samples, step):
    method log_generations_to_tensorboard (line 376) | def log_generations_to_tensorboard(self, samples, step):

FILE: siirl/utils/megatron/dist_checkpointing.py
  function save_dist_checkpointing (line 32) | def save_dist_checkpointing(sharded_state_dict, ckpt_path, async_save=Fa...
  function load_dist_checkpointing (line 52) | def load_dist_checkpointing(sharded_state_dict, ckpt_dir):

FILE: siirl/utils/megatron/megatron_utils.py
  function get_model_config (line 45) | def get_model_config(model):
  function get_model (line 49) | def get_model(
  class McoreModuleWrapperConfig (line 162) | class McoreModuleWrapperConfig:
  function make_megatron_module (line 171) | def make_megatron_module(
  function unwrap_model (line 222) | def unwrap_model(model, module_instances=ALL_MODULE_WRAPPER_CLASSNAMES):
  function convert_config (line 237) | def convert_config(hf_config: PretrainedConfig, megatron_config) -> Tran...
  function init_megatron_optim_config (line 279) | def init_megatron_optim_config(optim_config: Dict) -> OptimizerConfig:
  function mcore_model_parallel_config (line 302) | def mcore_model_parallel_config(
  function offload_megatron_model_to_cpu (line 328) | def offload_megatron_model_to_cpu(models):
  function load_megatron_model_to_gpu (line 364) | def load_megatron_model_to_gpu(models, load_grad=True):
  function offload_megatron_copy_params (line 393) | def offload_megatron_copy_params(optimizers):
  function load_megatron_copy_params (line 434) | def load_megatron_copy_params(optimizers):
  function offload_megatron_optimizer (line 475) | def offload_megatron_optimizer(optimizers):
  function load_megatron_optimizer (line 494) | def load_megatron_optimizer(optimizers):
  function print_rank_0 (line 512) | def print_rank_0(message):
  function get_dist_checkpoint_path (line 521) | def get_dist_checkpoint_path(checkpoint_path):
  function get_hf_model_checkpoint_path (line 527) | def get_hf_model_checkpoint_path(checkpoint_path):
  function get_transformer_config_checkpoint_path (line 533) | def get_transformer_config_checkpoint_path(checkpoint_path):
  function get_model_checkpoint_path (line 538) | def get_model_checkpoint_path(checkpoint_path):
  function get_hf_config_and_tokenizer_checkpoint_path (line 543) | def get_hf_config_and_tokenizer_checkpoint_path(checkpoint_path):
  function get_optimizer_checkpoint_path (line 548) | def get_optimizer_checkpoint_path(checkpoint_path, use_distributed_optim...
  function get_rng_states_checkpoint_path (line 560) | def get_rng_states_checkpoint_path(checkpoint_path, only_rank0_save=True):
  function convert_megatron_model_to_transformers_model (line 572) | def convert_megatron_model_to_transformers_model(
  function broadcast_from_megatron_pp (line 712) | def broadcast_from_megatron_pp(tensor: torch.Tensor):
  function broadcast_str_from_megatron_pp (line 747) | def broadcast_str_from_megatron_pp(obj: Any):
  function default_tp_concat_fn (line 771) | def default_tp_concat_fn(layer_name_mapping, name, train_params, infer_p...
  function per_tensor_generator (line 829) | def per_tensor_generator(actor_module, model_config, weight_converter, t...
  function get_transformer_layer_offset (line 947) | def get_transformer_layer_offset(pipeline_rank, vp_rank, config: Transfo...

FILE: siirl/utils/megatron/memory.py
  class MemoryBuffer (line 19) | class MemoryBuffer:
    method __init__ (line 20) | def __init__(self, numel, numel_padded, dtype):
    method zero (line 26) | def zero(self):
    method get (line 30) | def get(self, shape, start_index):

FILE: siirl/utils/megatron/memory_buffer.py
  class MemoryBuffer (line 26) | class MemoryBuffer:
    method __init__ (line 32) | def __init__(self, numel: int, numel_padded: int, dtype: torch.dtype, ...
    method zero (line 41) | def zero(self):
    method get (line 45) | def get(self, shape, start_index):
  function calc_padded_numel (line 55) | def calc_padded_numel(shape: torch.Size, dtype: torch.dtype):
  function get_weight_buffer_meta_from_module (line 62) | def get_weight_buffer_meta_from_module(module: nn.Module) -> Dict[str, D...
  function build_memory_buffer (line 72) | def build_memory_buffer(weight_buffer_meta: Dict[str, Dict]) -> Dict[tor...
  function build_memory_reference_from_module (line 101) | def build_memory_reference_from_module(module: torch.nn.Module, memory_b...
  function build_memory_reference (line 115) | def build_memory_reference(weight_buffer_meta: Dict[str, Dict], memory_b...
  class MemoryBufferModuleWrapper (line 142) | class MemoryBufferModuleWrapper:
    method __init__ (line 148) | def __init__(self, module: nn.Module):
    method get_memory_buffers (line 155) | def get_memory_buffers(self):
    method get_weight_buffer_meta (line 158) | def get_weight_buffer_meta(self):
  class MegatronMemoryBufferForRollout (line 162) | class MegatronMemoryBufferForRollout:
    method __init__ (line 177) | def __init__(self, transform_memory_param_fn):
    method initialize_weight_buffer (line 183) | def initialize_weight_buffer(self, weight_buffer_meta_pp: List[Dict[st...
    method build_memory_reference (line 201) | def build_memory_reference(self):
    method named_parameters (line 207) | def named_parameters(self):
    method weight_buffers (line 211) | def weight_buffers(self):
    method memory_buffers (line 215) | def memory_buffers(self):

FILE: siirl/utils/megatron/optimizer.py
  function get_megatron_optimizer (line 21) | def get_megatron_optimizer(
  function get_megatron_optimizer_param_scheduler (line 38) | def get_megatron_optimizer_param_scheduler(
  function get_megatron_last_lr (line 78) | def get_megatron_last_lr(optimizer):

FILE: siirl/utils/megatron/pipeline_parallel.py
  function compute_transformers_input_shapes (line 22) | def compute_transformers_input_shapes(batches, meta_info):
  function make_batch_generator (line 49) | def make_batch_generator(batches, vpp_size):

FILE: siirl/utils/megatron/sequence_parallel.py
  function mark_parameter_as_sequence_parallel (line 21) | def mark_parameter_as_sequence_parallel(parameter):
  function is_sequence_parallel_param (line 25) | def is_sequence_parallel_param(param):
  function pad_to_sequence_parallel (line 29) | def pad_to_sequence_parallel(unpad_tokens: torch.Tensor):

FILE: siirl/utils/megatron/tensor_parallel.py
  function update_kwargs_with_config (line 30) | def update_kwargs_with_config(dictionary: Dict, config: "ModelParallelCo...
  function get_default_kwargs_for_model_parallel_config (line 35) | def get_default_kwargs_for_model_parallel_config():
  function get_default_model_parallel_config (line 46) | def get_default_model_parallel_config():
  function get_common_default_kwargs_for_parallel_linear (line 52) | def get_common_default_kwargs_for_parallel_linear():
  function get_default_kwargs_for_column_parallel_linear (line 63) | def get_default_kwargs_for_column_parallel_linear():
  function get_default_kwargs_for_row_parallel_linear (line 79) | def get_default_kwargs_for_row_parallel_linear():
  function get_default_kwargs_for_parallel_embedding (line 84) | def get_default_kwargs_for_parallel_embedding():
  function is_tensor_parallel_param (line 95) | def is_tensor_parallel_param(param):
  function get_tensor_parallel_partition_dim (line 99) | def get_tensor_parallel_partition_dim(param):
  function get_tensor_parallel_partition_stride (line 104) | def get_tensor_parallel_partition_stride(param):
  class _VocabParallelEntropy (line 109) | class _VocabParallelEntropy(torch.autograd.Function):
    method forward (line 111) | def forward(ctx, vocab_parallel_logits: torch.Tensor) -> torch.Tensor:
    method backward (line 130) | def backward(ctx, grad_output: torch.Tensor) -> torch.Tensor:
  function vocab_parallel_entropy (line 142) | def vocab_parallel_entropy(vocab_parallel_logits: torch.Tensor) -> torch...
  function vocab_parallel_log_probs_from_logits (line 154) | def vocab_parallel_log_probs_from_logits(logits, labels):
  function vocab_parallel_log_probs_from_logits_response_rmpad (line 161) | def vocab_parallel_log_probs_from_logits_response_rmpad(input_ids, atten...

FILE: siirl/utils/memory_utils.py
  function aggressive_empty_cache (line 24) | def aggressive_empty_cache(force_sync: bool = True, max_retries: int = 3...
  function reset_memory_stats (line 69) | def reset_memory_stats() -> None:
  function get_memory_info (line 77) | def get_memory_info() -> dict:
  function log_memory_usage (line 95) | def log_memory_usage(stage: str = "current") -> None:
  function optimize_memory_for_inference (line 110) | def optimize_memory_for_inference() -> None:
  function optimize_memory_for_training (line 124) | def optimize_memory_for_training() -> None:

FILE: siirl/utils/metrics/metric_utils.py
  function _compute_response_info (line 37) | def _compute_response_info(batch: TensorDict) -> Dict[str, Any]:
  function compute_data_metric (line 70) | def compute_data_metric(data: TensorDict):
  function compute_timing_metrics (line 186) | def compute_timing_metrics(batch: TensorDict, timing_raw: Dict[str, floa...
  function compute_throughout_metrics (line 225) | def compute_throughout_metrics(batch: TensorDict, timing_raw: Dict[str, ...
  function _calculate_bootstrap_metrics (line 261) | def _calculate_bootstrap_metrics(group: pd.DataFrame, variable_name: str...
  function _process_prompt_group_task (line 329) | def _process_prompt_group_task(group: pd.DataFrame, numeric_variables: L...
  function bootstrap_metric (line 385) | def bootstrap_metric(
  function calc_maj_val (line 425) | def calc_maj_val(data: list[dict[str, Any]], vote_key: str, val_key: str...
  function process_validation_metrics (line 461) | def process_validation_metrics(
  function aggregate_validation_metrics (line 601) | def aggregate_validation_metrics(data_sources: List[str], sample_inputs:...

FILE: siirl/utils/model_utils/activation_offload.py
  function _get_unique_tensor_key (line 32) | def _get_unique_tensor_key(tensor):
  class FSDPParameterFilter (line 37) | class FSDPParameterFilter:
    method __init__ (line 38) | def __init__(self):
    method __call__ (line 41) | def __call__(self, tensor):
    method update_model_parameters (line 44) | def update_model_parameters(self, model):
  class CpuOffloadHookWithOffloadHandler (line 51) | class CpuOffloadHookWithOffloadHandler:
    method __init__ (line 59) | def __init__(
    method __enter__ (line 70) | def __enter__(self):
    method __exit__ (line 74) | def __exit__(self, *args: Any):
    method on_save_for_backward (line 78) | def on_save_for_backward(self, tensor: torch.Tensor) -> Any:
    method on_get_saved_tensor (line 82) | def on_get_saved_tensor(self, saved_state: Any) -> torch.Tensor:
  class OffloadHandler (line 87) | class OffloadHandler:
    method __init__ (line 90) | def __init__(self) -> None:
    method tensor_push (line 93) | def tensor_push(self, tensor: torch.Tensor, **kwargs) -> Any:
    method tensor_pop (line 97) | def tensor_pop(self, tensor_tag: Any, **kwargs):
  class GroupCommitFunction (line 102) | class GroupCommitFunction(torch.autograd.Function):
    method forward (line 110) | def forward(ctx, tensor, cpu_offload_handler):
    method backward (line 118) | def backward(ctx, grad_output):
  class SynchronizedGroupOffloadHandler (line 128) | class SynchronizedGroupOffloadHandler(OffloadHandler):
    method __init__ (line 134) | def __init__(self, num_offload_group, tensor_need_offloading_checker=(...
    method groupid_reset (line 142) | def groupid_reset(self):
    method on_group_commit_forward (line 152) | def on_group_commit_forward(self):
    method on_group_commit_backward (line 158) | def on_group_commit_backward(self):
    method offload (line 164) | def offload(src_tensor, pin_memory=True):
    method reload (line 179) | def reload(state, non_blocking=None):
    method tensor_push (line 186) | def tensor_push(self, tensor: torch.Tensor, **kwargs):
    method tensor_pop (line 201) | def tensor_pop(self, tensor_tag, **kwargs):
  class AsyncDoubleBufferGroupOffloadHandler (line 212) | class AsyncDoubleBufferGroupOffloadHandler(SynchronizedGroupOffloadHandl...
    method __init__ (line 219) | def __init__(
    method tensor_push (line 254) | def tensor_push(self, tensor: torch.Tensor, **kwargs) -> Any:
    method tensor_pop (line 279) | def tensor_pop(self, tensor_tag, **kwargs):
    method bulk_offload_group (line 292) | def bulk_offload_group(self, group_to_offload):
    method synchronize_on_group_commit_forward (line 313) | def synchronize_on_group_commit_forward(self, current_group):
    method on_group_commit_forward (line 341) | def on_group_commit_forward(self):
    method bulk_reload_group (line 349) | def bulk_reload_group(self, group_to_reload):
    method on_group_commit_backward (line 367) | def on_group_commit_backward(self):
  function get_activation_offload_context (line 392) | def get_activation_offload_context(num_layers: int = 1, model_layers: in...
  class ActivationHandler (line 408) | class ActivationHandler:
    method __init__ (line 409) | def __init__(self, offload_ctx, sync_func, tensor_filter, enable_ckpt):
    method pre_forward (line 420) | def pre_forward(self, module):
    method post_forward (line 425) | def post_forward(self, module):
    method _pack_kwargs (line 429) | def _pack_kwargs(self, *args, **kwargs):
    method _unpack_kwargs (line 438) | def _unpack_kwargs(self, flat_args, kwarg_keys):
    method _ckpt_forward (line 446) | def _ckpt_forward(self, forward_method, *args, **kwargs):
    method forward (line 461) | def forward(self, module, forward_method, *args, **kwargs):
    method wrap_module_forward_method (line 477) | def wrap_module_forward_method(self, module):
  function enable_activation_offloading (line 492) | def enable_activation_offloading(model, strategy, enable_ckpt=False):

FILE: siirl/utils/model_utils/attention_utils.py
  function _get_attention_functions (line 20) | def _get_attention_functions() -> tuple[Callable, Callable, Callable, Ca...
  function index_first_axis (line 37) | def index_first_axis(*args, **kwargs):
  function pad_input (line 53) | def pad_input(*args, **kwargs):
  function rearrange (line 69) | def rearrange(*args, **kwargs):
  function unpad_input (line 84) | def unpad_input(*args, **kwargs):

FILE: siirl/utils/model_utils/flops_counter.py
  function get_device_flops (line 33) | def get_device_flops(unit="T"):
  class FlopsCounter (line 65) | class FlopsCounter:
    method __init__ (line 75) | def __init__(self, config: PretrainedConfig, forward_only: bool = False):
    method _estimate_unknown_flops (line 97) | def _estimate_unknown_flops(self, tokens_sum, batch_seqlens, delta_time):
    method _estimate_qwen2_flops (line 100) | def _estimate_qwen2_flops(self, tokens_sum, batch_seqlens, delta_time):
    method _estimate_internvl_flops (line 142) | def _estimate_internvl_flops(self, tokens_sum, batch_seqlens, delta_ti...
    method _estimate_deepseek_v3_flops (line 198) | def _estimate_deepseek_v3_flops(self, tokens_sum, batch_seqlens, delta...
    method _estimate_qwen3_moe_flops (line 258) | def _estimate_qwen3_moe_flops(self, tokens_sum, batch_seqlens, delta_t...
    method _estimate_openvla_flops (line 305) | def _estimate_openvla_flops(self, tokens_sum, batch_seqlens, delta_time):
    method estimate_flops (line 371) | def estimate_flops(self, batch_seqlens, delta_time):

FILE: siirl/utils/model_utils/fsdp_utils.py
  function init_fn (line 45) | def init_fn(x: torch.nn.Module):
  function get_init_weight_context_manager (line 52) | def get_init_weight_context_manager(use_meta_tensor=Tru

Download .json

Condensed preview — 391 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (4,152K chars).

[
  {
    "path": ".gitignore",
    "chars": 1381,
    "preview": "**/*.pt\n**/checkpoints\n**/wget-log\n**/_build/\n**/*.ckpt\n**/outputs\n**/*.tar.gz\n**/playground\n**/wandb\n\n# Byte-compiled /"
  },
  {
    "path": ".pre-commit-config.yaml",
    "chars": 960,
    "preview": "\n# Default list of files to exclude from checks.\n# Add any other paths that should be ignored by all hooks.\nexclude: |\n "
  },
  {
    "path": ".readthedocs.yaml",
    "chars": 635,
    "preview": "# Read the Docs configuration file\n# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details\n\n# Requir"
  },
  {
    "path": "CONTRIBUTING.md",
    "chars": 1931,
    "preview": "# Contributing to siiRL\n\nThank you for considering contributing to siiRL!\n\nWe welcome contributions in various forms, in"
  },
  {
    "path": "LICENSE",
    "chars": 11358,
    "preview": "\n                                 Apache License\n                           Version 2.0, January 2004\n                  "
  },
  {
    "path": "README-zh.md",
    "chars": 6559,
    "preview": "<div align=\"center\">\n  <img src=\"asset/sii.png\" width=\"100%\"/>\n  <br>\n</div>\n<br>\n\n<h1 align=\"center\">\nsiiRL: Shanghai I"
  },
  {
    "path": "README.md",
    "chars": 11383,
    "preview": "\n<div align=\"center\">\n  <img src=\"asset/sii.png\" width=\"100%\"/>\n  <br>\n</div>\n<br>\n\n<h1 align=\"center\">\nsiiRL: Shanghai "
  },
  {
    "path": "docker/Dockerfile.cu124",
    "chars": 2392,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "docker/Dockerfile.cu126",
    "chars": 2167,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "docs/Makefile",
    "chars": 633,
    "preview": "# Minimal makefile for Sphinx documentation\n#\n\n# You can set these variables from the command line, and also\n# from the "
  },
  {
    "path": "docs/conf.py",
    "chars": 2268,
    "preview": "# Configuration file for the Sphinx documentation builder.\n#\n# For the full list of built-in configuration values, see t"
  },
  {
    "path": "docs/examples/config.rst",
    "chars": 11514,
    "preview": ".. _config-explain-page:\n\n===================\nConfiguration Guide\n===================\n\nsiiRL uses Hydra-based configurat"
  },
  {
    "path": "docs/examples/cpgd_example.rst",
    "chars": 3868,
    "preview": "DeepScaleR Example with CPGD\n==============================\n\nIntroduction\n------------\n\nThis example demonstrates how to"
  },
  {
    "path": "docs/examples/deepscaler_example.rst",
    "chars": 4061,
    "preview": "DeepScaleR Example with PPO\n=============================\n\nIntroduction\n------------\n\nThis example demonstrates how to f"
  },
  {
    "path": "docs/examples/embodied_srpo_example.rst",
    "chars": 19959,
    "preview": "Embodied SRPO Training\n======================\n\nIntroduction\n------------\n\nThis guide explains how to perform Embodied AI"
  },
  {
    "path": "docs/examples/megatron_backend_example.rst",
    "chars": 9145,
    "preview": "Megatron-LM Training Backend\n============================================\n\nIntroduction\n------------\n\nThis guide explain"
  },
  {
    "path": "docs/examples/mm_eureka_example.rst",
    "chars": 4296,
    "preview": "MM-Eureka Example with GRPO\n===========================\n\nIntroduction\n------------\n\nThis guide details how to fine-tune "
  },
  {
    "path": "docs/hardware_tutorial/ascend_profiling_en.rst",
    "chars": 3448,
    "preview": "Data Collection on Ascend Devices Based on the FSDP Backend\n============================================================"
  },
  {
    "path": "docs/hardware_tutorial/ascend_quickstart.rst",
    "chars": 7327,
    "preview": "Ascend NPU\n==========\n\nSiiRL is also supports for Huawei's Ascend NPU devices. This guide has been tested with the follo"
  },
  {
    "path": "docs/hardware_tutorial/metax_quickstart.rst",
    "chars": 13301,
    "preview": "MetaX(沐曦) GPU\n===============\n\nSiiRL is also supports for MetaX's GPU devices. This guide has been tested with the follo"
  },
  {
    "path": "docs/index.rst",
    "chars": 1370,
    "preview": ".. siiRL documentation master file, created by\n   sphinx-quickstart on Wed Jul  9 15:26:45 2025.\n   You can adapt this f"
  },
  {
    "path": "docs/preparation/prepare_data.rst",
    "chars": 5644,
    "preview": "Prepare Data for Post-Training\n========================================\n\nBefore starting the post-training job, we need "
  },
  {
    "path": "docs/preparation/reward_function.rst",
    "chars": 4598,
    "preview": "Implementing Reward Functions for Datasets\n===========================================\n\nIn Reinforcement Learning for LL"
  },
  {
    "path": "docs/programming_guide/code_structure.rst",
    "chars": 11132,
    "preview": "===============\nCode Structure\n===============\n\nThis document describes the code structure and architecture of siiRL.\n\nD"
  },
  {
    "path": "docs/programming_guide/siiRL_code_explained.rst",
    "chars": 15336,
    "preview": "siiRL's Implementation Explained\n================================\n\nsiiRL is under active development with an extensive r"
  },
  {
    "path": "docs/programming_guide/siirl_architecture_guide.rst",
    "chars": 113099,
    "preview": "=======================================\nsiiRL Complete Architecture Guide\n=======================================\n\n.. no"
  },
  {
    "path": "docs/programming_guide/srpo_code_explained.rst",
    "chars": 35534,
    "preview": "SRPO Code Implementation Explained\n==================================\n\nThis document provides a comprehensive guide to u"
  },
  {
    "path": "docs/requirements-docs.txt",
    "chars": 224,
    "preview": "# markdown support\nrecommonmark\nmyst_parser\n# markdown table support\nsphinx-markdown-tables\n\n# theme default rtd\n\n# crat"
  },
  {
    "path": "docs/start/install.rst",
    "chars": 2993,
    "preview": "Installation\n============\n\nsiiRL provides three primary installation methods. We **strongly recommend** using the Docker"
  },
  {
    "path": "docs/start/quickstart.rst",
    "chars": 12556,
    "preview": ".. _quickstart:\n\n=========================================================\nQuickstart: GRPO training on GSM8K dataset\n=="
  },
  {
    "path": "docs/user_interface/filter_interface.rst",
    "chars": 9602,
    "preview": "================\nFilter Interface\n================\n\nFilter interface is used for dynamic sampling and data filtering in "
  },
  {
    "path": "docs/user_interface/metrics_interface.rst",
    "chars": 29275,
    "preview": "=================\nMetrics Interface\n=================\n\nCustom metrics allow you to track and aggregate any quantitative "
  },
  {
    "path": "docs/user_interface/pipeline_interface.rst",
    "chars": 6842,
    "preview": "============\nPipeline API\n============\n\nPipeline is a declarative Python API for defining training workflows. Each Pipel"
  },
  {
    "path": "docs/user_interface/reward_interface.rst",
    "chars": 9042,
    "preview": "================\nReward Interface\n================\n\nCustom reward functions allow you to score model-generated responses"
  },
  {
    "path": "examples/cpgd_trainer/run_qwen2_5-7b.sh",
    "chars": 9251,
    "preview": "#!/usr/bin/env bash\n# ===================================================================================\n# ===         "
  },
  {
    "path": "examples/cpgd_trainer/run_qwen2_5_vl-72b.sh",
    "chars": 9400,
    "preview": "#!/usr/bin/env bash\n# ===================================================================================\n# ===         "
  },
  {
    "path": "examples/cpgd_trainer/run_qwen2_5_vl-7b.sh",
    "chars": 9340,
    "preview": "#!/usr/bin/env bash\n# ===================================================================================\n# ===         "
  },
  {
    "path": "examples/cpgd_trainer/run_qwen3-1.7b.sh",
    "chars": 9241,
    "preview": "#!/usr/bin/env bash\n# ===================================================================================\n# ===         "
  },
  {
    "path": "examples/cpgd_trainer/run_qwen3-8b.sh",
    "chars": 9237,
    "preview": "#!/usr/bin/env bash\n# ===================================================================================\n# ===         "
  },
  {
    "path": "examples/custom_pipeline_example/custom_grpo.py",
    "chars": 4228,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "examples/custom_reward/rewardfunc_gsm8k.py",
    "chars": 2092,
    "preview": "import re\n\n\ndef extract_solution(solution_str, method=\"strict\"):\n    assert method in [\"strict\", \"flexible\"]\n\n    if met"
  },
  {
    "path": "examples/custom_reward/run_qwen2_5-7b-custom_reward.sh",
    "chars": 11514,
    "preview": "#!/usr/bin/env bash\n# Exit immediately if a command exits with a non-zero status.\nset -e\nset -o pipefail\n# Print command"
  },
  {
    "path": "examples/dapo_trainer/run_qwen2_5-7b.sh",
    "chars": 12499,
    "preview": "#!/usr/bin/env bash\n# ===================================================================================\n# ===         "
  },
  {
    "path": "examples/dapo_trainer/run_qwen3-235b-megatron-gspo.sh",
    "chars": 14507,
    "preview": "#!/usr/bin/env bash\n# ===================================================================================\n# ===         "
  },
  {
    "path": "examples/dapo_trainer/run_qwen3-8b.sh",
    "chars": 12203,
    "preview": "#!/usr/bin/env bash\n# ===================================================================================\n# ===         "
  },
  {
    "path": "examples/data_preprocess/deepscaler.py",
    "chars": 3224,
    "preview": "# Copyright 2025, Shanghai Innovation Institute.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "examples/data_preprocess/geo3k.py",
    "chars": 2878,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "examples/data_preprocess/gsm8k.py",
    "chars": 2965,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "examples/data_preprocess/math_dataset.py",
    "chars": 2839,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "examples/data_preprocess/mm_eureka.py",
    "chars": 4084,
    "preview": "# Copyright 2025, Shanghai Innovation Institute.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "examples/embodied_srpo_trainer/run_openvla_oft_libero_goal.sh",
    "chars": 13201,
    "preview": "#!/usr/bin/env bash\n# ===================================================================================\n# ===    Embod"
  },
  {
    "path": "examples/embodied_srpo_trainer/run_openvla_oft_libero_long.sh",
    "chars": 13194,
    "preview": "#!/usr/bin/env bash\n# ===================================================================================\n# ===    Embod"
  },
  {
    "path": "examples/embodied_srpo_trainer/run_openvla_oft_libero_object.sh",
    "chars": 13205,
    "preview": "#!/usr/bin/env bash\n# ===================================================================================\n# ===    Embod"
  },
  {
    "path": "examples/embodied_srpo_trainer/run_openvla_oft_libero_spatial.sh",
    "chars": 13204,
    "preview": "#!/usr/bin/env bash\n# ===================================================================================\n# ===    Embod"
  },
  {
    "path": "examples/experimental/marft/config/code_env.py",
    "chars": 1443,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "examples/experimental/marft/config/math_env.py",
    "chars": 1403,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "examples/experimental/marft/config/process.py",
    "chars": 1352,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "examples/experimental/marft/config/workflow_marft.yaml",
    "chars": 4880,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "examples/experimental/marft/config/workflow_marft_code.yaml",
    "chars": 4995,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "examples/experimental/marft/run_qwen2_5-3b_marft.sh",
    "chars": 9874,
    "preview": "#!/usr/bin/env bash\n# ===================================================================================\n# ===         "
  },
  {
    "path": "examples/experimental/multiturn_server/run_qwen2_5-3b_grpo_multiturn_vllm.sh",
    "chars": 9808,
    "preview": "#!/usr/bin/env bash\n# ===================================================================================\n# ===         "
  },
  {
    "path": "examples/grpo_trainer/run_qwen2_5-32b-metax.sh",
    "chars": 10347,
    "preview": " #!/usr/bin/env bash\n# ===================================================================================\n# ===        "
  },
  {
    "path": "examples/grpo_trainer/run_qwen2_5-32b-npu.sh",
    "chars": 9756,
    "preview": " #!/usr/bin/env bash\n# ===================================================================================\n# ===        "
  },
  {
    "path": "examples/grpo_trainer/run_qwen2_5-72b-npu.sh",
    "chars": 9773,
    "preview": " #!/usr/bin/env bash\n# ===================================================================================\n# ===        "
  },
  {
    "path": "examples/grpo_trainer/run_qwen2_5-7b-npu-e2e_prof.sh",
    "chars": 8863,
    "preview": " #!/usr/bin/env bash\n# ===================================================================================\n# ===        "
  },
  {
    "path": "examples/grpo_trainer/run_qwen2_5-7b-npu-mindspeed.sh",
    "chars": 10690,
    "preview": " #!/usr/bin/env bash\n# ===================================================================================\n# ===        "
  },
  {
    "path": "examples/grpo_trainer/run_qwen2_5-7b-npu.sh",
    "chars": 9753,
    "preview": " #!/usr/bin/env bash\n# ===================================================================================\n# ===        "
  },
  {
    "path": "examples/grpo_trainer/run_qwen2_5-7b.sh",
    "chars": 9205,
    "preview": " #!/usr/bin/env bash\n# ===================================================================================\n# ===        "
  },
  {
    "path": "examples/grpo_trainer/run_qwen2_5_vl-72b.sh",
    "chars": 9265,
    "preview": "#!/usr/bin/env bash\n# ===================================================================================\n# ===         "
  },
  {
    "path": "examples/grpo_trainer/run_qwen2_5_vl-7b-npu.sh",
    "chars": 9047,
    "preview": " #!/usr/bin/env bash\n# ===================================================================================\n# ===        "
  },
  {
    "path": "examples/grpo_trainer/run_qwen2_5_vl-7b.sh",
    "chars": 9253,
    "preview": "\n#!/usr/bin/env bash\n# ===================================================================================\n# ===        "
  },
  {
    "path": "examples/grpo_trainer/run_qwen3-235b-megatron.sh",
    "chars": 11711,
    "preview": "#!/usr/bin/env bash\n# ===================================================================================\n# ===         "
  },
  {
    "path": "examples/grpo_trainer/run_qwen3-235b-npu-mindspeed.sh",
    "chars": 12311,
    "preview": " #!/usr/bin/env bash\n# ===================================================================================\n# ===        "
  },
  {
    "path": "examples/grpo_trainer/run_qwen3-30b-npu-mindspeed.sh",
    "chars": 10749,
    "preview": " #!/usr/bin/env bash\n# ===================================================================================\n# ===        "
  },
  {
    "path": "examples/grpo_trainer/run_qwen3-8b-megatron.sh",
    "chars": 9820,
    "preview": "#!/usr/bin/env bash\n# ===================================================================================\n# ===         "
  },
  {
    "path": "examples/grpo_trainer/run_qwen3-8b.sh",
    "chars": 9190,
    "preview": "#!/usr/bin/env bash\n# ===================================================================================\n# ===         "
  },
  {
    "path": "examples/gspo_trainer/run_qwen3-1.7b.sh",
    "chars": 11007,
    "preview": "#!/usr/bin/env bash\n# ===================================================================================\n# ===         "
  },
  {
    "path": "examples/gspo_trainer/run_qwen3-235b-megatron.sh",
    "chars": 13002,
    "preview": "#!/usr/bin/env bash\n# ===================================================================================\n# ===         "
  },
  {
    "path": "examples/gspo_trainer/run_qwen3-30b-gspo-megatron.sh",
    "chars": 12728,
    "preview": "#!/usr/bin/env bash\n# ===================================================================================\n# ===         "
  },
  {
    "path": "examples/multi_turn/config/interaction_config/gsm8k_interaction_config.yaml",
    "chars": 150,
    "preview": "interaction:\n  - name: \"gsm8k\"\n    class_name: \"siirl.execution.rollout_flow.multiturn.interactions.gsm8k_interaction.Gs"
  },
  {
    "path": "examples/multi_turn/config/tool_config/gsm8k_tool_config.yaml",
    "chars": 612,
    "preview": "tools:\n  - class_name: \"siirl.execution.rollout_flow.multiturn.tools.gsm8k_tool.Gsm8kTool\"\n    config: \n      type: nati"
  },
  {
    "path": "examples/multi_turn/gsm8k/run_qwen2_5-3b_grpo_multiturn_sglang.sh",
    "chars": 9664,
    "preview": "#!/usr/bin/env bash\n# ===================================================================================\n# ===         "
  },
  {
    "path": "examples/ppo_trainer/run_qwen2_5-72b.sh",
    "chars": 9620,
    "preview": "#!/usr/bin/env bash\n# ===================================================================================\n# ===         "
  },
  {
    "path": "examples/ppo_trainer/run_qwen3-8b-megatron.sh",
    "chars": 10891,
    "preview": "#!/usr/bin/env bash\n# ===================================================================================\n# ===         "
  },
  {
    "path": "examples/ppo_trainer/run_qwen3-8b.sh",
    "chars": 9546,
    "preview": "#!/usr/bin/env bash\n# ===================================================================================\n# ===         "
  },
  {
    "path": "pyproject.toml",
    "chars": 3942,
    "preview": "# ===================================================================\n# pyproject.toml for siirl\n#\n# PEP 621-compliant c"
  },
  {
    "path": "requirements-npu.txt",
    "chars": 362,
    "preview": "accelerate\ncodetiming\ndatasets>=4.0.0\ndill\nhydra-core\nnumpy\npandas\npeft\npyarrow>=19.0.0\npybind11\npylatexenc\nray[default]"
  },
  {
    "path": "requirements.txt",
    "chars": 316,
    "preview": "accelerate\ncodetiming\ndatasets>=4.0.0\ndill\nhydra-core\nnumpy\npandas\npeft\npyarrow>=19.0.0\npybind11\npylatexenc\nray[default]"
  },
  {
    "path": "setup.py",
    "chars": 1023,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/__init__.py",
    "chars": 1975,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/dag_worker/__init__.py",
    "chars": 616,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/dag_worker/checkpoint_manager.py",
    "chars": 10379,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/dag_worker/constants.py",
    "chars": 1683,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/dag_worker/core_algos.py",
    "chars": 62763,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2022 The HuggingFace Team. All rights reserved.\n#\n# Li"
  },
  {
    "path": "siirl/dag_worker/dag_utils.py",
    "chars": 37470,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/dag_worker/dagworker.py",
    "chars": 70882,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/dag_worker/data_structures.py",
    "chars": 1466,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/dag_worker/metric_aggregator.py",
    "chars": 8790,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n# Copyright 2025, Infrawaves. All rights reserved."
  },
  {
    "path": "siirl/dag_worker/metrics_collector.py",
    "chars": 12771,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/dag_worker/validator.py",
    "chars": 32285,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/data_coordinator/__init__.py",
    "chars": 148,
    "preview": "# Copyright (c) 2025, Shanghai Innovation Institute.  All rights reserved.\n\nfrom .protocol import *\nfrom .data_buffer im"
  },
  {
    "path": "siirl/data_coordinator/data_buffer.py",
    "chars": 23600,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/data_coordinator/dataloader/__init__.py",
    "chars": 693,
    "preview": "# Copyright 2025, Shanghai Innovation Institute.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "siirl/data_coordinator/dataloader/data_loader_node.py",
    "chars": 21822,
    "preview": "# Copyright 2025, Shanghai Innovation Institute.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "siirl/data_coordinator/dataloader/embodied_preprocess.py",
    "chars": 5679,
    "preview": "# Copyright 2025, Shanghai Innovation Institute.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "siirl/data_coordinator/dataloader/partitioned_dataset.py",
    "chars": 23512,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/data_coordinator/dataloader/vision_utils.py",
    "chars": 4603,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/data_coordinator/protocol.py",
    "chars": 5511,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/data_coordinator/sample.py",
    "chars": 15027,
    "preview": "import torch\nimport numpy as np\nimport asyncio\nimport ray\nimport uuid\nfrom pydantic import BaseModel, Field\nfrom typing "
  },
  {
    "path": "siirl/engine/__init__.py",
    "chars": 600,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/actor/__init__.py",
    "chars": 804,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/actor/base.py",
    "chars": 2078,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/actor/dp_actor.py",
    "chars": 25570,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023-2024 SGLang Team\n# Copyright 2025 ModelBest Inc. "
  },
  {
    "path": "siirl/engine/actor/embodied_actor.py",
    "chars": 28172,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/actor/megatron_actor.py",
    "chars": 27031,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/base_worker/__init__.py",
    "chars": 914,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/engine/base_worker/base/__init__.py",
    "chars": 616,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/engine/base_worker/base/worker.py",
    "chars": 11054,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/base_worker/megatron/__init__.py",
    "chars": 600,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/base_worker/megatron/npu_mbridge_patch.py",
    "chars": 8232,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/engine/base_worker/megatron/worker.py",
    "chars": 5776,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2025, Infrawaves. All rights reserved.\n#\n# Licensed un"
  },
  {
    "path": "siirl/engine/base_worker/register_center/__init__.py",
    "chars": 600,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/base_worker/register_center/register_center.py",
    "chars": 1236,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/base_worker/resouce_pool.py",
    "chars": 12811,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/critic/__init__.py",
    "chars": 732,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/critic/base.py",
    "chars": 1103,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/critic/dp_critic.py",
    "chars": 11961,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/critic/megatron_critic.py",
    "chars": 14484,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/fsdp_workers.py",
    "chars": 104427,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2025, Shanghai Innovation Institute. All rights reserv"
  },
  {
    "path": "siirl/engine/megatron_workers.py",
    "chars": 73412,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright (c) 2025, Infrawaves. All rights reserved.\n#\n# License"
  },
  {
    "path": "siirl/engine/reward_manager/__init__.py",
    "chars": 1189,
    "preview": "# Copyright 2024 PRIME team and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# "
  },
  {
    "path": "siirl/engine/reward_manager/dapo.py",
    "chars": 5579,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/reward_manager/embodied.py",
    "chars": 9182,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/engine/reward_manager/naive.py",
    "chars": 4589,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/reward_manager/parallel.py",
    "chars": 4516,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/reward_model/__init__.py",
    "chars": 672,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/reward_model/base.py",
    "chars": 1718,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/reward_model/megatron/__init__.py",
    "chars": 682,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/reward_model/megatron/reward_model.py",
    "chars": 15424,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/rollout/__init__.py",
    "chars": 705,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/rollout/async_server.py",
    "chars": 3854,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/rollout/base.py",
    "chars": 895,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/rollout/embodied_rollout.py",
    "chars": 30560,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/engine/rollout/hf_rollout.py",
    "chars": 7256,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/rollout/schemas.py",
    "chars": 31328,
    "preview": "# Copyright 2023-2024 SGLang Team\n# Copyright 2025 ModelBest Inc. and/or its affiliates\n#\n# Licensed under the Apache Li"
  },
  {
    "path": "siirl/engine/rollout/sglang_rollout/__init__.py",
    "chars": 639,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/rollout/sglang_rollout/async_sglang_server.py",
    "chars": 2958,
    "preview": "# Copyright 2023-2024 SGLang Team\n# Copyright 2025 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache Li"
  },
  {
    "path": "siirl/engine/rollout/sglang_rollout/sglang_rollout.py",
    "chars": 67247,
    "preview": "# Copyright 2023-2024 SGLang Team\n# Copyright 2025 ModelBest Inc. and/or its affiliates\n# Copyright 2024 Bytedance Ltd. "
  },
  {
    "path": "siirl/engine/rollout/sglang_rollout/utils.py",
    "chars": 2517,
    "preview": "# Copyright 2023-2024 SGLang Team\n# Copyright 2025 ModelBest Inc. and/or its affiliates\n#\n# Licensed under the Apache Li"
  },
  {
    "path": "siirl/engine/rollout/vllm_rollout/__init__.py",
    "chars": 1576,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/rollout/vllm_rollout/vllm_async_server.py",
    "chars": 9235,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/rollout/vllm_rollout/vllm_rollout_spmd.py",
    "chars": 24875,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/sharding_manager/__init__.py",
    "chars": 812,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/sharding_manager/base.py",
    "chars": 992,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/sharding_manager/fsdp_hf.py",
    "chars": 3924,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/engine/sharding_manager/fsdp_sglang.py",
    "chars": 10859,
    "preview": "# Copyright 2023-2024 SGLang Team\n# Copyright 2025 ModelBest Inc. and/or its affiliates\n#\n# Licensed under the Apache Li"
  },
  {
    "path": "siirl/engine/sharding_manager/fsdp_ulysses.py",
    "chars": 2863,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/sharding_manager/fsdp_vllm.py",
    "chars": 11388,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/engine/sharding_manager/megatron_sglang.py",
    "chars": 8880,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023-2024 SGLang Team\n# Copyright 2025 ModelBest Inc. "
  },
  {
    "path": "siirl/engine/sharding_manager/megatron_vllm.py",
    "chars": 16925,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/environment/embodied/__init__.py",
    "chars": 890,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/environment/embodied/adapters/__init__.py",
    "chars": 717,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/environment/embodied/adapters/libero.py",
    "chars": 15671,
    "preview": "\n# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2."
  },
  {
    "path": "siirl/environment/embodied/base.py",
    "chars": 2185,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/environment/embodied/venv.py",
    "chars": 37717,
    "preview": "# Modified from https://github.com/Lifelong-Robot-Learning/LIBERO/blob/master/libero/libero/envs/venv.py\n\nimport cloudpi"
  },
  {
    "path": "siirl/execution/dag/__init__.py",
    "chars": 1082,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/execution/dag/builtin_pipelines.py",
    "chars": 11151,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/execution/dag/config_loader.py",
    "chars": 10678,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/execution/dag/node.py",
    "chars": 22180,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/execution/dag/pipeline.py",
    "chars": 6131,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/execution/dag/task_graph.py",
    "chars": 23323,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/execution/dag/task_loader.py",
    "chars": 34758,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/execution/metric_worker/metric_worker.py",
    "chars": 10921,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/execution/metric_worker/utils.py",
    "chars": 1859,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/execution/rollout_flow/multi_agent/multiagent_generate.py",
    "chars": 40277,
    "preview": "# Copyright 2025, Shanghai Innovation Institute. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "siirl/execution/rollout_flow/multi_agent/utils.py",
    "chars": 656,
    "preview": "from pydantic import BaseModel\nfrom typing import List, Optional, Union, Any\n\nclass AgentOutputStatus:\n    RUNNING = 0\n "
  },
  {
    "path": "siirl/execution/rollout_flow/multiturn/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "siirl/execution/rollout_flow/multiturn/agent_loop/__init__.py",
    "chars": 706,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/execution/rollout_flow/multiturn/agent_loop/agent_loop.py",
    "chars": 16114,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/execution/rollout_flow/multiturn/agent_loop/single_turn_agent_loop.py",
    "chars": 2607,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/execution/rollout_flow/multiturn/agent_loop/tool_agent_loop.py",
    "chars": 10147,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/execution/rollout_flow/multiturn/interactions/__init__.py",
    "chars": 688,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023-2024 SGLang Team\n# Copyright 2025 ModelBest Inc. "
  },
  {
    "path": "siirl/execution/rollout_flow/multiturn/interactions/base.py",
    "chars": 3022,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023-2024 SGLang Team\n# Copyright 2025 ModelBest Inc. "
  },
  {
    "path": "siirl/execution/rollout_flow/multiturn/interactions/gsm8k_interaction.py",
    "chars": 3194,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\n# Copyright 2023-2024 SGLang Team\n# Copyright 2025 ModelBest Inc. "
  },
  {
    "path": "siirl/execution/rollout_flow/multiturn/interactions/utils/__init__.py",
    "chars": 634,
    "preview": "# Copyright 2023-2024 SGLang Team\n# Copyright 2025 ModelBest Inc. and/or its affiliates\n#\n# Licensed under the Apache Li"
  },
  {
    "path": "siirl/execution/rollout_flow/multiturn/interactions/utils/interaction_registry.py",
    "chars": 3050,
    "preview": "# Copyright 2023-2024 SGLang Team\n# Copyright 2025 ModelBest Inc. and/or its affiliates\n#\n# Licensed under the Apache Li"
  },
  {
    "path": "siirl/execution/rollout_flow/multiturn/tools/__init__.py",
    "chars": 634,
    "preview": "# Copyright 2023-2024 SGLang Team\n# Copyright 2025 ModelBest Inc. and/or its affiliates\n#\n# Licensed under the Apache Li"
  },
  {
    "path": "siirl/execution/rollout_flow/multiturn/tools/base_tool.py",
    "chars": 3064,
    "preview": "# Copyright 2023-2024 SGLang Team\n# Copyright 2025 ModelBest Inc. and/or its affiliates\n#\n# Licensed under the Apache Li"
  },
  {
    "path": "siirl/execution/rollout_flow/multiturn/tools/geo3k_tool.py",
    "chars": 3776,
    "preview": "# Copyright 2023-2025 SGLang Team\n# Copyright Amazon.com, Inc. or its affiliates.\n# Copyright 2025 ModelBest Inc. and/or"
  },
  {
    "path": "siirl/execution/rollout_flow/multiturn/tools/gsm8k_tool.py",
    "chars": 3758,
    "preview": "# Copyright 2023-2024 SGLang Team\n# Copyright 2025 ModelBest Inc. and/or its affiliates\n#\n# Licensed under the Apache Li"
  },
  {
    "path": "siirl/execution/rollout_flow/multiturn/tools/mcp_base_tool.py",
    "chars": 4490,
    "preview": "# Copyright 2025 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/execution/rollout_flow/multiturn/tools/mcp_search_tool.py",
    "chars": 2357,
    "preview": "# Copyright 2025 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/execution/rollout_flow/multiturn/tools/sandbox_fusion_tools.py",
    "chars": 7311,
    "preview": "# Copyright 2025 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/execution/rollout_flow/multiturn/tools/schemas.py",
    "chars": 2594,
    "preview": "# Copyright 2023-2024 SGLang Team\n# Copyright 2025 ModelBest Inc. and/or its affiliates\n#\n# Licensed under the Apache Li"
  },
  {
    "path": "siirl/execution/rollout_flow/multiturn/tools/search_tool.py",
    "chars": 10671,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\r\n# Copyright 2023-2024 SGLang Team\r\n#\r\n# Licensed under the Apache"
  },
  {
    "path": "siirl/execution/rollout_flow/multiturn/tools/utils/__init__.py",
    "chars": 634,
    "preview": "# Copyright 2023-2024 SGLang Team\n# Copyright 2025 ModelBest Inc. and/or its affiliates\n#\n# Licensed under the Apache Li"
  },
  {
    "path": "siirl/execution/rollout_flow/multiturn/tools/utils/mcp_clients/McpClientManager.py",
    "chars": 3650,
    "preview": "# Copyright 2025 Bytedance Ltd. and/or its affiliates\r\n#\r\n# Licensed under the Apache License, Version 2.0 (the \"License"
  },
  {
    "path": "siirl/execution/rollout_flow/multiturn/tools/utils/mcp_clients/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "siirl/execution/rollout_flow/multiturn/tools/utils/mcp_clients/utils.py",
    "chars": 1887,
    "preview": "# Copyright 2025 Bytedance Ltd. and/or its affiliates\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\")"
  },
  {
    "path": "siirl/execution/rollout_flow/multiturn/tools/utils/search_r1_like_utils.py",
    "chars": 9870,
    "preview": "# Copyright 2024 Bytedance Ltd. and/or its affiliates\r\n# Copyright 2023-2024 SGLang Team\r\n#\r\n# Licensed under the Apache"
  }
]

// ... and 191 more files (download for full content)

About this extraction

This page contains the full source code of the sii-research/siiRL GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 391 files (3.8 MB), approximately 1.0M tokens, and a symbol index with 2598 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo