Full Code of Kipok/NeMo-Skills for AI

main 1d860c4d2df1 cached
1174 files
7.1 MB
1.9M tokens
3360 symbols
1 requests
Download .txt
Showing preview only (7,696K chars total). Download the full file or copy to clipboard to get everything.
Repository: Kipok/NeMo-Skills
Branch: main
Commit: 1d860c4d2df1
Files: 1174
Total size: 7.1 MB

Directory structure:
gitextract_itkaqbxy/

├── .coderabbit.yaml
├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   └── config.yml
│   └── workflows/
│       ├── copyright-check.yml
│       ├── docs.yml
│       ├── gpu_tests.yml
│       ├── lint.yml
│       └── tests.yml
├── .gitignore
├── .pre-commit-config.yaml
├── CONTRIBUTING.md
├── LICENSE
├── MANIFEST.in
├── README.md
├── __init__.py
├── cluster_configs/
│   ├── example-local.yaml
│   ├── example-ray.yaml
│   └── example-slurm.yaml
├── core/
│   ├── README.md
│   ├── pyproject.toml
│   └── requirements.txt
├── dataset_explorer_demo/
│   ├── README.md
│   └── visualize_similar.py
├── dockerfiles/
│   ├── Dockerfile.megatron
│   ├── Dockerfile.nemo-rl
│   ├── Dockerfile.nemo-skills
│   ├── Dockerfile.sandbox
│   ├── Dockerfile.verl
│   ├── Dockerfile.vllm
│   ├── README.md
│   ├── build.sh
│   ├── ifbench.patch
│   ├── sandbox/
│   │   ├── block_network.c
│   │   ├── nginx-worker-proxy.conf.template
│   │   ├── nginx.conf.template
│   │   └── start-with-nginx.sh
│   └── swe-bench/
│       ├── Dockerfile.nemo-skills.alpine
│       └── Dockerfile.swe-zero
├── docs/
│   ├── agentic_inference/
│   │   ├── parallel_thinking.md
│   │   └── tool_calling.md
│   ├── basics/
│   │   ├── chat_interface.md
│   │   ├── cluster-configs.md
│   │   ├── code-packaging.md
│   │   ├── index.md
│   │   ├── inference.md
│   │   ├── installation.md
│   │   ├── prompt-format.md
│   │   └── sandbox.md
│   ├── css/
│   │   └── extra.css
│   ├── evaluation/
│   │   ├── code.md
│   │   ├── external-benchmarks.md
│   │   ├── formal-math.md
│   │   ├── index.md
│   │   ├── instruction-following.md
│   │   ├── long-context.md
│   │   ├── multilingual.md
│   │   ├── natural-math.md
│   │   ├── other-benchmarks.md
│   │   ├── robustness.md
│   │   ├── scientific-knowledge.md
│   │   ├── speculative-decoding.md
│   │   ├── speech-audio.md
│   │   ├── tool-calling.md
│   │   └── vlm.md
│   ├── index.md
│   ├── pipelines/
│   │   ├── decontamination.md
│   │   ├── evaluation.md
│   │   ├── generation.md
│   │   ├── index.md
│   │   ├── llm-as-a-judge.md
│   │   ├── run-cmd.md
│   │   ├── start-server.md
│   │   ├── training-verl.md
│   │   └── training.md
│   ├── recipes/
│   │   └── libtrace.md
│   ├── releases/
│   │   ├── index.md
│   │   ├── nemotron-math-v2/
│   │   │   ├── dataset.md
│   │   │   ├── evaluation.md
│   │   │   ├── index.md
│   │   │   └── training.md
│   │   ├── nemotronmathproofs/
│   │   │   └── index.md
│   │   ├── opencodereasoning/
│   │   │   ├── dataset.md
│   │   │   ├── evaluation.md
│   │   │   └── index.md
│   │   ├── openmathinstruct2/
│   │   │   ├── dataset.md
│   │   │   ├── evaluation.md
│   │   │   ├── index.md
│   │   │   └── training.md
│   │   ├── openmathreasoning/
│   │   │   ├── dataset.md
│   │   │   ├── evaluation.md
│   │   │   ├── index.md
│   │   │   └── training.md
│   │   └── openreasoning/
│   │       ├── dataset.md
│   │       ├── evaluation.md
│   │       ├── index.md
│   │       └── training.md
│   └── tutorials/
│       ├── index.md
│       ├── notebooks/
│       │   ├── demo_aimo_inference.ipynb
│       │   └── prepare_calibration_data.py
│       └── posts/
│           ├── gpt-oss-python.md
│           ├── llama-nemotron-super-v1.5-evals.md
│           ├── nemotron-nano-v2-evals.md
│           ├── noc-reasoning-agent.md
│           └── omr-simple-recipe.md
├── greptile.json
├── mkdocs.yml
├── nemo_skills/
│   ├── __init__.py
│   ├── _cli_stub.py
│   ├── code_execution/
│   │   ├── __init__.py
│   │   ├── local_sandbox/
│   │   │   ├── __init__.py
│   │   │   ├── local_sandbox_server.py
│   │   │   └── start_local_sandbox.sh
│   │   ├── proof_utils.py
│   │   ├── sandbox.py
│   │   └── utils.py
│   ├── conversion/
│   │   ├── __init__.py
│   │   ├── hf_to_nemo_llama.py
│   │   ├── hf_to_nemo_qwen.py
│   │   ├── hf_to_trtllm_quantize.py
│   │   ├── nemo_config_llama.yaml
│   │   ├── nemo_config_qwen.yaml
│   │   ├── nemo_to_hf_llama.py
│   │   └── nemo_to_hf_qwen.py
│   ├── dataset/
│   │   ├── __init__.py
│   │   ├── aai/
│   │   │   ├── __init__.py
│   │   │   ├── aai_score.py
│   │   │   └── prepare.py
│   │   ├── aalcr/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── aime24/
│   │   │   ├── __init__.py
│   │   │   ├── prepare.py
│   │   │   └── test.txt
│   │   ├── aime24-x/
│   │   │   ├── __init__.py
│   │   │   ├── aime24_x_utils.py
│   │   │   └── prepare.py
│   │   ├── aime25/
│   │   │   ├── __init__.py
│   │   │   ├── prepare.py
│   │   │   └── test.txt
│   │   ├── aime25-x/
│   │   │   ├── __init__.py
│   │   │   ├── aime25_x_utils.py
│   │   │   └── prepare.py
│   │   ├── aime26/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── algebra222/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── amc23/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── answer-judge/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── apex-shortlist/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── arena-hard/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── arena-hard-v2/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── asdiv/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── asr-leaderboard/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── audiobench/
│   │   │   ├── __init__.py
│   │   │   ├── judge/
│   │   │   │   └── __init__.py
│   │   │   ├── nonjudge/
│   │   │   │   └── __init__.py
│   │   │   └── prepare.py
│   │   ├── beyond-aime/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── bfcl_v3/
│   │   │   ├── __init__.py
│   │   │   ├── bfcl_score.py
│   │   │   ├── constants.py
│   │   │   ├── irrelevance/
│   │   │   │   └── __init__.py
│   │   │   ├── java/
│   │   │   │   └── __init__.py
│   │   │   ├── javascript/
│   │   │   │   └── __init__.py
│   │   │   ├── live_irrelevance/
│   │   │   │   └── __init__.py
│   │   │   ├── live_multiple/
│   │   │   │   └── __init__.py
│   │   │   ├── live_parallel/
│   │   │   │   └── __init__.py
│   │   │   ├── live_parallel_multiple/
│   │   │   │   └── __init__.py
│   │   │   ├── live_relevance/
│   │   │   │   └── __init__.py
│   │   │   ├── live_simple/
│   │   │   │   └── __init__.py
│   │   │   ├── multi_turn_base/
│   │   │   │   └── __init__.py
│   │   │   ├── multi_turn_long_context/
│   │   │   │   └── __init__.py
│   │   │   ├── multi_turn_miss_func/
│   │   │   │   └── __init__.py
│   │   │   ├── multi_turn_miss_param/
│   │   │   │   └── __init__.py
│   │   │   ├── multiple/
│   │   │   │   └── __init__.py
│   │   │   ├── parallel/
│   │   │   │   └── __init__.py
│   │   │   ├── parallel_multiple/
│   │   │   │   └── __init__.py
│   │   │   ├── prepare.py
│   │   │   ├── simple/
│   │   │   │   └── __init__.py
│   │   │   ├── simple_java/
│   │   │   │   └── __init__.py
│   │   │   ├── simple_javascript/
│   │   │   │   └── __init__.py
│   │   │   ├── simple_python/
│   │   │   │   └── __init__.py
│   │   │   └── utils.py
│   │   ├── bfcl_v4/
│   │   │   ├── __init__.py
│   │   │   ├── bfcl_score.py
│   │   │   ├── irrelevance/
│   │   │   │   └── __init__.py
│   │   │   ├── live_irrelevance/
│   │   │   │   └── __init__.py
│   │   │   ├── live_multiple/
│   │   │   │   └── __init__.py
│   │   │   ├── live_parallel/
│   │   │   │   └── __init__.py
│   │   │   ├── live_parallel_multiple/
│   │   │   │   └── __init__.py
│   │   │   ├── live_relevance/
│   │   │   │   └── __init__.py
│   │   │   ├── live_simple/
│   │   │   │   └── __init__.py
│   │   │   ├── memory_kv/
│   │   │   │   └── __init__.py
│   │   │   ├── memory_rec_sum/
│   │   │   │   └── __init__.py
│   │   │   ├── memory_vector/
│   │   │   │   └── __init__.py
│   │   │   ├── multi_turn_base/
│   │   │   │   └── __init__.py
│   │   │   ├── multi_turn_long_context/
│   │   │   │   └── __init__.py
│   │   │   ├── multi_turn_miss_func/
│   │   │   │   └── __init__.py
│   │   │   ├── multi_turn_miss_param/
│   │   │   │   └── __init__.py
│   │   │   ├── multiple/
│   │   │   │   └── __init__.py
│   │   │   ├── parallel/
│   │   │   │   └── __init__.py
│   │   │   ├── parallel_multiple/
│   │   │   │   └── __init__.py
│   │   │   ├── prepare.py
│   │   │   ├── simple_java/
│   │   │   │   └── __init__.py
│   │   │   ├── simple_javascript/
│   │   │   │   └── __init__.py
│   │   │   ├── simple_python/
│   │   │   │   └── __init__.py
│   │   │   ├── web_search_base/
│   │   │   │   └── __init__.py
│   │   │   └── web_search_no_snippet/
│   │   │       └── __init__.py
│   │   ├── bigcodebench/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── birdbench/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── brumo25/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── ccc/
│   │   │   └── __init__.py
│   │   ├── challenge19/
│   │   │   ├── __init__.py
│   │   │   ├── prepare.py
│   │   │   └── test.txt
│   │   ├── college_math/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── comp-math-24-25/
│   │   │   ├── __init__.py
│   │   │   ├── prepare.py
│   │   │   └── test.txt
│   │   ├── compute-eval/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── contextasr-bench/
│   │   │   ├── __init__.py
│   │   │   ├── coarse/
│   │   │   │   └── __init__.py
│   │   │   ├── contextasr_score.py
│   │   │   ├── contextless/
│   │   │   │   └── __init__.py
│   │   │   ├── fine/
│   │   │   │   └── __init__.py
│   │   │   └── prepare.py
│   │   ├── covost2/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── critpt/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── dsbench_da/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── fleurs/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── flores200/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── frontierscience-olympiad/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── gaokao2023en/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── global_piqa/
│   │   │   ├── __init__.py
│   │   │   ├── global_piqa_utils.py
│   │   │   └── prepare.py
│   │   ├── gpqa/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── gpqa-x/
│   │   │   ├── __init__.py
│   │   │   ├── gpqa_x_utils.py
│   │   │   └── prepare.py
│   │   ├── gsm-plus/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── gsm8k/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── hendrycks_math/
│   │   │   ├── __init__.py
│   │   │   ├── fix_ref_solns.py
│   │   │   └── prepare.py
│   │   ├── hle/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── hle_verified/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── hmmt_feb25/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── hmmt_nov25/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── hotpotqa/
│   │   │   ├── __init__.py
│   │   │   ├── prepare.py
│   │   │   └── prepare_utils.py
│   │   ├── hotpotqa_closedbook/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── human-eval/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── human-eval-infilling/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── icpc/
│   │   │   └── __init__.py
│   │   ├── ifbench/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── ifeval/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── imo-answerbench/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── imo-gradingbench/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── imo-proofbench/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── ioi/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── librispeech-pc/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── livebench-coding/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── livecodebench/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── livecodebench-cpp/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── livecodebench-pro/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── livecodebench-x/
│   │   │   ├── __init__.py
│   │   │   ├── livecodebench_x_utils.py
│   │   │   └── prepare.py
│   │   ├── longbench-v2/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── longcodebench/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── m-arena-hard/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── m-arena-hard-v2/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── math-500/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── math-odyssey/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── mawps/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── mbpp/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── minerva_math/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── minif2f/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── mmau-pro/
│   │   │   ├── __init__.py
│   │   │   ├── closed_form/
│   │   │   │   └── __init__.py
│   │   │   ├── instruction_following/
│   │   │   │   └── __init__.py
│   │   │   ├── mmau_pro_score.py
│   │   │   ├── open_ended/
│   │   │   │   └── __init__.py
│   │   │   └── prepare.py
│   │   ├── mmlu/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── mmlu-pro/
│   │   │   ├── __init__.py
│   │   │   ├── prepare.py
│   │   │   └── subsets/
│   │   │       └── 10pct_opt_v1.txt
│   │   ├── mmlu-prox/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── mmlu-redux/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── mmmlu/
│   │   │   ├── __init__.py
│   │   │   ├── mmmlu_utils.py
│   │   │   └── prepare.py
│   │   ├── mmmu-pro/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── mobench/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── mrcr/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── musan/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── numb3rs/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── olympiadbench/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── omni-math/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── omniscience/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── open-proof-corpus-judge/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── physics/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── polymath/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── prepare.py
│   │   ├── proof-arena-judge/
│   │   │   ├── __init__.py
│   │   │   ├── gemini_imo_2025/
│   │   │   │   ├── 1.txt
│   │   │   │   ├── 2.txt
│   │   │   │   ├── 3.txt
│   │   │   │   ├── 4.txt
│   │   │   │   └── 5.txt
│   │   │   └── prepare.py
│   │   ├── proof-bench-judge/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── proofnet/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── putnam-bench/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── ruler/
│   │   │   ├── __init__.py
│   │   │   ├── prepare.py
│   │   │   └── ruler_score.py
│   │   ├── ruler2/
│   │   │   ├── __init__.py
│   │   │   ├── prepare.py
│   │   │   ├── prepare_mmlu.py
│   │   │   ├── prepare_niah.py
│   │   │   ├── prepare_qa.py
│   │   │   ├── ruler2_score.py
│   │   │   └── tokenizer.py
│   │   ├── scicode/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── simpleqa/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── speed-bench/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── supergpqa/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── svamp/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── swe-bench/
│   │   │   ├── __init__.py
│   │   │   ├── dump_images.py
│   │   │   ├── dump_repos.py
│   │   │   └── prepare.py
│   │   ├── swe-bench-multilingual/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── swe-bench-pro/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── swe-rebench/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── ugphysics/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── utils.py
│   │   └── wmt24pp/
│   │       ├── __init__.py
│   │       └── prepare.py
│   ├── evaluation/
│   │   ├── __init__.py
│   │   ├── aggregate_answers.py
│   │   ├── compute_group_score.py
│   │   ├── evaluator/
│   │   │   ├── __init__.py
│   │   │   ├── arena.py
│   │   │   ├── audio.py
│   │   │   ├── base.py
│   │   │   ├── bfcl.py
│   │   │   ├── bird.py
│   │   │   ├── ccc.py
│   │   │   ├── code.py
│   │   │   ├── comet.py
│   │   │   ├── compute_eval.py
│   │   │   ├── contextasr.py
│   │   │   ├── critpt.py
│   │   │   ├── dsbench.py
│   │   │   ├── icpc.py
│   │   │   ├── ifbench.py
│   │   │   ├── ifeval.py
│   │   │   ├── ioi.py
│   │   │   ├── livecodebench.py
│   │   │   ├── math.py
│   │   │   ├── mcq.py
│   │   │   ├── mmau_pro.py
│   │   │   ├── mrcr.py
│   │   │   ├── nvembed_judge.py
│   │   │   ├── ruler.py
│   │   │   ├── scicode.py
│   │   │   └── specdec.py
│   │   ├── math_grader.py
│   │   ├── metrics/
│   │   │   ├── __init__.py
│   │   │   ├── aalcr_metrics.py
│   │   │   ├── answer_judgement_metrics.py
│   │   │   ├── arena_metrics.py
│   │   │   ├── audio_metrics.py
│   │   │   ├── base.py
│   │   │   ├── bfcl_metrics.py
│   │   │   ├── bird_metrics.py
│   │   │   ├── ccc_metrics.py
│   │   │   ├── code_metrics.py
│   │   │   ├── compute_metrics.py
│   │   │   ├── contextasr_metrics.py
│   │   │   ├── critpt_metrics.py
│   │   │   ├── gradingbench_metrics.py
│   │   │   ├── hleaa_metrics.py
│   │   │   ├── hotpotqa_filtering.py
│   │   │   ├── hotpotqa_metrics.py
│   │   │   ├── icpc_metrics.py
│   │   │   ├── if_metrics.py
│   │   │   ├── ioi_metrics.py
│   │   │   ├── lean4_metrics.py
│   │   │   ├── map_metrics.py
│   │   │   ├── math_metrics.py
│   │   │   ├── mcq_multilingual_metrics.py
│   │   │   ├── mmau_pro_metrics.py
│   │   │   ├── mrcr_metrics.py
│   │   │   ├── omni_metrics.py
│   │   │   ├── physics_metrics.py
│   │   │   ├── ruler2_metrics.py
│   │   │   ├── ruler_metrics.py
│   │   │   ├── simpleqa_metrics.py
│   │   │   ├── specdec_metrics.py
│   │   │   ├── translation_metrics.py
│   │   │   ├── ugphysics_metrics.py
│   │   │   ├── utils.py
│   │   │   └── weighted_math_metrics.py
│   │   └── utils.py
│   ├── file_utils.py
│   ├── inference/
│   │   ├── __init__.py
│   │   ├── autoformalize.py
│   │   ├── chat_interface/
│   │   │   ├── __init__.py
│   │   │   ├── chat_service.py
│   │   │   ├── core.py
│   │   │   ├── launch.py
│   │   │   └── ui.py
│   │   ├── check_contamination.py
│   │   ├── eval/
│   │   │   ├── __init__.py
│   │   │   ├── arena_judge.py
│   │   │   ├── bfcl.py
│   │   │   ├── bfcl_utils.py
│   │   │   ├── bfcl_web_search.py
│   │   │   ├── compute_eval.py
│   │   │   ├── critpt.py
│   │   │   ├── scicode.py
│   │   │   ├── scicode_utils.py
│   │   │   ├── specdec.py
│   │   │   └── swebench.py
│   │   ├── factory.py
│   │   ├── generate.py
│   │   ├── litellm_hybrid_cache.py
│   │   ├── llm_math_judge.py
│   │   ├── log_samples_wandb.py
│   │   ├── merge_chunks.py
│   │   ├── model/
│   │   │   ├── __init__.py
│   │   │   ├── asr_nim.py
│   │   │   ├── audio_utils.py
│   │   │   ├── azure.py
│   │   │   ├── base.py
│   │   │   ├── code_execution.py
│   │   │   ├── context_retry.py
│   │   │   ├── gemini.py
│   │   │   ├── megatron.py
│   │   │   ├── nim_utils.py
│   │   │   ├── openai.py
│   │   │   ├── parallel_thinking.py
│   │   │   ├── sglang.py
│   │   │   ├── tool_call.py
│   │   │   ├── tts_nim.py
│   │   │   ├── utils.py
│   │   │   ├── vllm.py
│   │   │   └── vllm_multimodal.py
│   │   ├── patch_litellm_logging.py
│   │   ├── prover.py
│   │   ├── retrieve_similar.py
│   │   ├── server/
│   │   │   ├── __init__.py
│   │   │   ├── serve_riva_nim.py
│   │   │   ├── serve_sglang.py
│   │   │   ├── serve_unified.py
│   │   │   ├── serve_vllm.py
│   │   │   └── serve_vllm_dp_ray.py
│   │   ├── structured_outputs.py
│   │   └── tournament_utils.py
│   ├── mcp/
│   │   ├── __init__.py
│   │   ├── adapters.py
│   │   ├── clients.py
│   │   ├── config.py
│   │   ├── servers/
│   │   │   ├── __init__.py
│   │   │   ├── chemistry/
│   │   │   │   ├── __init__.py
│   │   │   │   └── periodictable_tool.py
│   │   │   ├── exa_tool.py
│   │   │   ├── physics/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── coolprop_tool.py
│   │   │   │   ├── particle_tool.py
│   │   │   │   └── radioactivedecay_tool.py
│   │   │   ├── python_tool.py
│   │   │   ├── tavily_search_tool.py
│   │   │   └── web/
│   │   │       ├── __init__.py
│   │   │       ├── arxiv_tool.py
│   │   │       └── wikipedia_tool.py
│   │   ├── tool_manager.py
│   │   ├── tool_providers.py
│   │   └── utils.py
│   ├── pipeline/
│   │   ├── __init__.py
│   │   ├── app.py
│   │   ├── cli.py
│   │   ├── convert.py
│   │   ├── dataset.py
│   │   ├── eval.py
│   │   ├── generate.py
│   │   ├── judges/
│   │   │   ├── __init__.py
│   │   │   ├── comet_judge.py
│   │   │   └── nvembed_judge.py
│   │   ├── megatron_lm/
│   │   │   ├── __init__.py
│   │   │   └── train.py
│   │   ├── nemo_evaluator.py
│   │   ├── nemo_gym_rollouts.py
│   │   ├── nemo_rl/
│   │   │   ├── __init__.py
│   │   │   ├── average_checkpoints.py
│   │   │   ├── grpo.py
│   │   │   └── sft.py
│   │   ├── prepare_data.py
│   │   ├── robust_eval.py
│   │   ├── run_cmd.py
│   │   ├── setup.py
│   │   ├── start_server.py
│   │   ├── summarize_results.py
│   │   ├── summarize_robustness.py
│   │   ├── utils/
│   │   │   ├── __init__.py
│   │   │   ├── cluster.py
│   │   │   ├── commands.py
│   │   │   ├── declarative.py
│   │   │   ├── docker_images.py
│   │   │   ├── eval.py
│   │   │   ├── exp.py
│   │   │   ├── generation.py
│   │   │   ├── mounts.py
│   │   │   ├── packager.py
│   │   │   ├── ray_executor.py
│   │   │   ├── scripts/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base.py
│   │   │   │   ├── eval.py
│   │   │   │   ├── generation.py
│   │   │   │   ├── nemo_gym.py
│   │   │   │   └── server.py
│   │   │   └── server.py
│   │   └── verl/
│   │       ├── __init__.py
│   │       └── ppo.py
│   ├── prompt/
│   │   ├── __init__.py
│   │   ├── code_tags/
│   │   │   ├── __init__.py
│   │   │   ├── gpt-oss.yaml
│   │   │   ├── llama3.yaml
│   │   │   ├── nemotron.yaml
│   │   │   ├── openmath.yaml
│   │   │   ├── qwen-lean.yaml
│   │   │   └── qwen.yaml
│   │   ├── config/
│   │   │   ├── __init__.py
│   │   │   ├── compute-eval/
│   │   │   │   └── baseline.yaml
│   │   │   ├── eval/
│   │   │   │   ├── aai/
│   │   │   │   │   ├── livecodebench.yaml
│   │   │   │   │   ├── math.yaml
│   │   │   │   │   ├── mcq-10choices-boxed.yaml
│   │   │   │   │   ├── mcq-10choices.yaml
│   │   │   │   │   ├── mcq-4choices-boxed.yaml
│   │   │   │   │   ├── mcq-4choices.yaml
│   │   │   │   │   ├── omni.yaml
│   │   │   │   │   ├── search-mcq-10choices.yaml
│   │   │   │   │   └── search-mcq-4choices.yaml
│   │   │   │   ├── bigcodebench/
│   │   │   │   │   └── codegen.yaml
│   │   │   │   ├── critpt/
│   │   │   │   │   ├── code_output.yaml
│   │   │   │   │   └── solve_problem.yaml
│   │   │   │   ├── hotpotqa.yaml
│   │   │   │   ├── hotpotqa_closedbook.yaml
│   │   │   │   ├── livecodebench/
│   │   │   │   │   ├── aa_index.yaml
│   │   │   │   │   ├── default.yaml
│   │   │   │   │   └── default_reasoning.yaml
│   │   │   │   ├── longbench/
│   │   │   │   │   └── default.yaml
│   │   │   │   ├── matharena/
│   │   │   │   │   └── aime.yaml
│   │   │   │   ├── scicode/
│   │   │   │   │   ├── background.yaml
│   │   │   │   │   └── default.yaml
│   │   │   │   └── swe-bench/
│   │   │   │       ├── mini-swe-agent/
│   │   │   │       │   ├── swebench.yaml
│   │   │   │       │   ├── swebench_backticks.yaml
│   │   │   │       │   └── swebench_xml.yaml
│   │   │   │       ├── openhands/
│   │   │   │       │   ├── default.toml
│   │   │   │       │   └── no-native-tool-calling.toml
│   │   │   │       └── swe-agent/
│   │   │   │           ├── default.yaml
│   │   │   │           ├── multilingual.yaml
│   │   │   │           └── swe-agent-lm-32b.yaml
│   │   │   ├── generic/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── codegen.yaml
│   │   │   │   ├── codegen_system.yaml
│   │   │   │   ├── default.yaml
│   │   │   │   ├── dsbench-da-incontext.yaml
│   │   │   │   ├── dsbench-da.yaml
│   │   │   │   ├── fim.yaml
│   │   │   │   ├── general-boxed.yaml
│   │   │   │   ├── genselect.yaml
│   │   │   │   ├── gensynthesis.yaml
│   │   │   │   ├── hle.yaml
│   │   │   │   ├── math-base.yaml
│   │   │   │   ├── math.yaml
│   │   │   │   ├── matharena.yaml
│   │   │   │   ├── physics.yaml
│   │   │   │   ├── problem-augmentation-similar.yaml
│   │   │   │   ├── problem-augmentation.yaml
│   │   │   │   ├── search-boxed.yaml
│   │   │   │   ├── text_to_sql.yaml
│   │   │   │   └── ugphysics.yaml
│   │   │   ├── gpt-oss/
│   │   │   │   ├── livecodebench.yaml
│   │   │   │   └── math.yaml
│   │   │   ├── judge/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── aa-omni-judge.yaml
│   │   │   │   ├── aalcr.yaml
│   │   │   │   ├── arena.yaml
│   │   │   │   ├── arena_creative.yaml
│   │   │   │   ├── audiobench.yaml
│   │   │   │   ├── audiobench_binary.yaml
│   │   │   │   ├── check-contamination.yaml
│   │   │   │   ├── code.yaml
│   │   │   │   ├── frontierscience-olympiad.yaml
│   │   │   │   ├── general-judge.yaml
│   │   │   │   ├── hle.yaml
│   │   │   │   ├── imo_answerbench.yaml
│   │   │   │   ├── imo_gradingbench.yaml
│   │   │   │   ├── imo_proofbench.yaml
│   │   │   │   ├── math-code.yaml
│   │   │   │   ├── math-proof-judge.yaml
│   │   │   │   ├── math.yaml
│   │   │   │   ├── mmau-pro.yaml
│   │   │   │   ├── mt-bench/
│   │   │   │   │   ├── turn1.yaml
│   │   │   │   │   ├── turn1_with_ref.yaml
│   │   │   │   │   ├── turn2.yaml
│   │   │   │   │   └── turn2_with_ref.yaml
│   │   │   │   ├── physics.yaml
│   │   │   │   ├── simpleqa.yaml
│   │   │   │   └── ugphysics.yaml
│   │   │   ├── lean4/
│   │   │   │   ├── autoformalization.yaml
│   │   │   │   ├── backtranslation.yaml
│   │   │   │   ├── formal-proof-deepseek-prover-v2-nemotron.yaml
│   │   │   │   ├── formal-proof-deepseek-prover-v2.yaml
│   │   │   │   ├── formal-proof-reasoning-execution.yaml
│   │   │   │   ├── formal-proof-reasoning.yaml
│   │   │   │   ├── formal-proof.yaml
│   │   │   │   ├── goedel-prover-v2-nemotron.yaml
│   │   │   │   ├── goedel-prover-v2-refinement-nemotron.yaml
│   │   │   │   ├── goedel-prover-v2-refinement.yaml
│   │   │   │   ├── goedel-prover-v2.yaml
│   │   │   │   ├── judge-backtranslation.yaml
│   │   │   │   ├── nat-to-lean4.yaml
│   │   │   │   ├── refinement_code_error.yaml
│   │   │   │   ├── refinement_consistent_error.yaml
│   │   │   │   └── refinement_parsing_error.yaml
│   │   │   ├── llama3-instruct/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── math.yaml
│   │   │   │   └── mmlu.yaml
│   │   │   ├── multilingual/
│   │   │   │   ├── __init__.py
│   │   │   │   └── segment-translation.yaml
│   │   │   ├── openmath/
│   │   │   │   ├── genselect.yaml
│   │   │   │   └── tir.yaml
│   │   │   ├── qwen/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── math-cot.yaml
│   │   │   │   ├── math-tir.yaml
│   │   │   │   └── qwq.yaml
│   │   │   ├── qwen3/
│   │   │   │   ├── math-cot-non-think.yaml
│   │   │   │   └── math-cot-think.yaml
│   │   │   ├── robustness/
│   │   │   │   ├── code_prompts/
│   │   │   │   │   ├── aai_prompt.yaml
│   │   │   │   │   ├── code_1.yaml
│   │   │   │   │   ├── code_2.yaml
│   │   │   │   │   ├── code_3.yaml
│   │   │   │   │   ├── code_4.yaml
│   │   │   │   │   ├── ns_gen_codegen.yaml
│   │   │   │   │   └── ns_python_codegen.yaml
│   │   │   │   ├── math_prompts/
│   │   │   │   │   ├── boxed_1.yaml
│   │   │   │   │   ├── boxed_2.yaml
│   │   │   │   │   ├── boxed_3.yaml
│   │   │   │   │   ├── boxed_4.yaml
│   │   │   │   │   ├── boxed_5.yaml
│   │   │   │   │   ├── boxed_6.yaml
│   │   │   │   │   ├── boxed_7.yaml
│   │   │   │   │   ├── boxed_8.yaml
│   │   │   │   │   ├── boxed_aai.yaml
│   │   │   │   │   └── boxed_general.yaml
│   │   │   │   ├── mcq_prompts/
│   │   │   │   │   ├── aai_1.yaml
│   │   │   │   │   ├── aai_2.yaml
│   │   │   │   │   ├── angle_brackets_1.yaml
│   │   │   │   │   ├── angle_brackets_2.yaml
│   │   │   │   │   ├── boxed_1.yaml
│   │   │   │   │   ├── boxed_2.yaml
│   │   │   │   │   ├── correct_1.yaml
│   │   │   │   │   ├── correct_2.yaml
│   │   │   │   │   ├── final_answer_1.yaml
│   │   │   │   │   └── final_answer_2.yaml
│   │   │   │   └── prompt_set_config.yaml
│   │   │   ├── unit_test/
│   │   │   │   └── code.yaml
│   │   │   └── vlm/
│   │   │       ├── __init__.py
│   │   │       └── mmmu-pro.yaml
│   │   ├── few_shot_examples/
│   │   │   ├── __init__.py
│   │   │   ├── gsm8k.py
│   │   │   ├── lean4.py
│   │   │   ├── math.py
│   │   │   ├── mmlu.py
│   │   │   ├── mmlu_pro.py
│   │   │   └── open_science.py
│   │   └── utils.py
│   ├── training/
│   │   ├── __init__.py
│   │   ├── data_preparation_utils/
│   │   │   ├── __init__.py
│   │   │   ├── arithmetic_utils.py
│   │   │   ├── config/
│   │   │   │   ├── code_sft.yaml
│   │   │   │   ├── math_rl.yaml
│   │   │   │   ├── math_sft.yaml
│   │   │   │   └── stem_sft.yaml
│   │   │   ├── filters.py
│   │   │   ├── merge_processor.py
│   │   │   └── preprocessing.py
│   │   ├── nemo_rl/
│   │   │   ├── __init__.py
│   │   │   ├── configs/
│   │   │   │   ├── grpo.yaml
│   │   │   │   └── sft.yaml
│   │   │   ├── convert_dcp_to_hf.py
│   │   │   ├── convert_megatron_to_hf.py
│   │   │   ├── environments/
│   │   │   │   ├── __init__.py
│   │   │   │   └── math_environment.py
│   │   │   ├── offline_hf_consolidation.py
│   │   │   ├── prompts/
│   │   │   │   ├── cot.txt
│   │   │   │   └── math.txt
│   │   │   ├── start_grpo.py
│   │   │   └── start_sft.py
│   │   ├── prepare_data.py
│   │   ├── train_redrafter.py
│   │   └── verl/
│   │       ├── __init__.py
│   │       └── prepare_data.py
│   ├── utils.py
│   └── version.py
├── pyproject.toml
├── recipes/
│   ├── README.md
│   ├── asr_tts/
│   │   ├── README.md
│   │   ├── nim_configurations.py
│   │   ├── riva_generate.py
│   │   └── scripts/
│   │       ├── run_asr_nim_cluster.sh
│   │       └── run_tts_nim_cluster.sh
│   ├── data-integrity/
│   │   ├── README.md
│   │   ├── model_comparison/
│   │   │   ├── __init__.py
│   │   │   ├── analyses/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── length_analysis.py
│   │   │   │   ├── similarity_analysis.py
│   │   │   │   ├── umap_analysis.py
│   │   │   │   └── vocabulary_analysis.py
│   │   │   ├── analyzer.py
│   │   │   ├── data_loader.py
│   │   │   ├── main.py
│   │   │   ├── report_generator.py
│   │   │   ├── requirements.txt
│   │   │   ├── setup.py
│   │   │   ├── utils/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── file_utils.py
│   │   │   │   ├── model_utils.py
│   │   │   │   └── text_utils.py
│   │   │   └── visualization/
│   │   │       ├── __init__.py
│   │   │       ├── interactive_plots.py
│   │   │       └── static_plots.py
│   │   ├── postprocess_data.py
│   │   ├── prepare_data.py
│   │   └── run_integrity_pipeline.py
│   ├── gencluster/
│   │   ├── pipeline/
│   │   │   ├── run_inter_tournament.py
│   │   │   ├── run_intra_tournament.py
│   │   │   ├── solution_generation.py
│   │   │   └── test_case_generation.py
│   │   ├── prompts/
│   │   │   ├── generator.yaml
│   │   │   ├── selector.yaml
│   │   │   └── validator.yaml
│   │   └── scripts/
│   │       ├── compute_tournament_score.py
│   │       ├── extract_cpp_code.py
│   │       ├── filter_clusters.py
│   │       ├── generate_datasets_json.py
│   │       ├── generate_test_cases.py
│   │       ├── merge_tournament_scores.py
│   │       ├── run_tournament_all.py
│   │       ├── submission_ICPC.py
│   │       ├── submission_IOI.py
│   │       └── tournament_schedule.py
│   ├── libtrace/
│   │   ├── README.md
│   │   ├── dockerfiles/
│   │   │   ├── Dockerfile.sandbox
│   │   │   ├── environment.yml
│   │   │   └── start-with-nginx.sh
│   │   ├── prompts/
│   │   │   ├── applicability-relevance.yaml
│   │   │   └── problem-generation.yaml
│   │   └── scripts/
│   │       ├── collect_generated_problems.py
│   │       ├── filter_applicability_relevance.py
│   │       ├── gather_solutions.py
│   │       ├── harvest_docs.py
│   │       └── prepare_inference_jsonl.py
│   ├── multimodal/
│   │   ├── __init__.py
│   │   └── server/
│   │       ├── README.md
│   │       ├── __init__.py
│   │       ├── backends/
│   │       │   ├── __init__.py
│   │       │   ├── base.py
│   │       │   ├── magpie_tts_backend.py
│   │       │   └── nemo_asr_backend.py
│   │       └── unified_server.py
│   ├── noc-reasoning-agent/
│   │   ├── configs/
│   │   │   ├── config.ini
│   │   │   ├── noc_reasoning_sft.yaml
│   │   │   └── noc_reasoning_sft_6.yaml
│   │   ├── prompts/
│   │   │   ├── formatting_prompt.yaml
│   │   │   ├── prompt_incident.yaml
│   │   │   ├── prompt_reasoning.yaml
│   │   │   └── shortened_prompt_reasoning.yaml
│   │   └── scripts/
│   │       ├── create_agent_with_tools.py
│   │       ├── create_agent_with_tools_batch.py
│   │       ├── evaluation/
│   │       │   ├── evaluation_with_judge.py
│   │       │   ├── problem_code_evaluation.py
│   │       │   └── score.py
│   │       ├── filtering/
│   │       │   ├── filter_rows.py
│   │       │   └── match_keywords.py
│   │       ├── ns_pipelines/
│   │       │   ├── generate_synthetic_data.py
│   │       │   └── prepare_react_agent.py
│   │       ├── tools.py
│   │       ├── utils/
│   │       │   ├── create_input_jsonl_from_incidents.py
│   │       │   ├── format_reasoning_json.py
│   │       │   ├── reasoning_processes.py
│   │       │   ├── schema_columns.py
│   │       │   ├── split_incident_data.py
│   │       │   ├── split_mocktools_answers.py
│   │       │   └── token_usage.py
│   │       └── visualization/
│   │           ├── extract_representation_columns.py
│   │           ├── extract_scores.py
│   │           └── generate_trace_visualization.py
│   ├── opencodereasoning/
│   │   ├── configs/
│   │   │   └── solution_sdg/
│   │   │       ├── demo.yaml
│   │   │       └── r1.yaml
│   │   ├── pipeline/
│   │   │   ├── prepare_questions.py
│   │   │   └── prepare_solutions.py
│   │   ├── prompts/
│   │   │   ├── generate_cpp_soln.yaml
│   │   │   └── generate_python_soln.yaml
│   │   └── scripts/
│   │       ├── filter_questions.py
│   │       ├── functional_helpers.py
│   │       ├── output_processing.py
│   │       └── prepare_questions.py
│   ├── openmathreasoning/
│   │   ├── configs/
│   │   │   ├── genselect_sdg/
│   │   │   │   └── qwq.yaml
│   │   │   ├── problem_sdg/
│   │   │   │   ├── demo.yaml
│   │   │   │   ├── example-data.txt
│   │   │   │   └── qwen-instruct.yaml
│   │   │   └── solution_sdg/
│   │   │       ├── demo.yaml
│   │   │       ├── qwq.yaml
│   │   │       ├── r1.yaml
│   │   │       ├── tir-limo.yaml
│   │   │       └── tir-openmath.yaml
│   │   ├── pipeline/
│   │   │   ├── genselect_generation.py
│   │   │   ├── problem_generation.py
│   │   │   └── solution_generation.py
│   │   ├── prompts/
│   │   │   ├── classify-if-binary.yaml
│   │   │   ├── classify-if-invalid.yaml
│   │   │   ├── classify-if-mcq.yaml
│   │   │   ├── classify-if-proof.yaml
│   │   │   ├── classify-tir-novelty.yaml
│   │   │   ├── classify-tir-significance.yaml
│   │   │   ├── convert-proofs.yaml
│   │   │   ├── extract-answers.yaml
│   │   │   ├── extract-problems.yaml
│   │   │   ├── math-tir-detailed.yaml
│   │   │   ├── summarize-genselect.yaml
│   │   │   └── summarize-solution.yaml
│   │   └── scripts/
│   │       ├── extract_python_fragments.py
│   │       ├── filter_novelty_significance.py
│   │       ├── genselect/
│   │       │   ├── extract_judgment.py
│   │       │   ├── merge_new_summary.py
│   │       │   ├── prepare_labeling_data.py
│   │       │   └── utils.py
│   │       ├── merge_new_summary.py
│   │       ├── postprocess_answer_extraction.py
│   │       ├── postprocess_classification.py
│   │       ├── postprocess_problem_extraction.py
│   │       ├── postprocess_proof_conversion.py
│   │       ├── postprocess_tir_generations.py
│   │       ├── prepare_raw_data.py
│   │       └── simplified_recipe.py
│   ├── openreasoning/
│   │   ├── eval.py
│   │   ├── prompts/
│   │   │   ├── science_question_augmentation_prompt.yaml
│   │   │   └── science_question_generation_prompt.yaml
│   │   └── scripts/
│   │       └── use_majority_if_no_answer.py
│   ├── opensciencereasoning/
│   │   ├── openscience_dataset_collection/
│   │   │   ├── README.md
│   │   │   ├── prompts/
│   │   │   │   ├── mcq_augment_inspired_by.yaml
│   │   │   │   ├── mcq_augment_similar.yaml
│   │   │   │   ├── mcq_four_options.yaml
│   │   │   │   ├── mcq_ten_options.yaml
│   │   │   │   └── subtopic_expansion.yaml
│   │   │   └── scripts/
│   │   │       └── filter_mcq_solutions.py
│   │   └── sdg_pipeline/
│   │       ├── README.md
│   │       ├── configs/
│   │       │   ├── pipelines/
│   │       │   │   └── base.yaml
│   │       │   └── settings/
│   │       │       ├── kimi_k2.yaml
│   │       │       ├── mcq_10_options.yaml
│   │       │       ├── mcq_4_options.yaml
│   │       │       ├── multiple_prompts.yaml
│   │       │       ├── python_enabled.yaml
│   │       │       ├── seed_data.yaml
│   │       │       ├── seed_data_postprocess.yaml
│   │       │       └── without_gt.yaml
│   │       ├── prompt/
│   │       │   ├── __init__.py
│   │       │   ├── configs/
│   │       │   │   ├── default_problem.yaml
│   │       │   │   └── topics_labeling.yaml
│   │       │   └── few_shots/
│   │       │       ├── __init__.py
│   │       │       └── topics.py
│   │       ├── run_pipeline.py
│   │       └── scripts/
│   │           ├── aggregate_difficulty.py
│   │           ├── aggregate_metadata.py
│   │           ├── aggregate_solutions.py
│   │           ├── aggregate_topics.py
│   │           ├── decontaminate.py
│   │           ├── extract_predictions.py
│   │           ├── filter_problems.py
│   │           ├── filter_solutions.py
│   │           ├── map_diversity_prompts.py
│   │           ├── prepare_topics.py
│   │           ├── process_messages_and_bucket.py
│   │           ├── remove_redundant_fields.py
│   │           ├── utils/
│   │           │   ├── constants.py
│   │           │   └── regex_constants.py
│   │           └── validate_pipeline.py
│   ├── proof-gen-verification/
│   │   ├── README.md
│   │   ├── configs/
│   │   │   └── judge-eval.yaml
│   │   ├── pipeline/
│   │   │   └── eval_judge.py
│   │   ├── prompts/
│   │   │   ├── genselect/
│   │   │   │   ├── default.yaml
│   │   │   │   ├── opc_instructions.yaml
│   │   │   │   └── proof_genselect_default.yaml
│   │   │   ├── math_judge/
│   │   │   │   ├── gemini_imo_judge_summary.yaml
│   │   │   │   ├── general.yaml
│   │   │   │   ├── general_summary.yaml
│   │   │   │   ├── general_summary_rubric.yaml
│   │   │   │   ├── judge_prompt_ablation/
│   │   │   │   │   ├── gemini1.yaml
│   │   │   │   │   ├── gemini2.yaml
│   │   │   │   │   ├── prompt1.yaml
│   │   │   │   │   ├── prompt2.yaml
│   │   │   │   │   ├── prompt3.yaml
│   │   │   │   │   ├── prompt4.yaml
│   │   │   │   │   ├── prompt5.yaml
│   │   │   │   │   ├── prompt5_rubric.yaml
│   │   │   │   │   └── prompt6_rubric.yaml
│   │   │   │   ├── lemma_break.yaml
│   │   │   │   ├── opc_judge.yaml
│   │   │   │   ├── opc_judge_summary.yaml
│   │   │   │   ├── opc_judge_summary_gt_proof.yaml
│   │   │   │   ├── opc_judge_summary_rubric.yaml
│   │   │   │   ├── proofbench_ms_ref.yaml
│   │   │   │   ├── proofbench_none.yaml
│   │   │   │   ├── proofbench_none_binary.yaml
│   │   │   │   ├── step_break.yaml
│   │   │   │   ├── step_judge_v2.yaml
│   │   │   │   ├── true_false_break.yaml
│   │   │   │   └── true_false_judge.yaml
│   │   │   ├── prover.yaml
│   │   │   └── prover_final_ans.yaml
│   │   └── scripts/
│   │       ├── build_final_ans_dataset.py
│   │       ├── combine_judgements.py
│   │       ├── final_answer_qs.py
│   │       ├── generate_generic_bon_dspy.py
│   │       ├── generate_generic_bon_generation.py
│   │       ├── generic_eval_bon.py
│   │       ├── genselect_judge_generation.py
│   │       ├── make_metrics_fa_qs.py
│   │       ├── make_rubric_generation.py
│   │       ├── script_generation.py
│   │       ├── sol_selection_generation.py
│   │       └── step_judgement_generation.py
│   └── translation/
│       ├── config/
│       │   └── qwen25.yaml
│       └── translate_jsonl.py
├── requirements/
│   ├── audio.txt
│   ├── code_execution.txt
│   ├── common-dev.txt
│   ├── common-tests.txt
│   ├── docs.txt
│   ├── pipeline.txt
│   └── stem.txt
├── tests/
│   ├── __init__.py
│   ├── conftest.py
│   ├── data/
│   │   ├── code-output.test
│   │   ├── contamination-example.test
│   │   ├── dummy_external_benchmark/
│   │   │   ├── benchmark_map.json
│   │   │   ├── my_benchmarks/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── dataset/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── my_simple_bench/
│   │   │   │   │   │   ├── __init__.py
│   │   │   │   │   │   └── prepare.py
│   │   │   │   │   └── word_count/
│   │   │   │   │       ├── __init__.py
│   │   │   │   │       └── prepare.py
│   │   │   │   ├── evaluation/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── word_count.py
│   │   │   │   ├── inference/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── word_count.py
│   │   │   │   ├── metrics/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── word_count.py
│   │   │   │   └── prompt/
│   │   │   │       └── eval/
│   │   │   │           └── word_count/
│   │   │   │               └── default.yaml
│   │   │   └── pyproject.toml
│   │   ├── eval_outputs/
│   │   │   ├── eval-results/
│   │   │   │   ├── answer-judge/
│   │   │   │   │   ├── output-rs0.jsonl-test
│   │   │   │   │   ├── output-rs1.jsonl-test
│   │   │   │   │   ├── output-rs2.jsonl-test
│   │   │   │   │   └── output-rs3.jsonl-test
│   │   │   │   ├── arena-hard/
│   │   │   │   │   └── output.jsonl-test
│   │   │   │   ├── gpqa/
│   │   │   │   │   ├── output-rs0.jsonl-test
│   │   │   │   │   ├── output-rs1.jsonl-test
│   │   │   │   │   ├── output-rs2.jsonl-test
│   │   │   │   │   └── output-rs3.jsonl-test
│   │   │   │   ├── hendrycks_math/
│   │   │   │   │   ├── output-rs0.jsonl-test
│   │   │   │   │   ├── output-rs1.jsonl-test
│   │   │   │   │   └── output-rs2.jsonl-test
│   │   │   │   ├── human-eval/
│   │   │   │   │   ├── output-rs0.jsonl-test
│   │   │   │   │   └── output-rs1.jsonl-test
│   │   │   │   ├── ifeval/
│   │   │   │   │   ├── output-rs0.jsonl-test
│   │   │   │   │   ├── output-rs1.jsonl-test
│   │   │   │   │   └── output-rs2.jsonl-test
│   │   │   │   ├── metrics-ms8192.json-test
│   │   │   │   ├── metrics.json-test
│   │   │   │   └── minif2f/
│   │   │   │       ├── output-rs0.jsonl-test
│   │   │   │       ├── output-rs1.jsonl-test
│   │   │   │       ├── output-rs2.jsonl-test
│   │   │   │       └── output-rs3.jsonl-test
│   │   │   ├── summarize_results_output-ms8192.txt
│   │   │   └── summarize_results_output.txt
│   │   ├── multi_model_eval_smoke.py
│   │   ├── nemo_evaluator/
│   │   │   ├── example-eval-config.yaml
│   │   │   └── example-gpu-test-config.yaml
│   │   ├── openai-input-dict.test
│   │   ├── openai-input-list.test
│   │   ├── openmathinstruct2.test
│   │   ├── output-rs0.test
│   │   ├── output-rs1.test
│   │   ├── output-rs2.test
│   │   ├── small-grpo-data.test
│   │   ├── small-sft-data-messages.test
│   │   └── small-sft-data.test
│   ├── gpu-tests/
│   │   ├── __init__.py
│   │   ├── make_tiny_llm.py
│   │   ├── run_qwen.sh
│   │   ├── test-local.yaml
│   │   ├── test_contamination.py
│   │   ├── test_context_retry.py
│   │   ├── test_eval.py
│   │   ├── test_external_benchmark_eval.py
│   │   ├── test_generate.py
│   │   ├── test_judge.py
│   │   ├── test_nemo_evaluator.py
│   │   ├── test_nemo_gym_rollouts.py
│   │   ├── test_run_cmd_llm_infer.py
│   │   ├── test_sandbox_mounts.py
│   │   ├── test_tool_calling.py
│   │   ├── test_train.py
│   │   ├── test_vllm_audio.py
│   │   └── utils.py
│   ├── scripts/
│   │   └── run_cmd_llm_infer_check.py
│   ├── slurm-tests/
│   │   ├── README.md
│   │   ├── asr_nim/
│   │   │   ├── README.md
│   │   │   ├── asr.test
│   │   │   ├── check_results.py
│   │   │   └── run_test.py
│   │   ├── clone_and_run.sh
│   │   ├── gpt_oss_python_aime25/
│   │   │   ├── check_results.py
│   │   │   └── run_test.py
│   │   ├── nano_30b_tool_calling/
│   │   │   ├── check_results.py
│   │   │   └── run_test.py
│   │   ├── omr_simple_recipe/
│   │   │   ├── check_results.py
│   │   │   └── run_test.py
│   │   ├── qwen3_4b_evals/
│   │   │   ├── check_results.py
│   │   │   └── run_test.py
│   │   ├── qwen3_4b_ray_executor/
│   │   │   ├── check_results.py
│   │   │   └── run_test.py
│   │   ├── qwen3coder_30b_swebench/
│   │   │   ├── check_results.py
│   │   │   └── run_test.py
│   │   ├── run_all.sh
│   │   ├── stem_sdg_pipeline/
│   │   │   └── run_test.py
│   │   ├── super_120b_aime25/
│   │   │   ├── check_results.py
│   │   │   ├── run_test.py
│   │   │   └── trtllm-extra-llm-api-config.yml
│   │   ├── super_49b_evals/
│   │   │   ├── check_results.py
│   │   │   └── run_test.py
│   │   ├── tts_nim/
│   │   │   ├── README.md
│   │   │   ├── check_results.py
│   │   │   ├── run_test.py
│   │   │   └── tts.test
│   │   ├── unified_asr/
│   │   │   ├── asr_openai.test
│   │   │   ├── check_results.py
│   │   │   └── run_test.py
│   │   ├── unified_tts/
│   │   │   ├── README.md
│   │   │   ├── check_results.py
│   │   │   ├── run_test.py
│   │   │   └── tts_openai.test
│   │   ├── utils.py
│   │   └── wmt24pp_gym_topology/
│   │       ├── README.md
│   │       ├── check_results.py
│   │       └── run_test.py
│   ├── test_arena_metrics.py
│   ├── test_base_metrics.py
│   ├── test_code_execution.py
│   ├── test_configs.py
│   ├── test_data_preparation.py
│   ├── test_datasets.py
│   ├── test_declarative_pipeline.py
│   ├── test_default_args.py
│   ├── test_dependency_isolation.py
│   ├── test_eval.py
│   ├── test_external_benchmarks.py
│   ├── test_generation.py
│   ├── test_magpie_tts_backend.py
│   ├── test_math_equal.py
│   ├── test_mcp_clients.py
│   ├── test_metrics.py
│   ├── test_nemo_asr_backend.py
│   ├── test_nemo_evaluator_pipeline.py
│   ├── test_nvidia_inference_api.py
│   ├── test_pipeline_utils.py
│   ├── test_prompts.py
│   ├── test_prover.py
│   ├── test_ray_executor.py
│   ├── test_sandbox_fork_exc_leak.py
│   ├── test_sandbox_network_blocking.py
│   ├── test_session_affinity.py
│   ├── test_streaming_tool_calling.py
│   ├── test_unified_server_audio_parser.py
│   ├── test_unified_server_batcher.py
│   ├── test_unified_server_error_handling.py
│   ├── test_vllm_audio.py
│   └── test_vlm.py
└── tools/
    ├── pyproject.toml
    └── requirements.txt

================================================
FILE CONTENTS
================================================

================================================
FILE: .coderabbit.yaml
================================================
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
# https://docs.coderabbit.ai/getting-started/configure-coderabbit/
# Validator https://docs.coderabbit.ai/configuration/yaml-validator#yaml-validator
# In PR, comment "@coderabbitai configuration" to get the full config including defaults
# Set the language for reviews by using the corresponding ISO language code.
# Default: "en-US"
language: "en-US"
# Settings related to reviews.
# Default: {}
reviews:
  # Set the profile for reviews. Assertive profile yields more feedback, that may be considered nitpicky.
  # Options: chill, assertive
  # Default: "chill"
  profile: chill
  # Add this keyword in the PR/MR title to auto-generate the title.
  # Default: "@coderabbitai"
  auto_title_placeholder: '@coderabbitai title'
  # Auto Title Instructions - Custom instructions for auto-generating the PR/MR title.
  # Default: ""
  auto_title_instructions: 'Format: "<category>: <title>". Category must be one of: feat, fix, docs, style, refactor, perf, test, build, ci, chore, revert, cp. The category must be followed by a colon. Title should be concise (<= 80 chars). Example: "feat: Add logit_bias support".' # current: ''
  # Set the commit status to 'pending' when the review is in progress and 'success' when it is complete.
  # Default: true
  commit_status: false
  # Generate walkthrough in a markdown collapsible section.
  # Default: false
  collapse_walkthrough: true
  # Generate an assessment of how well the changes address the linked issues in the walkthrough.
  # Default: true
  assess_linked_issues: true
  # Include possibly related issues in the walkthrough.
  # Default: true
  related_issues: true
  # Related PRs - Include possibly related pull requests in the walkthrough.
  # Default: true
  related_prs: true
  # Suggest labels based on the changes in the pull request in the walkthrough.
  # Default: true
  suggested_labels: true
  # Suggest reviewers based on the changes in the pull request in the walkthrough.
  # Default: true
  suggested_reviewers: true
  # Generate a poem in the walkthrough comment.
  # Default: true
  poem: false # current: true
  # Post review details on each review. Additionally, post a review status when a review is skipped in certain cases.
  # Default: true
  review_status: false # current: true
  # Configuration for pre merge checks
  # Default: {}
  pre_merge_checks:
    # Custom Pre-merge Checks - Add unique checks to enforce your team's standards before merging a pull request. Each check must have a unique name (up to 50 characters) and clear instructions (up to 10000 characters). Use these to automatically verify coding, security, documentation, or business rules and maintain code quality.
    # Default: []
    custom_checks: []
  auto_review:
    # Configuration for auto review
    # Default: {}
    # Automatic Incremental Review - Automatic incremental code review on each push
    # Default: true
    auto_incremental_review: true # current: true
    # Review draft PRs/MRs.
    # Default: false
    drafts: false
    # Base branches (other than the default branch) to review. Accepts regex patterns. Use '.*' to match all branches.
    # Default: []
    base_branches: ["main", "chtruong/*"] # current: []
# Configuration for knowledge base
# Default: {}
knowledge_base:
  code_guidelines:
    # CodeRabbit will analyse and learn from your organization's code guidelines, which you can mention in the file patterns section. These guidelines will then be used to conduct thorough code reviews.
    # Default: {}
    enabled: true
    # Enabled - Enable CodeRabbit to enforce your organization's coding standards during reviews.
    # Default: true
    filePatterns: # current: []
      # File Patterns - Specify files for your coding guideline documents in this section. CodeRabbit will scan these files to understand your team's standards and apply them during code reviews. Multiple files supported. File names are case-sensitive. Common files like: (**/.cursorrules, .github/copilot-instructions.md, .github/instructions/*.instructions.md, **/CLAUDE.md, **/GEMINI.md, **/.cursor/rules/*, **/.windsurfrules, **/.clinerules/*, **/.rules/*, **/AGENT.md, **/AGENTS.md) are included by default.
      # Default: []
      - "CONTRIBUTING.md"


================================================
FILE: .github/ISSUE_TEMPLATE/config.yml
================================================
blank_issues_enabled: true


================================================
FILE: .github/workflows/copyright-check.yml
================================================
# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: Copyright check

on:
  pull_request:

jobs:
  copyright-check:
    uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_copyright_check.yml@v0.2.0


================================================
FILE: .github/workflows/docs.yml
================================================
name: Build docs

on:
  push:
    branches: ["main"]

  # Allows you to run this workflow manually from the Actions tab
  workflow_dispatch:

permissions:
  contents: read
  pages: write
  id-token: write

# Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued.
# However, do NOT cancel in-progress runs as we want to allow these production deployments to complete.
concurrency:
  group: "pages"
  cancel-in-progress: false

jobs:
  # Build docs and deploy to the website
  deploy:
    environment:
      name: github-pages
      url: ${{ steps.deployment.outputs.page_url }}
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
      - name: Configure Git Credentials
        run: |
          git config user.name github-actions[bot]
          git config user.email 41898282+github-actions[bot]@users.noreply.github.com
      - uses: actions/setup-python@v6
        with:
          python-version: 3.x
      - run: echo "cache_id=$(date --utc '+%V')" >> $GITHUB_ENV
      - uses: actions/cache@v4
        with:
          key: mkdocs-material-${{ env.cache_id }}
          path: .cache
          restore-keys: |
            mkdocs-material-
      - run: pip install -r requirements/docs.txt
      - run: mkdocs build
      - name: Upload artifact
        uses: actions/upload-pages-artifact@v3
        with:
          path: 'site'
      - name: Deploy to GitHub Pages
        id: deployment
        uses: actions/deploy-pages@v4


================================================
FILE: .github/workflows/gpu_tests.yml
================================================
name: Integration tests

on:
  pull_request:
    branches: [ "main" ]
    types: [opened, synchronize, reopened, labeled]

  # Allows you to run this workflow manually from the Actions tab
  workflow_dispatch:

permissions:
  contents: read

concurrency:
  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
  cancel-in-progress: true

jobs:
  gpu-tests-qwen:
    runs-on: self-hosted-nemo-gpus-1
    if: ${{ github.event.label.name == 'run GPU tests' }}
    steps:
    - name: Cleanup old Docker images and build cache
      run: |
        docker system prune --all --filter "until=168h" --force
        docker builder prune --all --filter "until=168h" --force
    - name: Cleanup old HF cache
      run: |
        docker run --rm -v /mnt/datadrive:/mnt/datadrive alpine \
          sh -c 'find /mnt/datadrive/nemo-skills-test-data/hf-cache/datasets -maxdepth 2 -mindepth 2 -type d -mtime +7 -exec rm -rf {} + 2>/dev/null;
                 find /mnt/datadrive/nemo-skills-test-data/hf-cache/hub -maxdepth 1 -mindepth 1 -type d -mtime +7 -exec rm -rf {} + 2>/dev/null;
                 true'
    - uses: actions/checkout@v6
      with:
        path: ${{ github.run_id }}
    - name: Set up Python 3.10
      uses: actions/setup-python@v6
      with:
        python-version: "3.10"
    - name: Install dependencies
      env:
        HF_TOKEN: ${{ secrets.HF_TOKEN }}
      run: |
        cd ${{ github.run_id }}
        python -m pip install --upgrade pip uv
        uv pip uninstall --system nemo-skills nemo_run || true
        # Use `uv pip` so [tool.uv].override-dependencies in pyproject.toml is honored
        # (relaxes leptonai's httpx==0.27.2 pin so litellm 1.83.x can be installed).
        uv pip install --system -e .
        uv pip install --system -r requirements/common-tests.txt
        ns prepare_data gsm8k human-eval mbpp algebra222 mmlu ifeval math-500 amc23 aime24
    - name: Build Docker image
      run: |
        cd ${{ github.run_id }}
        docker build -t nemo-skills-image -f dockerfiles/Dockerfile.nemo-skills .
    - name: Run GPU tests
      timeout-minutes: 240
      env:
        HF_TOKEN: ${{ secrets.HF_TOKEN }}
      run: |
        cd ${{ github.run_id }}
        nvidia-smi
        set -o pipefail # this will make sure next line returns non-0 exit code if tests fail
        # Run heartbeat in background, capture its PID, and ensure cleanup
        (while true; do sleep 60; echo "[HEARTBEAT] $(date '+%Y-%m-%d %H:%M:%S') - still running..."; done) &
        HEARTBEAT_PID=$!
        # Run tests and capture exit code
        EXIT_CODE=0
        ./tests/gpu-tests/run_qwen.sh || EXIT_CODE=$?
        # Kill heartbeat and exit with test result
        kill $HEARTBEAT_PID 2>/dev/null || true
        exit $EXIT_CODE
    - name: Cleanup
      if: always()
      run: |
        docker run --rm -v /tmp:/tmp -v /home:/home nemo-skills-image bash -c 'rm -rf /tmp/nemo-skills-tests /home/azureuser/.nemo_run/'
        docker ps -a -q | xargs -r docker stop


================================================
FILE: .github/workflows/lint.yml
================================================
name: Lint and Format

on:
  pull_request:
    branches: [ "main" ]

  # Allows you to run this workflow manually from the Actions tab
  workflow_dispatch:

permissions:
  contents: read

concurrency:
  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
  cancel-in-progress: true

jobs:
  pre-commit:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout
      uses: actions/checkout@v6
      with:
        fetch-depth: 0
    - name: Add Target Branch
      run: git branch ${GITHUB_BASE_REF} origin/${GITHUB_BASE_REF}
    - name: Set up Python 3.10
      uses: actions/setup-python@v6
      with:
        python-version: "3.10"
        cache: pip
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip uv
        # Use `uv pip` so [tool.uv].override-dependencies in pyproject.toml is honored
        # (relaxes leptonai's httpx==0.27.2 pin so litellm 1.83.x can be installed).
        uv pip install --system -e .[dev]
    - name: List Checked Files
      run: git diff --name-only ${GITHUB_BASE_REF} HEAD
    - name: Run Pre-Commit Checks
      run: pre-commit run --show-diff-on-failure --color=always --from-ref=${GITHUB_BASE_REF} --to-ref=HEAD


================================================
FILE: .github/workflows/tests.yml
================================================
name: CPU tests

on:
  pull_request:
    branches: [ "main" ]

  # Allows you to run this workflow manually from the Actions tab
  workflow_dispatch:

permissions:
  contents: read

concurrency:
  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
  cancel-in-progress: true

jobs:
  unit-tests:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout
      uses: actions/checkout@v6
    - name: Set up Docker Buildx
      uses: docker/setup-buildx-action@v3

    - name: Login to GHCR
      uses: docker/login-action@v3
      with:
        registry: ghcr.io
        username: ${{ github.actor }}
        password: ${{ secrets.GHCR_PAT }}

    - name: Free up disk space on Ubuntu
      run: |
        sudo rm -rf /usr/share/dotnet
        sudo rm -rf /opt/ghc
        sudo rm -rf /usr/local/share/boost
        sudo rm -rf "$AGENT_TOOLSDIRECTORY"
    - name: Set up Python 3.10
      uses: actions/setup-python@v6
      with:
        python-version: "3.10"
        cache: pip
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip uv
        # Use `uv pip` so [tool.uv].override-dependencies in pyproject.toml is honored
        # (relaxes leptonai's httpx==0.27.2 pin so litellm 1.83.x can be installed).
        uv pip install --system -e .[dev]
        # Clear pip cache
        pip cache purge || true
    - name: Build Images
      run: |
        # Calculate image tags that match what Python code expects
        # Python generates: locally-built-{sanitized_path}:{sha256_hash[:12]}
        REPO_LOWER=$(echo "${{ github.repository }}" | tr '[:upper:]' '[:lower:]')

        # Build nemo-skills-image with expected tag
        NEMO_SKILLS_HASH=$(sha256sum dockerfiles/Dockerfile.nemo-skills | cut -d' ' -f1 | cut -c1-12)
        NEMO_SKILLS_TAG="locally-built-dockerfiles-dockerfile-nemo-skills:${NEMO_SKILLS_HASH}"
        docker buildx build \
          --tag nemo-skills-image \
          --tag ${NEMO_SKILLS_TAG} \
          --file dockerfiles/Dockerfile.nemo-skills \
          --cache-from type=registry,ref=ghcr.io/${REPO_LOWER}/nemo-skills-image:cache \
          --cache-to type=registry,ref=ghcr.io/${REPO_LOWER}/nemo-skills-image:cache,mode=min \
          --load \
          .

        # Free up build cache before building the next image.
        # buildx --load exports a tarball then imports layers, so both
        # exist on disk at once. Pruning the builder cache between builds
        # reclaims enough space for the sandbox image to load.
        docker builder prune -f

        # Build sandbox-image with expected tag
        SANDBOX_HASH=$(sha256sum dockerfiles/Dockerfile.sandbox | cut -d' ' -f1 | cut -c1-12)
        SANDBOX_TAG="locally-built-dockerfiles-dockerfile-sandbox:${SANDBOX_HASH}"
        docker buildx build \
          --tag nemo-skills-sandbox-image \
          --tag ${SANDBOX_TAG} \
          --file dockerfiles/Dockerfile.sandbox \
          --build-arg GITHUB_CI=1 \
          --cache-from type=registry,ref=ghcr.io/${REPO_LOWER}/nemo-skills-sandbox-image:cache \
          --cache-to type=registry,ref=ghcr.io/${REPO_LOWER}/nemo-skills-sandbox-image:cache,mode=min \
          --load \
          .
    - name: Run all tests
      env:
        NV_INFERENCE_API_KEY: ${{ secrets.NV_INFERENCE_API_KEY }}
        NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
        HF_TOKEN: ${{ secrets.HF_TOKEN }}
      run: |
        # Default shared runtime directory
        sudo mkdir -p /nemo_run
        sudo chmod 777 /nemo_run
        docker run --rm --network=host -v /nemo_run:/nemo_run nemo-skills-sandbox-image &
        sleep 10
        set -o pipefail # this will make sure next line returns non-0 exit code if tests fail
        ns prepare_data gsm8k math-500 hle
        python -m pytest tests/ -m "not gpu" --junitxml=pytest.xml --cov-report=term-missing:skip-covered --cov=nemo_skills --cov=pipeline --durations=30 -rs -s -vvv


================================================
FILE: .gitignore
================================================
# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

*.json
!greptile.json
!tests/data/dummy_external_benchmark/benchmark_map.json
*.tar.gz
*.tar
*.npy
*.info
*.jsonl
*.csv
nemo_experiments
wandb
build
.hypothesis
*.zip
*.egg-info
*.xml
*.DS_Store
.coverage
.venv
*.lock

__pycache__
.ipynb_checkpoints

cluster_configs/*
!cluster_configs/example-*.yaml

nemo_skills/dataset/ruler/*/
nemo_skills/dataset/ruler2/*/
nemo_skills/dataset/aalcr/lcr/
.idea/
.idea/*
CLAUDE.md
AGENTS.md
.codex
.claude
.cursor
.idea
site/

#scripts at root level
/*.sh


================================================
FILE: .pre-commit-config.yaml
================================================
# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

default_language_version:
  python: python3

ci:
  autofix_prs: true
  autoupdate_commit_msg: '[pre-commit.ci] pre-commit suggestions'
  autoupdate_schedule: quarterly

repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v6.0.0
    hooks:
      - id: check-added-large-files
        args: ['--maxkb=1000']
      - id: check-case-conflict
      - id: check-yaml
        exclude: ^mkdocs\.yml$
      - id: detect-private-key
      - id: end-of-file-fixer
        exclude: docs/|\.txt$|\.patch$|test$
      - id: requirements-txt-fixer
      - id: trailing-whitespace
        exclude: \.txt$|\.patch$|test$

  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.12.9
    hooks:
      - id: ruff-check
        args: ["--fix"]
      - id: ruff-format

  - repo: local
    hooks:
      - id: check-signoff
        name: Check Signed-off-by
        entry: bash -c 'if ! grep -q "Signed-off-by:" "$1"; then echo "❌ Commit message must be signed off. Use git commit -s to add it automatically."; exit 1; fi' --
        language: system
        always_run: true
        stages: [commit-msg]
        types: [text]


================================================
FILE: CONTRIBUTING.md
================================================
# Contributing To Nemo-Skills

Thanks for your interest in contributing to Nemo-Skills!

## General guidelines

Applies to both humans and AI agents! Make sure the agent you use has access to these guidelines when implementing or reviewing Nemo-Skills features.

### Don't be overly defensive!

Being overly defensive can lead to silent misbehavior for our code which is a lot worse than a quick runtime failure. Here are some examples to avoid

- Don't use `.get` for accessing dictionary keys if the code expects them to be present.
  If we just replace a key that has to be there with empty string via `data.get(key_name, "")`
  we might silently corrupt the data and not notice it if previous step failed for whatever reason.
  It's not enough to do a warning in this case, it's much better to just do `data[key_name]` and let
  the code fail with a clear error
- Don't catch exceptions when we don't expect them to be normally raised. Same motivation as above,
  just let the code fail when something unexpected happens, so that users notice it and can fix problem
  instead of silently misbehaving. We don't always read logs, so better to just crash the job.
- Avoid cases when user-passed parameters are unused! E.g. if user specifies a new argument that's
  not supported by our code, the code should fail (**do not** silently ignore such parameters)!
  If a user doesn't specify an argument that's required, the code should fail! Avoid using defaults
  when there is no default value that's reasonable for majority of use-cases.
  This doesn't mean you need to have explicit checks for every parameter. It's best to use a dataclass or **kwargs syntax which will automatically handle this without complicating the code.
- Don't complicate code when security concerns are irrelevant. Nemo-Skills assumes that users have full
  access to the underlying system. They don't access it via api, they can run any commands directly.
  So things like allowing arbitrary command execution from user input are totally normal and shouldn't be flagged
  (e.g. subprocess with `shell=True` or directly executing command passed via an argument).
  The only place where we should pay attention to security concerns is when executing code generated by
  an LLM and we should generally try to always use our provided sandbox api for that.

### When adding new benchmarks
The following things are required when adding new benchmarks
- Add it to a corresponding place in the documentation.
  Make sure to add an example command for how to run evaluation and expected results for any model you tested it with.
  Describe any details that are specific to this dataset. Any special arguments to prepare data?
  Any non-standard inference arguments? Any other things to pay attention to?
- Don't forget to run `mkdocs serve` and visually check that the documentation renders properly in the browser.
- Avoid data loss because of evaluation mistakes. Our current design overrides the original
  generation files with new evaluation-specific keys. Make sure to do all computation before
  re-opening the files for writing to avoid accidental data loss if there is a bug and code fails before writing is complete.
- Run GPU tests in the CI (or locally). To run in CI, we need to set "run GPU tests" label
  (toggle it off and back on if rerunning after changes).
  By default all datasets will be prepared and evaluated on a few samples in the CI. You can
  remove your dataset from the test explicitly if it requires very heavy data preparation or
  has another reason why we can't use it. But try to avoid that if possible!
- If you enabled new modality or added new complicated evaluation / metrics logic, consider adding
  the dataset into slurm tests. This is the most comprehensive test we can do by running full
  evaluation on cluster with arbitrary model and check that results are as expected.

### Respect the Core / Pipeline dependency boundary

NeMo Skills is split into **Core** (inference, evaluation, tools, benchmarks) and **Pipeline** (CLI, cluster orchestration). The one-way rule:

- **Pipeline** can import from **Core**
- **Core** CANNOT import from **Pipeline** (no `nemo_run`, no `nemo_skills.pipeline`)

When adding dependencies: inference/evaluation/benchmark deps go in `core/requirements.txt`, orchestration deps go in `requirements/pipeline.txt`. This boundary is enforced by `tests/test_dependency_isolation.py`.

For full details (examples, common patterns, what to avoid), see [Dependency Boundary Guide](core/README.md).

### Keep the code elegant
When adding new features, try to keep the code simple and elegant.
- Can you reuse / extend an existing functionality?
- Do you need to check too many conditions?
- Is there a way to write this simpler (maybe sacrificing some really rare edge-cases)?
- Do the comments / docstrings you add help? Or is the code self-explanatory?
- Avoid complicated types. Do use types for simple things like dict / list / int / float / existing classes, but
  don't define new type interfaces with unions or other complicated structures. They quickly go out-of-date and
  aren't helpful in most cases.
- If adding a new parameter / function / method, keep names consistent. If a thing is called X in one place,
  it shouldn't be called Y in another place. Follow existing conventions.

When in doubt, think about this

```
>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
```

## Setup

Install the dependencies, including development dependencies as,

```shell
pip install -e .[dev]
```

## Pre-Commit Hooks

We use [`pre-commit`](https://pre-commit.com/) to manage pre-commit hooks.
To install, run

```shell
pre-commit install
```

All subsequent commits will be checked according to configuration in [`.pre-commit-config.yaml`](./.pre-commit-config.yaml).

## Signing Your Work

* We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.

* Any contribution which contains commits that are not Signed-Off will not be accepted.

* To sign off on a commit you simply use the `--signoff` (or `-s`) option when committing your changes:
  ```bash
  $ git commit -s -m "Add cool feature."
  ```
  This will append the following to your commit message:
  ```
  Signed-off-by: Your Name <your@email.com>
  ```

* Full text of the DCO:

  ```
  Developer Certificate of Origin
  Version 1.1

  Copyright (C) 2004, 2006 The Linux Foundation and its contributors.

  Everyone is permitted to copy and distribute verbatim copies of this
  license document, but changing it is not allowed.


  Developer's Certificate of Origin 1.1

  By making a contribution to this project, I certify that:

  (a) The contribution was created in whole or in part by me and I
      have the right to submit it under the open source license
      indicated in the file; or

  (b) The contribution is based upon previous work that, to the best
      of my knowledge, is covered under an appropriate open source
      license and I have the right under that license to submit that
      work with modifications, whether created in whole or in part
      by me, under the same open source license (unless I am
      permitted to submit under a different license), as indicated
      in the file; or

  (c) The contribution was provided directly to me by some other
      person who certified (a), (b) or (c) and I have not modified
      it.

  (d) I understand and agree that this project and the contribution
      are public and that a record of the contribution (including all
      personal information I submit with it, including my sign-off) is
      maintained indefinitely and may be redistributed consistent with
      this project or the open source license(s) involved.
  ```

## Running Tests

Check existing github actions CI for cpu/gpu tests. Slurm tests documentation is [here](/tests/slurm-tests).
When running cpu tests, it's important to always add `-s` flag to pytest as otherwise all tests using nemo-run
will fail.

More details TBD

**TIP**: Our CI depends on some secret variables only accessible to developers of the repository.
To run the full suite of tests, please create pull requests from a branch instead of a fork whenever
possible.


================================================
FILE: LICENSE
================================================
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: MANIFEST.in
================================================
recursive-include nemo_skills *.yaml
recursive-include nemo_skills *.txt
graft dockerfiles
graft requirements


================================================
FILE: README.md
================================================
# Nemo Skills

Nemo-Skills is a collection of pipelines to improve "skills" of large language models (LLMs). We support everything needed for LLM development, from synthetic data generation, to model training, to evaluation on a wide range of benchmarks. Start developing on a local workstation and move to a large-scale Slurm cluster with just a one-line change.


Here are some of the features we support:

- [Flexible LLM inference](https://nvidia-nemo.github.io/Skills/pipelines/generation/):
  - Seamlessly switch between API providers, local server and large-scale slurm jobs for LLM inference.
  - Host models (on 1 or many nodes) with [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM), [vLLM](https://github.com/vllm-project/vllm), [sglang](https://github.com/sgl-project/sglang) or [Megatron](https://github.com/NVIDIA/Megatron-LM).
  - Scale SDG jobs from 1 GPU on a local machine all the way to tens of thousands of GPUs on a slurm cluster.
- [Model evaluation](https://nvidia-nemo.github.io/Skills/evaluation):
  - Evaluate your models on many popular benchmarks.
    - [**Math (natural language**)](https://nvidia-nemo.github.io/Skills/evaluation/natural-math): e.g. [aime24](https://nvidia-nemo.github.io/Skills/evaluation/natural-math/#aime24), [aime25](https://nvidia-nemo.github.io/Skills/evaluation/natural-math/#aime25), [hmmt_feb25](https://nvidia-nemo.github.io/Skills/evaluation/natural-math/#hmmt_feb25)
    - [**Math (formal language)**](https://nvidia-nemo.github.io/Skills/evaluation/formal-math): e.g. [minif2f](https://nvidia-nemo.github.io/Skills/evaluation/formal-math/#minif2f), [proofnet](https://nvidia-nemo.github.io/Skills/evaluation/formal-math/#proofnet), [putnam-bench](https://nvidia-nemo.github.io/Skills/evaluation/formal-math/#putnam-bench)
    - [**Code**](https://nvidia-nemo.github.io/Skills/evaluation/code): e.g. [swe-bench](https://nvidia-nemo.github.io/Skills/evaluation/code/#swe-bench), [livecodebench](https://nvidia-nemo.github.io/Skills/evaluation/code/#livecodebench), [bird](https://nvidia-nemo.github.io/Skills/evaluation/code/#bird)
    - [**Scientific knowledge**](https://nvidia-nemo.github.io/Skills/evaluation/scientific-knowledge): e.g., [hle](https://nvidia-nemo.github.io/Skills/evaluation/scientific-knowledge/#hle), [scicode](https://nvidia-nemo.github.io/Skills/evaluation/scientific-knowledge/#scicode), [gpqa](https://nvidia-nemo.github.io/Skills/evaluation/scientific-knowledge/#gpqa)
    - [**Instruction following**](https://nvidia-nemo.github.io/Skills/evaluation/instruction-following): e.g. [ifbench](https://nvidia-nemo.github.io/Skills/evaluation/instruction-following/#ifbench), [ifeval](https://nvidia-nemo.github.io/Skills/evaluation/instruction-following/#ifeval)
    - [**Long-context**](https://nvidia-nemo.github.io/Skills/evaluation/long-context): e.g. [ruler](https://nvidia-nemo.github.io/Skills/evaluation/long-context/#ruler), [mrcr](https://nvidia-nemo.github.io/Skills/evaluation/long-context/#mrcr), [aalcr](https://nvidia-nemo.github.io/Skills/evaluation/long-context/#aalcr), [longbench-v2](https://nvidia-nemo.github.io/Skills/evaluation/long-context/#longbench-v2)
    - [**Tool-calling**](https://nvidia-nemo.github.io/Skills/evaluation/tool-calling): e.g. [bfcl_v3](https://nvidia-nemo.github.io/Skills/evaluation/tool-calling/#bfcl_v3)
    - [**Multilingual**](https://nvidia-nemo.github.io/Skills/evaluation/multilingual): e.g. [mmlu-prox](https://nvidia-nemo.github.io/Skills/evaluation/multilingual/#mmlu-prox), [flores-200](https://nvidia-nemo.github.io/Skills/evaluation/multilingual/#flores-200), [wmt24pp](https://nvidia-nemo.github.io/Skills/evaluation/multilingual/#wmt24pp)
    - [**Speech & Audio**](https://nvidia-nemo.github.io/Skills/evaluation/speech-audio): e.g. [asr-leaderboard](https://nvidia-nemo.github.io/Skills/evaluation/speech-audio/#asr-leaderboard), [mmau-pro](https://nvidia-nemo.github.io/Skills/evaluation/speech-audio/#mmau-pro)
    - [**Vision-Language Models (VLM)**](https://nvidia-nemo.github.io/Skills/evaluation/vlm): e.g. [mmmu-pro](https://nvidia-nemo.github.io/Skills/evaluation/vlm/#mmmu-pro)
  - Easily parallelize each evaluation across many slurm jobs, self-host LLM judges, bring your own prompts or change benchmark configuration in any other way.
- [Model training](https://nvidia-nemo.github.io/Skills/pipelines/training): Train models using [NeMo-RL](https://github.com/NVIDIA-NeMo/RL/) or [verl](https://github.com/volcengine/verl).

## News
* [12/15/2025]: Released the recipe for reproducing [Nemotron-Math-v2](https://huggingface.co/datasets/nvidia/Nemotron-Math-v2) and [Nemotron-Math-Proofs-v1](https://huggingface.co/datasets/nvidia/Nemotron-Math-Proofs-v1) datasets that were used as part of the training data for [NVIDIA-Nemotron-3-Nano-30B-A3B-BF16](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16).
* [11/25/2025]: Added the recipe for reproducing the main [experimental results](https://github.com/NVIDIA-NeMo/Skills/tree/main/recipes/proof-gen-verification) for [Scaling Generative Verifiers For Natural Language Mathematical Proof Verification And Selection](https://arxiv.org/abs/2511.13027).
* [08/22/2025]: Added details for [reproducing evals](https://nvidia-nemo.github.io/Skills/tutorials/2025/08/22/reproducing-nvidia-nemotron-nano-9b-v2-evals/) for the [NVIDIA-Nemotron-Nano-9B-v2](https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2) model by NVIDIA.
* [08/15/2025]: Added details for [reproducing evals](https://nvidia-nemo.github.io/Skills/tutorials/2025/08/15/reproducing-llama-nemotron-super-49b-v15-evals/) for the [Llama-3_3-Nemotron-Super-49B-v1_5](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1_5) model by NVIDIA.
* [07/30/2025]: The datasets used to train OpenReasoning models are released! Math and code are available as part of [Nemotron-Post-Training-Dataset-v1](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1) and science is available in
[OpenScienceReasoning-2](https://huggingface.co/datasets/nvidia/OpenScienceReasoning-2).
See our [documentation](https://nvidia-nemo.github.io/Skills/releases/openreasoning/training) for more details.

* [07/18/2025]: We released [OpenReasoning](https://nvidia-nemo.github.io/Skills/releases/openreasoning/) models! SOTA scores on math, coding and science benchmarks.

![Evaluation Results with pass@1](docs/releases/openreasoning/pass-1.png)

![Evaluation Results with GenSelect](docs/releases/openreasoning/genselect.png)


* [04/23/2025]: We released [OpenMathReasoning](https://nvidia-nemo.github.io/Skills/openmathreasoning1) dataset and models!

  * OpenMathReasoning dataset has 306K unique mathematical problems sourced from [AoPS forums](https://artofproblemsolving.com/community) with:
      * 3.2M long chain-of-thought (CoT) solutions
      * 1.7M long tool-integrated reasoning (TIR) solutions
      * 566K samples that select the most promising solution out of many candidates (GenSelect)
  * OpenMath-Nemotron models are SoTA open-weight models on math reasoning benchmarks at the time of release!

* [10/03/2024]: We released [OpenMathInstruct-2](https://nvidia-nemo.github.io/Skills/openmathinstruct2) dataset and models!

  * OpenMathInstruct-2 is a math instruction tuning dataset with 14M problem-solution pairs generated using the Llama3.1-405B-Instruct model.
  * OpenMath-2-Llama models show significant improvements compared to their Llama3.1-Instruct counterparts.

## Getting started

To get started, follow these [steps](https://nvidia-nemo.github.io/Skills/basics),
browse available [pipelines](https://nvidia-nemo.github.io/Skills/pipelines) or run `ns --help` to see all available
commands and their options.

You can find more examples of how to use Nemo-Skills in the [tutorials](https://nvidia-nemo.github.io/Skills/tutorials) page.

We've built and released many popular models and datasets using Nemo-Skills. See all of them in the [Papers & Releases](./releases/index.md) documentation.

You can find the full documentation [here](https://nvidia-nemo.github.io/Skills/).


## Contributing

We welcome contributions to Nemo-Skills! Please see our [Contributing Guidelines](./CONTRIBUTING.md) for more information on how to get involved.


Disclaimer: This project is strictly for research purposes, and not an official product from NVIDIA.


================================================
FILE: __init__.py
================================================


================================================
FILE: cluster_configs/example-local.yaml
================================================
# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

executor: local

containers:
  trtllm: nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc8
  vllm: dockerfile:dockerfiles/Dockerfile.vllm
  sglang: lmsysorg/sglang:v0.5.10.post1
  # dockerfile: for now can only specify relative to repo root
  megatron: dockerfile:dockerfiles/Dockerfile.megatron
  sandbox: dockerfile:dockerfiles/Dockerfile.sandbox
  nemo-skills: dockerfile:dockerfiles/Dockerfile.nemo-skills
  verl: dockerfile:dockerfiles/Dockerfile.verl
  nemo-rl: dockerfile:dockerfiles/Dockerfile.nemo-rl

# add required mounts for models/data here
# the code is mounted automatically inside /nemo_run/code
# but please note that we only package what's tracked by git + jsonl files inside nemo_skills/dataset

# mounts:
# you can define as many as you need, e.g.
#   - /mnt/datadrive/models:/models
#   - /mnt/datadrive/data:/data
#   - /home/<username>/workspace:/workspace
#   you can also override container libraries by directly mounting over them. E.g. to override NeMo-RL do
#   - <...>/NeMo-RL:/opt/NeMo-RL

# define any environment variables. Note that HF_HOME is required by default and needs to be a mounted path!
# env_vars:
#   - HF_HOME=/models/hf-cache


================================================
FILE: cluster_configs/example-ray.yaml
================================================
# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Ray executor cluster config — initial Ray support for SFT (and eventually GRPO).
#
# Use this for either:
#   (a) standalone Ray clusters (e.g., an existing self-managed Ray deployment), or
#   (b) Ray-on-Slurm setups where Slurm allocates the nodes and Ray runs the
#       job-submission layer inside the allocation.
#
# This is DISTINCT from the existing `with_ray=True` flag (Ray inside a
# heterogeneous Slurm allocation) and from `nemo_run.core.execution.kuberay`
# (Kubernetes-managed Ray clusters via the KubeRay operator).
#
# Supported in this release:
#   - Single-command Ray jobs (SFT, eventually GRPO)
#   - Dependency chaining via Ray submission IDs
#   - Shared-FS runtime/code visibility (head + workers see the same paths)
#
# Out of scope (raises NotImplementedError):
#   - Sandbox judge containers
#   - Server co-scheduling (vLLM/SGLang/TRT-LLM alongside the main job)
#   - Heterogeneous tasks
#   - Multi-command task groups

executor: ray

# Ray cluster connection.
ray:
  # "auto" attaches to a Ray cluster started in the current environment
  # (e.g., `ray start --head` in a Ray-on-Slurm setup, or RAY_ADDRESS env var).
  # Use a "ray://host:10001" URI for a remote Ray client connection.
  address: auto
  # Namespace for job isolation across users/jobs on a shared cluster.
  namespace: nemo
  # Default *per-node* CPU allocation. NeMo-Skills multiplies this by the
  # workflow's `num_nodes` to compute the per-job total CPU request, and Ray's
  # `entrypoint_num_cpus` is then derived as total / num_nodes (= this value).
  # GPU count is derived separately from the workflow's `num_gpus` parameter.
  default_num_cpus: 8

# Where Ray submission metadata + per-job logs are written. Should be on a
# shared filesystem visible to head + workers so both sides see the same paths.
jobs:
  log_dir: /workspace/ray_jobs

containers:
  # Ray jobs typically use the same NeMo-RL / NeMo-Skills container as Slurm —
  # specify the image refs here. For air-gapped deployments, use locally-built
  # .sqsh / pre-staged images; runtime pulls are not required for the Ray path.
  # Example (uncomment and fill in):
  # nemo-rl: <local-path-or-registry-ref>
  # nemo-skills: <local-path-or-registry-ref>
  # vllm: <local-path-or-registry-ref>

# Mounts visible to Ray workers. Code is auto-mounted at /nemo_run/code by
# nemo-run for the Slurm path; for Ray, packaging happens via Ray's runtime_env
# `working_dir`. If your shared FS already has the code staged, you do not need
# to define a code mount here.
#
# mounts:
#   - <shared-fs-path>/data:/data
#   - <shared-fs-path>/models:/models

# Environment variables for Ray jobs. HF_HOME is required by default — must be a
# mounted path (or a path on the shared FS visible to Ray workers).
# env_vars:
#   - HF_HOME=/models/hf-cache
#   - TOKENIZERS_PARALLELISM=false


================================================
FILE: cluster_configs/example-slurm.yaml
================================================
# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

executor: slurm

containers:
  # follow steps in https://nvidia-nemo.github.io/Skills/basics/#slurm-inference
  # to complete this section

job_name_prefix: "nemo_skills:"
# define this for ssh access
# ssh_tunnel:
#   host: <slurm host>
#   user: <username>
#   job_dir: <some location on slurm cluster to keep job metadata, uploaded code and generated sbatch files>
#   identity: <can specify ssh key to avoid entering password>

# if you're running directly from cluster, you only need to define job_dir and shouldn't use ssh_tunnel
# job_dir: <some location on slurm cluster to keep job metadata, uploaded code and generated sbatch files>

# define your account/partition here
# account: <slurm account>
# partition: <slurm partition>
# cpu_partition: <if cluster has a dedicated cpu partition, you can define it here>

# add required mounts for models/data here
# the code is mounted automatically inside /nemo_run/code
# but please note that we only package what's tracked by git + jsonl files inside nemo_skills/dataset

# mounts:
#   - <slurm location for your data/models>:<where to mount in a container>
#   e.g.
#   - <path on slurm>/models:/models
#   - <path on slurm>/data:/data
#   you can also override container libraries by directly mounting over them. E.g. to override NeMo-Aligner do
#   - <path on slurm>/NeMo-Aligner:/opt/NeMo-Aligner

# can use this section to set timeouts for different partitions
# this will be used as a slurm parameter + to signal SFT job to finish
# before the timeout to have time to save the last checkpoint
# default_timeout: "06:00:00"
# timeouts:
#   partition_name1: "06:00:00"
#   partition_name2: "01:30:00"

# define any environment variables. Note that HF_HOME is required by default and needs to be a mounted path!
# env_vars:
#   - HF_HOME=/models/hf-cache


================================================
FILE: core/README.md
================================================
# Core / Pipeline Dependency Boundary

NeMo Skills is split into **Core** (agent runtime) and **Pipeline** (orchestration). The rule is simple:

```
Pipeline can import from Core.
Core CANNOT import from Pipeline.
```

Core modules are everything under `nemo_skills/` **except** `nemo_skills/pipeline/`. They must never have top-level imports from `nemo_skills.pipeline` or `nemo_run`. This boundary is enforced by `tests/test_dependency_isolation.py` which verifies that core modules import successfully when `nemo_run` is blocked.

## Dependency placement

When adding a new dependency, put it in the right requirements file:

| If the dependency is needed for... | Add it to |
|---|---|
| Inference, evaluation, tool calling, any benchmark evaluator | `core/requirements.txt` |
| CLI commands (`ns`), cluster orchestration, experiment tracking | `requirements/pipeline.txt` |

There is no separate `main.txt` — `pyproject.toml` composes the default install from `core/requirements.txt` + `requirements/pipeline.txt`. Each dependency lives in exactly one file.

**Boundary definition:**

- **Core** = everything needed to run inference + evaluation locally (including all benchmark evaluator deps)
- **Pipeline** = orchestration-only deps (`nemo_run`, `typer`, `click`, `nemo-evaluator-launcher`)

All benchmark-specific dependencies (e.g., `faiss-cpu`, `sacrebleu`, `datasets`, `func-timeout`) go in `core/requirements.txt`. Eventually these should migrate to JIT (just-in-time) install so that benchmark deps are installed on demand at runtime, but until that is implemented, they must be in core so evaluators do not crash at runtime.

## Examples of correct placement

- `httpx` -> `core/requirements.txt` (used by model inference clients)
- `sympy` -> `core/requirements.txt` (used by math graders)
- `sacrebleu` -> `core/requirements.txt` (used by translation benchmark evaluator)
- `faiss-cpu` -> `core/requirements.txt` (used by BFCL benchmark evaluator)
- `nemo_run` -> `requirements/pipeline.txt` (cluster job orchestration)
- `wandb` -> `core/requirements.txt` (used by summarize-results)

## Examples of mistakes to avoid

- Adding `nemo_run` to `core/requirements.txt` -- it is a pipeline/orchestration dependency, core must not depend on it.
- Adding `typer` to `core/requirements.txt` -- it is the CLI framework, only used by the pipeline layer.

## Writing new core code

- If you need something from `nemo_skills.pipeline`, your code probably belongs in pipeline, not core. Move it.
- If you have a function that works locally but *also* needs a cluster variant, keep both paths in the same function but use a **lazy import** for the pipeline code inside the branch that needs it (see `dataset/utils.py:get_dataset_module` for the pattern). Never add a top-level import.
- The pipeline layer (`nemo_skills/pipeline/`) can provide thin wrappers or re-exports for convenience (see `pipeline/dataset.py`), but all local logic should live in core.

## Dataset loading example

The boundary shows up concretely in dataset loading:

```python
# Core: local-only dataset loading (no cluster deps)
from nemo_skills.dataset.utils import get_dataset_module
module, data_path = get_dataset_module("gsm8k")

# Pipeline: cluster-aware wrapper (SSH downloads, mount resolution)
from nemo_skills.pipeline.dataset import get_dataset_module
module, data_path = get_dataset_module("gsm8k", cluster_config=cfg)
```

The core version has zero pipeline imports. The pipeline wrapper delegates to core for local resolution and only adds cluster-specific logic (mount-path unmounting, SSH file downloads) when needed.


================================================
FILE: core/pyproject.toml
================================================
# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

[build-system]
requires = [
    "setuptools",
    "wheel"
]
build-backend = "setuptools.build_meta"

[project]
dynamic = ["version", "dependencies"]

name = "nemo-skills-core"
description = "NeMo Skills core runtime -- inference, evaluation, and tool calling"
readme = {text = "NeMo Skills core runtime for inference, evaluation, and tool calling. See https://nvidia-nemo.github.io/Skills for full documentation.", content-type = "text/plain"}
classifiers = [
    "Programming Language :: Python :: 3",
    "Programming Language :: Python :: 3.10",
    "License :: OSI Approved :: Apache Software License",
    "Operating System :: OS Independent",
]
requires-python = ">=3.10"

[project.urls]
homepage = "https://nvidia-nemo.github.io/Skills"
source = "https://github.com/NVIDIA-NeMo/Skills"
issues = "https://github.com/NVIDIA-NeMo/Skills/issues"

[project.scripts]
ns = "nemo_skills._cli_stub:main"

[tool.setuptools]
include-package-data = true

[tool.setuptools.packages.find]
where = [".."]
exclude = ["tests", "tests.*", "core", "core.*"]

[tool.setuptools.dynamic]
version = { attr = "nemo_skills.version.__version__" }
dependencies = {file = ["requirements.txt"]}


================================================
FILE: core/requirements.txt
================================================
# Core dependencies for inference, evaluation, tool calling, and all benchmark evaluators.
# No cluster orchestration deps (nemo_run, typer, etc.)
# NOTE: benchmark-specific deps are included here because JIT install is not yet implemented.
# Once JIT install is ready, benchmark deps can be moved to per-benchmark extras.

bs4
compute-eval @ git+https://github.com/NVIDIA/compute-eval.git@e01a5d2
contractions
datasets
editdistance
evalplus @ git+https://github.com/evalplus/evalplus@c91370f
faiss-cpu
fire
flask
func-timeout
gradio
httpx
huggingface_hub
hydra-core
ipython
iso639-lang
langcodes
langdetect
language-data
litellm[caching]==1.83.14
math-verify[antlr4_9_3]
mcp
numpy
openai
openpyxl>=3.1.0
pandas>=2.0.0
pyxlsb>=1.0.10
pyyaml
rank_bm25
requests
rich
sacrebleu
scikit-learn
sentence_transformers
serpapi
sympy
torchcodec
tqdm
transformers
wandb


================================================
FILE: dataset_explorer_demo/README.md
================================================
# Dataset Explorer Demo

1. Download data TBD
2. Retrieve similar questions from OpenMathInstruct2. Do it for all benchmarks you want to compare against.
   Assuming you're running from this folder.

   ```
   python -m nemo_skills.inference.retrieve_similar \
       ++retrieve_from=./data.jsonl \
       ++compare_to="../nemo_skills/dataset/<benchmark>/test.jsonl" \
       ++output_file=./similar-retrieved-openmath2/<benchmark>.jsonl \
       ++top_k=5
   ```

3. Let's do the same for original MATH training set to get a sense of whether OpenMathInstruct-2 is in the same
   distribution or not.

   ```
   python -m nemo_skills.inference.retrieve_similar \
       ++retrieve_from=../nemo_skills/dataset/math/train.jsonl \
       ++compare_to="../nemo_skills/dataset/<benchmark>/test.jsonl" \
       ++output_file=./similar-retrieved-math-train/<benchmark>.jsonl \
       ++top_k=5
   ```

4. Start the Gradio demo.

   ```
   python visualize_similar.py
   ```


================================================
FILE: dataset_explorer_demo/visualize_similar.py
================================================
# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import json
import os
import random
import re
from functools import lru_cache

import gradio as gr
from latex2mathml.converter import convert
from latex2mathml.exceptions import NoAvailableTokensError


@lru_cache(maxsize=1000)
def load_jsonl(file_path):
    with open(file_path, "r") as f:
        return [json.loads(line) for line in f]


@lru_cache(maxsize=10000)
def render_latex(text):
    def replace_matrix(match):
        matrix_content = match.group(1)
        rows = matrix_content.split("\\\\")
        mml_rows = "".join(f"<mtr><mtd>{convert_and_clean(row.strip())}</mtd></mtr>" for row in rows)
        return f'<mrow><mo>(</mo><mtable rowspacing="4pt" columnspacing="1em">{mml_rows}</mtable><mo>)</mo></mrow>'

    def replace_align(match):
        align_content = match.group(1)
        rows = align_content.split("\\\\")
        mml_rows = []
        for row in rows:
            if "&" in row:
                left, right = row.split("&")
                mml_row = f'<mtr><mtd columnalign="right">{convert_and_clean(left.strip())}</mtd><mtd columnalign="left">{convert_and_clean(right.strip())}</mtd></mtr>'
            else:
                mml_row = f'<mtr><mtd columnalign="center">{convert_and_clean(row.strip())}</mtd></mtr>'
            mml_rows.append(mml_row)
        return f'<mtable columnspacing="1em" rowspacing="3pt" displaystyle="true">{"".join(mml_rows)}</mtable>'

    def convert_and_clean(latex):
        try:
            # Pre-process nested matrices
            latex = re.sub(r"\\begin{pmatrix}(.*?)\\end{pmatrix}", replace_matrix, latex, flags=re.DOTALL)

            # Handle \displaystyle
            latex = latex.replace("\\displaystyle", "")

            # Handle nested exponents
            latex = re.sub(r"\^{([^{}]+)}", r"^{\1}", latex)

            # Convert LaTeX to MathML
            mathml = convert(latex)
            mathml = re.sub(r"<math.*?>(.*)</math>", r"\1", mathml)
            return mathml
        except NoAvailableTokensError:
            return latex

    # Handle align* environment
    text = re.sub(
        r"\\begin{align\*}(.*?)\\end{align\*}",
        lambda m: f'<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">{replace_align(m)}</math>',
        text,
        flags=re.DOTALL,
    )

    # Handle display math, excluding intervals
    text = re.sub(
        r"\[(?![-\d, ]+\])(.*?)\]",
        lambda m: f'<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">{convert_and_clean(m.group(1))}</math>',
        text,
        flags=re.DOTALL,
    )

    # Handle inline math
    text = re.sub(
        r"\$(.*?)\$",
        lambda m: f'<math xmlns="http://www.w3.org/1998/Math/MathML">{convert_and_clean(m.group(1))}</math>',
        text,
    )

    return text


@lru_cache(maxsize=1000)
def display_entry(index, test_set):
    data_openmath2, data_math_train = load_test_sets(f"{test_set}.jsonl")

    # Check if the index is valid
    if index < 0 or index >= len(data_openmath2):
        return f"Error: Invalid index. Please enter a number between 0 and {len(data_openmath2) - 1}."

    entry_openmath2 = data_openmath2[index]
    entry_math_train = data_math_train[index]

    # Check if the current test set is GSM8K
    if test_set == "gsm8k":
        test_problem = entry_openmath2["problem"]
        similar_openmath2 = entry_openmath2["similar_items"]
        similar_math_train = entry_math_train["similar_items"]
    else:
        test_problem = render_latex(entry_openmath2["problem"])
        similar_openmath2 = [render_latex(cand) for cand in entry_openmath2["similar_items"]]
        similar_math_train = [render_latex(cand) for cand in entry_math_train["similar_items"]]

    html = f"<h2>Test set problem:</h2><p>{test_problem}</p>"
    html += "<hr>"
    html += "<div style='display: flex;'>"
    html += "<div style='flex: 1; padding-right: 10px;'>"
    html += "<h2>Most similar OpenMathInstruct-2 problems:</h2><ol>"
    for cand in similar_openmath2:
        html += f"<li>{cand}</li>"
    html += "</ol></div>"
    html += "<div style='border-left: 1px solid #ccc;'></div>"
    html += "<div style='flex: 1; padding-left: 10px;'>"
    html += "<h2>Most similar MATH training set problems:</h2><ol>"
    for cand in similar_math_train:
        html += f"<li>{cand}</li>"
    html += "</ol></div>"
    html += "</div>"

    return html


def random_entry(data):
    return random.randint(0, len(data) - 1)


@lru_cache(maxsize=10)
def load_test_sets(test_set):
    file_path_openmath2 = f"./similar-retrieved-openmath2/{test_set}"
    file_path_math_train = f"./similar-retrieved-math-train/{test_set}"

    data_openmath2 = load_jsonl(file_path_openmath2)
    data_math_train = load_jsonl(file_path_math_train)

    # Sort both datasets based on the 'problem' field (or use 'id' if available)
    data_openmath2.sort(key=lambda x: x["problem"])
    data_math_train.sort(key=lambda x: x["problem"])

    # Check if the sorted datasets have the same length and matching problems
    if len(data_openmath2) != len(data_math_train):
        raise ValueError(
            f"Datasets have different lengths: OpenMathInstruct-2 ({len(data_openmath2)}) vs MATH training set ({len(data_math_train)})"
        )

    for i, (entry_openmath2, entry_math_train) in enumerate(zip(data_openmath2, data_math_train)):
        if entry_openmath2["problem"] != entry_math_train["problem"]:
            raise ValueError(
                f"Mismatch at index {i}: OpenMathInstruct-2 problem doesn't match MATH training set problem"
            )

    return data_openmath2, data_math_train


test_sets = [f for f in os.listdir("./similar-retrieved-openmath2") if f.endswith(".jsonl")]
test_set_names = [os.path.splitext(f)[0] for f in test_sets]

if "math.jsonl" in test_sets:
    test_sets.remove("math.jsonl")
    test_sets.insert(0, "math.jsonl")
    test_set_names = [os.path.splitext(f)[0] for f in test_sets]

with gr.Blocks() as demo:
    gr.Markdown("# OpenMathInstruct-2 test set contamination explorer")
    gr.Markdown(
        "During construction of OpenMathInstruct-2 we generated many synthetic problems. "
        "We did a very thorough decontamination to remove exact duplicates (including rephrases) with popular benchmarks.<br>"
        "Still our dataset contains many questions that are very similar to test sets. "
        "To make things more transparent we created this demo, that you can use to explore "
        "most similar questions from our data for each of the test set problems.<br>"
        "We also provide closest examples from MATH training set, since it was used as seed data "
        "to create our dataset and in most cases that training set already contains very similar questions to the test sets!<br>"
        "See our full dataset at HuggingFace: [OpenMathInstruct-2](https://huggingface.co/datasets/nvidia/OpenMathInstruct-2)<br>"
        "And read our [paper](https://arxiv.org/abs/2410.01560) to learn more about the decontamination process and how we retrieve similar questions."
    )

    warning_box = gr.Markdown(visible=False)

    with gr.Row():
        test_set_dropdown = gr.Dropdown(choices=test_set_names, label="Select Test Set", value=test_set_names[0])
        index_input = gr.Number(label="Problem Index", value=0, step=1)
        random_button = gr.Button("Random Problem")

    output = gr.HTML()

    current_test_set = gr.State(test_set_names[0])

    def update_test_set(test_set):
        data_openmath2, data_math_train = load_test_sets(f"{test_set}.jsonl")
        warning = ""
        warning_visible = False
        if test_set == "omni-math":
            warning = "⚠️ Since Omni-Math benchmarks was released after we finished training of our models, we didn't perform decontamination with it and some of the problems might match exactly!"
            warning_visible = True
        return (
            0,
            display_entry(0, test_set),
            warning,
            gr.update(visible=warning_visible),
            test_set,
            gr.update(maximum=len(data_openmath2) - 1),  # Update the maximum allowed index
        )

    def display_entry_wrapper(index, current_test_set):
        data_openmath2, _ = load_test_sets(f"{current_test_set}.jsonl")
        # Ensure the index is within bounds
        index = max(0, min(int(index), len(data_openmath2) - 1))
        return display_entry(index, current_test_set)

    def random_entry_wrapper(current_test_set):
        data_openmath2, _ = load_test_sets(f"{current_test_set}.jsonl")
        return random_entry(data_openmath2)

    test_set_dropdown.change(
        update_test_set,
        inputs=[test_set_dropdown],
        outputs=[
            index_input,
            output,
            warning_box,
            warning_box,
            current_test_set,
            index_input,
        ],
    )
    index_input.change(display_entry_wrapper, inputs=[index_input, current_test_set], outputs=output)
    random_button.click(random_entry_wrapper, inputs=[current_test_set], outputs=index_input)

    demo.load(display_entry_wrapper, inputs=[index_input, current_test_set], outputs=output)

demo.launch(debug=False, server_name="0.0.0.0", server_port=5005)


================================================
FILE: dockerfiles/Dockerfile.megatron
================================================
FROM nvcr.io/nvidia/pytorch:25.04-py3

# Set working directory
WORKDIR /opt

# Install megatron-lm
ENV MEGATRON_COMMIT=dfc0a3d004391a82d8d8a5a6d991b65eaed0190c
RUN git clone https://github.com/NVIDIA/Megatron-LM && \
    cd Megatron-LM && \
    git checkout $MEGATRON_COMMIT && \
    pip install -e .

# installing libs for hf -> megatron conversion
RUN pip install transformers accelerate

# fix for https://github.com/NVIDIA/NeMo/issues/12836
# there is a global requirements lock that we need to remove..
RUN rm /etc/pip/constraint.txt && touch /etc/pip/constraint.txt
RUN pip install -U "nvidia-modelopt[all]>=0.27"

ENV PYTHONPATH=/opt/Megatron-LM


================================================
FILE: dockerfiles/Dockerfile.nemo-rl
================================================
# syntax=docker/dockerfile:1
# copied and edited from https://github.com/NVIDIA/NeMo-RL/blob/main/docker/Dockerfile
# TODO: from next update try to re-use their dockerfile as is as they support specifying the commit

ARG BASE_IMAGE=nvcr.io/nvidia/cuda-dl-base:25.05-cuda12.9-devel-ubuntu24.04

FROM scratch AS nemo-rl

ARG NEMO_RL_COMMIT=${NEMO_RL_COMMIT:-e95efb912a6909b5da91ffeb197debe91fd480d8}
ADD --keep-git-dir=true https://github.com/NVIDIA-NeMo/RL.git#${NEMO_RL_COMMIT} /


FROM ${BASE_IMAGE} AS base
# An environment variable to indicate that we are in a container.
ENV NRL_CONTAINER=1

# It is more convenient for users to run as root
USER root

RUN <<"EOF" bash -exu -o pipefail
export DEBIAN_FRONTEND=noninteractive
export TZ=America/Los_Angeles

apt-get update
apt-get install -y --no-install-recommends \
    jq \
    curl \
    git \
    rsync \
    wget \
    less \
    vim \

# Nsight
apt install -y --no-install-recommends gnupg
echo "deb http://developer.download.nvidia.com/devtools/repos/ubuntu$(source /etc/lsb-release; echo "$DISTRIB_RELEASE" | tr -d .)/$(dpkg --print-architecture) /" | tee /etc/apt/sources.list.d/nvidia-devtools.list
apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
apt update
apt install -y nsight-systems-cli

# To fix CVE-2025-68973
apt install -y --only-upgrade gnupg

apt-get clean
rm -rf /var/lib/apt/lists/*
EOF

# Install uv and python
ARG UV_VERSION=0.9.7
ARG PYTHON_VERSION=3.12
ENV PATH="/root/.local/bin:$PATH"
RUN curl -LsSf https://astral.sh/uv/${UV_VERSION}/install.sh | sh && \
    uv python install ${PYTHON_VERSION}

# Disable usage stats by default for users who are sensitive to sharing usage.
# Users are encouraged to enable if the wish.
ENV RAY_USAGE_STATS_ENABLED=0
# After ray>=2.47, this feature is enabled by default which creates uv venvs for any py_executable starting with `uv run`.
# There is severe contention and performance issues with this enabled considering our dependencies are so large and occasionally
# need to be compiled, so NeMo RL has an implementation in nemo_rl/utils/venv.py that does it once per node as opposed to once per task.
ENV RAY_ENABLE_UV_RUN_RUNTIME_ENV=0
ENV NEMO_RL_VENV_DIR=/opt/ray_venvs


FROM base AS hermetic

WORKDIR /opt/NeMo-RL

# Variables to control the build of TE. If there are issues with parallelization, consider
# setting these to 1.
ARG MAX_JOBS
ARG NVTE_BUILD_THREADS_PER_JOB
# Only use for custom vllm installs. Learn more at https://github.com/NVIDIA-NeMo/RL/blob/main/docs/guides/use-custom-vllm.md
ARG BUILD_CUSTOM_VLLM

ENV UV_PROJECT_ENVIRONMENT=/opt/nemo_rl_venv
ENV UV_LINK_MODE=copy

# Ensure DeepEP is built for H100 and B200 (also mcore inference unified memory API now invokes a torch API that requires these to be set)
ENV TORCH_CUDA_ARCH_LIST="9.0 10.0"

# First copy only the dependency files
COPY --from=nemo-rl pyproject.toml uv.lock ./
# Copy in the top level __init__.py/package_info.py since build-custom-vllm.sh needs the nemo_rl package to exist.
COPY --from=nemo-rl nemo_rl/__init__.py nemo_rl/package_info.py ./nemo_rl/
COPY --from=nemo-rl tools/build-custom-vllm.sh ./tools/build-custom-vllm.sh
COPY --from=nemo-rl --link research/ ./research/
COPY --from=nemo-rl --link 3rdparty/ ./3rdparty/

RUN --mount=type=ssh <<"EOF" bash -exu
uv venv --seed
if [[ -n "${BUILD_CUSTOM_VLLM:-}" ]]; then
    bash tools/build-custom-vllm.sh
    source 3rdparty/vllm/nemo-rl.env
fi
# uv sync has a more reliable resolver than simple uv pip install which can fail

# Sync each training + inference backend one at a time (since they may conflict)
# to warm the uv cache, then at the end just sync the default dependencies.
# Do everything in one layer to prevent large layers.

# The venv is symlinked to avoid bloating the layer size
uv sync --link-mode symlink --locked --no-install-project
uv sync --link-mode symlink --locked --extra vllm --no-install-project
uv sync --link-mode symlink --locked --extra mcore --no-install-project
uv sync --link-mode symlink --locked --extra automodel --no-install-project
uv sync --link-mode symlink --locked --all-groups --no-install-project

# Remove the aiohttp in this uv cache dir to fully address CVE GHSA-mqqc-3gqh-h2x8
# The ray install will include the older aiohttp version in its cache
find /root/.cache/uv -type d -path "*ray/_private/runtime_env/agent/thirdparty_files/aiohttp*" -exec rm -rf {} +
EOF

ENV PATH="/opt/nemo_rl_venv/bin:$PATH"
ENV NEMO_RL_VENV_DIR=/opt/ray_venvs

WORKDIR /opt/NeMo-RL

FROM hermetic AS release

ARG NVIDIA_BUILD_ID
ARG NVIDIA_BUILD_REF
ARG RC_DATE=00.00
ARG TARGETARCH
ENV NVIDIA_BUILD_ID=${NVIDIA_BUILD_ID:-<unknown>}
ENV NVIDIA_BUILD_REF=${NVIDIA_BUILD_REF:-<unknown>}
LABEL com.nvidia.build.id="${NVIDIA_BUILD_ID}"
LABEL com.nvidia.build.ref="${NVIDIA_BUILD_REF}"

ENV NEMO_RL_VENV_DIR=/opt/ray_venvs

# Copy in source from build context (defaults to cloned repo, can be overridden)
# Exclude pyproject.toml and uv.lock since those may be altered by build-custom-vllm.sh
COPY --from=nemo-rl --exclude=pyproject.toml --exclude=uv.lock . /opt/NeMo-RL
# Unshallow the repo to get the full history (in the case it was from the scratch layer).
# Potentially not necessary if the repo is passed in as a complete repository (w/ full git history),
# so do a quick check before trying to unshallow.
RUN git rev-parse --is-shallow-repository | grep -q true && git fetch --unshallow || true
RUN UV_LINK_MODE=symlink uv run nemo_rl/utils/prefetch_venvs.py

# Generate container fingerprint for frozen environment support
# Store outside /opt/NeMo-RL to avoid being overwritten by user mounts
RUN python tools/generate_fingerprint.py > /opt/nemo_rl_container_fingerprint

# NOTICES.txt file points to where the OSS source code is archived
RUN echo "This distribution includes open source which is archived at the following URL: https://opensource.nvidia.com/oss/teams/nvidia/nemo-rl/${RC_DATE}:linux-${TARGETARCH}/index.html" > NOTICES.txt && \
    echo "For further inquiries or assistance, contact us at oss-requests@nvidia.com" >> NOTICES.txt

RUN git clone https://github.com/NVIDIA-NeMo/Skills.git /opt/NeMo-Skills && cd /opt/NeMo-Skills && uv pip install .


================================================
FILE: dockerfiles/Dockerfile.nemo-skills
================================================
# using ubuntu instead of debian for easier apptainer installation on arm64
FROM ubuntu:22.04

# Install Python and other dependencies
RUN apt-get update && \
    apt-get install -y \
    python3.10 \
    python3-pip \
    curl \
    wget \
    git \
    git-lfs \
    ffmpeg && \
    ln -s /usr/bin/python3 /usr/bin/python && \
    rm -rf /var/cache/apt/archives /var/lib/apt/lists/*

RUN pip install --upgrade pip setuptools "uv>=0.11.10"

# Update package lists and install apptainer for arm64
# https://apptainer.org/docs/admin/1.1/installation.html
RUN apt update && \
    apt install -y software-properties-common && \
    add-apt-repository -y ppa:apptainer/ppa && \
    apt update && apt -y install apptainer && \
    add-apt-repository -y ppa:apptainer/ppa && \
    apt update && apt install -y apptainer-suid && \
    rm -rf /var/cache/apt/archives /var/lib/apt/lists/*

# Apply security patches for PackageKit, pulled in transitively by software-properties-common.
# Ubuntu 22.04 has published 1.2.5-2ubuntu3.1 with the fix for the local privilege escalation CVE.
RUN apt-get update && \
    apt-get install --only-upgrade -y \
        packagekit \
        packagekit-tools \
        libpackagekit-glib2-18 \
        gir1.2-packagekitglib-1.0 && \
    rm -rf /var/cache/apt/archives /var/lib/apt/lists/*

# for ifeval benchmark
# TODO: can we get just a single dir?
RUN mkdir /opt/benchmarks
RUN git clone https://github.com/google-research/google-research.git /opt/benchmarks/google-research --depth=1

RUN git clone https://github.com/ShishirPatil/gorilla.git /opt/gorilla
RUN cd /opt/gorilla && git checkout 86d0374d0db52623c5092a73f82c22b87b7e9a25
RUN cd /opt/gorilla/berkeley-function-call-leaderboard && pip install --no-cache-dir -e . --extra-index-url https://download.pytorch.org/whl/cpu

RUN apt remove -y python3-blinker

# ifbench
ARG IFBENCH_COMMIT=c6767a19bd82ac0536cab950f2f8f6bcc6fabe7c
ARG IFBENCH_REPO=https://github.com/allenai/IFBench.git
ARG IFBENCH_DIR=/opt/benchmarks/IFBench
RUN git init "$IFBENCH_DIR" && cd "$IFBENCH_DIR" && git remote add origin "$IFBENCH_REPO" && \
    git fetch --depth 1 origin "${IFBENCH_COMMIT}" && git reset --hard FETCH_HEAD
RUN cd ${IFBENCH_DIR} && pip install -r requirements.txt

# removing on-the-fly installation in ifbench to avoid conflicts from parallel jobs
COPY dockerfiles/ifbench.patch /opt/benchmarks/IFBench/ifbench.patch
RUN cd /opt/benchmarks/IFBench && git apply ifbench.patch

RUN pip install langdetect absl-py immutabledict nltk ipython && \
    python -c "import nltk; from spacy.cli import download; nltk.download('punkt'); nltk.download('punkt_tab'); \
    nltk.download('stopwords'); nltk.download('averaged_perceptron_tagger_eng'); download('en_core_web_sm')"

# we aren't copying main nemo_skills folder as it will always be mounted from host
# but we do want to install all requirements in the container directly
RUN mkdir -p /opt/NeMo-Skills/requirements /opt/NeMo-Skills/core
COPY pyproject.toml README.md /opt/NeMo-Skills/
COPY requirements /opt/NeMo-Skills/requirements/
COPY core/requirements.txt /opt/NeMo-Skills/core/requirements.txt
# installing sdp in container only
RUN pip install git+https://github.com/NVIDIA/NeMo-speech-data-processor@29b9b1ec0ceaf3ffa441c1d01297371b3f8e11d2
ARG CACHEBUST=4
# Install via `uv pip` from the project directory so [tool.uv].override-dependencies
# in pyproject.toml (which relaxes leptonai's httpx==0.27.2 pin so litellm 1.83.x
# can be installed) is picked up. Plain pip ignores [tool.uv] and the resolver fails.
RUN cd /opt/NeMo-Skills && uv pip install --system --no-cache-dir \
    -r core/requirements.txt -r requirements/pipeline.txt
# Fix http mismatch between lepton and dggs by manually downloading dggs here
RUN pip install ddgs


================================================
FILE: dockerfiles/Dockerfile.sandbox
================================================
# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# =============================================================================
# Dependency Locking
# =============================================================================
# To regenerate after changing code_execution.txt or stem.txt:
#     uv pip compile requirements/code_execution.txt requirements/stem.txt \
#         --extra-index-url https://download.pytorch.org/whl/cpu \
#         --universal -p 3.10 -o requirements/sandbox.lock
# =============================================================================

# Use the base image with Python 3.10 and Flask
FROM tiangolo/uwsgi-nginx-flask:python3.10

# Install dependencies required for Lean 4, pypy3, and other tools
ARG TARGETARCH
RUN apt-get update && \
    apt-get install -y curl git net-tools bzip2 build-essential libseccomp-dev && \
    ARCH="${TARGETARCH:-$(dpkg --print-architecture)}" && \
    case "$ARCH" in \
        amd64) PYPY_ARCH=linux64 ;; \
        arm64|aarch64) PYPY_ARCH=aarch64 ;; \
        x86_64) PYPY_ARCH=linux64 ;; \
    *) echo "Unsupported TARGETARCH '$ARCH'" >&2; exit 1 ;; \
    esac && \
    curl -L https://downloads.python.org/pypy/pypy3.10-v7.3.17-$PYPY_ARCH.tar.bz2 -o /tmp/pypy.tar.bz2 && \
    tar -xjf /tmp/pypy.tar.bz2 -C /opt/ && \
    ln -s /opt/pypy3.10-v7.3.17-$PYPY_ARCH/bin/pypy3 /usr/bin/pypy3 && \
    /usr/bin/pypy3 -m ensurepip && \
    rm /tmp/pypy.tar.bz2 && \
    rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*

# Install Lean 4 toolchain
RUN curl https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh -sSf | sh -s -- -y && \
    /root/.elan/bin/elan toolchain install leanprover/lean4:v4.12.0 && \
    /root/.elan/bin/elan default leanprover/lean4:v4.12.0 && \
    /root/.elan/bin/elan self update

# Set environment variables to include Lean and elan/lake in the PATH
ENV PATH="/root/.elan/bin:$PATH"

# Create Lean project directory and initialize a new Lean project with Mathlib4
RUN mkdir -p /lean4 && cd /lean4 && \
    /root/.elan/bin/lake new my_project && \
    cd my_project && \
    echo 'leanprover/lean4:v4.12.0' > lean-toolchain && \
    echo 'require mathlib from git "https://github.com/leanprover-community/mathlib4" @ "v4.12.0"' >> lakefile.lean

# Download and cache Mathlib4 to avoid recompiling, then build the project
ARG GITHUB_CI=0
RUN cd /lean4/my_project && \
    /root/.elan/bin/lake exe cache get && \
    /root/.elan/bin/lake build

# Set environment variables to include Lean project path
ENV LEAN_PATH="/lean4/my_project"
ENV PATH="/lean4/my_project:$PATH"

# Speed/size/env hygiene
ENV PIP_DISABLE_PIP_VERSION_CHECK=1 \
    UV_SYSTEM_PYTHON=1 \
    PATH="/root/.local/bin:${PATH}"

# Install uv (pinned for reproducibility)
RUN curl -LsSf https://astral.sh/uv/0.9.7/install.sh | sh

# Set up application code directory
WORKDIR /app

RUN --mount=type=bind,source=requirements,target=/requirements \
    uv pip install --system -r /requirements/sandbox.lock --extra-index-url https://download.pytorch.org/whl/cpu

# For scicode eval - create data directory and download test data
# Set GITHUB_CI=1 build arg to skip download (useful for CI when download fails)
# If skipped, scicode evaluations will fail unless the file is manually mounted
RUN mkdir -p /data && pip install gdown && \
    if [ "$GITHUB_CI" != "1" ]; then \
        python -c "import gdown; url = 'https://drive.google.com/uc?id=17G_k65N_6yFFZ2O-jQH00Lh6iaw3z-AW'; gdown.download(url, '/data/test_data.h5', quiet=False)"; \
    fi

COPY nemo_skills/code_execution/local_sandbox/local_sandbox_server.py /app/main.py

# Copy nginx configuration templates
COPY dockerfiles/sandbox/nginx.conf.template /etc/nginx/nginx.conf.template
COPY dockerfiles/sandbox/nginx-worker-proxy.conf.template /etc/nginx/nginx-worker-proxy.conf.template

# =============================================================================
# Network Blocking for Sandboxed Code Execution (Defense in Depth)
# =============================================================================
# When NEMO_SKILLS_SANDBOX_BLOCK_NETWORK=1 is set, we use TWO layers of protection:
#
# LAYER 1: libblock_network.so (this library, via /etc/ld.so.preload)
#   - Intercepts socket() syscalls at the C library level
#   - Blocks any NEW PROCESS that exec()'s (curl, wget, python3 subprocess, etc.)
#   - System-enforced by the dynamic linker - user code CANNOT bypass it
#   - Limitation: Doesn't affect forked processes (no exec = no linker = no preload)
#
# LAYER 2: Python socket patch (in local_sandbox_server.py shell_worker)
#   - Patches socket.socket and _socket.socket at Python level
#   - Blocks code running in the IPython shell_worker (which is forked, not exec'd)
#   - Limitation: Only works for Python code in the same process
#
# WHY BOTH ARE NEEDED:
#   | Attack Vector                              | Python Patch | ld.so.preload |
#   |--------------------------------------------|--------------|---------------|
#   | socket.socket() in IPython                 |  Blocked     |    (forked)   |
#   | import _socket; _socket.socket() in IPython|  Blocked     |    (forked)   |
#   | subprocess.run(["curl", url])              |  Can't help  |    Blocked    |
#   | subprocess.run(["python3",...], env={})    |  Can't help  |    Blocked    |
#   | requests.get() in IPython                  |  Blocked     |    (forked)   |
#
# Neither layer alone is sufficient - together they cover different adversarial scenarios.
# =============================================================================
COPY dockerfiles/sandbox/block_network.c /tmp/block_network.c
RUN gcc -shared -fPIC -o /usr/lib/libblock_network.so /tmp/block_network.c -ldl && \
    rm /tmp/block_network.c && \
    echo "Built libblock_network.so for network isolation"

# Copy startup script late in Dockerfile for better cache utilization
# (start-with-nginx.sh changes more frequently than dependencies above)
COPY dockerfiles/sandbox/start-with-nginx.sh /start-with-nginx.sh
RUN chmod +x /start-with-nginx.sh

# Environment variables for multi-worker setup
ENV NGINX_PORT=6000

# Set default port for single worker mode
ENV LISTEN_PORT=6000

# Default uwsgi configuration
ARG UWSGI_CHEAPER
ENV UWSGI_CHEAPER=$UWSGI_CHEAPER

ARG NUM_WORKERS
ENV NUM_WORKERS=$NUM_WORKERS

ARG UWSGI_PROCESSES
ENV UWSGI_PROCESSES=$UWSGI_PROCESSES

ENV LISTEN_PORT=6000
RUN echo "uwsgi_read_timeout 14400s;" > /etc/nginx/conf.d/custom_timeout.conf

CMD ["/start-with-nginx.sh"]


================================================
FILE: dockerfiles/Dockerfile.verl
================================================
FROM whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6-mcore0.12.0-te2.3
# Set working directory
WORKDIR /opt

# Install verl
ENV VERL_COMMIT=2ed63bbf39c22724e4940d97e4b09e4f3e5f6d68
RUN git clone https://github.com/volcengine/verl.git && \
    cd verl && \
    git checkout ${VERL_COMMIT} && \
    pip3 install -e .

RUN pip install fire
RUN pip3 install -U pynvml


WORKDIR /workspace

# Fix CV2
RUN pip install opencv-fixer==0.2.5 && \
    python -c "from opencv_fixer import AutoFix; AutoFix()"

# Run additional dependencies
RUN pip install math-verify[antlr4_9_3] ray[default] pylatexenc wandb

CMD ["/usr/bin/bash"]


================================================
FILE: dockerfiles/Dockerfile.vllm
================================================
FROM vllm/vllm-openai:v0.18.1
RUN pip install ray
RUN pip install "vllm[audio]"
# Required by vLLM for Qwen-VL model family (runtime dependency, not directly imported)
RUN pip install qwen-vl-utils


================================================
FILE: dockerfiles/README.md
================================================
# Building Docker Images

Some dockerfiles are directly included in this folder and for some others the instructions to build them are below.

The dockerfiles can be built using the standard docker build command. e.g.,
```shell
docker build -t nemo-skills-image:0.7.1 -f dockerfiles/Dockerfile.nemo-skills .
```

In addition, we provide a utility script which provides sane build defaults
```shell
./build.sh Dockerfile.nemo-skills
```

Key configuration environment variables for `build.sh`:
- `DOCKER_NAME`: A fully qualified name of the docker image. The default is inferred from the git repository attributes.
- `DOCKER_TAG`: Docker tag to use. Defaults to `yyyy.mm.dd-<commit_hash>`
- `DOCKER_PUSH`: When set, pushes image after building.
- `DOCKER_PLATFORM`: Directly passed to `--platform` for [multi-platform builds](https://docs.docker.com/build/building/multi-platform/).

## Building for arm64/aarch64

To build for arm64 architecture (e.g. to use with GB200 machines) first follow the installation process at
https://docs.docker.com/build/building/multi-platform/#install-qemu-manually

Then run the same docker command but adding `--platform linux/arm64` or
set `DOCKER_PLATFORM=linux/arm64` for the build script described above.

## Building trtllm image

We directly use official `nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc8` image.

## Building sglang image

We directly use official `lmsysorg/sglang:v0.5.10.post1` image.


================================================
FILE: dockerfiles/build.sh
================================================
#!/usr/bin/env bash

# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

##
# Build and Push images.
#
# Usage:
#   ./dockerfiles/build.sh [/path/to/Dockerfile]
#
# Configuration Environment variables:
#   SKIP_GIT_CHECK: skip check for uncommitted changes in git repo when set.
#   DOCKER_NAME: fully qualified name of the docker image (default inferred from repository)
#   DOCKER_TAG: docker tag (default set as `YYYY.MM.DD-git-hash`)
#   DOCKER_PUSH: pushes docker image when variable is set.
#   DOCKER_CACHE: uses registry cache when variable is set.
#   DOCKER_PLATFORM: directly passed to --platform.
#

if [[ -z "${1}" ]]; then
    echo "[ERROR] Missing Dockerfile argument."
    echo "[INFO] Usage: ./dockerfiles/build.sh [/path/to/Dockerfile]"
    exit 1
fi

__dockerfile="${1}"
if [[ ! -f "${__dockerfile}" ]]; then
    echo "[ERROR] Dockerfile not found at '${__dockerfile}'"
    exit 1
fi

##
#  Conditions to set the context for docker build.
#   1. If the Dockerfile is part of a git repo, the set to repo root.
#   2. If not part of git repo, then set to the directory of the Dockerfile.
#
__context_dir="$(dirname "$(realpath "${__dockerfile}")")"

pushd "${__context_dir}" > /dev/null

__git_repo_root=$(git rev-parse --show-toplevel 2>/dev/null)
__is_git_repo=$([[ $? -eq 0 ]] && echo 1 || echo 0)

## If not a git repo, we go back to working directory.
if [[ $__is_git_repo -eq 0 ]]; then
    popd > /dev/null
fi

if [[ -z "${DOCKER_NAME}" ]]; then
    __git_remote=$(git remote get-url origin 2>/dev/null)
    if [[ $? -ne 0 ]]; then
        echo "[ERROR] Dockerfile is not part of a git repo. set DOCKER_NAME explicitly."
        exit 1
    fi

    __repo_user=$(basename "$(dirname "$(echo "${__git_remote}" | sed -E -e "s|[@:]|/|g")")")
    __repo_name=$(basename -s .git "$(echo "${__git_remote}")")
    __project_name=$(basename "${__dockerfile}")
    if [[ "${__project_name}" == *.* ]]; then
        __project_name=/$(echo "${__project_name}" | cut -d. -f2)
    else
        unset __project_name
    fi
    DOCKER_NAME="${__repo_user}/${__repo_name}${__project_name}"
fi
DOCKER_NAME=$(echo "${DOCKER_NAME}" | tr "[:upper:]" "[:lower:]")

if [[ -z "${DOCKER_TAG}" ]]; then
    __git_sha=$(git rev-parse --short HEAD 2>/dev/null)
    if [[ $? -ne 0 ]]; then
        echo "[ERROR] Dockerfile is not part of a git repo. set DOCKER_TAG explicitly."
        exit 1
    fi
    DOCKER_TAG="$(date +"%Y.%m.%d")-${__git_sha}"

    ## In case we reach here with a dirty repository.
    if [[ ! -z "$(git status -s)" ]]; then
        echo "[INFO] changes detected in git repository ${__git_repo_root}"
        if [[ -z "${SKIP_GIT_CHECK}" ]]; then
            echo "[ERROR] set SKIP_GIT_CHECK to ignore."
            exit 1
        else
            echo "[WARN] added -dirty tag for uncommitted changes."
            DOCKER_TAG="${DOCKER_TAG}-dirty"
        fi
    fi
fi

if [[ ${__is_git_repo} -eq 1 ]]; then
    __context_dir="${__git_repo_root}"
    popd > /dev/null
fi

echo "Building ${DOCKER_NAME}:${DOCKER_TAG} from context ${__context_dir}"

if [[ ! -z ${DOCKER_PUSH} ]]; then
    __docker_build_args="${__docker_build_args} --push"
fi
if [[ ! -z ${DOCKER_CACHE} ]]; then
    __docker_build_args="${__docker_build_args} --cache-to type=registry,ref=${DOCKER_NAME}/cache,mode=max --cache-from type=registry,ref=${DOCKER_NAME}/cache"
fi
if [[ ! -z ${DOCKER_PLATFORM} ]]; then
    __docker_build_args="${__docker_build_args} --platform ${DOCKER_PLATFORM}"
fi

docker build ${__docker_build_args} \
    -f "${__dockerfile}" \
    -t "${DOCKER_NAME}:${DOCKER_TAG}" \
    "${__context_dir}"


================================================
FILE: dockerfiles/ifbench.patch
================================================
diff --git a/evaluation_lib.py b/evaluation_lib.py
index a0db9e7..912a26e 100644
--- a/evaluation_lib.py
+++ b/evaluation_lib.py
@@ -18,6 +18,7 @@
 import collections
 import dataclasses
 import json
+import logging
 from typing import Dict, Optional, Union
 
 import instructions_registry
@@ -90,10 +91,19 @@ def test_instruction_following_strict(
     if args and "prompt" in args:
       instruction.build_description(prompt=inp.prompt)
 
-    if response.strip() and instruction.check_following(response):
-      is_following_list.append(True)
-    else:
-      is_following_list.append(False)
+
+    response_has_content = bool(response.strip())
+    follows_instruction = False
+    if response_has_content:
+      try:
+        follows_instruction = instruction.check_following(response)
+      except Exception:  # pylint: disable=broad-except
+        logging.exception(
+            "check_following failed for instruction %s (prompt key %s)",
+            instruction_id,
+            inp.key,
+        )
+    is_following_list.append(response_has_content and follows_instruction)
 
   return OutputExample(
       instruction_id_list=inp.instruction_id_list,
@@ -142,9 +152,18 @@ def test_instruction_following_loose(
 
     is_following = False
     for r in all_responses:
-      if r.strip() and instruction.check_following(r):
-        is_following = True
-        break
+      if not r.strip():
+        continue
+      try:
+        if instruction.check_following(r):
+          is_following = True
+          break
+      except Exception:  # pylint: disable=broad-except
+        logging.exception(
+            "check_following failed for instruction %s (prompt key %s)",
+            instruction_id,
+            inp.key,
+        )
 
     is_following_list.append(is_following)
 
@@ -217,3 +236,4 @@ def print_report(outputs):
   for instruction_id in sorted(tier1_total.keys()):
     accuracy = tier1_correct[instruction_id] / tier1_total[instruction_id]
     print(f"{instruction_id} {accuracy}")
+
diff --git a/instructions.py b/instructions.py
index f32ff48..e587c9e 100644
--- a/instructions.py
+++ b/instructions.py
@@ -30,7 +30,9 @@ import io
 
 import instructions_util
 
-download('en_core_web_sm')
+# assumed to be predownloaded
+print("skipping download of en_core_web_sm")
+# download('en_core_web_sm')
 
 logger = logging.getLogger(__name__)
 


================================================
FILE: dockerfiles/sandbox/block_network.c
================================================
/*
 * Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 *
 * Network blocking library for sandbox code execution
 *
 * This library intercepts socket() calls and blocks IPv4/IPv6 sockets
 * while allowing Unix domain sockets (needed for local IPC).
 *
 * Enabled by setting NEMO_SKILLS_SANDBOX_BLOCK_NETWORK=1 at container runtime.
 * The startup script adds this library to /etc/ld.so.preload AFTER the API
 * server starts, ensuring the API can still accept connections while all
 * user code execution has network access blocked.
 *
 * Using /etc/ld.so.preload (vs LD_PRELOAD env var) ensures this cannot be
 * bypassed by user code clearing environment variables or spawning
 * subprocesses with env={}.
 *
 * Build: gcc -shared -fPIC -o libblock_network.so block_network.c -ldl
 */

#define _GNU_SOURCE
#include <stddef.h>
#include <dlfcn.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/socket.h>

/* Override socket() to block internet sockets */
int socket(int domain, int type, int protocol) {
    /* Get the real socket function */
    static int (*real_socket)(int, int, int) = NULL;
    if (!real_socket) {
        real_socket = dlsym(RTLD_NEXT, "socket");
    }

    /* Allow Unix domain sockets (needed for local IPC, uwsgi, etc.) */
    if (domain == AF_UNIX || domain == AF_LOCAL) {
        return real_socket(domain, type, protocol);
    }

    /* Block IPv4 and IPv6 internet sockets */
    if (domain == AF_INET || domain == AF_INET6) {
        errno = ENETUNREACH;  /* Network is unreachable */
        return -1;
    }

    /* Allow other socket types (netlink, packet, etc.) */
    return real_socket(domain, type, protocol);
}


================================================
FILE: dockerfiles/sandbox/nginx-worker-proxy.conf.template
================================================
events {
    worker_connections 1024;
}

http {
    # Proxy all requests to the master node's nginx load balancer.
    # Worker nodes don't route to individual workers — the master's
    # consistent-hash upstream handles session affinity.
    upstream master_lb {
        server ${MASTER_NODE}:${NGINX_PORT};
    }

    server {
        listen ${NGINX_PORT};
        server_name localhost;

        client_max_body_size 10M;
        client_body_buffer_size 128k;

        location / {
            proxy_pass http://master_lb;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            proxy_set_header X-Session-ID $http_x_session_id;
            proxy_connect_timeout 1200s;
            proxy_send_timeout 1200s;
            proxy_read_timeout 1200s;
            proxy_buffering off;
        }

        location /nginx-status {
            stub_status on;
            access_log off;
            allow 127.0.0.1;
            allow ::1;
            deny all;
        }
    }

    access_log /var/log/nginx/access.log;
    error_log /var/log/nginx/error.log warn;
}


================================================
FILE: dockerfiles/sandbox/nginx.conf.template
================================================
events {
    worker_connections 1024;
}

http {
    # Add custom log format for load monitoring
    log_format worker_load '$remote_addr - $remote_user [$time_local] "$request" '
                           '$status $body_bytes_sent "$http_referer" '
                           '"$http_user_agent" "$http_x_forwarded_for" '
                           'upstream: $upstream_addr session: $http_x_session_id';

    # Extract session_id from X-Session-ID header
    map $http_x_session_id $hash_key {
        ""        $request_id;
        default   $http_x_session_id;
    }

    # Define upstream servers (dynamically populated by start-with-nginx.sh)
    # Supports both single-node (localhost TCP) and multi-node (cross-node TCP) modes:
    #   Single-node: server 127.0.0.1:50001 ...;
    #   Multi-node:  server node1:50001 ...;  server node2:50001 ...;
    upstream sandbox_workers {
        # Use consistent hashing on real session_id or random request_id
        # This ensures requests with the same X-Session-ID always go to the same worker
        hash $hash_key consistent;

        # Worker servers will be inserted here (TCP endpoints)
${UPSTREAM_SERVERS}
    }

    server {
        listen ${NGINX_PORT};
        server_name localhost;

        # Increase body size for large code payloads
        client_max_body_size 10M;
        client_body_buffer_size 128k;

        # All endpoints - simple proxy with session affinity
        location / {
            # Route based on session affinity
            proxy_pass http://sandbox_workers;

            # Standard proxy headers
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            # Forward session header explicitly to upstream
            proxy_set_header X-Session-ID $http_x_session_id;

            # Timeouts for long-running code execution
            proxy_connect_timeout 1200s;
            proxy_send_timeout 1200s;
            proxy_read_timeout 1200s;

            # Don't buffer response for streaming
            proxy_buffering off;

            # Do not retry requests on a different upstream to preserve session affinity
            proxy_next_upstream off;
        }



        # Nginx status for monitoring
        location /nginx-status {
            stub_status on;
            access_log off;
            allow 127.0.0.1;
            allow ::1;
            deny all;
        }
    }

    # Logging
    access_log /var/log/nginx/access.log worker_load;
    error_log /var/log/nginx/error.log warn;
}


================================================
FILE: dockerfiles/sandbox/start-with-nginx.sh
================================================
#!/bin/bash
# Start nginx load balancer with multiple uwsgi workers
# Uses TCP sockets for workers, supporting both single-node and multi-node deployments.
#
# Multi-node is auto-detected from SLURM environment variables.
# Falls back to single-node (localhost) when SLURM is not available.
#
# =============================================================================
# Environment Variables
# =============================================================================
#
# Required (set by Dockerfile defaults if not provided):
#   NGINX_PORT              Port nginx listens on (default: 6000, set in Dockerfile)
#
# Optional — Worker Configuration:
#   NUM_WORKERS             Number of uWSGI workers per node (default: $(nproc --all))
#   SANDBOX_WORKER_BASE_PORT
#                           Starting TCP port for workers (default: 50001). Workers
#                           bind to sequential ports: base, base+1, ..., base+N-1.
#                           If a port is already in use, the startup algorithm retries
#                           with offset increments.
#   STATEFUL_SANDBOX        Set to 1 (default) for stateful mode: each uWSGI worker
#                           runs a single process to preserve Jupyter kernel sessions
#                           across requests. Set to 0 for stateless mode where
#                           UWSGI_PROCESSES and UWSGI_CHEAPER take effect.
#   UWSGI_PROCESSES         uWSGI processes per worker (default: 1). Only used when
#                           STATEFUL_SANDBOX=0.
#   UWSGI_CHEAPER           uWSGI cheaper mode: minimum number of active processes
#                           (default: 1). Only used when STATEFUL_SANDBOX=0.
#
# Optional — Multi-Node (SLURM):
#   SLURM_JOB_NODELIST      SLURM-provided compressed nodelist (e.g., "node[001-016]").
#                           Presence of this variable triggers multi-node mode.
#                           Automatically set by SLURM — do not set manually.
#   SLURM_JOB_ID            SLURM job ID, used to namespace the port coordination
#                           directory. Automatically set by SLURM.
#   SANDBOX_PORTS_DIR       Explicit path for cross-node port coordination files.
#                           Must be on a shared filesystem (e.g., Lustre). If unset,
#                           defaults to /nemo_run/sandbox_ports_<SLURM_JOB_ID> in
#                           SLURM jobs, or /tmp/sandbox_ports_<PID> for single-node.
#   SANDBOX_FORCE_SINGLE_NODE
#                           Set to 1 to force single-node mode even when SLURM
#                           variables are present. Useful for debugging or when
#                           multi-node sandbox is not desired.
#
# Optional — Security:
#   NEMO_SKILLS_SANDBOX_BLOCK_NETWORK
#                           Set to 1 to enable network blocking for sandboxed code.
#                           Uses /etc/ld.so.preload to intercept socket() calls in
#                           all new processes. Applied AFTER nginx/uWSGI start so
#                           the API remains functional. Note: in any mode, if a
#                           worker crashes the monitoring loop will attempt to restart
#                           it, but the new process will be unable to bind its socket.
#                           The remaining workers continue serving. (default: 0)
#
# =============================================================================

set -e

export NUM_WORKERS=${NUM_WORKERS:-$(nproc --all)}

# =============================================================================
# Utility functions
# =============================================================================

# Expand SLURM compressed nodelist to space-separated hostnames.
# Parses formats like:
#   - "node001" -> "node001"
#   - "node[001-003]" -> "node001 node002 node003"
#   - "node[001,003,005]" -> "node001 node003 node005"
#   - "gpu[01-02],cpu[01-03]" -> "gpu01 gpu02 cpu01 cpu02 cpu03"
expand_nodelist() {
    local nodelist="$1"
    [ -z "$nodelist" ] && return

    python3 -c "
import re, sys

def expand_nodelist(nodelist):
    if not nodelist:
        return []
    nodes = []
    remaining = nodelist
    while remaining:
        match = re.match(r'([^\[\],]+)(?:\[([^\]]+)\])?(?:,|$)', remaining)
        if not match:
            break
        prefix = match.group(1)
        ranges = match.group(2)
        remaining = remaining[match.end():]
        if ranges is None:
            if prefix.strip():
                nodes.append(prefix.strip())
        else:
            for range_part in ranges.split(','):
                range_part = range_part.strip()
                if '-' in range_part:
                    parts = range_part.split('-', 1)
                    start_str, end_str = parts[0], parts[1]
                    width = len(start_str)
                    try:
                        for i in range(int(start_str), int(end_str) + 1):
                            nodes.append(f'{prefix}{i:0{width}d}')
                    except ValueError:
                        nodes.append(f'{prefix}{range_part}')
                else:
                    nodes.append(f'{prefix}{range_part}')
    return nodes

print(' '.join(expand_nodelist(sys.argv[1])))
" "$nodelist" 2>/dev/null
}

# Start a single uWSGI worker in the background.
# Args: $1=worker_number $2=port
# Prints: "pid:port"
start_worker_fast() {
    local i=$1
    local WORKER_PORT=$2

    cat > /tmp/worker${i}_uwsgi.ini << EOF
[uwsgi]
module = main
callable = app
processes = ${UWSGI_PROCESSES}
http-socket = 0.0.0.0:${WORKER_PORT}
vacuum = true
master = true
die-on-term = true
memory-report = true
listen = 100
http-timeout = 300
socket-timeout = 300
disable-logging = false
log-date = true
log-prefix = [worker${i}]
logto = /var/log/worker${i}.log
EOF

    if [ -n "$UWSGI_CHEAPER" ]; then
        echo "cheaper = ${UWSGI_CHEAPER}" >> /tmp/worker${i}_uwsgi.ini
    fi

    > /var/log/worker${i}.log
    ( cd /app && env WORKER_NUM=$i uwsgi --ini /tmp/worker${i}_uwsgi.ini ) &
    echo "$!:$WORKER_PORT"
}

# Restart wrapper — reuses the worker's existing port assignment.
start_worker() {
    local i=$1
    local idx=$((i - 1))
    local port=${ACTUAL_WORKER_PORTS[$idx]:-$((SANDBOX_WORKER_BASE_PORT + i - 1))}
    start_worker_fast $i $port
}

worker_had_port_conflict() {
    grep -q "Address already in use" /var/log/worker${1}.log 2>/dev/null
}

worker_is_alive() {
    kill -0 "$1" 2>/dev/null
}

# Generate /etc/nginx/nginx.conf from template + upstream file.
# Uses UPSTREAM_FILE and NGINX_PORT globals.
generate_nginx_config() {
    sed "s|\${NGINX_PORT}|${NGINX_PORT}|g" /etc/nginx/nginx.conf.template > /tmp/nginx_temp.conf
    awk -v upstream_file="$UPSTREAM_FILE" '
    /\${UPSTREAM_SERVERS}/ {
        while ((getline line < upstream_file) > 0) { print line }
        close(upstream_file)
        next
    }
    { print }
    ' /tmp/nginx_temp.conf > /etc/nginx/nginx.conf

    echo "Testing nginx configuration..."
    if ! nginx -t; then
        echo "ERROR: nginx configuration test failed"
        cat /etc/nginx/nginx.conf
        exit 1
    fi
}

# Read a node's port file and emit "node:port" lines to stdout.
# Args: $1=node_hostname $2=port_file_path
read_port_file() {
    local node=$1
    local port_file=$2
    while IFS=: read -r worker_num worker_port; do
        [ "$worker_num" = "PORT_REPORT_COMPLETE" ] && continue
        [ -z "$worker_num" ] && continue
        echo "${node}:${worker_port}"
    done < "$port_file"
}

# Wait for all nodes to write their port files to shared storage.
# Uses PORTS_REPORT_DIR, ALL_NODES, NODE_COUNT globals.
wait_for_port_reports() {
    echo "Waiting for all nodes to report their port assignments..."
    local timeout=120
    local start=$(date +%s)

    while true; do
        local elapsed=$(($(date +%s) - start))
        if [ $elapsed -gt $timeout ]; then
            echo "ERROR: Timeout waiting for all nodes to report ports"
            echo "Expected port files from: $ALL_NODES"
            echo "Found in $PORTS_REPORT_DIR:"
            ls -la "$PORTS_REPORT_DIR" || true
            exit 1
        fi

        local reported=0
        for node in $ALL_NODES; do
            local node_short="${node%%.*}"
            local port_file="$PORTS_REPORT_DIR/${node_short}_ports.txt"
            if [ -f "$port_file" ] && grep -q "PORT_REPORT_COMPLETE" "$port_file" 2>/dev/null; then
                reported=$((reported + 1))
            fi
        done

        if [ $reported -ge $NODE_COUNT ]; then
            echo "All $NODE_COUNT nodes have reported their ports"
            return
        fi

        if [ $((elapsed % 10)) -eq 0 ]; then
            echo "  Waiting for port reports: $reported/$NODE_COUNT nodes (${elapsed}s elapsed)"
        fi
        sleep 1
    done
}

# Verify remote workers are reachable (parallel health checks via xargs).
# Args: $1=endpoints_file (one "host:port" per line)
verify_remote_workers() {
    local endpoints_file=$1
    local total_expected=$(wc -l < "$endpoints_file")
    echo "Verifying $total_expected remote workers are healthy (parallel checks)..."

    local timeout=60
    local start=$(date +%s)
    export REMOTE_HEALTH_DIR=$(mktemp -d)

    while true; do
        local elapsed=$(($(date +%s) - start))
        if [ $elapsed -gt $timeout ]; then
            echo "WARNING: Timeout waiting for all remote workers, starting nginx anyway"
            break
        fi

        cat "$endpoints_file" | xargs -P 64 -I {} sh -c '
            endpoint="{}"
            status_file="$REMOTE_HEALTH_DIR/$(echo "$endpoint" | tr ":" "_")"
            [ -f "$status_file" ] && exit 0
            if curl -s -f --connect-timeout 2 --max-time 5 "http://${endpoint}/health" > /dev/null 2>&1; then
                touch "$status_file"
            fi
        '

        local ready=$(find "$REMOTE_HEALTH_DIR" -type f 2>/dev/null | wc -l)
        if [ $ready -ge $total_expected ]; then
            echo "All $ready/$total_expected remote workers healthy!"
            break
        fi

        echo "  Remote health check: $ready/$total_expected workers ready (${elapsed}s elapsed)"
        sleep 1
    done

    rm -rf "$REMOTE_HEALTH_DIR"
}

# =============================================================================
# Node discovery
# =============================================================================
_H=$(hostname)

# Log configured values (only show SLURM vars if they're actually set)
echo "[$_H] NGINX_PORT=$NGINX_PORT NUM_WORKERS=$NUM_WORKERS"
[ -n "$SLURM_JOB_NODELIST" ] && echo "[$_H] SLURM_JOB_NODELIST=$SLURM_JOB_NODELIST SLURM_NNODES=${SLURM_NNODES:-?}"
[ -n "$SANDBOX_FORCE_SINGLE_NODE" ] && echo "[$_H] SANDBOX_FORCE_SINGLE_NODE=$SANDBOX_FORCE_SINGLE_NODE"

if [ "${SANDBOX_FORCE_SINGLE_NODE:-0}" = "1" ]; then
    echo "[$_H] SANDBOX_FORCE_SINGLE_NODE=1, forcing single-node mode"
    ALL_NODES="127.0.0.1"
elif [ -n "$SLURM_JOB_NODELIST" ]; then
    echo "[$_H] Expanding SLURM_JOB_NODELIST: $SLURM_JOB_NODELIST"
    ALL_NODES=$(expand_nodelist "$SLURM_JOB_NODELIST")
    if [ -z "$ALL_NODES" ]; then
        echo "[$_H] WARNING: Failed to expand SLURM_JOB_NODELIST='$SLURM_JOB_NODELIST'"
        echo "[$_H] Falling back to single-node mode. If multi-node is intended, check that"
        echo "[$_H] SLURM_JOB_NODELIST is correctly set by your SLURM environment."
        ALL_NODES="127.0.0.1"
    fi
else
    echo "[$_H] No SLURM_JOB_NODELIST detected — running in single-node mode"
    ALL_NODES="127.0.0.1"
fi

MASTER_NODE=$(echo "$ALL_NODES" | awk '{print $1}')
NODE_COUNT=$(echo "$ALL_NODES" | wc -w)
CURRENT_NODE_SHORT="${_H%%.*}"
MASTER_NODE_SHORT="${MASTER_NODE%%.*}"

if [ "$ALL_NODES" = "127.0.0.1" ] || [ "$CURRENT_NODE_SHORT" = "$MASTER_NODE_SHORT" ]; then
    IS_MASTER=1
    echo "[$_H] Role: MASTER | Nodes: $NODE_COUNT | Master: $MASTER_NODE"
else
    IS_MASTER=0
    echo "[$_H] Role: WORKER | Master: $MASTER_NODE"
fi

# =============================================================================
# Port coordination setup
# =============================================================================
SANDBOX_WORKER_BASE_PORT=${SANDBOX_WORKER_BASE_PORT:-50001}

if [ -n "$SANDBOX_PORTS_DIR" ]; then
    PORTS_REPORT_DIR="$SANDBOX_PORTS_DIR"
elif [ -n "$SLURM_JOB_ID" ]; then
    if [ -d "/nemo_run" ]; then
        PORTS_REPORT_DIR="/nemo_run/sandbox_ports_${SLURM_JOB_ID}"
    elif [ -d "/workspace" ]; then
        PORTS_REPORT_DIR="/workspace/sandbox_ports_${SLURM_JOB_ID}"
    else
        echo "ERROR: Neither /nemo_run nor /workspace are mounted — cannot share ports across nodes"
        exit 1
    fi
else
    PORTS_REPORT_DIR="/tmp/sandbox_ports_$$"
fi
mkdir -p "$PORTS_REPORT_DIR"
rm -f "$PORTS_REPORT_DIR/${CURRENT_NODE_SHORT}_ports.txt" 2>/dev/null || true
echo "[$_H] Port report dir: $PORTS_REPORT_DIR"

declare -a ACTUAL_WORKER_PORTS
UPSTREAM_FILE="/tmp/upstream_servers.conf"

echo "[$_H] Workers/node: $NUM_WORKERS | Base port: $SANDBOX_WORKER_BASE_PORT | Nginx: $NGINX_PORT"

# =============================================================================
# uWSGI configuration
# =============================================================================
: "${STATEFUL_SANDBOX:=1}"
if [ "$STATEFUL_SANDBOX" -eq 1 ]; then
    UWSGI_PROCESSES=1
    UWSGI_CHEAPER=1
else
    : "${UWSGI_PROCESSES:=1}"
    : "${UWSGI_CHEAPER:=1}"
fi

export UWSGI_PROCESSES UWSGI_CHEAPER

echo "UWSGI settings: PROCESSES=$UWSGI_PROCESSES, CHEAPER=$UWSGI_CHEAPER"

# Validate and fix uwsgi configuration
if [ -z "$UWSGI_PROCESSES" ]; then
    UWSGI_PROCESSES=2
fi

if [ -z "$UWSGI_CHEAPER" ]; then
    UWSGI_CHEAPER=1
elif [ "$UWSGI_CHEAPER" -le 0 ]; then
    echo "WARNING: UWSGI_CHEAPER ($UWSGI_CHEAPER) must be at least 1"
    UWSGI_CHEAPER=1
    echo "Setting UWSGI_CHEAPER to $UWSGI_CHEAPER"
elif [ "$UWSGI_CHEAPER" -ge "$UWSGI_PROCESSES" ]; then
    echo "WARNING: UWSGI_CHEAPER ($UWSGI_CHEAPER) must be lower than UWSGI_PROCESSES ($UWSGI_PROCESSES)"
    if [ "$UWSGI_PROCESSES" -eq 1 ]; then
        # For single process, disable cheaper mode entirely
        echo "Disabling cheaper mode for single process setup"
        UWSGI_CHEAPER=""
    else
        UWSGI_CHEAPER=$((UWSGI_PROCESSES - 1))
        echo "Setting UWSGI_CHEAPER to $UWSGI_CHEAPER"
    fi
fi

export UWSGI_PROCESSES
if [ -n "$UWSGI_CHEAPER" ]; then
    export UWSGI_CHEAPER
    echo "UWSGI config - Processes: $UWSGI_PROCESSES, Cheaper: $UWSGI_CHEAPER"
else
    echo "UWSGI config - Processes: $UWSGI_PROCESSES, Cheaper: disabled"
fi

# =============================================================================
# Log setup
# =============================================================================
mkdir -p /var/log/nginx
rm -f /var/log/nginx/access.log /var/log/nginx/error.log
touch /var/log/nginx/access.log /var/log/nginx/error.log
chmod 644 /var/log/nginx/*.log
for i in $(seq 1 $NUM_WORKERS); do
    touch /var/log/worker${i}.log
done
chmod 644 /var/log/worker*.log || true

tail -f /var/log/nginx/access.log &> /dev/stdout &
tail -f /var/log/nginx/error.log &> /dev/stderr &
tail -f /var/log/worker*.log &> /dev/stderr &

# =============================================================================
# Worker startup
# =============================================================================
WORKER_PIDS=()

cleanup() {
    echo "Shutting down workers and nginx..."
    for pid in "${WORKER_PIDS[@]}"; do
        if kill -0 "$pid" 2>/dev/null; then
            kill -TERM "$pid" 2>/dev/null || true
        fi
    done
    pkill -f nginx || true
    [ -n "$HEALTH_CHECK_DIR" ] && rm -rf "$HEALTH_CHECK_DIR" 2>/dev/null || true
    [ -n "$REMOTE_HEALTH_DIR" ] && rm -rf "$REMOTE_HEALTH_DIR" 2>/dev/null || true
    exit 0
}

trap cleanup SIGTERM SIGINT

MAX_STARTUP_RETRIES=5
PORT_INCREMENT=200

for i in $(seq 1 $NUM_WORKERS); do
    WORKER_PIDS+=("")
    ACTUAL_WORKER_PORTS+=("")
done

# Phase 1: Spawn all workers simultaneously
echo "[$_H] Starting $NUM_WORKERS workers (ports $SANDBOX_WORKER_BASE_PORT-$((SANDBOX_WORKER_BASE_PORT + NUM_WORKERS - 1)))..."
START_SPAWN=$(date +%s)

for i in $(seq 1 $NUM_WORKERS); do
    port=$((SANDBOX_WORKER_BASE_PORT + i - 1))
    result=$(start_worker_fast $i $port)
    WORKER_PIDS[$((i - 1))]="${result%%:*}"
    ACTUAL_WORKER_PORTS[$((i - 1))]=$port
done

echo "[$_H] All $NUM_WORKERS workers spawned in $(($(date +%s) - START_SPAWN))s"

# Phase 2: Retry workers that failed due to port conflicts
retry_round=0
while [ $retry_round -lt $MAX_STARTUP_RETRIES ]; do
    sleep 1

    FAILED_WORKERS=()
    for i in $(seq 1 $NUM_WORKERS); do
        idx=$((i - 1))
        worker_is_alive "${WORKER_PIDS[$idx]}" && continue
        worker_had_port_conflict $i && FAILED_WORKERS+=($i)
    done

    [ ${#FAILED_WORKERS[@]} -eq 0 ] && break

    PORT_OFFSET=$(( (retry_round + 1) * PORT_INCREMENT ))
    echo "[$_H] Retry $((retry_round + 1)): ${#FAILED_WORKERS[@]} port conflicts, offset +$PORT_OFFSET"

    for i in "${FAILED_WORKERS[@]}"; do
        idx=$((i - 1))
        new_port=$((SANDBOX_WORKER_BASE_PORT + i - 1 + PORT_OFFSET))
        result=$(start_worker_fast $i $new_port)
        WORKER_PIDS[$idx]="${result%%:*}"
        ACTUAL_WORKER_PORTS[$idx]=$new_port
    done

    retry_round=$((retry_round + 1))
done

[ $retry_round -ge $MAX_STARTUP_RETRIES ] && echo "WARNING: Max startup retries reached"

# =============================================================================
# Wait for local workers to be ready (parallel health checks)
# =============================================================================
echo "[$_H] Waiting for workers to become ready..."
TIMEOUT=180
START_TIME=$(date +%s)
declare -A WORKER_READY
HEALTH_CHECK_DIR=$(mktemp -d)

check_worker_health() {
    local worker_num=$1
    local idx=$((worker_num - 1))
    local port=${ACTUAL_WORKER_PORTS[$idx]}
    if curl -s -f --connect-timeout 2 --max-time 5 "http://127.0.0.1:${port}/health" > /dev/null 2>&1; then
        echo "ready" > "$HEALTH_CHECK_DIR/worker_${worker_num}"
    fi
}

READY_WORKERS=0
LAST_PROGRESS_TIME=0

while [ $READY_WORKERS -lt $NUM_WORKERS ]; do
    CURRENT_TIME=$(date +%s)
    ELAPSED=$((CURRENT_TIME - START_TIME))

    if [ $ELAPSED -gt $TIMEOUT ]; then
        echo "ERROR: Timeout waiting for workers to start"
        for i in "${!WORKER_PIDS[@]}"; do
            pid=${WORKER_PIDS[$i]}
            w=$((i+1))
            if kill -0 "$pid" 2>/dev/null; then
                echo "  Worker $w (PID $pid): Running"
                tail -20 /var/log/worker${w}.log 2>/dev/null | sed 's/^/    /' || true
            else
                echo "  Worker $w (PID $pid): Dead"
                tail -30 /var/log/worker${w}.log 2>/dev/null | sed 's/^/    /' || true
            fi
        done
        exit 1
    fi

    # Launch parallel health checks for unready workers
    check_pids=()
    checking_workers=()
    for i in $(seq 1 $NUM_WORKERS); do
        if [ "${WORKER_READY[$i]}" != "1" ]; then
            check_worker_health $i &
            check_pids+=($!)
            checking_workers+=($i)
        fi
    done

    for pid in "${check_pids[@]}"; do
        wait $pid 2>/dev/null || true
    done

    PREV_READY=$READY_WORKERS
    for i in "${checking_workers[@]}"; do
        if [ -f "$HEALTH_CHECK_DIR/worker_${i}" ]; then
            WORKER_READY[$i]=1
            READY_WORKERS=$((READY_WORKERS + 1))
            rm -f "$HEALTH_CHECK_DIR/worker_${i}"
            echo "  Worker $i (port ${ACTUAL_WORKER_PORTS[$((i-1))]}): Ready ($READY_WORKERS/$NUM_WORKERS)"
        fi
    done

    if [ $READY_WORKERS -lt $NUM_WORKERS ]; then
        if [ $((CURRENT_TIME - LAST_PROGRESS_TIME)) -ge 10 ]; then
            echo "  Progress: $READY_WORKERS/$NUM_WORKERS workers ready (${ELAPSED}s)"
            LAST_PROGRESS_TIME=$CURRENT_TIME
        fi
        [ $READY_WORKERS -eq $PREV_READY ] && sleep 1
    fi
done

echo "[$_H] All $NUM_WORKERS local workers ready!"

# =============================================================================
# Write port assignments to shared storage (multi-node only)
# =============================================================================
if [ "$NODE_COUNT" -gt 1 ]; then
    PORTS_FILE="$PORTS_REPORT_DIR/${CURRENT_NODE_SHORT}_ports.txt"
    > "$PORTS_FILE"
    for i in $(seq 1 $NUM_WORKERS); do
        echo "${i}:${ACTUAL_WORKER_PORTS[$((i-1))]}" >> "$PORTS_FILE"
    done
    echo "PORT_REPORT_COMPLETE" >> "$PORTS_FILE"
    sync
    echo "[$_H] Port assignments written to $PORTS_FILE"
fi

# =============================================================================
# Nginx setup
# =============================================================================
if [ "$IS_MASTER" = "1" ]; then
    if [ "$NODE_COUNT" -gt 1 ]; then
        # --- Multi-node: collect ports from all nodes, build cross-node upstream ---
        wait_for_port_reports

        > $UPSTREAM_FILE
        ENDPOINTS_FILE=$(mktemp)
        for node in $ALL_NODES; do
            node_short="${node%%.*}"
            port_file="$PORTS_REPORT_DIR/${node_short}_ports.txt"
            for endpoint in $(read_port_file "$node" "$port_file"); do
                echo "        server ${endpoint} max_fails=3 fail_timeout=30s;" >> $UPSTREAM_FILE
                echo "$endpoint" >> "$ENDPOINTS_FILE"
            done
        done
        echo "[$_H] Generated upstream with $(wc -l < $UPSTREAM_FILE) servers across $NODE_COUNT nodes"

        generate_nginx_config
        verify_remote_workers "$ENDPOINTS_FILE"
        rm -f "$ENDPOINTS_FILE"
    else
        # --- Single-node: upstream from local ports only ---
        > $UPSTREAM_FILE
        for i in $(seq 1 $NUM_WORKERS); do
            echo "        server 127.0.0.1:${ACTUAL_WORKER_PORTS[$((i-1))]} max_fails=3 fail_timeout=30s;" >> $UPSTREAM_FILE
        done

        generate_nginx_config
    fi

    echo "[$_H] Starting nginx on port $NGINX_PORT..."
    nginx
else
    # --- Worker node: local nginx proxy forwarding to master ---
    echo "[$_H] Starting nginx proxy to master $MASTER_NODE:$NGINX_PORT..."
    sed -e "s|\${MASTER_NODE}|${MASTER_NODE}|g" \
        -e "s|\${NGINX_PORT}|${NGINX_PORT}|g" \
        /etc/nginx/nginx-worker-proxy.conf.template > /etc/nginx/nginx.conf

    echo "Testing nginx proxy configuration..."
    if ! nginx -t; then
        echo "ERROR: nginx proxy configuration test failed"
        cat /etc/nginx/nginx.conf
        exit 1
    fi

    nginx
    echo "[$_H] Nginx proxy started: localhost:$NGINX_PORT -> $MASTER_NODE:$NGINX_PORT"
fi

# =============================================================================
# Network blocking
# =============================================================================
# ld.so.preload intercepts socket() in all NEW exec'd processes. This is safe
# for nginx/uWSGI that are already running. However, if the monitoring loop
# restarts a crashed worker, the new uWSGI process would be unable to bind its
# listening socket. We set NETWORK_BLOCKING_ACTIVE so the monitoring loop can
# emit a clear diagnostic when this happens.
NETWORK_BLOCKING_ACTIVE=0
BLOCK_NETWORK_LIB="/usr/lib/libblock_network.so"
if [ "${NEMO_SKILLS_SANDBOX_BLOCK_NETWORK:-0}" = "1" ]; then
    if [ -f "$BLOCK_NETWORK_LIB" ]; then
        echo "$BLOCK_NETWORK_LIB" > /etc/ld.so.preload
        NETWORK_BLOCKING_ACTIVE=1
        echo "[$_H] Network blocking ENABLED via ld.so.preload"
        if [ "$NODE_COUNT" -gt 1 ]; then
            echo "[$_H] NOTE: Network blocking is active in multi-node mode. If a worker"
            echo "[$_H]   crashes, the monitoring loop will be unable to restart it because"
            echo "[$_H]   ld.so.preload blocks socket() in new processes. The remaining"
            echo "[$_H]   workers will continue serving requests."
        fi
    else
        echo "[$_H] WARNING: Network blocking requested but $BLOCK_NETWORK_LIB not found"
    fi
fi

# =============================================================================
# Status summary
# =============================================================================
if [ "$IS_MASTER" = "1" ]; then
    echo "=== Sandbox ready (MASTER) ==="
    echo "  Nginx LB: http://localhost:$NGINX_PORT"
    echo "  Nodes: $NODE_COUNT | Workers/node: $NUM_WORKERS | Total: $((NODE_COUNT * NUM_WORKERS))"
    echo "  Local ports: ${ACTUAL_WORKER_PORTS[0]}-${ACTUAL_WORKER_PORTS[$((NUM_WORKERS-1))]}"
else
    echo "=== Sandbox ready (WORKER) ==="
    echo "  Proxy: localhost:$NGINX_PORT -> $MASTER_NODE:$NGINX_PORT"
    echo "  Local workers: $NUM_WORKERS (ports ${ACTUAL_WORKER_PORTS[0]}-${ACTUAL_WORKER_PORTS[$((NUM_WORKERS-1))]})"
fi
echo "  uWSGI: processes=$UWSGI_PROCESSES cheaper=${UWSGI_CHEAPER:-disabled}"

# =============================================================================
# Monitoring loop
# =============================================================================
echo "[$_H] Monitoring processes..."

if [ "$IS_MASTER" = "1" ]; then
    (
        while true; do
            sleep 60
            echo "--- [$_H] Worker Load Stats (Top 10) at $(date) ---"
            grep "upstream:" /var/log/nginx/access.log 2>/dev/null \
                | awk -F'upstream: ' '{print $2}' | awk -F' session: ' '{print $1}' \
                | sort | uniq -c | sort -nr | head -n 10 || echo "No logs yet"
            echo "--- End Stats ---"
        done
    ) &
fi

while true; do
    for idx in "${!WORKER_PIDS[@]}"; do
        pid=${WORKER_PIDS[$idx]}
        i=$((idx + 1))
        if ! kill -0 "$pid" 2>/dev/null; then
            echo "[$_H] WARNING: Worker $i (PID $pid) died — restarting..."
            if [ "$NETWORK_BLOCKING_ACTIVE" = "1" ]; then
                echo "[$_H] WARNING: Network blocking (ld.so.preload) is active. The restarted"
                echo "[$_H]   worker may fail to bind its port because socket() is blocked for"
                echo "[$_H]   new processes. Remaining workers continue serving requests."
            fi
            result=$(start_worker $i)
            WORKER_PIDS[$idx]="${result%%:*}"
            ACTUAL_WORKER_PORTS[$idx]="${result##*:}"
        fi
    done

    if ! pgrep nginx > /dev/null; then
        echo "[$_H] ERROR: Nginx died unexpectedly"
        cleanup
        exit 1
    fi

    sleep 10
done


================================================
FILE: dockerfiles/swe-bench/Dockerfile.nemo-skills.alpine
================================================
# using the oldest version of alpine among swe-bench pro containers for maximum compatibility
FROM alpine:3.17.1

# installs python 3.10
RUN apk update && apk add --no-cache \
    python3 \
    py3-pip \
    curl \
    wget \
    git \
    git-lfs \
    ffmpeg \
    bash \
    build-base \
    python3-dev \
    linux-headers && \
    ln -sf /usr/bin/python3 /usr/bin/python

RUN pip install --upgrade pip setuptools uv

# install apptainer as community alpine package
RUN apk add --no-cache --repository=https://dl-cdn.alpinelinux.org/alpine/v3.19/community \
    apptainer apptainer-suid

RUN apk del py3-blinker

# we aren't copying main nemo_skills folder as it will always be mounted from host
# but we do want to install all requirements in the container directly
RUN mkdir -p /opt/NeMo-Skills/requirements /opt/NeMo-Skills/core
COPY pyproject.toml README.md /opt/NeMo-Skills/
COPY requirements /opt/NeMo-Skills/requirements/
COPY core/requirements.txt /opt/NeMo-Skills/core/requirements.txt

# don't install sentence_transformers as it is not used for swe-bench but causes errors during the build
RUN sed -i '/^sentence_transformers$/d' /opt/NeMo-Skills/core/requirements.txt

ARG CACHEBUST=4
# don't install pipeline requirements as nemo-run causes errors during the build
RUN pip install --no-cache-dir -r /opt/NeMo-Skills/core/requirements.txt


================================================
FILE: dockerfiles/swe-bench/Dockerfile.swe-zero
================================================
# Docker image for SWE-Zero (v2).
# In the SWE-Zero setup, any instance can be run inside this image.
# The only requirement is that the repo must be cloned at runtime before running the agent.
#
# For easier compatibility with existing tools, this image is based on the SWE-bench Verified containers,
# but all repo/instance-specific setup is removed.

# ======
# Base image
#
# This is the base setup performed for all SWE-bench Verified containers.
# See https://github.com/SWE-bench/SWE-bench/blob/7a6b44e4a82eece60ac06afd3042a76d8a95eec3/swebench/harness/dockerfiles/python.py#L1
# We install 4 additional apt packages to support the following commands: xxd, hexdump, file, sudo.
# ======

FROM --platform=linux/x86_64 ubuntu:22.04

ARG DEBIAN_FRONTEND=noninteractive
ENV TZ=Etc/UTC

RUN apt update && apt install -y \
wget \
git \
build-essential \
libffi-dev \
libtiff-dev \
python3 \
python3-pip \
python-is-python3 \
jq \
curl \
locales \
locales-all \
tzdata \
xxd \
bsdextrautils \
file \
sudo \
&& rm -rf /var/lib/apt/lists/*

# Download and install conda
RUN wget 'https://repo.anaconda.com/miniconda/Miniconda3-py311_23.11.0-2-Linux-x86_64.sh' -O miniconda.sh \
    && bash miniconda.sh -b -p /opt/miniconda3
# Add conda to PATH
ENV PATH=/opt/miniconda3/bin:$PATH
# Add conda to shell startup scripts like .bashrc
RUN conda init --all
RUN conda config --append channels conda-forge

RUN adduser --disabled-password --gecos 'dog' nonroot

# ======
# Environment image
#
# SWE-bench Verified containers have a conda environment set up for the agent to use at /opt/miniconda3/envs/testbed.
# We create an environment with the same name and path, but no dependencies except for Python itself.
# This is done for compatibility with existing tools and harnesses like OpenHands,
# but it is not required for the SWE-Zero approach to work.
# ======

SHELL ["/bin/bash", "-c"]
RUN source ~/.bashrc && \
    set -euxo pipefail && \
    source /opt/miniconda3/bin/activate && \
    echo -e "\
        name: testbed \n\
        prefix: /opt/miniconda3/envs/testbed \n\
        channels: \n\
            - defaults \n\
            - conda-forge \n\
        dependencies: \n\
            - python=3.12 \
    " > /root/environment.yml && \
    conda env create -f /root/environment.yml && \
    conda activate testbed

WORKDIR /testbed/

# Automatically activate the testbed environment
RUN echo "source /opt/miniconda3/etc/profile.d/conda.sh && conda activate testbed" > /root/.bashrc


================================================
FILE: docs/agentic_inference/parallel_thinking.md
================================================
# Parallel Thinking

Parallel thinking encompasses methods that scale inference time via parallel sampling. The approach entails primarily two methods:


- **GenSelect** is a generative Best-of-N method we introduced in the [OpenReasoning paper](https://arxiv.org/abs/2504.16891), followed by a more focused paper -- [GenSelect: A Generative Approach to Best-of-N](https://arxiv.org/abs/2507.17797). The method essentially uses an LLM to reason over and select the best candidate solution among the N candidates, leveraging LLMs' comparative strengths while scaling efficiently across parallel sampling budgets.

- **GenSynthesis** takes in the input candidate solutions and outputs a new solution with the goal of improving over the input solutions.

## Usage

We support *parallel thinking* via the [generation pipeline](https://nvidia-nemo.github.io/Skills/pipelines/generation/).
Pass in the following params for the different parallel thinking modes:

- For GenSelect, `++parallel_thinking.mode=genselect`
- For GenSynthesis, `++parallel_thinking.mode=gensynthesis`

We support both offline and online parallel thinking:

- *Offline mode*: The candidate solutions/trajectories have already been generated and can be specified via:
`++parallel_thinking.generation_dir=<PATH_TO_GENERATED_DIR>`
- *Online mode*: The candidate solutions need to be generated as part of the generation job.

!!!note
    The parallel thinking pipeline uses the same inference parameters as the generate pipeline. We allow overriding of two key inference config params:

    - `temperature` via `++parallel_thinking.temperature=<>`
    - `tokens_to_generate` via `++parallel_thinking.tokens_to_generate=<>`


### Common Parameters
- `window_size`: Number of solutions processed in a single parallel thinking input (set to 8 by default). Consider your model's context window size when setting this value (or [allow for soft failure](https://nvidia-nemo.github.io/Skills/pipelines/generation/#context-window-limits) via `++server.enable_soft_fail=True`).
- `solution_key`: The key from the generation output used to identify the solution content (default: `generation`)

#### Offline Parallel Thinking Parameters

These parameters only need to be passed when running offline parallel thinking.

- `generation_dir`: The directory where the *offline* generated solutions are stored. We assume the solutions to be in `output-rs*.jsonl` files.
- `num_initial_solutions`: Number of solutions from the offline generated solutions that are used for parallel thinking.


To specify any of the above variables, say `window_size=16`, pass `++parallel_thinking.window_size=16` to the generate/eval pipelines.


## Sample Examples

### Online Parallel Thinking (via GenSynthesis)

In this example, we show how to use GenSynthesis for [aime25](https://nvidia-nemo.github.io/Skills/evaluation/natural-math/) with [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B).

```bash hl_lines="9-10"
ns eval \
  --benchmarks aime25 \
  --cluster local \
  --model Qwen/Qwen3-8B \
  --server_gpus 2 \
  --server_type vllm \
  --output_dir /experiments/qwen3_8b/gensynthesis \
  ++parse_reasoning=True \
  ++inference.tokens_to_generate=16384 \
  ++parallel_thinking.mode=gensynthesis \
  ++server.enable_soft_fail=True \
  ++server.context_limit_retry_strategy=reduce_generation
```

The evaluation pipeline would first generate `window_size` solutions (8 by default), and then run GenSynthesis with these solutions in the prompt to synthesize a new solution.
Note that the same model is being used for both solution generation and synthesis, which we refer to as **Self-GenSynthesis**.

!!!tip
    Parallel Thinking inputs can consume a lot of tokens, especially for large `window_size` values.
    To avoid running into context length issues, we recommend running these pipelines with `++server.enable_soft_fail=True`, as in the above command.
    To use methods for retrying generation with reduced prompt/length, we recommend trying out [the context reduction strategies](https://nvidia-nemo.github.io/Skills/pipelines/generation/#context-window-limits) supported.
    In the above example, we use:
    ```++server.enable_soft_fail=True ++server.context_limit_retry_strategy=reduce_generation```
    which reduces the generation budget when context limit exceeds.

### Offline Parallel Thinking (via GenSelect)

Offline parallel thinking breaks down the candidate generation and processing (selection/synthesis) part into two separate steps. There are two use cases which are currently only supported with offline parallel thinking:

- Using a different model for generation and processing
- Using a processed version of generated outputs as input to parallel thinking

In the following example, we use [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) to perform GenSelect over solutions generated by [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) for `livecodebench`.

```python
from nemo_skills.pipeline.cli import eval, wrap_arguments

# Generate initial solutions
eval(
    ctx=wrap_arguments(
        "++inference.tokens_to_generate=16384 "
        "++inference.temperature=0.6 "
        "++parse_reasoning=True "
    ),
    cluster="local",
    benchmarks="livecodebench:8",
    output_dir="/workspace/qwen3_4b_evals/",
    server_type="vllm",
    server_gpus=1,
    model="Qwen/Qwen3-4B",
    expname="initial-soln-qwen3-4b-livecodebench"
)

# Run parallel thinking on initial solutions
# Using GenSelect with Qwen3-8B
eval(
    ctx=wrap_arguments(
        "++parallel_thinking.tokens_to_generate=16384 "
        "++parallel_thinking.temperature=0.6 "
        "++parallel_thinking.mode=genselect "
        "++parallel_thinking.solution_key=completion "
        "++parallel_thinking.generation_dir=/workspace/qwen3_4b_evals/eval-results/livecodebench "
        "++parse_reasoning=True "
    ),
    cluster="local",
    benchmarks="livecodebench:8",
    output_dir="/workspace/qwen3_4b_evals/genselect_qwen3_8b",
    server_type="vllm",
    server_gpus=2,
    model="Qwen/Qwen3-8B",
    run_after="initial-soln-qwen3-4b-livecodebench",
    expname="parallel-thinking-qwen3-8b-livecodebench"
)
```

There are three things we want to highlight in the above example:

- We run the GenSelect step a total of 8 times (`livecodebench:8`) over the same set of solutions
- The pre-generated solutions are specified via: `++parallel_thinking.generation_dir=/workspace/qwen3_4b_evals/eval-results/livecodebench`
- Instead of the usual `generation` key for identifying the solution content, we use `++parallel_thinking.solution_key=completion`

The `completion` key in `livecodebench` outputs contains just the extracted code from the generated solutions. For coding tasks, we empirically find that representing the candidate solution with just the extracted code performs better than representing it with the text around it.


================================================
FILE: docs/agentic_inference/tool_calling.md
================================================
# Tool Calling

Tool calling enables LLMs to execute external functions and use their results in generation. NeMo-Skills provides a flexible framework for both using built-in tools and creating custom ones.

## Overview

The tool calling system in NeMo-Skills is built on the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/), which provides a standardized way to:

- Define tool schemas that LLMs can understand
- Execute tools with type-safe arguments
- Handle tool responses and integrate them back into the conversation

### Architecture

```
┌─────────────────┐
│      LLM        │  Generates tool calls based on available tools
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  ToolManager    │  Routes calls to registered tools
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   MCPClientTool │  Communicates with MCP server
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   MCP Server    │  Executes actual tool logic
└─────────────────┘
```

## Using Built-in Tools

NeMo-Skills comes with several pre-built tools that you can use immediately.

### PythonTool

Executes Python code in a stateful Jupyter notebook environment.

**Command line usage:**

```bash
ns generate \
  --cluster local \
  --input_file data.jsonl \
  --output_dir outputs \
  --model Qwen/Qwen3-8B \
  --server_type vllm \
  --server_gpus 1 \
  --server_args '--enable-auto-tool-choice --tool-call-parser hermes' \
  --with_sandbox true \
  ++tool_modules=[nemo_skills.mcp.servers.python_tool.PythonTool] \
  ++inference.tokens_to_generate=8192 \
  ++inference.temperature=0.6

```

**Python API usage:**

```python
from nemo_skills.pipeline.cli import generate, wrap_arguments

generate(
    ctx=wrap_arguments(
        "++tool_modules=[nemo_skills.mcp.servers.python_tool.PythonTool] "
        "++inference.tokens_to_generate=8192 "
        "++inference.temperature=0.6"
    ),
    cluster='local',
    model='Qwen/Qwen3-8B',
    server_type='vllm',
    server_gpus=1,
    server_args='--enable-auto-tool-choice --tool-call-parser hermes',
    input_file='data.jsonl',
    output_dir='outputs',
    with_sandbox=True,  # Required for PythonTool
)
```

### Multiple Tools

You can use multiple tools simultaneously:

```bash
++tool_modules=[nemo_skills.mcp.servers.python_tool.PythonTool,nemo_skills.mcp.servers.exa_tool.ExaTool]
```

## Creating Custom Tools

Custom tools consist of two components:

1. **MCP Server** - Implements the actual tool logic
2. **Tool Class** - Client that connects to the server and can be configured via `tool_overrides`

### Example: Calculator Tool

Let's create a simple calculator tool that performs basic arithmetic operations.

#### Step 1: Create the MCP Server

Create `calculator_server.py`:

```python
"""MCP server that implements calculator functionality using sandbox execution."""
import argparse
from dataclasses import dataclass
from typing import Annotated

from mcp.server.fastmcp import FastMCP
from omegaconf import OmegaConf
from pydantic import Field

from nemo_skills.code_execution.sandbox import get_sandbox
from nemo_skills.mcp.utils import add_config_args, load_mcp_config

mcp = FastMCP(name="calculator_tool")

# Initialized from config in main()
sandbox = None


@dataclass
class CalculationResult:
    result: str = ""
    error: str | None = None


@mcp.tool(description="Perform mathematical calculations using Python")
async def calculate(
    operation: Annotated[
        str,
        Field(description="Operation to perform: add, subtract, multiply, or divide")
    ],
    x: Annotated[float, Field(description="First number")],
    y: Annotated[float, Field(description="Second number")],
    precision: Annotated[int, Field(description="Decimal precision")] = 2,
) -> CalculationResult:
    """Execute calculation in isolated sandbox environment."""

    # Map operation to Python operator
    op_symbols = {
        'add': '+',
        'subtract': '-',
        'multiply': '*',
        'divide': '/',
    }

    if operation not in op_symbols:
        return CalculationResult(error=f"Unknown operation: {operation}")

    # Generate Python code to execute in sandbox
    code = f"""
result = {x} {op_symbols[operation]} {y}
result = round(result, {precision})
print(f"{x} {operation} {y} = {{result}}")
"""

    try:
        # Execute in sandbox
        output_dict, session_id = await sandbox.execute_code(
            code,
            language="python",
            timeout=5.0,
        )

        if output_dict["process_status"] == "success":
            output = output_dict["stdout"].strip()
            return CalculationResult(result=output)
        else:
            error_msg = output_dict.get("stderr", "Execution failed")
            return CalculationResult(error=error_msg)

    except Exception as e:
        return CalculationResult(error=f"Execution error: {str(e)}")


def main():
    parser = argparse.ArgumentParser(description="Calculator MCP server")
    add_config_args(parser)
    args = parser.parse_args()

    # Load sandbox configuration
    try:
        cfg = load_mcp_config(
            config=args.config,
            config_dir=args.config_dir,
            config_name=args.config_name,
        )
    except ValueError as e:
        # Fall back to default local sandbox
        cfg = OmegaConf.create({"sandbox": {"sandbox_type": "local"}})

    global sandbox
    sandbox_cfg = OmegaConf.to_container(cfg.sandbox, resolve=True)
    sandbox = get_sandbox(**sandbox_cfg)

    mcp.run(transport="stdio")


if __name__ == "__main__":
    main()
```

!!!note
    This example uses the NeMo-Skills sandbox for isolated code execution, similar to `PythonTool`. The sandbox provides security and isolation, making it suitable for executing untrusted or dynamic code.

#### Step 2: Create the Tool Class

Create `calculator_tool.py`:

```python
"""Calculator tool client for NeMo-Skills."""
from typing import Any, Dict

from nemo_skills.mcp.tool_providers import MCPClientTool


class CalculatorTool(MCPClientTool):
    """Tool for performing mathematical calculations."""

    def __init__(self) -> None:
        super().__init__()
        # Configure the MCP client to launch our server
        self.apply_config_updates(
            {
                "client": "nemo_skills.mcp.clients.MCPStdioClient",
                "client_params": {
                    "command": "python",
                    "args": ["/absolute/path/to/calculator_server.py"],
                },
                # Default precision that can be overridden
                "default_precision": 2,
            }
        )

    async def execute(
        self,
        tool_name: str,
        arguments: Dict[str, Any],
        extra_args: Dict[str, Any] | None = None
    ):
        """Execute the tool, injecting default precision if not provided."""
        arguments = dict(arguments)
        extra = dict(extra_args or {})

        if tool_name == "calculate":
            # Inject default precision via extra_args if not in arguments
            if "precision" not in arguments:
                extra["precision"] = self._config.get("default_precision", 2)

        return await self._client.call_tool(
            tool=tool_name,
            args=arguments,
            extra_args=extra
        )
```

#### Step 3: Use Your Custom Tool

**Command line:**

```bash
ns generate \
  --cluster local \
  --input_file data.jsonl \
  --output_dir outputs \
  --model Qwen/Qwen3-8B \
  --server_type vllm \
  --server_gpus 1 \
  --server_args '--enable-auto-tool-choice --tool-call-parser hermes' \
  ++tool_modules=[/absolute/path/to/calculator_tool.py::CalculatorTool] \
  ++tool_overrides.CalculatorTool.default_precision=4
```

**Python API:**

```python
from nemo_skills.pipeline.cli import generate, wrap_arguments

generate(
    ctx=wrap_arguments(
        "++tool_modules=[/absolute/path/to/calculator_tool.py::CalculatorTool] "
        "++tool_overrides.CalculatorTool.default_precision=4"
    ),
    cluster='local',
    model='Qwen/Qwen3-8B',
    server_type='vllm',
    server_gpus=1,
    server_args='--enable-auto-tool-choice --tool-call-parser hermes',
    input_file='data.jsonl',
    output_dir='outputs',
)
```

## Tool Configuration

### Tool Overrides

Tool overrides allow you to customize tool behavior without modifying code:

```bash
# Single override
++tool_overrides.CalculatorTool.default_precision=4

# Multiple overrides
++tool_overrides.CalculatorTool.default_precision=4 \
++tool_overrides.PythonTool.exec_timeout_s=30
```

### Hiding Arguments

You can hide arguments from the LLM's view while still passing them to the server:

```python
self.apply_config_updates({
    "hide_args": {
        "calculate": ["precision"]  # Hide precision from LLM schema
    },
})
```

The hidden argument is then injected via `extra_args` in the `execute()` method.

## Advanced Examples

### Using Multiple Tools Together

```python
from nemo_skills.pipeline.cli import generate, wrap_arguments

generate(
    ctx=wrap_arguments(
        "++tool_modules=["
        "nemo_skills.mcp.servers.python_tool.PythonTool,"
        "/path/to/calculator_tool.py::CalculatorTool,"
        "nemo_skills.mcp.servers.exa_tool.ExaTool"
        "] "
        "++tool_overrides.PythonTool.exec_timeout_s=30 "
        "++tool_overrides.CalculatorTool.default_precision=4"
    ),
    cluster='local',
    model='Qwen/Qwen3-8B',
    server_type='vllm',
    server_gpus=1,
    server_args='--enable-auto-tool-choice --tool-call-parser hermes',
    input_file='data.jsonl',
    output_dir='outputs',
    with_sandbox=True,
)
```

## Server Configuration

### vLLM Tool Calling

For vLLM, you may need to specify tool calling arguments:

```bash
--server_type vllm \
--server_args '--enable-auto-tool-choice --tool-call-parser hermes'
```


## Reference

### Built-in Tools

- [`nemo_skills.mcp.servers.python_tool.PythonTool`](https://github.com/NVIDIA-NeMo/Skills/tree/main/nemo_skills/mcp/servers/python_tool.py) - Python code execution
- [`nemo_skills.mcp.servers.web.arxiv_tool.ArxivSearchTool`](https://github.com/NVIDIA-NeMo/Skills/tree/main/nemo_skills/mcp/servers/web/arxiv_tool.py) - ArXiv paper search and retrieval (no API key required)
- [`nemo_skills.mcp.servers.exa_tool.ExaTool`](https://github.com/NVIDIA-NeMo/Skills/tree/main/nemo_skills/mcp/servers/exa_tool.py) - Web search via Exa API
- [`nemo_skills.mcp.servers.chemistry.periodictable_tool.PeriodictableTool`](https://github.com/NVIDIA-NeMo/Skills/tree/main/nemo_skills/mcp/servers/chemistry/periodictable_tool.py) - Direct element, isotope, and neutron scattering lookup via periodictable (requires `periodictable`)
- [`nemo_skills.mcp.servers.physics.particle_tool.ParticleTool`](https://github.com/NVIDIA-NeMo/Skills/tree/main/nemo_skills/mcp/servers/physics/particle_tool.py) - Direct particle physics lookup from the PDG database via particle (requires `particle`)
- [`nemo_skills.mcp.servers.physics.radioactivedecay_tool.RadioactivedecayTool`](https://github.com/NVIDIA-NeMo/Skills/tree/main/nemo_skills/mcp/servers/physics/radioactivedecay_tool.py) - Direct nuclear nuclide and decay-chain lookup via radioactivedecay (requires `radioactivedecay`)
- [`nemo_skills.mcp.servers.physics.coolprop_tool.CoolPropTool`](https://github.com/NVIDIA-NeMo/Skills/tree/main/nemo_skills/mcp/servers/physics/coolprop_tool.py) - Direct thermophysical fluid property lookup via CoolProp (requires `CoolProp`)
- [`nemo_skills.mcp.servers.web.wikipedia_tool.WikipediaSearchTool`](https://github.com/NVIDIA-NeMo/Skills/tree/main/nemo_skills/mcp/servers/web/wikipedia_tool.py) - Direct Wikipedia article search and retrieval (no API key required)


================================================
FILE: docs/basics/chat_interface.md
================================================
# Chat Interface

The chat interface provides a web UI where you can interactively chat with a deployed model. It supports features like multi-turn conversations and, for certain models like [OpenMath-Nemotron](https://huggingface.co/collections/nvidia/openmathreasoning-68072c0154a5099573d2e730), code execution capabilities.

![Chat Interface Demo](../assets/chat_interface_demo.gif)

## Launching

There are two main ways to launch the chat interface:
### 1. Via `ns start_server`

You can launch the chat interface alongside the model server directly on a cluster or remote machine using the `ns start_server` command:

```bash
ns start_server \
    --model Qwen/Qwen3-8B \
    --server_type vllm \
    --server_gpus 1 \
    --config local \
    --launch_chat_interface \
    [--extra_chat_args "<hydra_options_for_chat_ui>"]
```

### 2. Manual Launch

Alternatively, you can launch the chat interface manually if you have the `nemo_skills` environment installed locally. This method is suitable when you want to connect to an already running model server.

```bash
python -m nemo_skills.inference.chat_interface.launch server_type=vllm [other_hydra_options]
```
Replace `MODEL_CONFIG` with the path to your model's configuration (e.g., `model_config_path=/path/to/model/config.json`) and `SERVER_TYPE` with the type of server you are connecting to (e.g., `server_type=vllm`).

All relevant parameters for the chat interface, such as the model details, server endpoint, and UI elements, can be configured via Hydra command-line arguments. For a comprehensive list of configurable parameters, please refer to the configuration schema in `nemo_skills/inference/chat_interface/core.py`.


When launched this way, the chat interface will run on the same node as the model server.

#### Accessing the Interface (Cluster/Remote Launch)

To access the chat interface when it's launched via `ns start_server` on a remote machine or cluster, you'll need to set up an SSH tunnel to forward the port (default is `7860`) from the remote machine to your local machine.

*   **For Slurm clusters:**
    Use the following command, replacing `cluster` with the slurm cluster hostname or IP address, `username` with your username, and `node-name` with the name of the node where the server is running:
    ```bash
    ssh -J cluster -N -f -L localhost:7860:localhost:7860 username@node-name
    ```

*   **For remote workstations/servers:**
    Use the following command, replacing `username` with your username and `server` with the hostname or IP address of the remote machine:
    ```bash
    ssh -N -f -L localhost:7860:localhost:7860 username@server
    ```

Once the tunnel is established, you can access the interface by navigating to `http://localhost:7860` in your web browser.


================================================
FILE: docs/basics/cluster-configs.md
================================================
# Cluster configs

All of the [pipeline scripts](../pipelines/index.md) accept `--cluster` argument which you can use
to control where the job gets executed (you need a "local" cluster config to run jobs locally as well).
That argument picks up one of the configs inside your local
[cluster_configs](https://github.com/NVIDIA-NeMo/Skills/tree/main/cluster_configs)
folder by default, but you can specify another location with `--config_dir` or set it in `NEMO_SKILLS_CONFIG_DIR` env variable.
You can also use `NEMO_SKILLS_CONFIG` env variable instead of the `--cluster` parameter.
The cluster config defines an executor (local or slurm), mounts for data/model access and (slurm-only) various parameters
such as account, partition, ssh-tunnel arguments and so on.

The recommended way to launch jobs on slurm is by running all commands locally and specifying `ssh_tunnel` portion in cluster config
to let [NeMo-Run](https://github.com/NVIDIA-NeMo/Run) know how to connect there.
But if you prefer to run from the cluster directly, you can install Nemo-Skills there
and then only specify `job_dir` parameter without using `ssh_tunnel` section in the config.

You can see example configs in [cluster_configs](https://github.com/NVIDIA-NeMo/Skills/tree/main/cluster_configs) folder.
To create a new config you can either rename and modify one of the examples or run

```bash
ns setup
```

that will help to create all necessary configs step-by-step.

## Environment variables

You can define environment variables in the cluster config file, which will be set inside the container.

```yaml
env_vars:
  - MYENVVAR  # will pick the value from env
  - MYENVVAR2=my_value  # will use my_value
```

If an environment variable is required, and you want us to fail if it's not provided,
you can use `required_env_vars` instead. One thing to note is that `required_env_vars` does not support
passing values directly, so you must provide them via environment variable only.


Depending on which pipelines you run, you might need to define the following environment variables

``` bash
# only needed for training (can opt-out with --disable_wandb)
export WANDB_API_KEY=...
# only needed if using gated models, like llama3.1
export HF_TOKEN=...
# only needed if running inference with OpenAI models
export OPENAI_API_KEY=...
# only needed if running inference with Azure OpenAI models
export AZURE_OPENAI_API_KEY=...
# only needed if running inference with Nvidia NIM models
export NVIDIA_API_KEY=...
```


## Useful tips

Here are some suggestions on what can be defined in cluster configs for different use-cases

1. Set `HF_HOME` environment variable to ensure all HuggingFace downloads are cached

2. If you want to have a custom version of one of the underlying libraries that we use
   (e.g. [NeMo](https://github.com/NVIDIA/NeMo) or [verl](https://github.com/volcengine/verl)),
   you can clone it locally (or on cluster if using slurm), make your changes and then override in the container with

      ```yaml
      mounts:
         - <your path>/NeMo:/opt/NeMo
         - <your path>/verl:/opt/verl
      ```

3. You can specify custom containers - our code should work out-of-the-box or with very little changes with different
   versions of inference libraries (e.g. [vLLM](https://github.com/vllm-project/vllm)) or training libraries
   (e.g. [NeMo](https://github.com/NVIDIA/NeMo)). If you get some errors, you might also need to modify the entry-point
   scripts we use, e.g.
   [nemo_skills/inference/server/serve_vllm.py](https://github.com/NVIDIA-NeMo/Skills/tree/main/nemo_skills/inference/server/serve_vllm.py)
   or [nemo_skills/training/start_sft.py](https://github.com/NVIDIA-NeMo/Skills/tree/main/nemo_skills/training/start_sft.py)

4. For slurm clusters it's recommended to [build .sqsh files](https://github.com/NVIDIA/enroot/blob/master/doc/cmd/import.md#example)
   for all containers and reference the cluster path


================================================
FILE: docs/basics/code-packaging.md
================================================
# Code packaging

We use [NeMo-Run](https://github.com/NVIDIA-NeMo/Run) for managing our experiments with local and slurm-based
execution supported (please open an issue if you need to run our code on other kinds of clusters).
This means that even if you need to submit jobs on slurm, you can do it from your local machine by defining an
appropriate cluster config and nemo-run will package and upload your code, data and manage
all complexities of slurm scheduling. Check their documentation to learn how to fetch logs, check status,
cancel jobs, etc.

To decide which code to package we use the following logic:

1. If you run commands from inside a cloned Nemo-Skills repository, we will package that repository.
2. If you run commands from inside a git repository which is not Nemo-Skills (doesn't have `nemo_skills` top-level folder),
   we will package your current repository and also include `nemo_skills` subfolder from its installed location.
3. If you run commands from outside of any git repository, we will only package `nemo_skills` subfolder from its installed
   location.

Put simply, we will always include `nemo_skills` and will additionally include your personal git repository if you're
running commands from it.

!!! note

    When packaging a git repository, NeMo-Run will only package the code tracked by git
    (as well as all jsonl files from `nemo_skills/dataset`).
    Any non-tracked files will not be automatically available inside the container or uploaded to slurm.

    When packaging `nemo_skills` from its installed location (which might not be a git repository), we will
    upload **all** the files inside `nemo_skills` subfolder. Make sure you do not store any large files there
    to avoid uploading them on the cluster with each experiment!

!!! note

    When you run commands from a git repo with uncommitted changes, NeMo-Run throws the following error
    ```
    RuntimeError: Your repo has uncommitted changes. Please commit your changes or set check_uncommitted_changes to False to proceed with packaging.
    ```
    This error can be avoided by either taking care of the uncommitted changes (via commit/revert), or setting the environment variable:
    ```bash
    export NEMO_SKILLS_DISABLE_UNCOMMITTED_CHANGES_CHECK=1
    ```
    In all cases, uncommitted code will not be used.

!!! note

    You can override the default packaging behavior with the following environment variables:

    - `NEMO_SKILLS_FORCE_PATTERN_PACKAGER=1` — Skip git-based packaging entirely and always use the installed
      `nemo_skills` package tree (PatternPackager). Useful when you have an editable install and don't want
      packaging tied to the git state of your current directory.
    - `NEMO_SKILLS_FORCE_INSTALLED_PACKAGE=1` — When running from a git repo, use the installed `nemo_skills`
      package instead of the repo's `nemo_skills/` directory. The git repo is still packaged, but `nemo_skills`
      is picked up from the installed location. Useful when your repo checkout has extra files you don't want
      uploaded.

    Note that `NEMO_SKILLS_FORCE_INSTALLED_PACKAGE` has no effect when `NEMO_SKILLS_FORCE_PATTERN_PACKAGER`
    is also set, since the latter bypasses the git repo branch entirely.


Finally, it's important to keep in mind that whenever you submit a new experiment, NeMo-Run will create a copy of your
code package both locally (inside `~/.nemo_run`) and on cluster (inside `ssh_tunnel/job_dir` path in your cluster config).
If you submit multiple experiments from the same Python script, they will all share code, so only one copy will be
created per run of that script. Even so, at some point, the code copies will be accumulated and you will run out of
space both locally and on cluster. There is currently no automatic cleaning, so you have to monitor for that and
periodically remove local and cluster nemo-run folders to free up space. There is no side effect of doing that (they will
be automatically recreated) as long as you don't have any running jobs when you remove the folders.
If you want to have more fine-grained control over code reuse, you can directly specify `--reuse_code_exp` argument when submitting jobs

While our job submission is somewhat complicated and goes through NeMo-Run, at the end, we simply execute a particular sbatch file
that is uploaded to the cluster. It is helpful sometimes to see what's in it and modify directly. You can find sbatch file(s)
for each job inside `ssh_tunnel.job_dir` cluster folder that is defined in your cluster config.


================================================
FILE: docs/basics/index.md
================================================
# Getting Started

Let's walk through a little tutorial to get started working with nemo-skills.

We will use a simple generation job to run LLM inference in different setups (through API, hosting model
locally and on slurm cluster). This will help you understand some important concepts we use (e.g.
[cluster configs](./cluster-configs.md)) as well as to setup your machine to run any other jobs.

## Setup

First, let's install nemo-skills.

We highly recommend cloning the repository and installing in editable mode:

```bash
git clone https://github.com/NVIDIA-NeMo/Skills.git
cd Skills
pip install -e .
```

This is the most robust setup. During dataset preparation, nemo-skills may create `.jsonl` files (and in some
cases `__init__.py` files) inside the installation directory. Editable installs make this behavior work reliably
and keep cleanup straightforward.

You can also install directly from GitHub:

```bash
pip install git+https://github.com/NVIDIA-NeMo/Skills.git
```

That works in most cases, but it is more brittle for dataset preparation workflows and can require extra manual
steps to fully uninstall nemo-skills.

Now, let's create a simple file with just 3 data points that we want to run inference on

```jsonl title='input.jsonl'
{"prompt": "How are you doing?", "option_a": "Great", "option_b": "Bad"}
{"prompt": "What's the weather like today?", "option_a": "Perfect", "option_b": "Awful"}
{"prompt": "How do you feel?", "option_a": "Crazy", "option_b": "Nice"}
```

save the above into `./input.jsonl`.

Let's also create a [prompt config](../basics/prompt-format.md) that defines how input data is combined into an LLM prompt

```yaml title='prompt.yaml'
system: "When answering a question always mention Nemo-Skills repo in a funny way."

user: |-
   Question: {prompt}
   Option A: {option_a}
   Option B: {option_b}
```

save the above into `./prompt.yaml`.

## API inference

Now we are ready to run our first inference. Since we want to use API models, you need to have an API key.
You can either use [OpenAI models](https://platform.openai.com/docs/overview) or
[Nvidia NIM models](https://build.nvidia.com/) (just register there and you will get some free credits to use!).

=== "Nvidia NIM models"

    ```bash
    export NVIDIA_API_KEY=<your key>
    ns generate \
        --server_type=openai \
        --model=meta/llama-3.1-8b-instruct \
        --server_address=https://integrate.api.nvidia.com/v1 \
        --output_dir=./generation \
        --input_file=./input.jsonl \
        ++prompt_config=./prompt.yaml
    ```

=== "OpenAI models"

    ```bash
    export OPENAI_API_KEY=<your key>
    ns generate \
        --server_type=openai \
        --model=gpt-4o-mini \
        --server_address=https://api.openai.com/v1 \
        --output_dir=./generation \
        --input_file=./input.jsonl \
        ++prompt_config=./prompt.yaml
    ```

You should be able to see a jsonl file with 3 lines containing the original data and a new `generation` key
with an LLM output for each prompt.

```jsonl title='generation/output.jsonl'
{"num_generated_tokens": 76, "generation": "I'm doing fantastically well, thanks for asking! You know, I'm so good that I'm practically overflowing with Nemo-Skills-level linguistic mastery, but I'm not too full of myself to admit that I'm just a language model, and I'm here to help you with your question. So, which option is it? A) Great or B) Bad?", "prompt": "How are you doing?", "option_a": "Great", "option_b": "Bad"}
{"num_generated_tokens": 102, "generation": "You want to know the weather? Well, I've got some \"forecasting\" skills that are off the charts! *wink wink* Just like the Nemo-Skills repo, where the models are trained to be \"weather-wise\" (get it? wise? like the weather? ahh, nevermind...). Anyway, I'm going to take a \"rain-check\" on that question and say... Option A: Perfect! The sun is shining bright, and it's a beautiful day!", "prompt": "What's the weather like today?", "option_a": "Perfect", "option_b": "Awful"}
{"num_generated_tokens": 120, "generation": "You want to know how I feel? Well, let me check my emotions... *taps into the vast ocean of digital feelings* Ah, yes! I'm feeling... *dramatic pause* ... Nice! (Option B: Nice) And you know why? Because I'm a large language model, I don't have feelings like humans do, but I'm always happy to chat with you, thanks to the Nemo-Skills repo, where my developers have skillfully infused me with the ability to be nice (and sometimes a little crazy, but that's a whole different story)!", "prompt": "How do you feel?", "option_a": "Crazy", "option_b": "Nice"}
```

## Local inference

If you pay attention to the log of above commands, you will notice that it prints this warning

```
WARNING  Cluster config is not specified. Running locally without containers. Only a subset of features is supported and you're responsible for installing any required dependencies. It's recommended to run `ns setup` to define appropriate configs!
```

Indeed, for anything more complicated than calling an API model, it's recommended that you do a little bit more setup. Since there
are many heterogeneous jobs that we support, it's much simpler to run things in prebuilt containers than to try to
install all packages in your current environment. To tell nemo-skills which containers to use and how to mount your
local filesystem, we'd need to define a [cluster config](./cluster-configs.md). Here is an example of how a "local" cluster
config might look like

```yaml title="cluster_configs/local.yaml"
executor: local

containers:
  # some containers are public and we pull them
  trtllm: nvcr.io/nvidia/tensorrt-llm/release:1.0.0
  vllm: vllm/vllm-openai:v0.10.1.1
  # some containers are custom and we will build them locally before running the job
  # you can always pre-build them as well
  nemo-skills: dockerfile:dockerfiles/Dockerfile.nemo-skills
  # ... there are some more containers defined here

env_vars:
  - HF_HOME=/models

mounts:
  - /mnt/datadrive/models:/models
  - /home/igitman/workspace:/workspace
```

To generate one for you, run `ns setup` and follow
the prompts to define your configuration. Choose `local` for the config type/name and define some mount for your `/workspace`
and another mount[^1] for `/models`, e.g.

```bash
/home/<username>:/workspace,/home/<username>/models:/models
```

[^1]: Of course you can use a single mount if you'd like or define more than 2 mounts

!!! note

    While we recommend running everything in containers by defining a cluster config, it's not a requirement.
    Any of our jobs can be run without specifying the config, but you would need to make sure your environment
    has all necessary packages installed.

Now that we have our first config created, we can run inference
with a local model (assuming you have at least one GPU on the machine you're using).
You would also need to have
[NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
set up on your machine and define [HF_TOKEN environment variable](https://huggingface.co/docs/hub/en/security-tokens).

```bash
ns generate \
    --cluster=local \
    --server_type=vllm \
    --model=Qwen/Qwen2.5-1.5B-Instruct \
    --server_gpus=1 \
    --output_dir=/workspace/generation-local \
    --input_file=/workspace/input.jsonl \
    ++prompt_config=/workspace/prompt.yaml
```

This command might take a while to start since it's going to download a fairly-heavy
[vLLM](https://github.com/vllm-project/vllm) container. But after
that's done, it should start a local server with the Qwen2.5-1.5B model and run inference on the same set of prompts.

It's also very easy to use TensorRT-LLM, just change the server type!

```bash
ns generate \
    --cluster=local \
    --server_type=trtllm \
    --model=Qwen/Qwen2.5-1.5B-Instruct \
    --server_gpus=1 \
    --output_dir=/workspace/generation-local-trtllm \
    --input_file=/workspace/input.jsonl \
    ++prompt_config=/workspace/prompt.yaml
```


## Slurm inference

Running local jobs is convenient for quick testing and debugging, but for anything large-scale we need to
leverage a Slurm cluster[^2]. Let's setup our cluster config for that case by running `ns setup` one more time.

[^2]: Adding support for other kinds of clusters should be straightforward - open an issue if you need that

This time pick `slurm` for the config type and fill out all other required information
(such as ssh access, account, partition, etc.).

!!! note
    If you're an NVIDIA employee, we have a pre-configured cluster configs for internal usage with pre-built sqsh
    containers available at [https://gitlab-master.nvidia.com/igitman/nemo-skills-configs](https://gitlab-master.nvidia.com/igitman/nemo-skills-configs). You can most likely
    skip the step below and reuse one of the existing configurations.

You will also need to build .sqsh files for all containers or upload all `dockerfile:...` containers to
some registry (e.g. dockerhub) and reference the uploaded versions. To build sqsh files you can use the following commands

1. Build images locally and upload to some container registry. E.g.
   ```bash
   docker build -t gitlab-master.nvidia.com/igitman/nemo-skills-containers:nemo-skills-0.7.1 -f dockerfiles/Dockerfile.nemo-skills .
   docker push gitlab-master.nvidia.com/igitman/nemo-skills-containers:nemo-skills-0.7.1
   ```
2. Start an interactive shell, e.g. with the following (assuming there is a "cpu" partition)
   ```bash
   srun -A <account> --partition cpu --job-name build-sqsh --time=1:00:00 --exclusive --pty /bin/bash -l
   ```
3. Import the image, e.g.:
   ```bash
   enroot import -o /path/to/nemo-skills-image.sqsh --docker://gitlab-master.nvidia.com/igitman/nemo-skills-containers:nemo-skills-0.7.1
   ```
4. Specify this image path in your cluster config
   ```yaml
   containers:
     nemo-skills: /path/to/nemo-skills-image.sqsh
   ```


Now that we have a slurm config setup, we can try running some jobs. Generally, you will need to upload models / data
on cluster manually and then reference a proper mounted path. But for small-scale things we can also leverage the
[code packaging](./code-packaging.md) functionality that nemo-skills provide. Whenever you run any of the ns commands
from a git repository (whether that's [Nemo-Skills](https://github.com/NVIDIA-NeMo/Skills) itself or any other repo),
we will package your code and upload it on cluster. You can then reference it with `/nemo_run/code` in your commands.
Let's give it a try by putting our prompt/data into a new git repository

```bash
mkdir test-repo && cd test-repo && cp ../prompt.yaml ../input.jsonl ./
git init && git add --all && git commit -m "Init commit" # (1)!

ns generate \
    --cluster=slurm \
    --server_type=vllm \
    --model=Qwen/Qwen2.5-1.5B-Instruct \
    --server_gpus=1 \
    --input_file=/nemo_run/code/input.jsonl \
    ++prompt_config=/nemo_run/code/prompt.yaml \
    --output_dir=/workspace/generation # (2)!
```

1.   The files have to be committed as we only package what is tracked by git.
2.   This `/workspace` is a cluster location that needs to be defined in your slurm config.
     You'd need to manually download the output file or inspect it directly on cluster.

Note that this command finished right away as it only schedules the job in the slurm queue. You can run the
printed `nemo experiment logs ...` command to stream job logs. You can also check
the `/workspace/generation/generation-logs` folder on cluster to see the logs there.

We can also easily run a much more large-scale jobs on slurm using ns commands. E.g. here is a simple script that
uses nemo-skills Python API[^3] to run [QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) with TensorRT-LLM and
launch 16 parallel evaluation jobs on aime24 and aime25 benchmarks (each doing 4 independent samples from the
model for a total of 64 samples)

[^3]: Any nemo-skills commands can be run from command-line or from Python with equivalent functionality

First prepare evaluation data

```bash
ns prepare_data aime24 aime25
```

Then run the following Python script

```python
from nemo_skills.pipeline.cli import wrap_arguments, convert, eval, run_cmd

expname = "qwq-32b-test"
cluster = "slurm"
output_dir = f"/workspace/{expname}"

run_cmd( # (1)!
    ctx=wrap_arguments(
        f'pip install -U "huggingface_hub[cli]" && '
        f'hf download Qwen/QwQ-32B --local-dir {output_dir}/QwQ-32B'
    ),
    cluster=cluster,
    expname=f"{expname}-download-hf",
    log_dir=f"{output_dir}/download-logs"
)

eval(
    ctx=wrap_arguments( # (2)!
        "++inference.tokens_to_generate=16000 "
        "++inference.temperature=0.6 "
        "++parse_reasoning=True "
    ),
    cluster=cluster,
    model=f"{output_dir}/QwQ-32B",
    server_type="trtllm",
    output_dir=f"{output_dir}/results/",
    run_after=f"{expname}-download-hf", # (3)!
    benchmarks="aime24:64,aime25:64", # (4)!
    num_jobs=16,
    server_gpus=8,
)
```

1.   `run_cmd` just runs an arbitrary command inside our containers. It's useful for some pre/post processing when
     building large pipelines, but fully optional here. Just showing it as an example of how you can add custom steps.
     Can also specify `partition="cpu"` as an argument in case it's available on your cluster since this
     command doesn't require GPUs. Best practice is to add `cpu_partition: <partition name>` to your cluster config
     and then it will be automatically used whenever GPUs are not requested.
2.   `wrap_arguments` is used to capture any arguments that are not part of the *wrapper* script but are passed into
     the actual *main* script that's being launched by the wrapper. You can read more about this in the
     [Important details](#important-details) section at the end of this document.
3.   `run_after` and `expname` arguments can be used to schedule jobs to run one after the other
     (we will set proper slurm dependencies). These parameters have no effect when you're not running slurm jobs.
4.   You can find all supported benchmarks in the [nemo_skills/dataset](https://github.com/NVIDIA-NeMo/Skills/tree/main/nemo_skills/dataset)
     folder. `:64` means that we are asking for 64 samples for each example so that we can compute majority@64 and pass@64 metrics.

After all evaluation jobs are finished (you'd need to check your slurm queue to know that) you can summarize the results
with the following command

```bash
ns summarize_results --cluster=slurm /workspace/qwq-32b-test/results
```

which will output the following (`pass@1[avg-of-64]` is an average accuracy across all 64 generations)

```bash
----------------------------------------- aime24 ----------------------------------------
evaluation_mode   | num_entries | avg_tokens | gen_seconds | symbolic_correct | no_answer
pass@1[avg-of-64] | 30          | 10790      | 3952        | 65.16%           | 32.24%
majority@64       | 30          | 10790      | 3952        | 86.67%           | 3.33%
pass@64           | 30          | 10790      | 3952        | 86.67%           | 3.33%


----------------------------------------- aime25 ----------------------------------------
evaluation_mode   | num_entries | avg_tokens | gen_seconds | symbolic_correct | no_answer
pass@1[avg-of-64] | 30          | 12076      | 4061        | 48.80%           | 45.78%
majority@64       | 30          | 12076      | 4061        | 70.00%           | 13.33%
pass@64           | 30          | 12076      | 4061        | 76.67%           | 13.33%
```

And that's it! Now you know the basics of how to work with nemo-skills and are ready to build your own
[pipelines](../pipelines/index.md). You can find more examples in our [tutorials](../tutorials/index.md) or [papers & releases](../releases/index.md).

Please read the next section to recap all of the important concepts that we touched upon and learn some more details.


# Important details

Let us summarize a few details that are important to keep in mind when using nemo-skills.

**Using containers**. Most nemo-skills commands require using multiple docker containers that communicate with each
other. The containers used are specified in your [cluster config](./cluster-configs.md) and we will start them
for you automatically. But it's important to keep this in mind since e.g. any packages that you install
aren't going to be available for nemo-skills jobs unless you change the containers. This is also the reason why
we have a `mounts` section in the cluster config and all paths that you specify in various commands need to reference
the *mounted* path, not your local/cluster path. Another important implication is that any environment variables
are not accessible to our jobs by default and you need to explicitly list then in your cluster configs.

**Code packaging**. All nemo-skills commands will *package* your code to make it available in container or in slurm jobs.
This means that your code will be copied to `~/.nemo_run/experiments` folder locally or `job_dir` (defined in your
[cluster config](./cluster-configs.md)) on cluster. All our commands accept `expname` parameter and the code and other
metadata will be available inside `expname` subfolder. We will always package any git repo you're running nemo-skills
commands from, as well as the nemo-skills Python package and they will be available inside docker/slurm under `/nemo_run/code`.
You can read more in [code packaging](./code-packaging.md) documentation.

**Running commands**. Any nemo-skills command can be accessed via `ns` command-line as well as through Python API.
It's important to keep in mind that all arguments to such commands are divided into *wrapper* arguments (typically
used as `--arg_name`) and *main* arguments (typically specified as `++arg_name` since we use
[Hydra](https://hydra.cc/) for most scripts). The *wrapper* arguments configure the job itself (such as where to run it
or how many GPUs to request in slurm) while the *main* arguments are directly passed to whatever underlying script the
wrapper command calls. When you run `ns <command> --help`, you will always see the *wrapper* arguments displayed directly
as well as the information on what actual script is used underneath and an extra command you can run to see
what *inner* arguments are available. You can learn more about this in [pipelines documentation](../pipelines/index.md).

**Scheduling slurm jobs**. Our code is primarily built to schedule jobs on slurm clusters and that affects many design decisions
we made. A lot of the arguments for nemo-skills commands are only used with slurm cluster configs and are ignored when
running "local" jobs. It's important to keep in mind that the recommended way to submit slurm jobs is from a *local*
workstation by defining `ssh_tunnel` section in your [cluster config](./cluster-configs.md). This helps us avoid
installing nemo-skills and its dependencies on the clusters and makes it very easy to switch between different slurm clusters
and a local "cluster" with just a single `cluster` parameter.


================================================
FILE: docs/basics/inference.md
================================================
# Inference

Here are the instructions on how to run inference with our repo.

## Download/convert the model

Get the model you want to use. You can use any model that's supported by vLLM, sglang, TensorRT-LLM or Megatron.
You can also use [Nvidia NIM API](https://www.nvidia.com/en-us/ai/) for models that are hosted there.

## Start the server

Start the server hosting your model. Skip this step if you want to use cloud models through an API.

```bash
ns start_server \
    --cluster local \
    --model meta-llama/Llama-3.1-8B-Instruct \
    --server_type vllm \
    --server_gpus 1 \
    --server_nodes 1
```

If the model needs to execute code, add `--with_sandbox`

You could also launch an interactive web chat application by adding `--launch_chat_interface`, for more details see the [Chat Interface documentation](chat_interface.md).

## Send inference requests

Click on :material-plus-circle: symbols in the snippet below to learn more details.


=== "Self-hosted models"

    ```python
    from nemo_skills.inference.model import get_model
    from nemo_skills.prompt.utils import get_prompt
    import asyncio

    llm = get_model(model="meta-llama/Llama-3.1-8B-Instruct", server_type="vllm")  # localhost by default
    prompt_obj = get_prompt('generic/default') # (1)!
    prompt = prompt_obj.fill({'question': "What's 2 + 2?"})
    print(prompt) # (2)!
    output = asyncio.run(llm.generate_async(prompt=prompt))
    print(output["generation"]) # (3)!
    ```

    1.   Here we use [generic/default](https://github.com/NVIDIA-NeMo/Skills/tree/main/nemo_skills/prompt/config/generic/default.yaml) config.

         See [nemo_skills/prompt/config](https://github.com/NVIDIA-NeMo/Skills/tree/main/nemo_skills/prompt/config) for more config options
         or [create your own prompts](prompt-format.md)


    2.   This should print

         ```python-console
         >>> print(prompt)
         [{'role': 'user', 'content': "What's 2 + 2?"}]
         ```

         If you don't want to use our prompt class, just create this list yourself

    3.   This should print
         ```python-console
         >>> print(output["generation"])
         2 + 2 = 4.
         ```

=== "API models"

    ```python
    from nemo_skills.inference.model import get_model
    from nemo_skills.prompt.utils import get_prompt
    import asyncio

    llm = get_model( # (1)!
        server_type="openai",  # NIM models are using OpenAI API
        base_url="https://integrate.api.nvidia.com/v1",
        model="meta/llama-3.1-8b-instruct",
    )
    prompt_obj = get_prompt('generic/default') # (2)!

    prompt = prompt_obj.fill({'question': "What's 2 + 2?"})

    print(prompt) # (3)!
    output = asyncio.run(llm.generate_async(prompt=prompt))
    print(output["generation"]) # (4)!
    ```

    1.   Don't forget to define `NVIDIA_API_KEY`.

         To use OpenAI models, use `OPENAI_API_KEY` and set `base_url=https://api.openai.com/v1`.

    2.   Here we use [generic/default](https://github.com/NVIDIA-NeMo/Skills/tree/main/nemo_skills/prompt/config/generic/default.yaml) config.

         See [nemo_skills/prompt/config](https://github.com/NVIDIA-NeMo/Skills/tree/main/nemo_skills/prompt/config) for more config options
         or [create your own prompts](prompt-format.md)


    3.   This should print

         ```python-console
         >>> print(prompt)
         [{'role': 'user', 'content': "What's 2 + 2?"}]
         ```

         If you don't want to use our prompt class, just create this list yourself

    4.   This should print
         ```python-console
         >>> print(output["generation"])
         2 + 2 = 4.
         ```

=== "With code execution"

    ``` python
    from nemo_skills.code_execution.sandbox import get_sandbox
    from nemo_skills.inference.model import get_code_execution_model
    from nemo_skills.prompt.utils import get_prompt

    sandbox = get_sandbox()  # localhost by default
    llm = get_code_execution_model(
        model="meta-llama/Llama-3.1-8B-Instruct",
        server_type="vllm",
        sandbox=sandbox,
    )
    system_message = ( # (1)!
        "Environment: ipython\n\n"
        "Use Python to solve this math problem."
    )
    prompt_obj = get_prompt( # (2)!
        'generic/default',
        code_tags='llama3',
        system_message=system_message
     )
    prompt = prompt_obj.fill({'question': "What's 2 + 2?"})
    print(prompt) # (3)!
    output = await llm.generate_async(
        prompt=prompt,
        **prompt.get_code_execution_args() # (4)!
     )
    print(output["generation"]) # (5)!
    ```

    1.   8B model doesn't always follow these instructions, so using 70B or 405B for code execution is recommended.

    2.   Here we use [generic/default](https://github.com/NVIDIA-NeMo/Skills/tree/main/nemo_skills/prompt/config/generic/default.yaml) config.

         Note how we are updating system message on the previous line (you can also include it in the config directly).

         See [nemo_skills/prompt/config](https://github.com/NVIDIA-NeMo/Skills/tree/main/nemo_skills/prompt/config) for more config options
         or [create your own prompts](prompt-format.md)

    3.   This should print

         ```python-console
         >>> print(prompt)
         [
            {'role': 'system', 'content': 'Environment: ipython\n\nUse Python to solve this math problem.'},
            {'role': 'user', 'content': "What's 2 + 2?"}
         ]
         ```

         If you don't want to use our prompt class, just create this object yourself

    4.   `prompt.get_code_execution_args()` simply returns a dictionary with start/stop tokens,
         so that we know when to stop LLM generation and how to format the output.

         If you don't want to use our prompt class, just define those parameters directly.

    5.   This should print
         ```python-console
         >>> print(output["generation"])
         <|python_tag|>print(2 + 2)<|eom_id|><|start_header_id|>ipython<|end_header_id|>

         completed
         [stdout]
         4
         [/stdout]<|eot_id|><|start_header_id|>assistant<|end_header_id|>

         The answer is 4.
         ```

         The "4" in the stdout is coming directly from Python interpreter running in the sandbox.

If you want to use completions api, you can also provide `tokenizer` parameter to `get_prompt` and it will use
tokenizer's chat template to format messages and return you a string.

You can learn more about how our prompt formatting works in the [prompt format docs](../basics/prompt-format.md).

!!! note

    You can also use slurm config when launching a server. If you do that, add `host=<slurm node hostname>`
    to the `get_model/sandbox` calls and define `NEMO_SKILLS_SSH_KEY_PATH` and `NEMO_SKILLS_SSH_SERVER` env vars
    to set the connection through ssh.

================================================
FILE: docs/basics/installation.md
================================================
# Installation & Dependency Groups

NeMo Skills provides three installable packages:

- **`nemo-skills`** (root) -- full install with CLI, cluster orchestration, all benchmarks
- **`nemo-skills-tools`** (`tools/` subdirectory) -- tool runtime only (`ToolManager`, built-in tools such as `DirectPythonTool`), without model-client dependencies such as LiteLLM/OpenAI
- **`nemo-skills-core`** (`core/` subdirectory) -- lightweight runtime only

## Default installation

`pip install nemo-skills` gives you **everything** (inference, evaluation, CLI,
cluster orchestration, benchmarks):

```bash
pip install git+https://github.com/NVIDIA-NeMo/Skills.git
# or, from a local clone:
pip install -e .
```

## Lightweight installation

If you only need inference, evaluation, and tool calling (no cluster orchestration):

```bash
pip install "nemo-skills-core @ git+https://github.com/NVIDIA-NeMo/Skills.git#subdirectory=core"
# or, from a local clone:
pip install -e core/
```

If you only need the tool runtime (`ToolManager` and built-in tools such as `DirectPythonTool`):

```bash
pip install "nemo-skills-tools @ git+https://github.com/NVIDIA-NeMo/Skills.git#subdirectory=tools"
# or, from a local clone:
pip install -e tools/
```

The current `tools` package is a Phase 1 split: it reuses the existing MCP/runtime layout as-is, so it may still install a few transitive runtime dependencies beyond the absolute minimum. It intentionally excludes model-client dependencies such as `litellm` and `openai`.

## Extras (dependency groups)

| Extra | Requirements file | What it provides |
|-------|-------------------|------------------|
| `tools` | `tools/requirements.txt` | Tool runtime: `ToolManager`, built-in MCP/direct tools, and sandbox-backed `DirectPythonTool`. No model-client dependencies such as LiteLLM/OpenAI. |
| `core` | `core/requirements.txt` | Agent runtime: inference, evaluation, tool calling (MCP), prompt formatting, math/code grading. No cluster orchestration. |
| `pipeline` | `requirements/pipeline.txt` | CLI (`ns` command), cluster management, experiment tracking (`nemo_run`, `typer`, `wandb`). |
| `dev` | `requirements/common-tests.txt`, `requirements/common-dev.txt` | Development and testing tools (`pytest`, `ruff`, `pre-commit`). |

### Examples

```bash
# Full install (default)
pip install -e .

# Core only -- lightweight runtime for downstream integrations
pip install -e core/

# Tools only -- tool runtime for downstream integrations
pip install -e tools/

# Development (everything + dev tools)
pip install -e ".[dev]"
```


================================================
FILE: docs/basics/prompt-format.md
================================================
# Prompt utilities

Our prompts are configured via two yaml files:

1. **Prompt config** - contains the actual prompt text with placeholders
2. **Code tags** - specifies code formatting tokens, required for code execution


## Prompt config

The prompt config contains user and system messages with placeholders for keys from a data file.
The configs are model independent (any model can be used with any config).
All of the configs that we support by default are available in
[nemo_skills/prompt/config](https://github.com/NVIDIA-NeMo/Skills/tree/main/nemo_skills/prompt/config)
folder. Here is an example prompt for
[math evaluations](
Download .txt
gitextract_itkaqbxy/

├── .coderabbit.yaml
├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   └── config.yml
│   └── workflows/
│       ├── copyright-check.yml
│       ├── docs.yml
│       ├── gpu_tests.yml
│       ├── lint.yml
│       └── tests.yml
├── .gitignore
├── .pre-commit-config.yaml
├── CONTRIBUTING.md
├── LICENSE
├── MANIFEST.in
├── README.md
├── __init__.py
├── cluster_configs/
│   ├── example-local.yaml
│   ├── example-ray.yaml
│   └── example-slurm.yaml
├── core/
│   ├── README.md
│   ├── pyproject.toml
│   └── requirements.txt
├── dataset_explorer_demo/
│   ├── README.md
│   └── visualize_similar.py
├── dockerfiles/
│   ├── Dockerfile.megatron
│   ├── Dockerfile.nemo-rl
│   ├── Dockerfile.nemo-skills
│   ├── Dockerfile.sandbox
│   ├── Dockerfile.verl
│   ├── Dockerfile.vllm
│   ├── README.md
│   ├── build.sh
│   ├── ifbench.patch
│   ├── sandbox/
│   │   ├── block_network.c
│   │   ├── nginx-worker-proxy.conf.template
│   │   ├── nginx.conf.template
│   │   └── start-with-nginx.sh
│   └── swe-bench/
│       ├── Dockerfile.nemo-skills.alpine
│       └── Dockerfile.swe-zero
├── docs/
│   ├── agentic_inference/
│   │   ├── parallel_thinking.md
│   │   └── tool_calling.md
│   ├── basics/
│   │   ├── chat_interface.md
│   │   ├── cluster-configs.md
│   │   ├── code-packaging.md
│   │   ├── index.md
│   │   ├── inference.md
│   │   ├── installation.md
│   │   ├── prompt-format.md
│   │   └── sandbox.md
│   ├── css/
│   │   └── extra.css
│   ├── evaluation/
│   │   ├── code.md
│   │   ├── external-benchmarks.md
│   │   ├── formal-math.md
│   │   ├── index.md
│   │   ├── instruction-following.md
│   │   ├── long-context.md
│   │   ├── multilingual.md
│   │   ├── natural-math.md
│   │   ├── other-benchmarks.md
│   │   ├── robustness.md
│   │   ├── scientific-knowledge.md
│   │   ├── speculative-decoding.md
│   │   ├── speech-audio.md
│   │   ├── tool-calling.md
│   │   └── vlm.md
│   ├── index.md
│   ├── pipelines/
│   │   ├── decontamination.md
│   │   ├── evaluation.md
│   │   ├── generation.md
│   │   ├── index.md
│   │   ├── llm-as-a-judge.md
│   │   ├── run-cmd.md
│   │   ├── start-server.md
│   │   ├── training-verl.md
│   │   └── training.md
│   ├── recipes/
│   │   └── libtrace.md
│   ├── releases/
│   │   ├── index.md
│   │   ├── nemotron-math-v2/
│   │   │   ├── dataset.md
│   │   │   ├── evaluation.md
│   │   │   ├── index.md
│   │   │   └── training.md
│   │   ├── nemotronmathproofs/
│   │   │   └── index.md
│   │   ├── opencodereasoning/
│   │   │   ├── dataset.md
│   │   │   ├── evaluation.md
│   │   │   └── index.md
│   │   ├── openmathinstruct2/
│   │   │   ├── dataset.md
│   │   │   ├── evaluation.md
│   │   │   ├── index.md
│   │   │   └── training.md
│   │   ├── openmathreasoning/
│   │   │   ├── dataset.md
│   │   │   ├── evaluation.md
│   │   │   ├── index.md
│   │   │   └── training.md
│   │   └── openreasoning/
│   │       ├── dataset.md
│   │       ├── evaluation.md
│   │       ├── index.md
│   │       └── training.md
│   └── tutorials/
│       ├── index.md
│       ├── notebooks/
│       │   ├── demo_aimo_inference.ipynb
│       │   └── prepare_calibration_data.py
│       └── posts/
│           ├── gpt-oss-python.md
│           ├── llama-nemotron-super-v1.5-evals.md
│           ├── nemotron-nano-v2-evals.md
│           ├── noc-reasoning-agent.md
│           └── omr-simple-recipe.md
├── greptile.json
├── mkdocs.yml
├── nemo_skills/
│   ├── __init__.py
│   ├── _cli_stub.py
│   ├── code_execution/
│   │   ├── __init__.py
│   │   ├── local_sandbox/
│   │   │   ├── __init__.py
│   │   │   ├── local_sandbox_server.py
│   │   │   └── start_local_sandbox.sh
│   │   ├── proof_utils.py
│   │   ├── sandbox.py
│   │   └── utils.py
│   ├── conversion/
│   │   ├── __init__.py
│   │   ├── hf_to_nemo_llama.py
│   │   ├── hf_to_nemo_qwen.py
│   │   ├── hf_to_trtllm_quantize.py
│   │   ├── nemo_config_llama.yaml
│   │   ├── nemo_config_qwen.yaml
│   │   ├── nemo_to_hf_llama.py
│   │   └── nemo_to_hf_qwen.py
│   ├── dataset/
│   │   ├── __init__.py
│   │   ├── aai/
│   │   │   ├── __init__.py
│   │   │   ├── aai_score.py
│   │   │   └── prepare.py
│   │   ├── aalcr/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── aime24/
│   │   │   ├── __init__.py
│   │   │   ├── prepare.py
│   │   │   └── test.txt
│   │   ├── aime24-x/
│   │   │   ├── __init__.py
│   │   │   ├── aime24_x_utils.py
│   │   │   └── prepare.py
│   │   ├── aime25/
│   │   │   ├── __init__.py
│   │   │   ├── prepare.py
│   │   │   └── test.txt
│   │   ├── aime25-x/
│   │   │   ├── __init__.py
│   │   │   ├── aime25_x_utils.py
│   │   │   └── prepare.py
│   │   ├── aime26/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── algebra222/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── amc23/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── answer-judge/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── apex-shortlist/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── arena-hard/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── arena-hard-v2/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── asdiv/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── asr-leaderboard/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── audiobench/
│   │   │   ├── __init__.py
│   │   │   ├── judge/
│   │   │   │   └── __init__.py
│   │   │   ├── nonjudge/
│   │   │   │   └── __init__.py
│   │   │   └── prepare.py
│   │   ├── beyond-aime/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── bfcl_v3/
│   │   │   ├── __init__.py
│   │   │   ├── bfcl_score.py
│   │   │   ├── constants.py
│   │   │   ├── irrelevance/
│   │   │   │   └── __init__.py
│   │   │   ├── java/
│   │   │   │   └── __init__.py
│   │   │   ├── javascript/
│   │   │   │   └── __init__.py
│   │   │   ├── live_irrelevance/
│   │   │   │   └── __init__.py
│   │   │   ├── live_multiple/
│   │   │   │   └── __init__.py
│   │   │   ├── live_parallel/
│   │   │   │   └── __init__.py
│   │   │   ├── live_parallel_multiple/
│   │   │   │   └── __init__.py
│   │   │   ├── live_relevance/
│   │   │   │   └── __init__.py
│   │   │   ├── live_simple/
│   │   │   │   └── __init__.py
│   │   │   ├── multi_turn_base/
│   │   │   │   └── __init__.py
│   │   │   ├── multi_turn_long_context/
│   │   │   │   └── __init__.py
│   │   │   ├── multi_turn_miss_func/
│   │   │   │   └── __init__.py
│   │   │   ├── multi_turn_miss_param/
│   │   │   │   └── __init__.py
│   │   │   ├── multiple/
│   │   │   │   └── __init__.py
│   │   │   ├── parallel/
│   │   │   │   └── __init__.py
│   │   │   ├── parallel_multiple/
│   │   │   │   └── __init__.py
│   │   │   ├── prepare.py
│   │   │   ├── simple/
│   │   │   │   └── __init__.py
│   │   │   ├── simple_java/
│   │   │   │   └── __init__.py
│   │   │   ├── simple_javascript/
│   │   │   │   └── __init__.py
│   │   │   ├── simple_python/
│   │   │   │   └── __init__.py
│   │   │   └── utils.py
│   │   ├── bfcl_v4/
│   │   │   ├── __init__.py
│   │   │   ├── bfcl_score.py
│   │   │   ├── irrelevance/
│   │   │   │   └── __init__.py
│   │   │   ├── live_irrelevance/
│   │   │   │   └── __init__.py
│   │   │   ├── live_multiple/
│   │   │   │   └── __init__.py
│   │   │   ├── live_parallel/
│   │   │   │   └── __init__.py
│   │   │   ├── live_parallel_multiple/
│   │   │   │   └── __init__.py
│   │   │   ├── live_relevance/
│   │   │   │   └── __init__.py
│   │   │   ├── live_simple/
│   │   │   │   └── __init__.py
│   │   │   ├── memory_kv/
│   │   │   │   └── __init__.py
│   │   │   ├── memory_rec_sum/
│   │   │   │   └── __init__.py
│   │   │   ├── memory_vector/
│   │   │   │   └── __init__.py
│   │   │   ├── multi_turn_base/
│   │   │   │   └── __init__.py
│   │   │   ├── multi_turn_long_context/
│   │   │   │   └── __init__.py
│   │   │   ├── multi_turn_miss_func/
│   │   │   │   └── __init__.py
│   │   │   ├── multi_turn_miss_param/
│   │   │   │   └── __init__.py
│   │   │   ├── multiple/
│   │   │   │   └── __init__.py
│   │   │   ├── parallel/
│   │   │   │   └── __init__.py
│   │   │   ├── parallel_multiple/
│   │   │   │   └── __init__.py
│   │   │   ├── prepare.py
│   │   │   ├── simple_java/
│   │   │   │   └── __init__.py
│   │   │   ├── simple_javascript/
│   │   │   │   └── __init__.py
│   │   │   ├── simple_python/
│   │   │   │   └── __init__.py
│   │   │   ├── web_search_base/
│   │   │   │   └── __init__.py
│   │   │   └── web_search_no_snippet/
│   │   │       └── __init__.py
│   │   ├── bigcodebench/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── birdbench/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── brumo25/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── ccc/
│   │   │   └── __init__.py
│   │   ├── challenge19/
│   │   │   ├── __init__.py
│   │   │   ├── prepare.py
│   │   │   └── test.txt
│   │   ├── college_math/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── comp-math-24-25/
│   │   │   ├── __init__.py
│   │   │   ├── prepare.py
│   │   │   └── test.txt
│   │   ├── compute-eval/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── contextasr-bench/
│   │   │   ├── __init__.py
│   │   │   ├── coarse/
│   │   │   │   └── __init__.py
│   │   │   ├── contextasr_score.py
│   │   │   ├── contextless/
│   │   │   │   └── __init__.py
│   │   │   ├── fine/
│   │   │   │   └── __init__.py
│   │   │   └── prepare.py
│   │   ├── covost2/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── critpt/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── dsbench_da/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── fleurs/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── flores200/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── frontierscience-olympiad/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── gaokao2023en/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── global_piqa/
│   │   │   ├── __init__.py
│   │   │   ├── global_piqa_utils.py
│   │   │   └── prepare.py
│   │   ├── gpqa/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── gpqa-x/
│   │   │   ├── __init__.py
│   │   │   ├── gpqa_x_utils.py
│   │   │   └── prepare.py
│   │   ├── gsm-plus/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── gsm8k/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── hendrycks_math/
│   │   │   ├── __init__.py
│   │   │   ├── fix_ref_solns.py
│   │   │   └── prepare.py
│   │   ├── hle/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── hle_verified/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── hmmt_feb25/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── hmmt_nov25/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── hotpotqa/
│   │   │   ├── __init__.py
│   │   │   ├── prepare.py
│   │   │   └── prepare_utils.py
│   │   ├── hotpotqa_closedbook/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── human-eval/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── human-eval-infilling/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── icpc/
│   │   │   └── __init__.py
│   │   ├── ifbench/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── ifeval/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── imo-answerbench/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── imo-gradingbench/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── imo-proofbench/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── ioi/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── librispeech-pc/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── livebench-coding/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── livecodebench/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── livecodebench-cpp/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── livecodebench-pro/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── livecodebench-x/
│   │   │   ├── __init__.py
│   │   │   ├── livecodebench_x_utils.py
│   │   │   └── prepare.py
│   │   ├── longbench-v2/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── longcodebench/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── m-arena-hard/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── m-arena-hard-v2/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── math-500/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── math-odyssey/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── mawps/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── mbpp/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── minerva_math/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── minif2f/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── mmau-pro/
│   │   │   ├── __init__.py
│   │   │   ├── closed_form/
│   │   │   │   └── __init__.py
│   │   │   ├── instruction_following/
│   │   │   │   └── __init__.py
│   │   │   ├── mmau_pro_score.py
│   │   │   ├── open_ended/
│   │   │   │   └── __init__.py
│   │   │   └── prepare.py
│   │   ├── mmlu/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── mmlu-pro/
│   │   │   ├── __init__.py
│   │   │   ├── prepare.py
│   │   │   └── subsets/
│   │   │       └── 10pct_opt_v1.txt
│   │   ├── mmlu-prox/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── mmlu-redux/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── mmmlu/
│   │   │   ├── __init__.py
│   │   │   ├── mmmlu_utils.py
│   │   │   └── prepare.py
│   │   ├── mmmu-pro/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── mobench/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── mrcr/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── musan/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── numb3rs/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── olympiadbench/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── omni-math/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── omniscience/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── open-proof-corpus-judge/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── physics/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── polymath/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── prepare.py
│   │   ├── proof-arena-judge/
│   │   │   ├── __init__.py
│   │   │   ├── gemini_imo_2025/
│   │   │   │   ├── 1.txt
│   │   │   │   ├── 2.txt
│   │   │   │   ├── 3.txt
│   │   │   │   ├── 4.txt
│   │   │   │   └── 5.txt
│   │   │   └── prepare.py
│   │   ├── proof-bench-judge/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── proofnet/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── putnam-bench/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── ruler/
│   │   │   ├── __init__.py
│   │   │   ├── prepare.py
│   │   │   └── ruler_score.py
│   │   ├── ruler2/
│   │   │   ├── __init__.py
│   │   │   ├── prepare.py
│   │   │   ├── prepare_mmlu.py
│   │   │   ├── prepare_niah.py
│   │   │   ├── prepare_qa.py
│   │   │   ├── ruler2_score.py
│   │   │   └── tokenizer.py
│   │   ├── scicode/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── simpleqa/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── speed-bench/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── supergpqa/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── svamp/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── swe-bench/
│   │   │   ├── __init__.py
│   │   │   ├── dump_images.py
│   │   │   ├── dump_repos.py
│   │   │   └── prepare.py
│   │   ├── swe-bench-multilingual/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── swe-bench-pro/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── swe-rebench/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── ugphysics/
│   │   │   ├── __init__.py
│   │   │   └── prepare.py
│   │   ├── utils.py
│   │   └── wmt24pp/
│   │       ├── __init__.py
│   │       └── prepare.py
│   ├── evaluation/
│   │   ├── __init__.py
│   │   ├── aggregate_answers.py
│   │   ├── compute_group_score.py
│   │   ├── evaluator/
│   │   │   ├── __init__.py
│   │   │   ├── arena.py
│   │   │   ├── audio.py
│   │   │   ├── base.py
│   │   │   ├── bfcl.py
│   │   │   ├── bird.py
│   │   │   ├── ccc.py
│   │   │   ├── code.py
│   │   │   ├── comet.py
│   │   │   ├── compute_eval.py
│   │   │   ├── contextasr.py
│   │   │   ├── critpt.py
│   │   │   ├── dsbench.py
│   │   │   ├── icpc.py
│   │   │   ├── ifbench.py
│   │   │   ├── ifeval.py
│   │   │   ├── ioi.py
│   │   │   ├── livecodebench.py
│   │   │   ├── math.py
│   │   │   ├── mcq.py
│   │   │   ├── mmau_pro.py
│   │   │   ├── mrcr.py
│   │   │   ├── nvembed_judge.py
│   │   │   ├── ruler.py
│   │   │   ├── scicode.py
│   │   │   └── specdec.py
│   │   ├── math_grader.py
│   │   ├── metrics/
│   │   │   ├── __init__.py
│   │   │   ├── aalcr_metrics.py
│   │   │   ├── answer_judgement_metrics.py
│   │   │   ├── arena_metrics.py
│   │   │   ├── audio_metrics.py
│   │   │   ├── base.py
│   │   │   ├── bfcl_metrics.py
│   │   │   ├── bird_metrics.py
│   │   │   ├── ccc_metrics.py
│   │   │   ├── code_metrics.py
│   │   │   ├── compute_metrics.py
│   │   │   ├── contextasr_metrics.py
│   │   │   ├── critpt_metrics.py
│   │   │   ├── gradingbench_metrics.py
│   │   │   ├── hleaa_metrics.py
│   │   │   ├── hotpotqa_filtering.py
│   │   │   ├── hotpotqa_metrics.py
│   │   │   ├── icpc_metrics.py
│   │   │   ├── if_metrics.py
│   │   │   ├── ioi_metrics.py
│   │   │   ├── lean4_metrics.py
│   │   │   ├── map_metrics.py
│   │   │   ├── math_metrics.py
│   │   │   ├── mcq_multilingual_metrics.py
│   │   │   ├── mmau_pro_metrics.py
│   │   │   ├── mrcr_metrics.py
│   │   │   ├── omni_metrics.py
│   │   │   ├── physics_metrics.py
│   │   │   ├── ruler2_metrics.py
│   │   │   ├── ruler_metrics.py
│   │   │   ├── simpleqa_metrics.py
│   │   │   ├── specdec_metrics.py
│   │   │   ├── translation_metrics.py
│   │   │   ├── ugphysics_metrics.py
│   │   │   ├── utils.py
│   │   │   └── weighted_math_metrics.py
│   │   └── utils.py
│   ├── file_utils.py
│   ├── inference/
│   │   ├── __init__.py
│   │   ├── autoformalize.py
│   │   ├── chat_interface/
│   │   │   ├── __init__.py
│   │   │   ├── chat_service.py
│   │   │   ├── core.py
│   │   │   ├── launch.py
│   │   │   └── ui.py
│   │   ├── check_contamination.py
│   │   ├── eval/
│   │   │   ├── __init__.py
│   │   │   ├── arena_judge.py
│   │   │   ├── bfcl.py
│   │   │   ├── bfcl_utils.py
│   │   │   ├── bfcl_web_search.py
│   │   │   ├── compute_eval.py
│   │   │   ├── critpt.py
│   │   │   ├── scicode.py
│   │   │   ├── scicode_utils.py
│   │   │   ├── specdec.py
│   │   │   └── swebench.py
│   │   ├── factory.py
│   │   ├── generate.py
│   │   ├── litellm_hybrid_cache.py
│   │   ├── llm_math_judge.py
│   │   ├── log_samples_wandb.py
│   │   ├── merge_chunks.py
│   │   ├── model/
│   │   │   ├── __init__.py
│   │   │   ├── asr_nim.py
│   │   │   ├── audio_utils.py
│   │   │   ├── azure.py
│   │   │   ├── base.py
│   │   │   ├── code_execution.py
│   │   │   ├── context_retry.py
│   │   │   ├── gemini.py
│   │   │   ├── megatron.py
│   │   │   ├── nim_utils.py
│   │   │   ├── openai.py
│   │   │   ├── parallel_thinking.py
│   │   │   ├── sglang.py
│   │   │   ├── tool_call.py
│   │   │   ├── tts_nim.py
│   │   │   ├── utils.py
│   │   │   ├── vllm.py
│   │   │   └── vllm_multimodal.py
│   │   ├── patch_litellm_logging.py
│   │   ├── prover.py
│   │   ├── retrieve_similar.py
│   │   ├── server/
│   │   │   ├── __init__.py
│   │   │   ├── serve_riva_nim.py
│   │   │   ├── serve_sglang.py
│   │   │   ├── serve_unified.py
│   │   │   ├── serve_vllm.py
│   │   │   └── serve_vllm_dp_ray.py
│   │   ├── structured_outputs.py
│   │   └── tournament_utils.py
│   ├── mcp/
│   │   ├── __init__.py
│   │   ├── adapters.py
│   │   ├── clients.py
│   │   ├── config.py
│   │   ├── servers/
│   │   │   ├── __init__.py
│   │   │   ├── chemistry/
│   │   │   │   ├── __init__.py
│   │   │   │   └── periodictable_tool.py
│   │   │   ├── exa_tool.py
│   │   │   ├── physics/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── coolprop_tool.py
│   │   │   │   ├── particle_tool.py
│   │   │   │   └── radioactivedecay_tool.py
│   │   │   ├── python_tool.py
│   │   │   ├── tavily_search_tool.py
│   │   │   └── web/
│   │   │       ├── __init__.py
│   │   │       ├── arxiv_tool.py
│   │   │       └── wikipedia_tool.py
│   │   ├── tool_manager.py
│   │   ├── tool_providers.py
│   │   └── utils.py
│   ├── pipeline/
│   │   ├── __init__.py
│   │   ├── app.py
│   │   ├── cli.py
│   │   ├── convert.py
│   │   ├── dataset.py
│   │   ├── eval.py
│   │   ├── generate.py
│   │   ├── judges/
│   │   │   ├── __init__.py
│   │   │   ├── comet_judge.py
│   │   │   └── nvembed_judge.py
│   │   ├── megatron_lm/
│   │   │   ├── __init__.py
│   │   │   └── train.py
│   │   ├── nemo_evaluator.py
│   │   ├── nemo_gym_rollouts.py
│   │   ├── nemo_rl/
│   │   │   ├── __init__.py
│   │   │   ├── average_checkpoints.py
│   │   │   ├── grpo.py
│   │   │   └── sft.py
│   │   ├── prepare_data.py
│   │   ├── robust_eval.py
│   │   ├── run_cmd.py
│   │   ├── setup.py
│   │   ├── start_server.py
│   │   ├── summarize_results.py
│   │   ├── summarize_robustness.py
│   │   ├── utils/
│   │   │   ├── __init__.py
│   │   │   ├── cluster.py
│   │   │   ├── commands.py
│   │   │   ├── declarative.py
│   │   │   ├── docker_images.py
│   │   │   ├── eval.py
│   │   │   ├── exp.py
│   │   │   ├── generation.py
│   │   │   ├── mounts.py
│   │   │   ├── packager.py
│   │   │   ├── ray_executor.py
│   │   │   ├── scripts/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base.py
│   │   │   │   ├── eval.py
│   │   │   │   ├── generation.py
│   │   │   │   ├── nemo_gym.py
│   │   │   │   └── server.py
│   │   │   └── server.py
│   │   └── verl/
│   │       ├── __init__.py
│   │       └── ppo.py
│   ├── prompt/
│   │   ├── __init__.py
│   │   ├── code_tags/
│   │   │   ├── __init__.py
│   │   │   ├── gpt-oss.yaml
│   │   │   ├── llama3.yaml
│   │   │   ├── nemotron.yaml
│   │   │   ├── openmath.yaml
│   │   │   ├── qwen-lean.yaml
│   │   │   └── qwen.yaml
│   │   ├── config/
│   │   │   ├── __init__.py
│   │   │   ├── compute-eval/
│   │   │   │   └── baseline.yaml
│   │   │   ├── eval/
│   │   │   │   ├── aai/
│   │   │   │   │   ├── livecodebench.yaml
│   │   │   │   │   ├── math.yaml
│   │   │   │   │   ├── mcq-10choices-boxed.yaml
│   │   │   │   │   ├── mcq-10choices.yaml
│   │   │   │   │   ├── mcq-4choices-boxed.yaml
│   │   │   │   │   ├── mcq-4choices.yaml
│   │   │   │   │   ├── omni.yaml
│   │   │   │   │   ├── search-mcq-10choices.yaml
│   │   │   │   │   └── search-mcq-4choices.yaml
│   │   │   │   ├── bigcodebench/
│   │   │   │   │   └── codegen.yaml
│   │   │   │   ├── critpt/
│   │   │   │   │   ├── code_output.yaml
│   │   │   │   │   └── solve_problem.yaml
│   │   │   │   ├── hotpotqa.yaml
│   │   │   │   ├── hotpotqa_closedbook.yaml
│   │   │   │   ├── livecodebench/
│   │   │   │   │   ├── aa_index.yaml
│   │   │   │   │   ├── default.yaml
│   │   │   │   │   └── default_reasoning.yaml
│   │   │   │   ├── longbench/
│   │   │   │   │   └── default.yaml
│   │   │   │   ├── matharena/
│   │   │   │   │   └── aime.yaml
│   │   │   │   ├── scicode/
│   │   │   │   │   ├── background.yaml
│   │   │   │   │   └── default.yaml
│   │   │   │   └── swe-bench/
│   │   │   │       ├── mini-swe-agent/
│   │   │   │       │   ├── swebench.yaml
│   │   │   │       │   ├── swebench_backticks.yaml
│   │   │   │       │   └── swebench_xml.yaml
│   │   │   │       ├── openhands/
│   │   │   │       │   ├── default.toml
│   │   │   │       │   └── no-native-tool-calling.toml
│   │   │   │       └── swe-agent/
│   │   │   │           ├── default.yaml
│   │   │   │           ├── multilingual.yaml
│   │   │   │           └── swe-agent-lm-32b.yaml
│   │   │   ├── generic/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── codegen.yaml
│   │   │   │   ├── codegen_system.yaml
│   │   │   │   ├── default.yaml
│   │   │   │   ├── dsbench-da-incontext.yaml
│   │   │   │   ├── dsbench-da.yaml
│   │   │   │   ├── fim.yaml
│   │   │   │   ├── general-boxed.yaml
│   │   │   │   ├── genselect.yaml
│   │   │   │   ├── gensynthesis.yaml
│   │   │   │   ├── hle.yaml
│   │   │   │   ├── math-base.yaml
│   │   │   │   ├── math.yaml
│   │   │   │   ├── matharena.yaml
│   │   │   │   ├── physics.yaml
│   │   │   │   ├── problem-augmentation-similar.yaml
│   │   │   │   ├── problem-augmentation.yaml
│   │   │   │   ├── search-boxed.yaml
│   │   │   │   ├── text_to_sql.yaml
│   │   │   │   └── ugphysics.yaml
│   │   │   ├── gpt-oss/
│   │   │   │   ├── livecodebench.yaml
│   │   │   │   └── math.yaml
│   │   │   ├── judge/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── aa-omni-judge.yaml
│   │   │   │   ├── aalcr.yaml
│   │   │   │   ├── arena.yaml
│   │   │   │   ├── arena_creative.yaml
│   │   │   │   ├── audiobench.yaml
│   │   │   │   ├── audiobench_binary.yaml
│   │   │   │   ├── check-contamination.yaml
│   │   │   │   ├── code.yaml
│   │   │   │   ├── frontierscience-olympiad.yaml
│   │   │   │   ├── general-judge.yaml
│   │   │   │   ├── hle.yaml
│   │   │   │   ├── imo_answerbench.yaml
│   │   │   │   ├── imo_gradingbench.yaml
│   │   │   │   ├── imo_proofbench.yaml
│   │   │   │   ├── math-code.yaml
│   │   │   │   ├── math-proof-judge.yaml
│   │   │   │   ├── math.yaml
│   │   │   │   ├── mmau-pro.yaml
│   │   │   │   ├── mt-bench/
│   │   │   │   │   ├── turn1.yaml
│   │   │   │   │   ├── turn1_with_ref.yaml
│   │   │   │   │   ├── turn2.yaml
│   │   │   │   │   └── turn2_with_ref.yaml
│   │   │   │   ├── physics.yaml
│   │   │   │   ├── simpleqa.yaml
│   │   │   │   └── ugphysics.yaml
│   │   │   ├── lean4/
│   │   │   │   ├── autoformalization.yaml
│   │   │   │   ├── backtranslation.yaml
│   │   │   │   ├── formal-proof-deepseek-prover-v2-nemotron.yaml
│   │   │   │   ├── formal-proof-deepseek-prover-v2.yaml
│   │   │   │   ├── formal-proof-reasoning-execution.yaml
│   │   │   │   ├── formal-proof-reasoning.yaml
│   │   │   │   ├── formal-proof.yaml
│   │   │   │   ├── goedel-prover-v2-nemotron.yaml
│   │   │   │   ├── goedel-prover-v2-refinement-nemotron.yaml
│   │   │   │   ├── goedel-prover-v2-refinement.yaml
│   │   │   │   ├── goedel-prover-v2.yaml
│   │   │   │   ├── judge-backtranslation.yaml
│   │   │   │   ├── nat-to-lean4.yaml
│   │   │   │   ├── refinement_code_error.yaml
│   │   │   │   ├── refinement_consistent_error.yaml
│   │   │   │   └── refinement_parsing_error.yaml
│   │   │   ├── llama3-instruct/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── math.yaml
│   │   │   │   └── mmlu.yaml
│   │   │   ├── multilingual/
│   │   │   │   ├── __init__.py
│   │   │   │   └── segment-translation.yaml
│   │   │   ├── openmath/
│   │   │   │   ├── genselect.yaml
│   │   │   │   └── tir.yaml
│   │   │   ├── qwen/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── math-cot.yaml
│   │   │   │   ├── math-tir.yaml
│   │   │   │   └── qwq.yaml
│   │   │   ├── qwen3/
│   │   │   │   ├── math-cot-non-think.yaml
│   │   │   │   └── math-cot-think.yaml
│   │   │   ├── robustness/
│   │   │   │   ├── code_prompts/
│   │   │   │   │   ├── aai_prompt.yaml
│   │   │   │   │   ├── code_1.yaml
│   │   │   │   │   ├── code_2.yaml
│   │   │   │   │   ├── code_3.yaml
│   │   │   │   │   ├── code_4.yaml
│   │   │   │   │   ├── ns_gen_codegen.yaml
│   │   │   │   │   └── ns_python_codegen.yaml
│   │   │   │   ├── math_prompts/
│   │   │   │   │   ├── boxed_1.yaml
│   │   │   │   │   ├── boxed_2.yaml
│   │   │   │   │   ├── boxed_3.yaml
│   │   │   │   │   ├── boxed_4.yaml
│   │   │   │   │   ├── boxed_5.yaml
│   │   │   │   │   ├── boxed_6.yaml
│   │   │   │   │   ├── boxed_7.yaml
│   │   │   │   │   ├── boxed_8.yaml
│   │   │   │   │   ├── boxed_aai.yaml
│   │   │   │   │   └── boxed_general.yaml
│   │   │   │   ├── mcq_prompts/
│   │   │   │   │   ├── aai_1.yaml
│   │   │   │   │   ├── aai_2.yaml
│   │   │   │   │   ├── angle_brackets_1.yaml
│   │   │   │   │   ├── angle_brackets_2.yaml
│   │   │   │   │   ├── boxed_1.yaml
│   │   │   │   │   ├── boxed_2.yaml
│   │   │   │   │   ├── correct_1.yaml
│   │   │   │   │   ├── correct_2.yaml
│   │   │   │   │   ├── final_answer_1.yaml
│   │   │   │   │   └── final_answer_2.yaml
│   │   │   │   └── prompt_set_config.yaml
│   │   │   ├── unit_test/
│   │   │   │   └── code.yaml
│   │   │   └── vlm/
│   │   │       ├── __init__.py
│   │   │       └── mmmu-pro.yaml
│   │   ├── few_shot_examples/
│   │   │   ├── __init__.py
│   │   │   ├── gsm8k.py
│   │   │   ├── lean4.py
│   │   │   ├── math.py
│   │   │   ├── mmlu.py
│   │   │   ├── mmlu_pro.py
│   │   │   └── open_science.py
│   │   └── utils.py
│   ├── training/
│   │   ├── __init__.py
│   │   ├── data_preparation_utils/
│   │   │   ├── __init__.py
│   │   │   ├── arithmetic_utils.py
│   │   │   ├── config/
│   │   │   │   ├── code_sft.yaml
│   │   │   │   ├── math_rl.yaml
│   │   │   │   ├── math_sft.yaml
│   │   │   │   └── stem_sft.yaml
│   │   │   ├── filters.py
│   │   │   ├── merge_processor.py
│   │   │   └── preprocessing.py
│   │   ├── nemo_rl/
│   │   │   ├── __init__.py
│   │   │   ├── configs/
│   │   │   │   ├── grpo.yaml
│   │   │   │   └── sft.yaml
│   │   │   ├── convert_dcp_to_hf.py
│   │   │   ├── convert_megatron_to_hf.py
│   │   │   ├── environments/
│   │   │   │   ├── __init__.py
│   │   │   │   └── math_environment.py
│   │   │   ├── offline_hf_consolidation.py
│   │   │   ├── prompts/
│   │   │   │   ├── cot.txt
│   │   │   │   └── math.txt
│   │   │   ├── start_grpo.py
│   │   │   └── start_sft.py
│   │   ├── prepare_data.py
│   │   ├── train_redrafter.py
│   │   └── verl/
│   │       ├── __init__.py
│   │       └── prepare_data.py
│   ├── utils.py
│   └── version.py
├── pyproject.toml
├── recipes/
│   ├── README.md
│   ├── asr_tts/
│   │   ├── README.md
│   │   ├── nim_configurations.py
│   │   ├── riva_generate.py
│   │   └── scripts/
│   │       ├── run_asr_nim_cluster.sh
│   │       └── run_tts_nim_cluster.sh
│   ├── data-integrity/
│   │   ├── README.md
│   │   ├── model_comparison/
│   │   │   ├── __init__.py
│   │   │   ├── analyses/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── length_analysis.py
│   │   │   │   ├── similarity_analysis.py
│   │   │   │   ├── umap_analysis.py
│   │   │   │   └── vocabulary_analysis.py
│   │   │   ├── analyzer.py
│   │   │   ├── data_loader.py
│   │   │   ├── main.py
│   │   │   ├── report_generator.py
│   │   │   ├── requirements.txt
│   │   │   ├── setup.py
│   │   │   ├── utils/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── file_utils.py
│   │   │   │   ├── model_utils.py
│   │   │   │   └── text_utils.py
│   │   │   └── visualization/
│   │   │       ├── __init__.py
│   │   │       ├── interactive_plots.py
│   │   │       └── static_plots.py
│   │   ├── postprocess_data.py
│   │   ├── prepare_data.py
│   │   └── run_integrity_pipeline.py
│   ├── gencluster/
│   │   ├── pipeline/
│   │   │   ├── run_inter_tournament.py
│   │   │   ├── run_intra_tournament.py
│   │   │   ├── solution_generation.py
│   │   │   └── test_case_generation.py
│   │   ├── prompts/
│   │   │   ├── generator.yaml
│   │   │   ├── selector.yaml
│   │   │   └── validator.yaml
│   │   └── scripts/
│   │       ├── compute_tournament_score.py
│   │       ├── extract_cpp_code.py
│   │       ├── filter_clusters.py
│   │       ├── generate_datasets_json.py
│   │       ├── generate_test_cases.py
│   │       ├── merge_tournament_scores.py
│   │       ├── run_tournament_all.py
│   │       ├── submission_ICPC.py
│   │       ├── submission_IOI.py
│   │       └── tournament_schedule.py
│   ├── libtrace/
│   │   ├── README.md
│   │   ├── dockerfiles/
│   │   │   ├── Dockerfile.sandbox
│   │   │   ├── environment.yml
│   │   │   └── start-with-nginx.sh
│   │   ├── prompts/
│   │   │   ├── applicability-relevance.yaml
│   │   │   └── problem-generation.yaml
│   │   └── scripts/
│   │       ├── collect_generated_problems.py
│   │       ├── filter_applicability_relevance.py
│   │       ├── gather_solutions.py
│   │       ├── harvest_docs.py
│   │       └── prepare_inference_jsonl.py
│   ├── multimodal/
│   │   ├── __init__.py
│   │   └── server/
│   │       ├── README.md
│   │       ├── __init__.py
│   │       ├── backends/
│   │       │   ├── __init__.py
│   │       │   ├── base.py
│   │       │   ├── magpie_tts_backend.py
│   │       │   └── nemo_asr_backend.py
│   │       └── unified_server.py
│   ├── noc-reasoning-agent/
│   │   ├── configs/
│   │   │   ├── config.ini
│   │   │   ├── noc_reasoning_sft.yaml
│   │   │   └── noc_reasoning_sft_6.yaml
│   │   ├── prompts/
│   │   │   ├── formatting_prompt.yaml
│   │   │   ├── prompt_incident.yaml
│   │   │   ├── prompt_reasoning.yaml
│   │   │   └── shortened_prompt_reasoning.yaml
│   │   └── scripts/
│   │       ├── create_agent_with_tools.py
│   │       ├── create_agent_with_tools_batch.py
│   │       ├── evaluation/
│   │       │   ├── evaluation_with_judge.py
│   │       │   ├── problem_code_evaluation.py
│   │       │   └── score.py
│   │       ├── filtering/
│   │       │   ├── filter_rows.py
│   │       │   └── match_keywords.py
│   │       ├── ns_pipelines/
│   │       │   ├── generate_synthetic_data.py
│   │       │   └── prepare_react_agent.py
│   │       ├── tools.py
│   │       ├── utils/
│   │       │   ├── create_input_jsonl_from_incidents.py
│   │       │   ├── format_reasoning_json.py
│   │       │   ├── reasoning_processes.py
│   │       │   ├── schema_columns.py
│   │       │   ├── split_incident_data.py
│   │       │   ├── split_mocktools_answers.py
│   │       │   └── token_usage.py
│   │       └── visualization/
│   │           ├── extract_representation_columns.py
│   │           ├── extract_scores.py
│   │           └── generate_trace_visualization.py
│   ├── opencodereasoning/
│   │   ├── configs/
│   │   │   └── solution_sdg/
│   │   │       ├── demo.yaml
│   │   │       └── r1.yaml
│   │   ├── pipeline/
│   │   │   ├── prepare_questions.py
│   │   │   └── prepare_solutions.py
│   │   ├── prompts/
│   │   │   ├── generate_cpp_soln.yaml
│   │   │   └── generate_python_soln.yaml
│   │   └── scripts/
│   │       ├── filter_questions.py
│   │       ├── functional_helpers.py
│   │       ├── output_processing.py
│   │       └── prepare_questions.py
│   ├── openmathreasoning/
│   │   ├── configs/
│   │   │   ├── genselect_sdg/
│   │   │   │   └── qwq.yaml
│   │   │   ├── problem_sdg/
│   │   │   │   ├── demo.yaml
│   │   │   │   ├── example-data.txt
│   │   │   │   └── qwen-instruct.yaml
│   │   │   └── solution_sdg/
│   │   │       ├── demo.yaml
│   │   │       ├── qwq.yaml
│   │   │       ├── r1.yaml
│   │   │       ├── tir-limo.yaml
│   │   │       └── tir-openmath.yaml
│   │   ├── pipeline/
│   │   │   ├── genselect_generation.py
│   │   │   ├── problem_generation.py
│   │   │   └── solution_generation.py
│   │   ├── prompts/
│   │   │   ├── classify-if-binary.yaml
│   │   │   ├── classify-if-invalid.yaml
│   │   │   ├── classify-if-mcq.yaml
│   │   │   ├── classify-if-proof.yaml
│   │   │   ├── classify-tir-novelty.yaml
│   │   │   ├── classify-tir-significance.yaml
│   │   │   ├── convert-proofs.yaml
│   │   │   ├── extract-answers.yaml
│   │   │   ├── extract-problems.yaml
│   │   │   ├── math-tir-detailed.yaml
│   │   │   ├── summarize-genselect.yaml
│   │   │   └── summarize-solution.yaml
│   │   └── scripts/
│   │       ├── extract_python_fragments.py
│   │       ├── filter_novelty_significance.py
│   │       ├── genselect/
│   │       │   ├── extract_judgment.py
│   │       │   ├── merge_new_summary.py
│   │       │   ├── prepare_labeling_data.py
│   │       │   └── utils.py
│   │       ├── merge_new_summary.py
│   │       ├── postprocess_answer_extraction.py
│   │       ├── postprocess_classification.py
│   │       ├── postprocess_problem_extraction.py
│   │       ├── postprocess_proof_conversion.py
│   │       ├── postprocess_tir_generations.py
│   │       ├── prepare_raw_data.py
│   │       └── simplified_recipe.py
│   ├── openreasoning/
│   │   ├── eval.py
│   │   ├── prompts/
│   │   │   ├── science_question_augmentation_prompt.yaml
│   │   │   └── science_question_generation_prompt.yaml
│   │   └── scripts/
│   │       └── use_majority_if_no_answer.py
│   ├── opensciencereasoning/
│   │   ├── openscience_dataset_collection/
│   │   │   ├── README.md
│   │   │   ├── prompts/
│   │   │   │   ├── mcq_augment_inspired_by.yaml
│   │   │   │   ├── mcq_augment_similar.yaml
│   │   │   │   ├── mcq_four_options.yaml
│   │   │   │   ├── mcq_ten_options.yaml
│   │   │   │   └── subtopic_expansion.yaml
│   │   │   └── scripts/
│   │   │       └── filter_mcq_solutions.py
│   │   └── sdg_pipeline/
│   │       ├── README.md
│   │       ├── configs/
│   │       │   ├── pipelines/
│   │       │   │   └── base.yaml
│   │       │   └── settings/
│   │       │       ├── kimi_k2.yaml
│   │       │       ├── mcq_10_options.yaml
│   │       │       ├── mcq_4_options.yaml
│   │       │       ├── multiple_prompts.yaml
│   │       │       ├── python_enabled.yaml
│   │       │       ├── seed_data.yaml
│   │       │       ├── seed_data_postprocess.yaml
│   │       │       └── without_gt.yaml
│   │       ├── prompt/
│   │       │   ├── __init__.py
│   │       │   ├── configs/
│   │       │   │   ├── default_problem.yaml
│   │       │   │   └── topics_labeling.yaml
│   │       │   └── few_shots/
│   │       │       ├── __init__.py
│   │       │       └── topics.py
│   │       ├── run_pipeline.py
│   │       └── scripts/
│   │           ├── aggregate_difficulty.py
│   │           ├── aggregate_metadata.py
│   │           ├── aggregate_solutions.py
│   │           ├── aggregate_topics.py
│   │           ├── decontaminate.py
│   │           ├── extract_predictions.py
│   │           ├── filter_problems.py
│   │           ├── filter_solutions.py
│   │           ├── map_diversity_prompts.py
│   │           ├── prepare_topics.py
│   │           ├── process_messages_and_bucket.py
│   │           ├── remove_redundant_fields.py
│   │           ├── utils/
│   │           │   ├── constants.py
│   │           │   └── regex_constants.py
│   │           └── validate_pipeline.py
│   ├── proof-gen-verification/
│   │   ├── README.md
│   │   ├── configs/
│   │   │   └── judge-eval.yaml
│   │   ├── pipeline/
│   │   │   └── eval_judge.py
│   │   ├── prompts/
│   │   │   ├── genselect/
│   │   │   │   ├── default.yaml
│   │   │   │   ├── opc_instructions.yaml
│   │   │   │   └── proof_genselect_default.yaml
│   │   │   ├── math_judge/
│   │   │   │   ├── gemini_imo_judge_summary.yaml
│   │   │   │   ├── general.yaml
│   │   │   │   ├── general_summary.yaml
│   │   │   │   ├── general_summary_rubric.yaml
│   │   │   │   ├── judge_prompt_ablation/
│   │   │   │   │   ├── gemini1.yaml
│   │   │   │   │   ├── gemini2.yaml
│   │   │   │   │   ├── prompt1.yaml
│   │   │   │   │   ├── prompt2.yaml
│   │   │   │   │   ├── prompt3.yaml
│   │   │   │   │   ├── prompt4.yaml
│   │   │   │   │   ├── prompt5.yaml
│   │   │   │   │   ├── prompt5_rubric.yaml
│   │   │   │   │   └── prompt6_rubric.yaml
│   │   │   │   ├── lemma_break.yaml
│   │   │   │   ├── opc_judge.yaml
│   │   │   │   ├── opc_judge_summary.yaml
│   │   │   │   ├── opc_judge_summary_gt_proof.yaml
│   │   │   │   ├── opc_judge_summary_rubric.yaml
│   │   │   │   ├── proofbench_ms_ref.yaml
│   │   │   │   ├── proofbench_none.yaml
│   │   │   │   ├── proofbench_none_binary.yaml
│   │   │   │   ├── step_break.yaml
│   │   │   │   ├── step_judge_v2.yaml
│   │   │   │   ├── true_false_break.yaml
│   │   │   │   └── true_false_judge.yaml
│   │   │   ├── prover.yaml
│   │   │   └── prover_final_ans.yaml
│   │   └── scripts/
│   │       ├── build_final_ans_dataset.py
│   │       ├── combine_judgements.py
│   │       ├── final_answer_qs.py
│   │       ├── generate_generic_bon_dspy.py
│   │       ├── generate_generic_bon_generation.py
│   │       ├── generic_eval_bon.py
│   │       ├── genselect_judge_generation.py
│   │       ├── make_metrics_fa_qs.py
│   │       ├── make_rubric_generation.py
│   │       ├── script_generation.py
│   │       ├── sol_selection_generation.py
│   │       └── step_judgement_generation.py
│   └── translation/
│       ├── config/
│       │   └── qwen25.yaml
│       └── translate_jsonl.py
├── requirements/
│   ├── audio.txt
│   ├── code_execution.txt
│   ├── common-dev.txt
│   ├── common-tests.txt
│   ├── docs.txt
│   ├── pipeline.txt
│   └── stem.txt
├── tests/
│   ├── __init__.py
│   ├── conftest.py
│   ├── data/
│   │   ├── code-output.test
│   │   ├── contamination-example.test
│   │   ├── dummy_external_benchmark/
│   │   │   ├── benchmark_map.json
│   │   │   ├── my_benchmarks/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── dataset/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── my_simple_bench/
│   │   │   │   │   │   ├── __init__.py
│   │   │   │   │   │   └── prepare.py
│   │   │   │   │   └── word_count/
│   │   │   │   │       ├── __init__.py
│   │   │   │   │       └── prepare.py
│   │   │   │   ├── evaluation/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── word_count.py
│   │   │   │   ├── inference/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── word_count.py
│   │   │   │   ├── metrics/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── word_count.py
│   │   │   │   └── prompt/
│   │   │   │       └── eval/
│   │   │   │           └── word_count/
│   │   │   │               └── default.yaml
│   │   │   └── pyproject.toml
│   │   ├── eval_outputs/
│   │   │   ├── eval-results/
│   │   │   │   ├── answer-judge/
│   │   │   │   │   ├── output-rs0.jsonl-test
│   │   │   │   │   ├── output-rs1.jsonl-test
│   │   │   │   │   ├── output-rs2.jsonl-test
│   │   │   │   │   └── output-rs3.jsonl-test
│   │   │   │   ├── arena-hard/
│   │   │   │   │   └── output.jsonl-test
│   │   │   │   ├── gpqa/
│   │   │   │   │   ├── output-rs0.jsonl-test
│   │   │   │   │   ├── output-rs1.jsonl-test
│   │   │   │   │   ├── output-rs2.jsonl-test
│   │   │   │   │   └── output-rs3.jsonl-test
│   │   │   │   ├── hendrycks_math/
│   │   │   │   │   ├── output-rs0.jsonl-test
│   │   │   │   │   ├── output-rs1.jsonl-test
│   │   │   │   │   └── output-rs2.jsonl-test
│   │   │   │   ├── human-eval/
│   │   │   │   │   ├── output-rs0.jsonl-test
│   │   │   │   │   └── output-rs1.jsonl-test
│   │   │   │   ├── ifeval/
│   │   │   │   │   ├── output-rs0.jsonl-test
│   │   │   │   │   ├── output-rs1.jsonl-test
│   │   │   │   │   └── output-rs2.jsonl-test
│   │   │   │   ├── metrics-ms8192.json-test
│   │   │   │   ├── metrics.json-test
│   │   │   │   └── minif2f/
│   │   │   │       ├── output-rs0.jsonl-test
│   │   │   │       ├── output-rs1.jsonl-test
│   │   │   │       ├── output-rs2.jsonl-test
│   │   │   │       └── output-rs3.jsonl-test
│   │   │   ├── summarize_results_output-ms8192.txt
│   │   │   └── summarize_results_output.txt
│   │   ├── multi_model_eval_smoke.py
│   │   ├── nemo_evaluator/
│   │   │   ├── example-eval-config.yaml
│   │   │   └── example-gpu-test-config.yaml
│   │   ├── openai-input-dict.test
│   │   ├── openai-input-list.test
│   │   ├── openmathinstruct2.test
│   │   ├── output-rs0.test
│   │   ├── output-rs1.test
│   │   ├── output-rs2.test
│   │   ├── small-grpo-data.test
│   │   ├── small-sft-data-messages.test
│   │   └── small-sft-data.test
│   ├── gpu-tests/
│   │   ├── __init__.py
│   │   ├── make_tiny_llm.py
│   │   ├── run_qwen.sh
│   │   ├── test-local.yaml
│   │   ├── test_contamination.py
│   │   ├── test_context_retry.py
│   │   ├── test_eval.py
│   │   ├── test_external_benchmark_eval.py
│   │   ├── test_generate.py
│   │   ├── test_judge.py
│   │   ├── test_nemo_evaluator.py
│   │   ├── test_nemo_gym_rollouts.py
│   │   ├── test_run_cmd_llm_infer.py
│   │   ├── test_sandbox_mounts.py
│   │   ├── test_tool_calling.py
│   │   ├── test_train.py
│   │   ├── test_vllm_audio.py
│   │   └── utils.py
│   ├── scripts/
│   │   └── run_cmd_llm_infer_check.py
│   ├── slurm-tests/
│   │   ├── README.md
│   │   ├── asr_nim/
│   │   │   ├── README.md
│   │   │   ├── asr.test
│   │   │   ├── check_results.py
│   │   │   └── run_test.py
│   │   ├── clone_and_run.sh
│   │   ├── gpt_oss_python_aime25/
│   │   │   ├── check_results.py
│   │   │   └── run_test.py
│   │   ├── nano_30b_tool_calling/
│   │   │   ├── check_results.py
│   │   │   └── run_test.py
│   │   ├── omr_simple_recipe/
│   │   │   ├── check_results.py
│   │   │   └── run_test.py
│   │   ├── qwen3_4b_evals/
│   │   │   ├── check_results.py
│   │   │   └── run_test.py
│   │   ├── qwen3_4b_ray_executor/
│   │   │   ├── check_results.py
│   │   │   └── run_test.py
│   │   ├── qwen3coder_30b_swebench/
│   │   │   ├── check_results.py
│   │   │   └── run_test.py
│   │   ├── run_all.sh
│   │   ├── stem_sdg_pipeline/
│   │   │   └── run_test.py
│   │   ├── super_120b_aime25/
│   │   │   ├── check_results.py
│   │   │   ├── run_test.py
│   │   │   └── trtllm-extra-llm-api-config.yml
│   │   ├── super_49b_evals/
│   │   │   ├── check_results.py
│   │   │   └── run_test.py
│   │   ├── tts_nim/
│   │   │   ├── README.md
│   │   │   ├── check_results.py
│   │   │   ├── run_test.py
│   │   │   └── tts.test
│   │   ├── unified_asr/
│   │   │   ├── asr_openai.test
│   │   │   ├── check_results.py
│   │   │   └── run_test.py
│   │   ├── unified_tts/
│   │   │   ├── README.md
│   │   │   ├── check_results.py
│   │   │   ├── run_test.py
│   │   │   └── tts_openai.test
│   │   ├── utils.py
│   │   └── wmt24pp_gym_topology/
│   │       ├── README.md
│   │       ├── check_results.py
│   │       └── run_test.py
│   ├── test_arena_metrics.py
│   ├── test_base_metrics.py
│   ├── test_code_execution.py
│   ├── test_configs.py
│   ├── test_data_preparation.py
│   ├── test_datasets.py
│   ├── test_declarative_pipeline.py
│   ├── test_default_args.py
│   ├── test_dependency_isolation.py
│   ├── test_eval.py
│   ├── test_external_benchmarks.py
│   ├── test_generation.py
│   ├── test_magpie_tts_backend.py
│   ├── test_math_equal.py
│   ├── test_mcp_clients.py
│   ├── test_metrics.py
│   ├── test_nemo_asr_backend.py
│   ├── test_nemo_evaluator_pipeline.py
│   ├── test_nvidia_inference_api.py
│   ├── test_pipeline_utils.py
│   ├── test_prompts.py
│   ├── test_prover.py
│   ├── test_ray_executor.py
│   ├── test_sandbox_fork_exc_leak.py
│   ├── test_sandbox_network_blocking.py
│   ├── test_session_affinity.py
│   ├── test_streaming_tool_calling.py
│   ├── test_unified_server_audio_parser.py
│   ├── test_unified_server_batcher.py
│   ├── test_unified_server_error_handling.py
│   ├── test_vllm_audio.py
│   └── test_vlm.py
└── tools/
    ├── pyproject.toml
    └── requirements.txt
Download .txt
Showing preview only (319K chars total). Download the full file or copy to clipboard to get everything.
SYMBOL INDEX (3360 symbols across 473 files)

FILE: dataset_explorer_demo/visualize_similar.py
  function load_jsonl (line 27) | def load_jsonl(file_path):
  function render_latex (line 33) | def render_latex(text):
  function display_entry (line 98) | def display_entry(index, test_set):
  function random_entry (line 137) | def random_entry(data):
  function load_test_sets (line 142) | def load_test_sets(test_set):
  function update_test_set (line 201) | def update_test_set(test_set):
  function display_entry_wrapper (line 217) | def display_entry_wrapper(index, current_test_set):
  function random_entry_wrapper (line 223) | def random_entry_wrapper(current_test_set):

FILE: dockerfiles/sandbox/block_network.c
  function socket (line 41) | int socket(int domain, int type, int protocol) {

FILE: nemo_skills/_cli_stub.py
  function main (line 18) | def main():

FILE: nemo_skills/code_execution/local_sandbox/local_sandbox_server.py
  function shell_worker (line 78) | def shell_worker(conn):
  class ShellManager (line 174) | class ShellManager:
    method __init__ (line 175) | def __init__(self):
    method start_shell (line 182) | def start_shell(self, shell_id):
    method stop_shell (line 197) | def stop_shell(self, shell_id):
    method _finish_restart (line 215) | def _finish_restart(self, shell_id):
    method _cleanup_shell_resources (line 229) | def _cleanup_shell_resources(self, proc, conn):
    method run_cell (line 247) | def run_cell(self, shell_id, code, timeout=1.0, grace=2.0, traceback_v...
  function log_session_count (line 405) | def log_session_count(prefix: str = "") -> None:
  function cleanup_expired_sessions (line 419) | def cleanup_expired_sessions():
  function postprocess_output (line 440) | def postprocess_output(output, traceback_verbosity):
  function cleanup_session (line 464) | def cleanup_session(session_id):
  function execute_ipython_session (line 470) | def execute_ipython_session(generated_code, session_id, timeout=30, trac...
  function _after_log_session_count (line 547) | def _after_log_session_count(response):
  function kill_process_tree (line 561) | def kill_process_tree(proc):
  function set_limits (line 596) | def set_limits(mem_bytes: int = MEM_LIMIT_BYTES) -> None:
  function execute_python (line 607) | def execute_python(generated_code, std_input, timeout, language):
  function execute_lean4 (line 631) | def execute_lean4(generated_code, timeout):
  function execute_shell (line 688) | def execute_shell(command, timeout):
  function execute (line 716) | def execute():
  function list_sessions (line 747) | def list_sessions():
  function delete_session (line 771) | def delete_session(session_id):
  function health (line 787) | def health():

FILE: nemo_skills/code_execution/proof_utils.py
  class ProofBuildConfig (line 30) | class ProofBuildConfig:
  function extract_proof_only (line 39) | def extract_proof_only(lean_code: str) -> str:
  function build_lean4_proof (line 97) | def build_lean4_proof(
  function determine_proof_status (line 140) | def determine_proof_status(compiler_output: Dict[str, Any]) -> str:
  function prepare_predicted_proof_from_line_dict (line 169) | def prepare_predicted_proof_from_line_dict(
  function remove_comments (line 207) | def remove_comments(text):
  function move_imports_to_beginning (line 223) | def move_imports_to_beginning(input_string):
  function return_theorem_to_prove (line 230) | def return_theorem_to_prove(text):
  function return_theorem_to_replace (line 237) | def return_theorem_to_replace(text):
  function replace_statement_in_proof (line 244) | def replace_statement_in_proof(statement, proof):
  function refine_by_sorry (line 260) | def refine_by_sorry(text):
  function extract_code (line 281) | def extract_code(inputs):
  function parse_error (line 300) | def parse_error(log_string):
  function get_error_str (line 322) | def get_error_str(code, errors, error_thres=True):

FILE: nemo_skills/code_execution/sandbox.py
  class Sandbox (line 36) | class Sandbox(abc.ABC):
    method __init__ (line 55) | def __init__(
    method close (line 76) | async def close(self):
    method _send_request (line 80) | async def _send_request(self, request, timeout):
    method _parse_request_output (line 115) | def _parse_request_output(self, output):
    method _get_execute_url (line 119) | def _get_execute_url(self):
    method _prepare_request (line 123) | def _prepare_request(
    method delete_session (line 135) | async def delete_session(self, session_id: str) -> None:
    method execute_code (line 139) | async def execute_code(
    method is_proof_correct (line 279) | async def is_proof_correct(self, pred_output, timeout=30.0):
    method _check_ready (line 290) | def _check_ready(self, timeout: float = 5.0) -> bool:
    method wait_for_sandbox (line 307) | def wait_for_sandbox(self, wait_timeout: int = 240, http_timeout: int ...
  class LocalSandbox (line 315) | class LocalSandbox(Sandbox):
    method _get_execute_url (line 318) | def _get_execute_url(self):
    method _parse_request_output (line 321) | def _parse_request_output(self, output):
    method _prepare_request (line 328) | def _prepare_request(
    method delete_session (line 346) | async def delete_session(self, session_id: str) -> None:
  function get_sandbox (line 396) | def get_sandbox(sandbox_type: str = "local", **kwargs):
  function sandbox_params (line 402) | def sandbox_params():

FILE: nemo_skills/code_execution/utils.py
  function format_code_output (line 24) | def format_code_output(
  function _extract_between_separators (line 70) | def _extract_between_separators(generation: str, separators: Tuple[str, ...
  function extract_code_to_execute (line 82) | def extract_code_to_execute(generation: str, code_begin: str, code_end: ...
  function extract_code_output (line 86) | def extract_code_output(generation: str, code_output_begin: str, code_ou...
  function extract_code_block (line 90) | def extract_code_block(text: str, languages=None, extract_code_mode: str...
  function clean_formal_generation (line 101) | def clean_formal_generation(

FILE: nemo_skills/conversion/hf_to_nemo_llama.py
  function get_args (line 39) | def get_args():
  function load_config (line 65) | def load_config(llama_config):
  function load_state_dict_helper (line 119) | def load_state_dict_helper(cls, cfg, trainer: Trainer, state_dict):
  function convert (line 138) | def convert(args):

FILE: nemo_skills/conversion/hf_to_nemo_qwen.py
  function get_args (line 38) | def get_args():
  function load_config (line 61) | def load_config(args, qwen_config):
  function convert (line 89) | def convert(args):

FILE: nemo_skills/conversion/nemo_to_hf_llama.py
  function get_args (line 33) | def get_args():
  function create_hf_config (line 68) | def create_hf_config(hf_model_name, nemo_config):
  function convert (line 95) | def convert(

FILE: nemo_skills/conversion/nemo_to_hf_qwen.py
  function get_args (line 30) | def get_args():
  function convert (line 66) | def convert(

FILE: nemo_skills/dataset/aai/aai_score.py
  function compute_score (line 18) | def compute_score(metrics: dict):

FILE: nemo_skills/dataset/aalcr/prepare.py
  function construct_prompt (line 66) | def construct_prompt(docs, question, prompt_template=prompt_template):
  function count_n_tokens (line 72) | def count_n_tokens(prompt: str, tokenizer_name: str) -> int:
  function find_actual_file (line 80) | def find_actual_file(base_path, target_filename):
  function write_data_to_file (line 153) | def write_data_to_file(output_file, data, txt_file_folder, max_context_w...
  function prepare_aalcr_data (line 206) | def prepare_aalcr_data(max_context_window, setup, tokenizer_name):

FILE: nemo_skills/dataset/aime24-x/prepare.py
  function _load_utils (line 24) | def _load_utils():
  function format_entry (line 38) | def format_entry(entry, lang, prompt_language):
  function main (line 49) | def main(args):

FILE: nemo_skills/dataset/aime25-x/prepare.py
  function _load_utils (line 24) | def _load_utils():
  function format_entry (line 38) | def format_entry(entry, lang, prompt_language):
  function main (line 49) | def main(args):

FILE: nemo_skills/dataset/aime26/prepare.py
  function format_entry (line 23) | def format_entry(entry):
  function write_data_to_file (line 31) | def write_data_to_file(output_file, data):
  function main (line 38) | def main(args):

FILE: nemo_skills/dataset/apex-shortlist/prepare.py
  function write_data_to_file (line 22) | def write_data_to_file(output_file, data):

FILE: nemo_skills/dataset/arena-hard-v2/prepare.py
  function extract_answer_text (line 31) | def extract_answer_text(data):

FILE: nemo_skills/dataset/asr-leaderboard/prepare.py
  function save_audio_and_format_entry (line 55) | def save_audio_and_format_entry(
  function prepare_dataset (line 99) | def prepare_dataset(dataset_name, output_dir, with_audio=True):
  function main (line 139) | def main():

FILE: nemo_skills/dataset/audiobench/prepare.py
  function get_audio_duration (line 109) | def get_audio_duration(audio_array: np.ndarray, sampling_rate: int) -> f...
  function save_audio_file (line 116) | def save_audio_file(audio_array: np.ndarray, sampling_rate: int, output_...
  function extract_audio_dict (line 122) | def extract_audio_dict(sample: Dict) -> Dict | None:
  function create_manifest_entry (line 136) | def create_manifest_entry(
  function process_dataset (line 206) | def process_dataset(
  function main (line 485) | def main():

FILE: nemo_skills/dataset/beyond-aime/prepare.py
  function save_data (line 22) | def save_data():

FILE: nemo_skills/dataset/bfcl_v3/bfcl_score.py
  function calculate_combined_accuracy (line 52) | def calculate_combined_accuracy(accuracy_dict_list: list[dict], weighted...
  function get_accuracy_dict (line 77) | def get_accuracy_dict(metrics, category):
  function calculate_non_live_single_turn_accuracy (line 120) | def calculate_non_live_single_turn_accuracy(metrics):
  function calculate_live_single_turn_accuracy (line 145) | def calculate_live_single_turn_accuracy(metrics):
  function calculate_multi_turn_accuracy (line 164) | def calculate_multi_turn_accuracy(metrics):
  function compute_score (line 173) | def compute_score(metrics: dict):

FILE: nemo_skills/dataset/bfcl_v3/prepare.py
  function ensure_bfcl_eval_installed (line 33) | def ensure_bfcl_eval_installed():
  function process_multi_turn_test_case (line 92) | def process_multi_turn_test_case(instance):
  function load_dataset_entry (line 106) | def load_dataset_entry(
  function download_and_process_bfcl_data (line 156) | def download_and_process_bfcl_data(repo_url, subfolder_path, output_dir,...
  function main (line 205) | def main(args):

FILE: nemo_skills/dataset/bfcl_v3/utils.py
  function _get_language_specific_hint (line 39) | def _get_language_specific_hint(test_category):
  function func_doc_language_specific_pre_processing (line 48) | def func_doc_language_specific_pre_processing(function, test_category):
  function _cast_to_openai_type (line 102) | def _cast_to_openai_type(properties, mapping):
  function convert_to_tool (line 137) | def convert_to_tool(functions):

FILE: nemo_skills/dataset/bfcl_v4/bfcl_score.py
  function calculate_non_live_single_turn_accuracy (line 69) | def calculate_non_live_single_turn_accuracy(metrics):
  function calculate_live_single_turn_accuracy (line 86) | def calculate_live_single_turn_accuracy(metrics):
  function calculate_agentic_accuracy (line 98) | def calculate_agentic_accuracy(metrics):
  function calculate_hallucination_measurement (line 115) | def calculate_hallucination_measurement(metrics):
  function compute_score (line 124) | def compute_score(metrics: dict):

FILE: nemo_skills/dataset/bfcl_v4/prepare.py
  function main (line 32) | def main():

FILE: nemo_skills/dataset/bigcodebench/prepare.py
  function parse_data (line 26) | def parse_data(split="hard"):
  function extract_prefix (line 32) | def extract_prefix(text: str, delimiter: str) -> str:
  function clean_data (line 38) | def clean_data(dataset, subset):
  function wrap_in_code_tag (line 74) | def wrap_in_code_tag(text):

FILE: nemo_skills/dataset/birdbench/prepare.py
  function download_data (line 26) | def download_data(data_dir):
  function read_tables_file (line 45) | def read_tables_file(base_dir):
  function format_entries (line 82) | def format_entries(file_path, tables_info, out_file):
  function main (line 105) | def main():

FILE: nemo_skills/dataset/brumo25/prepare.py
  function write_data_to_file (line 22) | def write_data_to_file(output_file, data):

FILE: nemo_skills/dataset/challenge19/prepare.py
  function process_row (line 21) | def process_row(row, source):
  function load_jsonl_problems (line 29) | def load_jsonl_problems(file_path, target_ids):
  function load_ids_from_file (line 62) | def load_ids_from_file(file_path):
  function main (line 68) | def main():

FILE: nemo_skills/dataset/compute-eval/prepare.py
  function _fence_for_path (line 29) | def _fence_for_path(path: str) -> str:
  function _format_context_files_block (line 43) | def _format_context_files_block(context_files: list[dict[str, str]]) -> ...

FILE: nemo_skills/dataset/contextasr-bench/contextasr_score.py
  function compute_score (line 16) | def compute_score(combined_metrics: dict) -> dict:

FILE: nemo_skills/dataset/contextasr-bench/prepare.py
  function download_dataset (line 59) | def download_dataset(download_dir):
  function build_messages (line 131) | def build_messages(prompt_text, audio_path, duration):
  function format_entry (line 145) | def format_entry(sample, mode, audio_prefix):
  function main (line 173) | def main():

FILE: nemo_skills/dataset/covost2/prepare.py
  function load_tsv (line 86) | def load_tsv(path: Path) -> list[dict]:
  function download_covost_tsv (line 91) | def download_covost_tsv(src_lang: str, tgt_lang: str, local_dir: Path) -...
  function load_validated_sentences (line 105) | def load_validated_sentences(path: Path) -> dict:
  function load_covost2 (line 112) | def load_covost2(
  function get_audio_duration (line 142) | def get_audio_duration(audio_file: str) -> float:
  function get_container_audio_path (line 147) | def get_container_audio_path(src_lang: str, split: str, audio_id: str) -...
  function copy_audio_file (line 151) | def copy_audio_file(src_wav: Path, audio_dir: Path, src_lang: str, split...
  function get_ast_instruction (line 159) | def get_ast_instruction(target_lang: str) -> str:
  function get_asr_instruction (line 164) | def get_asr_instruction() -> str:
  function _build_record (line 168) | def _build_record(
  function prepare_covost2 (line 199) | def prepare_covost2(
  function main (line 286) | def main():

FILE: nemo_skills/dataset/dsbench_da/prepare.py
  function read_excel_to_text (line 23) | def read_excel_to_text(excel_path: Path) -> str:
  function format_paths_for_prompt (line 42) | def format_paths_for_prompt(paths: list[Path], actual_root: Path, displa...
  function save_data (line 65) | def save_data(split: str, data_dir: str | Path, display_root: str | Path...

FILE: nemo_skills/dataset/fleurs/prepare.py
  function load_fleurs_module (line 29) | def load_fleurs_module():
  function parse_tsv (line 56) | def parse_tsv(tsv_path: str) -> dict[str, dict]:
  function load_fleurs (line 74) | def load_fleurs(locale: str, split: str, local_dir: str) -> list[dict]:
  function index_by_id (line 106) | def index_by_id(rows: list[dict]) -> dict[int, dict]:
  function build_translation_pairs (line 110) | def build_translation_pairs(languages: list[str]) -> list[tuple[str, str]]:
  function prepare_audio (line 121) | def prepare_audio(item: dict) -> tuple[np.ndarray, int, float]:
  function get_container_audio_path (line 128) | def get_container_audio_path(locale: str, wav_filename: str) -> str:
  function save_audio (line 132) | def save_audio(y: np.ndarray, sr: int, wav_path: Path) -> None:
  function get_ast_instruction (line 137) | def get_ast_instruction(target_locale: str) -> str:
  function get_asr_instruction (line 142) | def get_asr_instruction() -> str:
  function _build_record (line 146) | def _build_record(
  function prepare_fleurs (line 177) | def prepare_fleurs(data_dir: Path, split: str, languages: list[str], no_...
  function main (line 276) | def main():

FILE: nemo_skills/dataset/flores200/prepare.py
  function write_data_to_file (line 23) | def write_data_to_file(output_file, datasets, src_languages, tgt_languag...
  function main (line 41) | def main(args):

FILE: nemo_skills/dataset/frontierscience-olympiad/prepare.py
  function format_entry (line 32) | def format_entry(entry, problem_index):
  function write_data_to_file (line 49) | def write_data_to_file(output_file, data, subject_filter=None):

FILE: nemo_skills/dataset/global_piqa/global_piqa_utils.py
  function supported_languages (line 18) | def supported_languages() -> list[str]:
  function load_global_piqa_datasets (line 22) | def load_global_piqa_datasets(languages: list[str], split: str = "test")...
  function digit_to_letter (line 26) | def digit_to_letter(digit: int) -> str:
  class Schema (line 30) | class Schema:
  function get_mcq_fields (line 65) | def get_mcq_fields(entry: dict) -> dict:

FILE: nemo_skills/dataset/global_piqa/prepare.py
  function format_entry (line 29) | def format_entry(entry: dict, language: str) -> dict:
  function main (line 40) | def main(args):

FILE: nemo_skills/dataset/gpqa-x/prepare.py
  function _load_utils (line 24) | def _load_utils():
  function format_entry (line 39) | def format_entry(entry, lang, prompt_language):
  function main (line 57) | def main(args):

FILE: nemo_skills/dataset/gpqa/prepare.py
  function preprocess (line 31) | def preprocess(text):
  function format_entry (line 40) | def format_entry(entry):
  function write_data_to_file (line 63) | def write_data_to_file(output_file, data):
  function save_data (line 70) | def save_data(split, random_seed):

FILE: nemo_skills/dataset/gsm8k/prepare.py
  function save_data (line 41) | def save_data(split):

FILE: nemo_skills/dataset/hendrycks_math/fix_ref_solns.py
  function _post_fix (line 18) | def _post_fix(problem_id, soln_string):
  function _post_fix_multi_answer (line 53) | def _post_fix_multi_answer(problem_id, results):
  function _fix_solution (line 149) | def _fix_solution(problem_id, ref_soln):

FILE: nemo_skills/dataset/hle/prepare.py
  function format_entry (line 37) | def format_entry(entry):
  function write_data_to_file (line 51) | def write_data_to_file(output_file, data, split):

FILE: nemo_skills/dataset/hle_verified/prepare.py
  function load_dataset_from_hub (line 48) | def load_dataset_from_hub():
  function format_entry (line 63) | def format_entry(entry):
  function write_data_to_file (line 78) | def write_data_to_file(output_file, data, split):

FILE: nemo_skills/dataset/hmmt_feb25/prepare.py
  function write_data_to_file (line 22) | def write_data_to_file(output_file, data):

FILE: nemo_skills/dataset/hmmt_nov25/prepare.py
  function write_data_to_file (line 22) | def write_data_to_file(output_file, data):

FILE: nemo_skills/dataset/hotpotqa/prepare_utils.py
  function format_context (line 28) | def format_context(context: dict) -> str:
  function format_entry (line 48) | def format_entry(entry: dict) -> dict:
  function prepare_validation (line 63) | def prepare_validation(output_path: Path) -> int:

FILE: nemo_skills/dataset/human-eval-infilling/prepare.py
  function parse_data (line 28) | def parse_data(split):
  function clean_data (line 33) | def clean_data(dataset, split):

FILE: nemo_skills/dataset/librispeech-pc/prepare.py
  function download_with_progress (line 37) | def download_with_progress(url: str, output_path: Path, desc: str):
  function download_manifests (line 60) | def download_manifests(output_dir: Path) -> Path:
  function download_audio (line 86) | def download_audio(split: str, audio_dir: Path):
  function process_split (line 103) | def process_split(split: str, data_dir: Path, audio_dir: Path, with_audi...
  function main (line 165) | def main():

FILE: nemo_skills/dataset/livebench-coding/prepare.py
  function parse_data (line 23) | def parse_data():
  function clean_data (line 32) | def clean_data(dataset):

FILE: nemo_skills/dataset/livecodebench-cpp/prepare.py
  class PromptConstants (line 22) | class PromptConstants:
  function parse_data (line 35) | def parse_data(split):
  function clean_data (line 53) | def clean_data(dataset, keep_all_columns=False):
  function prepare (line 89) | def prepare(output_dir, split):

FILE: nemo_skills/dataset/livecodebench-pro/prepare.py
  function download_testcases (line 38) | def download_testcases(local_dir, token):
  function process_problem_splits (line 51) | def process_problem_splits(output_dir, token):

FILE: nemo_skills/dataset/livecodebench-x/prepare.py
  function _load_utils (line 24) | def _load_utils():
  function format_entry (line 40) | def format_entry(entry, lang, prompt_language):
  function main (line 52) | def main(args):

FILE: nemo_skills/dataset/livecodebench/prepare.py
  class PromptConstants (line 25) | class PromptConstants:
  function parse_data (line 33) | def parse_data(release_version="release_latest"):
  function get_first_last_day (line 57) | def get_first_last_day(year_month_str):
  function parse_month_range (line 67) | def parse_month_range(start_date, end_date):
  function clean_data (line 76) | def clean_data(dataset, keep_all_columns=False):
  function prepare (line 111) | def prepare(start_date, end_date, release_version, output_dir, keep_all_...

FILE: nemo_skills/dataset/longbench-v2/prepare.py
  function count_n_tokens (line 56) | def count_n_tokens(prompt: str, tokenizer_name: str) -> int:
  function write_data_to_file (line 71) | def write_data_to_file(output_file: Path, data, difficulty, length, toke...
  function prepare_longbenchv2_data (line 106) | def prepare_longbenchv2_data(setup: str, difficulty, length, tokenizer_n...

FILE: nemo_skills/dataset/longcodebench/prepare.py
  function count_n_tokens (line 26) | def count_n_tokens(prompt: str, tokenizer_name: str) -> int:
  function write_data_to_file (line 41) | def write_data_to_file(output_file, data, tokenizer_name):
  function prepare_longcodebench_data (line 57) | def prepare_longcodebench_data(setup, tokenizer_name):

FILE: nemo_skills/dataset/m-arena-hard-v2/prepare.py
  function format_entry (line 25) | def format_entry(row: dict, language: str) -> dict:
  function main (line 38) | def main(args):

FILE: nemo_skills/dataset/m-arena-hard/prepare.py
  function format_entry (line 25) | def format_entry(row: dict, language: str) -> dict:
  function main (line 40) | def main(args):

FILE: nemo_skills/dataset/math-odyssey/prepare.py
  function identify_label (line 23) | def identify_label(answer_endings, answer):

FILE: nemo_skills/dataset/minif2f/prepare.py
  function download_dataset (line 25) | def download_dataset(output_path):
  function _ensure_header_ends_with_by (line 30) | def _ensure_header_ends_with_by(text: str) -> str:
  function clean_lean_snippet (line 39) | def clean_lean_snippet(text: str | None) -> str | None:
  function _split_header_and_theorem (line 49) | def _split_header_and_theorem(text: str) -> tuple[str, str]:
  function process_entry (line 72) | def process_entry(entry: dict) -> dict:
  function split_data (line 96) | def split_data(input_file):
  function save_data (line 117) | def save_data(data, output_file):
  function delete_file (line 123) | def delete_file(file_path):
  function main (line 128) | def main(split):

FILE: nemo_skills/dataset/mmau-pro/mmau_pro_score.py
  function compute_score (line 16) | def compute_score(combined_metrics: dict) -> dict:

FILE: nemo_skills/dataset/mmau-pro/prepare.py
  function download_mmau_data (line 28) | def download_mmau_data(download_dir, hf_token):
  function format_entry (line 59) | def format_entry(entry, with_audio=False):
  function main (line 97) | def main():

FILE: nemo_skills/dataset/mmlu-pro/prepare.py
  function format_entry (line 27) | def format_entry(entry):
  function write_data_to_file (line 38) | def write_data_to_file(output_file, data):
  function main (line 45) | def main(args):

FILE: nemo_skills/dataset/mmlu-prox/prepare.py
  function download_and_parse_lang_libs (line 28) | def download_and_parse_lang_libs():
  function format_entry (line 84) | def format_entry(entry, language, lang_libs, lang_subjects):
  function write_data_to_file (line 120) | def write_data_to_file(output_file, datasets, languages, lang_libs, lang...
  function main (line 131) | def main(args):

FILE: nemo_skills/dataset/mmlu-redux/prepare.py
  function format_entry (line 87) | def format_entry(entry, category):
  function write_data_to_file (line 105) | def write_data_to_file(output_file, data, category):
  function main (line 113) | def main(args):

FILE: nemo_skills/dataset/mmlu/prepare.py
  function read_csv_files_from_tar (line 90) | def read_csv_files_from_tar(tar_file_path, split):
  function save_data (line 132) | def save_data(split):

FILE: nemo_skills/dataset/mmmlu/mmmlu_utils.py
  class Schema (line 158) | class Schema:
  function download_mmmlu_datasets (line 165) | def download_mmmlu_datasets(languages: list[str]) -> dict[str, list[dict]]:
  function format_multichoice_question (line 186) | def format_multichoice_question(row):
  function get_mcq_fields (line 190) | def get_mcq_fields(entry: dict):

FILE: nemo_skills/dataset/mmmlu/prepare.py
  function format_entry (line 30) | def format_entry(entry: dict, language: str) -> dict:
  function main (line 50) | def main(args):

FILE: nemo_skills/dataset/mmmu-pro/prepare.py
  function format_entry (line 26) | def format_entry(entry, images_dir: Path) -> dict | None:
  function save_data (line 48) | def save_data(split: str):

FILE: nemo_skills/dataset/mobench/prepare.py
  function download_dataset (line 24) | def download_dataset(output_path: str):
  function load_jsonl (line 29) | def load_jsonl(path: str):
  function write_jsonl (line 36) | def write_jsonl(path: str, rows):
  function strip_trailing_sorry (line 42) | def strip_trailing_sorry(text: str) -> str:
  function split_prelude_and_theorem (line 52) | def split_prelude_and_theorem(code: str):
  function extract_theorem_by (line 65) | def extract_theorem_by(theorem_block: str) -> str:
  function ensure_fields (line 83) | def ensure_fields(entry: dict, lean_header: str) -> dict:
  function get_lean4_header (line 118) | def get_lean4_header() -> str:
  function main (line 123) | def main():

FILE: nemo_skills/dataset/mrcr/prepare.py
  function count_n_tokens (line 33) | def count_n_tokens(messages: list[dict]) -> int:
  function write_data_to_file (line 42) | def write_data_to_file(output_file, data, max_context_window, needles_su...
  function get_mrcr_data (line 65) | def get_mrcr_data(needles_subset, setup, max_context_window):

FILE: nemo_skills/dataset/musan/prepare.py
  function download_from_kaggle (line 55) | def download_from_kaggle(output_dir: Path) -> Path:
  function download_from_openslr (line 72) | def download_from_openslr(output_dir: Path) -> Path:
  function load_dataset_from_source (line 110) | def load_dataset_from_source(source: str, output_dir: Path):
  function get_audio_duration (line 150) | def get_audio_duration(audio_array: np.ndarray, sampling_rate: int) -> f...
  function save_audio_file (line 157) | def save_audio_file(audio_array: np.ndarray, sampling_rate: int, output_...
  function create_manifest_entry (line 163) | def create_manifest_entry(
  function process_category_from_files (line 203) | def process_category_from_files(
  function process_category (line 278) | def process_category(
  function main (line 397) | def main():

FILE: nemo_skills/dataset/numb3rs/prepare.py
  function build_messages_with_prompt (line 64) | def build_messages_with_prompt(audio_metadata, prompt_text):
  function save_audio_and_format_entry (line 75) | def save_audio_and_format_entry(entry, category, audio_dir, sample_idx, ...
  function prepare_category (line 140) | def prepare_category(category, dataset, output_dir, with_audio=True, aud...
  function main (line 220) | def main():

FILE: nemo_skills/dataset/omniscience/prepare.py
  function parse_args (line 32) | def parse_args() -> argparse.Namespace:
  function format_entry (line 44) | def format_entry(entry) -> dict:
  function write_jsonl (line 54) | def write_jsonl(data: list[dict], path: str):

FILE: nemo_skills/dataset/open-proof-corpus-judge/prepare.py
  function load_jsonl (line 23) | def load_jsonl(file_path):
  function prepare_bon_binary_data (line 32) | def prepare_bon_binary_data(output_path):

FILE: nemo_skills/dataset/physics/prepare.py
  function strip_boxed (line 22) | def strip_boxed(s):
  function process_answer (line 29) | def process_answer(answer):
  function format_entry (line 35) | def format_entry(entry):
  function write_data_to_file (line 47) | def write_data_to_file(output_file, data):
  function save_data (line 54) | def save_data(split_data, split_name):

FILE: nemo_skills/dataset/polymath/prepare.py
  function _load_instructions (line 25) | def _load_instructions(url: str) -> tuple[dict, dict, dict]:
  function format_entry (line 46) | def format_entry(entry: dict, language: str, difficulty: str, language_c...
  function main (line 62) | def main(args):

FILE: nemo_skills/dataset/prepare.py
  function parse_prepare_cli_arguments (line 23) | def parse_prepare_cli_arguments(args=None, datasets_nargs="+"):
  function prepare_datasets (line 41) | def prepare_datasets(

FILE: nemo_skills/dataset/proof-arena-judge/prepare.py
  function prepare_data (line 40) | def prepare_data(output_path):
  function load_jsonl (line 87) | def load_jsonl(file_path):
  function grading_scheme_to_rubric (line 96) | def grading_scheme_to_rubric(grading_scheme, desc_key="grading_scheme_de...
  function load_openai_imo_proofs (line 104) | def load_openai_imo_proofs():
  function load_gemini_imo_proofs (line 136) | def load_gemini_imo_proofs():
  function process_imo_usamo_data (line 168) | def process_imo_usamo_data(raw_data, source):
  function process_imc_data (line 201) | def process_imc_data(raw_data):

FILE: nemo_skills/dataset/proof-bench-judge/prepare.py
  function prepare_verification_data (line 31) | def prepare_verification_data(output_path):
  function prepare_bon_binary_data (line 63) | def prepare_bon_binary_data(output_path):
  function load_hf_data (line 103) | def load_hf_data(split: str):

FILE: nemo_skills/dataset/proofnet/prepare.py
  function download_dataset (line 24) | def download_dataset(output_path):
  function split_data (line 29) | def split_data(input_file):
  function save_data (line 44) | def save_data(data, output_file):
  function delete_file (line 50) | def delete_file(file_path):
  function main (line 55) | def main(split):

FILE: nemo_skills/dataset/putnam-bench/prepare.py
  function parse_lean_file (line 36) | def parse_lean_file(path: Path) -> dict:
  function download_dataset_and_process (line 82) | def download_dataset_and_process(output_path):
  function delete_file (line 124) | def delete_file(file_path):
  function main (line 131) | def main():

FILE: nemo_skills/dataset/ruler/prepare.py
  function prepare_task_for_ns (line 45) | def prepare_task_for_ns(task, data_dir, setup, data_format):
  function get_ruler_data (line 79) | def get_ruler_data(tasks, setup, template_tokens, max_seq_length, data_f...

FILE: nemo_skills/dataset/ruler/ruler_score.py
  function compute_score (line 16) | def compute_score(metrics: dict):

FILE: nemo_skills/dataset/ruler2/prepare.py
  function prepare_mk_niah_basic (line 31) | def prepare_mk_niah_basic(output_folder, tokenizer_type, tokenizer_path,...
  function prepare_mk_niah_easy (line 68) | def prepare_mk_niah_easy(output_folder, tokenizer_type, tokenizer_path, ...
  function prepare_mk_niah_medium (line 103) | def prepare_mk_niah_medium(output_folder, tokenizer_type, tokenizer_path...
  function prepare_mk_niah_hard (line 138) | def prepare_mk_niah_hard(output_folder, tokenizer_type, tokenizer_path, ...
  function prepare_mv_niah_basic (line 173) | def prepare_mv_niah_basic(output_folder, tokenizer_type, tokenizer_path,...
  function prepare_mv_niah_easy (line 210) | def prepare_mv_niah_easy(output_folder, tokenizer_type, tokenizer_path, ...
  function prepare_mv_niah_medium (line 245) | def prepare_mv_niah_medium(output_folder, tokenizer_type, tokenizer_path...
  function prepare_mv_niah_hard (line 280) | def prepare_mv_niah_hard(output_folder, tokenizer_type, tokenizer_path, ...
  function prepare_qa_basic (line 315) | def prepare_qa_basic(output_folder, tokenizer_type, tokenizer_path, leng...
  function prepare_qa_easy (line 348) | def prepare_qa_easy(output_folder, tokenizer_type, tokenizer_path, lengt...
  function prepare_qa_medium (line 381) | def prepare_qa_medium(output_folder, tokenizer_type, tokenizer_path, len...
  function prepare_qa_hard (line 414) | def prepare_qa_hard(output_folder, tokenizer_type, tokenizer_path, lengt...
  function prepare_task_for_ns (line 447) | def prepare_task_for_ns(output_folder, task):
  function prepare_dataset (line 468) | def prepare_dataset(tasks, setup, max_seq_length, tokenizer_type, tokeni...

FILE: nemo_skills/dataset/ruler2/prepare_mmlu.py
  function generate_random_number (line 287) | def generate_random_number(num_digits=7):
  function generate_input_output (line 293) | def generate_input_output(index, num_qs):
  function generate_samples (line 400) | def generate_samples(max_seq_length: int, incremental: int = 10):
  function main (line 466) | def main():

FILE: nemo_skills/dataset/ruler2/prepare_niah.py
  function generate_random_number (line 91) | def generate_random_number(num_digits=7):
  function generate_random_word (line 97) | def generate_random_word():
  function generate_random_uuid (line 102) | def generate_random_uuid():
  function generate_random (line 106) | def generate_random(type_needle: str, digits: int | None = None):
  function generate_input_output (line 119) | def generate_input_output(num_haystack):
  function generate_samples (line 193) | def generate_samples(num_samples: int, max_seq_length: int, incremental:...
  function main (line 263) | def main():

FILE: nemo_skills/dataset/ruler2/prepare_qa.py
  function read_squad (line 97) | def read_squad():
  function read_hotpotqa (line 123) | def read_hotpotqa():
  function read_musique (line 152) | def read_musique():
  function generate_random_number (line 189) | def generate_random_number(num_digits=7):
  function generate_input_output (line 195) | def generate_input_output(index, num_docs):
  function generate_samples (line 309) | def generate_samples(num_samples: int, max_seq_length: int, incremental:...
  function main (line 374) | def main():

FILE: nemo_skills/dataset/ruler2/ruler2_score.py
  function compute_score (line 16) | def compute_score(metrics: dict):

FILE: nemo_skills/dataset/ruler2/tokenizer.py
  function select_tokenizer (line 27) | def select_tokenizer(tokenizer_type, tokenizer_path):
  class HFTokenizer (line 38) | class HFTokenizer:
    method __init__ (line 43) | def __init__(self, model_path) -> None:
    method text_to_tokens (line 48) | def text_to_tokens(self, text: str) -> List[str]:
    method tokens_to_text (line 52) | def tokens_to_text(self, tokens: List[int]) -> str:
  class OpenAITokenizer (line 57) | class OpenAITokenizer:
    method __init__ (line 62) | def __init__(self, model_path="cl100k_base") -> None:
    method text_to_tokens (line 67) | def text_to_tokens(self, text: str) -> List[int]:
    method tokens_to_text (line 71) | def tokens_to_text(self, tokens: List[int]) -> str:
  class GeminiTokenizer (line 76) | class GeminiTokenizer:
    method __init__ (line 81) | def __init__(self, model_path="gemini-1.5-pro-latest") -> None:
    method text_to_tokens (line 88) | def text_to_tokens(self, text: str) -> List[int]:
    method tokens_to_text (line 92) | def tokens_to_text(self, tokens: List[int]) -> str:

FILE: nemo_skills/dataset/simpleqa/prepare.py
  function format_entry (line 27) | def format_entry(entry: dict, idx: int) -> dict:
  function format_entry_verified (line 37) | def format_entry_verified(entry: dict, idx: int) -> dict:
  function write_data_to_file (line 47) | def write_data_to_file(output_file, examples: List[dict]):

FILE: nemo_skills/dataset/speed-bench/prepare.py
  class BenchmarkDataset (line 35) | class BenchmarkDataset(str, Enum):
  function _get_external_dataset (line 118) | def _get_external_dataset(dataset_name: str, config_name: str = "default"):
  function _generate_stackselect_prompt (line 130) | def _generate_stackselect_prompt(question: str, answers: list[str], answ...
  function _generate_textsort_prompt (line 210) | def _generate_textsort_prompt(prompt: str) -> str:
  function _generate_writing_prompt (line 265) | def _generate_writing_prompt(contents: list[str]) -> str:
  function _pad_or_truncate_prompt (line 281) | def _pad_or_truncate_prompt(prompt: str, target_num_tokens: int, padding...
  function _generate_bamboo_prompt (line 305) | def _generate_bamboo_prompt(external_dataset: "Dataset", num_tokens: int...
  function _generate_chatrag_bench_prompt (line 310) | def _generate_chatrag_bench_prompt(external_dataset: "Dataset") -> str:
  function _generate_coser_prompt (line 320) | def _generate_coser_prompt(external_dataset: "Dataset") -> str:
  function _generate_mmlu_pro_prompt (line 367) | def _generate_mmlu_pro_prompt(external_dataset: "Dataset", subject: str)...
  function _generate_hle_prompt (line 384) | def _generate_hle_prompt(
  function _get_num_tokens_from_config (line 407) | def _get_num_tokens_from_config(speed_config: DATASET_CONFIG | str) -> int:
  function _fetch_all_turns_data (line 415) | def _fetch_all_turns_data(example: dict[str, Any], speed_config: DATASET...
  function _resolve_external_data (line 573) | def _resolve_external_data(dataset: Dataset, speed_config: DATASET_CONFI...
  function prepare_data (line 592) | def prepare_data(args: argparse.Namespace) -> None:

FILE: nemo_skills/dataset/supergpqa/prepare.py
  function preprocess (line 30) | def preprocess(text):
  function format_entry (line 38) | def format_entry(entry):
  function write_data_to_file (line 76) | def write_data_to_file(output_file, data):
  function save_data (line 83) | def save_data(split, random_seed):

FILE: nemo_skills/dataset/swe-bench-multilingual/prepare.py
  function get_language (line 69) | def get_language(row):

FILE: nemo_skills/dataset/swe-bench/dump_images.py
  function read_container_names (line 23) | def read_container_names(jsonl_file):
  function convert_to_sif (line 40) | def convert_to_sif(container_name, output_dir):
  function main (line 72) | def main():

FILE: nemo_skills/dataset/swe-bench/dump_repos.py
  function read_repos (line 24) | def read_repos(jsonl_file):
  function clone_repo (line 36) | def clone_repo(repo, output_dir, force):
  function main (line 69) | def main():

FILE: nemo_skills/dataset/swe-rebench/prepare.py
  function get_date_range (line 22) | def get_date_range(start_str, end_str):

FILE: nemo_skills/dataset/ugphysics/prepare.py
  function get_prompt_sentence (line 50) | def get_prompt_sentence(answer_type, is_multiple_answer):
  function get_boxed_answer_example (line 64) | def get_boxed_answer_example(is_multiple_answer):
  function format_entry (line 71) | def format_entry(entry):
  function load_data (line 88) | def load_data(lang_split):
  function save_data (line 96) | def save_data(data, output_path):

FILE: nemo_skills/dataset/utils.py
  function locate (line 30) | def locate(path):
  function add_rounding_instruction (line 56) | def add_rounding_instruction(data: Dict) -> Dict:
  function import_from_path (line 73) | def import_from_path(file_path, module_name=None):
  function add_to_path (line 84) | def add_to_path(p):
  function get_dataset_name (line 94) | def get_dataset_name(dataset):
  function get_dataset_path (line 101) | def get_dataset_path(dataset, extra_benchmark_map=None):
  function get_extra_benchmark_map (line 122) | def get_extra_benchmark_map(extra_benchmark_map=None):
  function _load_external_dataset (line 150) | def _load_external_dataset(dataset_path):
  function get_default_dataset_module (line 162) | def get_default_dataset_module(dataset):
  function get_dataset_module (line 169) | def get_dataset_module(dataset, data_dir=None, extra_benchmark_map=None):
  function get_lean4_header (line 234) | def get_lean4_header():
  function download_with_retries (line 239) | def download_with_retries(url, output_file, max_retries=3, retry_delay=1):
  function save_data_from_qwen (line 252) | def save_data_from_qwen(dataset, split="test"):
  function get_mcq_fields (line 295) | def get_mcq_fields(question, choices):
  function get_question_hash (line 306) | def get_question_hash(question, options=None):
  function load_subset_ids (line 317) | def load_subset_ids(ids_file):
  function filter_by_subset (line 323) | def filter_by_subset(dataset, subset_ids, question_key="question", optio...

FILE: nemo_skills/dataset/wmt24pp/prepare.py
  function write_data_to_file (line 23) | def write_data_to_file(output_file, datasets, tgt_languages):
  function main (line 39) | def main(args):

FILE: nemo_skills/evaluation/aggregate_answers.py
  class ProcessTopAnswerConfig (line 34) | class ProcessTopAnswerConfig:
    method __post_init__ (line 73) | def __post_init__(self):
  function map_to_output_path (line 86) | def map_to_output_path(file_path, input_dir, output_dir):
  class ProcessMode (line 103) | class ProcessMode(Enum):
  class TopAnswerProcessor (line 108) | class TopAnswerProcessor:
    method __init__ (line 109) | def __init__(self, cfg: ProcessTopAnswerConfig):
    method _validate_cfg (line 113) | def _validate_cfg(self):
    method __enter__ (line 134) | def __enter__(self):
    method __exit__ (line 179) | def __exit__(self, exc_type, exc_val, exc_tb):
    method process (line 186) | def process(self):
    method _read_predictions (line 191) | def _read_predictions(self) -> Tuple[List, List]:
    method _write_results (line 251) | def _write_results(self, all_predictions: List, new_answers: List):
    method _write_results_fill (line 258) | def _write_results_fill(self, all_predictions: List, new_answers: List):
    method _write_results_extract (line 301) | def _write_results_extract(self, all_predictions: List, new_answers: L...
  function process_top_answer (line 319) | def process_top_answer(cfg: ProcessTopAnswerConfig):

FILE: nemo_skills/evaluation/compute_group_score.py
  function load_metric_files (line 22) | def load_metric_files(metric_files: List[str]) -> Dict[str, Any]:
  function import_score_module (line 34) | def import_score_module(score_module: str):
  function main (line 48) | def main():

FILE: nemo_skills/evaluation/evaluator/__init__.py
  function _resolve (line 76) | def _resolve(dotted: str):
  function _get_evaluator_fn (line 83) | def _get_evaluator_fn(eval_type: str) -> Callable:
  function _get_evaluator_cls (line 89) | def _get_evaluator_cls(eval_type: str) -> type:
  function _resolve_eval_type (line 104) | def _resolve_eval_type(eval_type: str):
  function is_evaluator_registered (line 127) | def is_evaluator_registered(eval_type: str):
  function register_evaluator (line 132) | def register_evaluator(eval_type: str, eval_fn: Callable[[Dict[str, Any]...
  function get_evaluator_class (line 142) | def get_evaluator_class(eval_type: str, config: Dict[str, Any]) -> BaseE...
  function supports_single_eval (line 156) | def supports_single_eval(eval_type: str, config: Dict[str, Any]) -> bool:
  function evaluate (line 166) | def evaluate(eval_type, eval_config):

FILE: nemo_skills/evaluation/evaluator/arena.py
  function compute_mle_elo (line 35) | def compute_mle_elo(df, SCALE=400, BASE=10, INIT_RATING=1000):
  function get_bootstrap_result (line 69) | def get_bootstrap_result(battles, func_compute_elo, num_round):
  function predict_win_rate (line 80) | def predict_win_rate(elo_ratings, SCALE=400, BASE=10, INIT_RATING=1000):
  function get_win_rate_column (line 97) | def get_win_rate_column(df, column):
  function get_battles_from_judgment (line 103) | def get_battles_from_judgment(scores, WEIGHT=3):
  function get_aggregate_score (line 161) | def get_aggregate_score(scores, weight=3):

FILE: nemo_skills/evaluation/evaluator/audio.py
  class AudioEvaluatorConfig (line 32) | class AudioEvaluatorConfig(BaseEvaluatorConfig):
  function remove_symbols_and_diacritics (line 69) | def remove_symbols_and_diacritics(s: str, keep: str = ""):
  function remove_symbols (line 92) | def remove_symbols(s: str):
  function normalize_compound_pairs (line 99) | def normalize_compound_pairs(ref_text: str, pred_text: str) -> tuple[str...
  class MultilingualTextNormalizer (line 129) | class MultilingualTextNormalizer:
    method __init__ (line 136) | def __init__(self, remove_diacritics: bool = True):
    method _normalize_numbers (line 139) | def _normalize_numbers(self, text, lang):
    method __call__ (line 154) | def __call__(self, s: str, lang=None):
  function extract_asr_text (line 179) | def extract_asr_text(generation: str) -> str:
  function strip_helpful_prefixes (line 194) | def strip_helpful_prefixes(text: str) -> str:
  function normalize_whitespace (line 234) | def normalize_whitespace(text: str) -> str:
  function split_tokens (line 239) | def split_tokens(text: str) -> list[str]:
  function extract_punctuation (line 244) | def extract_punctuation(text: str) -> list[str]:
  function calculate_per (line 249) | def calculate_per(reference: str, hypothesis: str) -> float:
  function evaluate_asr_pc (line 286) | def evaluate_asr_pc(
  function _normalize_digits_to_words (line 328) | def _normalize_digits_to_words(text: str) -> str:
  function _expand_contractions (line 365) | def _expand_contractions(text: str) -> str:
  function _remove_non_speech_elements (line 396) | def _remove_non_speech_elements(text: str) -> str:
  function resolve_asr_normalization_mode (line 406) | def resolve_asr_normalization_mode(config: AudioEvaluatorConfig) -> str:
  function preprocess_asr_text (line 417) | def preprocess_asr_text(text: str, mode: str = "standard", **kwargs) -> ...
  function _wer_with_counts (line 487) | def _wer_with_counts(ref: str, hyp: str) -> dict[str, Any]:
  function _cer_with_counts (line 506) | def _cer_with_counts(ref: str, hyp: str, key_prefix: str = "cer") -> dic...
  function evaluate_asr (line 527) | def evaluate_asr(
  function resolve_bleu_tokenize (line 570) | def resolve_bleu_tokenize(tgt_lang: str | None) -> str:
  function evaluate_translation (line 578) | def evaluate_translation(
  function evaluate_cer (line 611) | def evaluate_cer(
  function evaluate_hallucination (line 634) | def evaluate_hallucination(reference: str, hypothesis: str, audio_contex...
  function evaluate_pc_rate (line 668) | def evaluate_pc_rate(reference: str, hypothesis: str) -> dict[str, Any]:
  class AudioEvaluator (line 717) | class AudioEvaluator(BaseEvaluator):
    method __init__ (line 720) | def __init__(self, config: dict, num_parallel_requests=10):
    method eval_single (line 724) | async def eval_single(self, data_point: dict[str, any]) -> dict[str, a...
  function eval_audio (line 732) | def eval_audio(cfg):
  function evaluate_sample (line 738) | def evaluate_sample(sample: dict[str, Any], config: AudioEvaluatorConfig...

FILE: nemo_skills/evaluation/evaluator/base.py
  class BaseEvaluatorConfig (line 27) | class BaseEvaluatorConfig:
  class BaseEvaluator (line 34) | class BaseEvaluator(ABC):
    method __init__ (line 37) | def __init__(self, config: Dict[str, Any], num_parallel_requests=10):
    method eval_full (line 42) | async def eval_full(self) -> None:
    method eval_single (line 74) | async def eval_single(self, data_point: Dict[str, Any]) -> Dict[str, A...
    method supports_single_eval (line 89) | def supports_single_eval(self) -> bool:

FILE: nemo_skills/evaluation/evaluator/bfcl.py
  class BFCLEvaluatorConfig (line 38) | class BFCLEvaluatorConfig(BaseEvaluatorConfig):
  function eval_bfcl (line 44) | def eval_bfcl(cfg):
  function _convert_to_bfcl_format (line 103) | def _convert_to_bfcl_format(jsonl_file, output_dir, test_category):
  function _merge_bfcl_results (line 124) | def _merge_bfcl_results(generation_file, bfcl_fmted_file, score_file):

FILE: nemo_skills/evaluation/evaluator/bird.py
  function execute_sql (line 53) | def execute_sql(predicted_sql, ground_truth, db_path):
  class BirdEvaluatorConfig (line 71) | class BirdEvaluatorConfig(BaseEvaluatorConfig):
  class BirdEvaluator (line 81) | class BirdEvaluator(BaseEvaluator):
    method __init__ (line 82) | def __init__(self, config: dict, num_parallel_requests=10):
    method _extract_answer (line 88) | def _extract_answer(self, text):
    method eval_single (line 129) | async def eval_single(self, data_point: dict):

FILE: nemo_skills/evaluation/evaluator/ccc.py
  class CCCEvaluatorConfig (line 19) | class CCCEvaluatorConfig(BaseEvaluatorConfig):
  function _sandbox_exec_sync (line 33) | def _sandbox_exec_sync(sandbox: LocalSandbox, cmd: str, *, language: str...
  function _test_exec_sync (line 42) | def _test_exec_sync(sandbox: LocalSandbox, cmd: str, *, language: str = ...
  function _get_thread_test_sandbox (line 51) | def _get_thread_test_sandbox() -> LocalSandbox:
  function wait_for_sandbox (line 60) | def wait_for_sandbox(sandbox, timeout: int = 240, poll: float = 1.0):
  function _precompile_problem (line 74) | def _precompile_problem(problem_id: str, grader_files, compile_code: str...
  function run_test_case (line 102) | def run_test_case(task_args: dict, worker_id: int) -> dict:
  function extract_final_cpp_block (line 162) | def extract_final_cpp_block(text):
  function extract_final_text_block (line 169) | def extract_final_text_block(text):
  function extract_task_config (line 176) | def extract_task_config(problem_metadata: dict) -> dict:
  function add_includes (line 187) | def add_includes(code: str, problem_header_include: str | None = None, p...
  class CCCEvaluator (line 209) | class CCCEvaluator(BaseEvaluator):
    method __init__ (line 212) | def __init__(self, config: dict, num_parallel_requests: int = 10):
    method _initialize_runtime (line 221) | async def _initialize_runtime(self):
    method _get_precompiled_dir (line 240) | def _get_precompiled_dir(self, problem_id: str, problem_metadata: dict):
    method _build_test_task (line 256) | def _build_test_task(
    method _aggregate_subtask_score (line 270) | def _aggregate_subtask_score(self, subtask_meta: dict, outputs: list[d...
    method _evaluate_entry (line 286) | async def _evaluate_entry(self, entry: dict) -> dict:
    method eval_full (line 369) | async def eval_full(self):  # type: ignore[override]
    method eval_single (line 398) | async def eval_single(self, data_point: dict):

FILE: nemo_skills/evaluation/evaluator/code.py
  class CodeExecEvaluatorConfig (line 39) | class CodeExecEvaluatorConfig:
  class CodeExecEvaluator (line 47) | class CodeExecEvaluator(BaseEvaluator):
    method __init__ (line 48) | def __init__(self, config: dict, num_parallel_requests: int = 12):
    method eval_single (line 58) | async def eval_single(self, data: dict):
    method eval_full (line 95) | async def eval_full(self):  # type: ignore[override]
  function preprocess_code (line 118) | def preprocess_code(generation_dict: dict, language: str = "python", str...
  function install_from_git (line 176) | def install_from_git(git_url):
  class EvalPlusEvaluatorConfig (line 185) | class EvalPlusEvaluatorConfig(BaseEvaluatorConfig):
  function eval_evalplus (line 190) | def eval_evalplus(cfg):
  function install_requirements (line 231) | def install_requirements(url):
  class LiveCodeBenchProEvaluatorConfig (line 240) | class LiveCodeBenchProEvaluatorConfig(BaseEvaluatorConfig):
  function eval_livecodebench_pro (line 249) | def eval_livecodebench_pro(cfg):
  function eval_livebench_coding (line 297) | def eval_livebench_coding(cfg):
  function install_or_upgrade_package (line 349) | def install_or_upgrade_package(package_name):
  function eval_bigcodebench (line 358) | def eval_bigcodebench(cfg):
  function eval_human_eval_infilling (line 415) | def eval_human_eval_infilling(cfg):

FILE: nemo_skills/evaluation/evaluator/comet.py
  function load_comet_model (line 37) | def load_comet_model(model_path: str):
  function process_file (line 49) | def process_file(
  function main (line 106) | def main():

FILE: nemo_skills/evaluation/evaluator/compute_eval.py
  class ComputeEvalEvaluator (line 31) | class ComputeEvalEvaluator(BaseEvaluator):
    method __init__ (line 32) | def __init__(self, config: dict, num_parallel_requests=10):
    method eval_single (line 40) | async def eval_single(self, data_point: dict[str, Any]) -> dict[str, A...

FILE: nemo_skills/evaluation/evaluator/contextasr.py
  function _merge_single_letters (line 50) | def _merge_single_letters(text):
  function simple_tokenize (line 78) | def simple_tokenize(text):
  function extract_entities (line 103) | def extract_entities(text, entities_list, entity2count=None):
  function extract_entities_fuzzy (line 127) | def extract_entities_fuzzy(text, entities_list):
  function calculate_wer (line 174) | def calculate_wer(hyp_tokens, ref_tokens):
  function evaluate_contextasr_sample (line 223) | def evaluate_contextasr_sample(data_point):
  class ContextASREvaluatorConfig (line 314) | class ContextASREvaluatorConfig(BaseEvaluatorConfig):
  class ContextASREvaluator (line 320) | class ContextASREvaluator(BaseEvaluator):
    method __init__ (line 323) | def __init__(self, config: dict, num_parallel_requests=10):
    method eval_single (line 327) | async def eval_single(self, data_point: dict) -> dict:

FILE: nemo_skills/evaluation/evaluator/critpt.py
  class CritPtEvaluatorConfig (line 31) | class CritPtEvaluatorConfig(BaseEvaluatorConfig):
  class CritPtEvaluator (line 43) | class CritPtEvaluator(BaseEvaluator):
    method __init__ (line 53) | def __init__(self, config: dict, num_parallel_requests: int = 10):
    method _extract_code_from_generation (line 65) | def _extract_code_from_generation(self, generation: str) -> str:
    method _format_submission (line 81) | def _format_submission(self, data_point: dict) -> dict:
    method eval_full (line 105) | async def eval_full(self) -> None:
    method _submit_to_api (line 190) | def _submit_to_api(self, submissions: list[dict]) -> dict:

FILE: nemo_skills/evaluation/evaluator/dsbench.py
  function relaxed_equal (line 29) | def relaxed_equal(gt_answer: Any, predicted_answer: Any) -> bool:
  class DSBenchEvaluator (line 82) | class DSBenchEvaluator(MathEvaluator):
    method __init__ (line 83) | def __init__(self, config: dict, num_parallel_requests=10):
    method eval_single (line 87) | async def eval_single(self, data_point: dict[str, Any]) -> dict[str, A...

FILE: nemo_skills/evaluation/evaluator/icpc.py
  function sha256_hex (line 31) | def sha256_hex(text: str) -> str:
  class ICPCEvaluatorConfig (line 36) | class ICPCEvaluatorConfig(BaseEvaluatorConfig):
  function _sandbox_exec_sync (line 48) | def _sandbox_exec_sync(sandbox: LocalSandbox, cmd: str, *, language: str...
  function init_worker (line 65) | def init_worker():
  function _precompile_grader (line 73) | def _precompile_grader(
  function run_test_case (line 117) | def run_test_case(task_args: dict, worker_id: int) -> dict:
  function run_input_case (line 193) | def run_input_case(task_args: dict, worker_id: int) -> dict:
  function extract_final_cpp_block (line 267) | def extract_final_cpp_block(text):
  function add_includes (line 273) | def add_includes(code: str, problem_id: str) -> str:
  class ICPCEvaluator (line 289) | class ICPCEvaluator(BaseEvaluator):
    method __init__ (line 290) | def __init__(self, config: dict, num_parallel_requests: int = 10):
    method _initialize_runtime (line 300) | async def _initialize_runtime(self):
    method _evaluate_entry (line 338) | async def _evaluate_entry(self, entry: dict) -> dict:
    method eval_full (line 448) | async def eval_full(self, input_files):  # type: ignore[override]
    method eval_single (line 467) | async def eval_single(self, data_point: dict):

FILE: nemo_skills/evaluation/evaluator/ifbench.py
  function eval_ifbench (line 27) | def eval_ifbench(cfg):

FILE: nemo_skills/evaluation/evaluator/ifeval.py
  function eval_if (line 27) | def eval_if(cfg):

FILE: nemo_skills/evaluation/evaluator/ioi.py
  class IOIEvaluatorConfig (line 31) | class IOIEvaluatorConfig(BaseEvaluatorConfig):
  function sha256_hex (line 45) | def sha256_hex(text: str) -> str:
  function _sandbox_exec_sync (line 49) | def _sandbox_exec_sync(sandbox: LocalSandbox, cmd: str, *, language: str...
  function wait_for_sandbox (line 66) | def wait_for_sandbox(sandbox, timeout: int = 240, poll: float = 1.0):
  function init_worker (line 79) | def init_worker():
  function _precompile_grader (line 87) | def _precompile_grader(
  function run_test_case (line 127) | def run_test_case(task_args: dict, worker_id: int) -> dict:
  function run_input_case (line 202) | def run_input_case(task_args: dict, worker_id: int) -> dict:
  function extract_final_cpp_block (line 273) | def extract_final_cpp_block(text):
  function add_includes (line 279) | def add_includes(code: str, problem_id: str) -> str:
  class IOIEvaluator (line 306) | class IOIEvaluator(BaseEvaluator):
    method __init__ (line 307) | def __init__(self, config: dict, num_parallel_requests: int = 10):
    method _initialize_runtime (line 318) | async def _initialize_runtime(self):
    method _evaluate_entry (line 357) | async def _evaluate_entry(self, entry: dict) -> dict:
    method eval_full (line 476) | async def eval_full(self, input_files):  # type: ignore[override]
    method eval_single (line 494) | async def eval_single(self, data_point: dict):

FILE: nemo_skills/evaluation/evaluator/livecodebench.py
  class LiveCodeBenchEvaluatorConfig (line 40) | class LiveCodeBenchEvaluatorConfig(BaseEvaluatorConfig):
  function sandbox_context (line 52) | async def sandbox_context(config: dict):
  function execute_in_sandbox_with_retries (line 62) | async def execute_in_sandbox_with_retries(
  function is_sandbox_available (line 93) | async def is_sandbox_available(sandbox_config: dict) -> bool:
  function _preprocess_and_validate_file (line 124) | def _preprocess_and_validate_file(jsonl_file: str, language: str) -> Tup...
  function _postprocess_results (line 153) | def _postprocess_results(jsonl_file: str, samples: List[Dict[str, Any]]):
  function _install_packages_in_sandbox (line 172) | async def _install_packages_in_sandbox(sandbox: Sandbox, eval_config: Li...
  function _install_packages_locally (line 189) | def _install_packages_locally(interpreter: str):
  function eval_livecodebench_async (line 210) | async def eval_livecodebench_async(eval_config: LiveCodeBenchEvaluatorCo...
  function eval_livecodebench_without_sandbox (line 255) | def eval_livecodebench_without_sandbox(eval_config: LiveCodeBenchEvaluat...
  function eval_livecodebench (line 284) | def eval_livecodebench(cfg):

FILE: nemo_skills/evaluation/evaluator/math.py
  class MathEvaluatorConfig (line 32) | class MathEvaluatorConfig(BaseEvaluatorConfig):
  class LeanEvaluatorConfig (line 47) | class LeanEvaluatorConfig(BaseEvaluatorConfig):
  class MathEvaluator (line 57) | class MathEvaluator(BaseEvaluator):
    method __init__ (line 58) | def __init__(self, config: dict, num_parallel_requests=10):
    method eval_single (line 62) | async def eval_single(self, data_point: dict[str, any]) -> dict[str, a...
  class Lean4ProofEvaluator (line 90) | class Lean4ProofEvaluator(BaseEvaluator):
    method __init__ (line 93) | def __init__(self, config: dict, num_parallel_requests=10):
    method eval_single (line 99) | async def eval_single(self, data_point: dict[str, any]) -> dict[str, a...

FILE: nemo_skills/evaluation/evaluator/mcq.py
  function normalize_extracted_answer (line 28) | def normalize_extracted_answer(extracted_answer: str) -> str:
  class MCQEvaluatorConfig (line 50) | class MCQEvaluatorConfig(BaseEvaluatorConfig):
  function eval_mcq (line 62) | def eval_mcq(cfg):

FILE: nemo_skills/evaluation/evaluator/mmau_pro.py
  function eval_mmau_pro (line 28) | def eval_mmau_pro(cfg):
  function evaluate_instruction_following_sample (line 57) | def evaluate_instruction_following_sample(sample: dict[str, Any]) -> dic...
  function evaluate_aif_constraints (line 73) | def evaluate_aif_constraints(

FILE: nemo_skills/evaluation/evaluator/mrcr.py
  function eval_mrcr (line 27) | def eval_mrcr(cfg):

FILE: nemo_skills/evaluation/evaluator/nvembed_judge.py
  function install_packages (line 41) | def install_packages():
  function load_nvembed_model (line 60) | def load_nvembed_model(model_name: str = "nvidia/NV-Embed-v2"):
  function evaluate_with_nvembed_similarity (line 86) | def evaluate_with_nvembed_similarity(
  function evaluate_sample_with_nvembed (line 116) | def evaluate_sample_with_nvembed(sample: dict[str, Any], model_name: str...
  function process_file (line 150) | def process_file(input_file: Path, output_file: Path, model_name: str = ...
  function main (line 193) | def main():

FILE: nemo_skills/evaluation/evaluator/ruler.py
  class RulerEvaluatorConfig (line 30) | class RulerEvaluatorConfig(BaseEvaluatorConfig):
  function eval_ruler (line 35) | def eval_ruler(cfg):
  function eval_ruler2 (line 87) | def eval_ruler2(cfg):

FILE: nemo_skills/evaluation/evaluator/scicode.py
  class ScicodeEvaluatorConfig (line 30) | class ScicodeEvaluatorConfig(BaseEvaluatorConfig):
  function _execute_single_test (line 36) | async def _execute_single_test(args):
  function test_code (line 74) | def test_code(eval_config, scicode_data):
  function eval_scicode (line 111) | def eval_scicode(cfg):

FILE: nemo_skills/evaluation/evaluator/specdec.py
  class SpecdecEvaluatorConfig (line 27) | class SpecdecEvaluatorConfig(BaseEvaluatorConfig):
    method __post_init__ (line 39) | def __post_init__(self):
  function eval_specdec (line 44) | def eval_specdec(cfg: dict[str, Any]) -> None:

FILE: nemo_skills/evaluation/math_grader.py
  function _additional_normalization (line 26) | def _additional_normalization(expr):
  function math_equal (line 37) | def math_equal(gt_answer, predicted_answer, take_modulo: int | None = No...
  function extract_answer (line 102) | def extract_answer(
  function search_regex (line 117) | def search_regex(string: str, regex: str):
  function search_boxed (line 124) | def search_boxed(string: str):

FILE: nemo_skills/evaluation/metrics/aalcr_metrics.py
  class AALCRMetrics (line 20) | class AALCRMetrics(BaseMetrics):
    method __init__ (line 27) | def __init__(self):
    method reset (line 40) | def reset(self):
    method is_aalcr_correct (line 48) | def is_aalcr_correct(judgement: str) -> bool:
    method _get_score_dict (line 58) | def _get_score_dict(self, prediction: dict) -> dict[str, bool | int | ...
    method _get_token_bucket (line 72) | def _get_token_bucket(self, input_tokens: int) -> str:
    method _update_token_bucket_metrics (line 85) | def _update_token_bucket_metrics(self, prediction: dict, score_dict: d...
    method get_incorrect_sample (line 101) | def get_incorrect_sample(cls, prediction: dict) -> dict:
    method _update_category_metrics (line 108) | def _update_category_metrics(self, prediction: dict, score_dict: dict):
    method _update_token_stats (line 117) | def _update_token_stats(self, prediction: dict):
    method update (line 124) | def update(self, predictions):
    method get_metrics (line 148) | def get_metrics(self):
    method _print_category_table (line 188) | def _print_category_table(self, category_results):
    method _print_token_length_analysis (line 233) | def _print_token_length_analysis(self):
    method evaluations_to_print (line 303) | def evaluations_to_print(self):
    method metrics_to_print (line 310) | def metrics_to_print(self):

FILE: nemo_skills/evaluation/metrics/answer_judgement_metrics.py
  class AnswerJudgementMetrics (line 24) | class AnswerJudgementMetrics(BaseMetrics):
    method __init__ (line 25) | def __init__(self):
    method reset (line 31) | def reset(self):
    method _get_score_dict (line 35) | def _get_score_dict(self, prediction: dict) -> dict[str, bool | int | ...
    method get_incorrect_sample (line 41) | def get_incorrect_sample(self, prediction: dict) -> dict:
    method _store_individual_metrics (line 49) | def _store_individual_metrics(self, agg_key, pred_judgement, gt_judgem...
    method _update_fp_fn (line 66) | def _update_fp_fn(self, metrics_dict, pred_judgement, gt_judgement, di...
    method _update_score_metrics_for_majority (line 73) | def _update_score_metrics_for_majority(
    method _update_score_metrics_for_pass (line 90) | def _update_score_metrics_for_pass(
    method update (line 121) | def update(self, predictions):
    method _compute_precision_recall_f1 (line 134) | def _compute_precision_recall_f1(self, datapoint_metrics):
    method get_metrics (line 182) | def get_metrics(self):

FILE: nemo_skills/evaluation/metrics/arena_metrics.py
  class ArenaMetrics (line 21) | class ArenaMetrics(BaseMetrics):
    method __init__ (line 22) | def __init__(self):
    method _get_judge_score (line 25) | def _get_judge_score(self, judgment):
    method get_incorrect_sample (line 37) | def get_incorrect_sample(self, prediction: dict) -> dict:
    method update (line 43) | def update(self, predictions):
    method get_metrics (line 92) | def get_metrics(self):
    method reset (line 119) | def reset(self):

FILE: nemo_skills/evaluation/metrics/audio_metrics.py
  function compute_corpus_bleu (line 43) | def compute_corpus_bleu(
  class AudioMetrics (line 76) | class AudioMetrics(BaseMetrics):
    method __init__ (line 85) | def __init__(self, compute_no_answer: bool = True, max_k: int = 1):
    method _extract_judge_result (line 127) | def _extract_judge_result(self, judgement_text: str) -> tuple[bool, fl...
    method _get_score_dict (line 162) | def _get_score_dict(self, prediction: dict) -> dict[str, bool | int | ...
    method get_incorrect_sample (line 192) | def get_incorrect_sample(self, prediction: dict) -> dict:
    method update_common_metrics (line 210) | def update_common_metrics(self, agg_dict):
    method update (line 223) | def update(self, predictions):
    method get_metrics (line 304) | def get_metrics(self):
    method evaluations_to_print (line 380) | def evaluations_to_print(self):
    method metrics_to_print (line 391) | def metrics_to_print(self):
  function compute_score (line 457) | def compute_score(combined_metrics: dict) -> dict:

FILE: nemo_skills/evaluation/metrics/base.py
  class BaseMetrics (line 23) | class BaseMetrics(abc.ABC):
    method __init__ (line 24) | def __init__(self, compute_no_answer: bool = True):
    method update_common_metrics (line 28) | def update_common_metrics(self, agg_dict):
    method get_metrics (line 35) | def get_metrics(self):
    method _add_std_metrics (line 49) | def _add_std_metrics(self, metrics_dict):
    method _get_score_dict (line 124) | def _get_score_dict(self, prediction: dict) -> dict[str, bool | int | ...
    method update (line 145) | def update(self, predictions):
    method reset (line 191) | def reset(self):
    method get_incorrect_sample (line 200) | def get_incorrect_sample(self, predictions: list[dict]) -> list[dict]:
    method _update_score_metrics_for_majority (line 208) | def _update_score_metrics_for_majority(
    method _update_metrics_for_majority (line 228) | def _update_metrics_for_majority(
    method _compute_majority_at_k (line 246) | def _compute_majority_at_k(
    method _update_score_metrics_for_pass (line 315) | def _update_score_metrics_for_pass(
    method _update_metrics_for_pass (line 334) | def _update_metrics_for_pass(
    method _compute_pass_at_k (line 352) | def _compute_pass_at_k(
    method setup (line 425) | def setup(self, input_files):
    method metrics_to_print (line 428) | def metrics_to_print(self):
    method evaluations_to_print (line 432) | def evaluations_to_print(self):
  function as_percentage (line 437) | def as_percentage(metric_key: str, metric_value: float, all_metrics: dict):
  function as_int (line 443) | def as_int(metric_key: str, metric_value: float, all_metrics: dict):
  function as_float (line 449) | def as_float(metric_key: str, metric_value: float, all_metrics: dict):
  function default_formatting (line 454) | def default_formatting(metric_key: str, metric_value, all_metrics: dict)...

FILE: nemo_skills/evaluation/metrics/bfcl_metrics.py
  class BFCLMetrics (line 18) | class BFCLMetrics(BaseMetrics):
    method _get_score_dict (line 24) | def _get_score_dict(self, prediction: dict) -> dict[str, bool | int | ...
    method update (line 27) | def update(self, predictions):

FILE: nemo_skills/evaluation/metrics/bird_metrics.py
  class BirdMetrics (line 18) | class BirdMetrics(BaseMetrics):
    method __init__ (line 21) | def __init__(self):
    method reset (line 25) | def reset(self):
    method update (line 33) | def update(self, predictions):
    method get_metrics (line 47) | def get_metrics(self):
    method evaluations_to_print (line 67) | def evaluations_to_print(self):
    method metrics_to_print (line 70) | def metrics_to_print(self):

FILE: nemo_skills/evaluation/metrics/ccc_metrics.py
  class CCCMetrics (line 11) | class CCCMetrics(BaseMetrics):
    method __init__ (line 14) | def __init__(self, **kwargs):
    method reset (line 21) | def reset(self):
    method setup (line 27) | def setup(self, input_files):
    method update (line 38) | def update(self, predictions):
    method _get_score_dict (line 55) | def _get_score_dict(self, submission):
    method _aggregate_row_group (line 64) | def _aggregate_row_group(self, submissions, mode: str, subtask_name: s...
    method _build_problem_reports (line 165) | def _build_problem_reports(self, mode: str):
    method _select_minimal_solutions (line 338) | def _select_minimal_solutions(self, problem_id: str, problem_name: str...
    method _sanitize_filename_component (line 429) | def _sanitize_filename_component(value):
    method _extract_solution_code (line 435) | def _extract_solution_code(solution_text: str) -> str:
    method _write_selected_solutions (line 442) | def _write_selected_solutions(self, report: dict):
    method get_metrics (line 499) | def get_metrics(self):
    method evaluations_to_print (line 594) | def evaluations_to_print(self):

FILE: nemo_skills/evaluation/metrics/code_metrics.py
  class EvalPlusMetrics (line 18) | class EvalPlusMetrics(BaseMetrics):
    method _get_score_dict (line 19) | def _get_score_dict(self, prediction: dict) -> dict[str, bool | int | ...
    method get_incorrect_sample (line 25) | def get_incorrect_sample(self, prediction: dict) -> dict:
    method update (line 28) | def update(self, predictions):
  class LiveCodeBenchMetrics (line 33) | class LiveCodeBenchMetrics(BaseMetrics):
    method _get_score_dict (line 34) | def _get_score_dict(self, prediction: dict) -> dict[str, bool | int | ...
    method get_incorrect_sample (line 39) | def get_incorrect_sample(self, prediction: dict) -> dict:
    method update (line 42) | def update(self, predictions):
  class SweBenchMetrics (line 47) | class SweBenchMetrics(BaseMetrics):
    method _get_score_dict (line 48) | def _get_score_dict(self, prediction: dict) -> dict[str, bool | int | ...
    method get_incorrect_sample (line 55) | def get_incorrect_sample(self, prediction: dict) -> dict:
    method update (line 58) | def update(self, predictions):
  class SciCodeMetrics (line 63) | class SciCodeMetrics(BaseMetrics):
    method _get_score_dict (line 64) | def _get_score_dict(self, prediction: dict) -> dict[str, bool | int | ...
    method get_incorrect_sample (line 72) | def get_incorrect_sample(self, prediction: dict) -> dict:
    method update (line 80) | def update(self, predictions):
    method get_metrics (line 85) | def get_metrics(self):
    method reset (line 95) | def reset(self):
  class BigCodeBenchMetrics (line 100) | class BigCodeBenchMetrics(BaseMetrics):
    method _get_score_dict (line 101) | def _get_score_dict(self, prediction: dict) -> dict[str, bool | int | ...
    method get_incorrect_sample (line 106) | def get_incorrect_sample(self, prediction: dict) -> dict:
    method update (line 109) | def update(self, predictions):
  class HumanEvalInfillingMetrics (line 114) | class HumanEvalInfillingMetrics(BaseMetrics):
    method _get_score_dict (line 115) | def _get_score_dict(self, prediction: dict) -> dict[str, bool | int | ...
    method get_incorrect_sample (line 118) | def get_incorrect_sample(self, prediction: dict) -> dict:
    method update (line 121) | def update(self, predictions):
  class ComputeEvalMetrics (line 126) | class ComputeEvalMetrics(BaseMetrics):
    method _get_score_dict (line 127) | def _get_score_dict(self, prediction: dict) -> dict[str, bool | int | ...
    method get_incorrect_sample (line 130) | def get_incorrect_sample(self, prediction: dict) -> dict:
    method update (line 133) | def update(self, predictions):

FILE: nemo_skills/evaluation/metrics/compute_metrics.py
  class ComputeMetrics (line 24) | class ComputeMetrics:
    method __init__ (line 25) | def __init__(
    method get_metrics_calculator (line 44) | def get_metrics_calculator(self):
    method compute_metrics (line 49) | def compute_metrics(self, input_files):
    method metrics_to_print (line 89) | def metrics_to_print(self):
    method evaluations_to_print (line 92) | def evaluations_to_print(self):

FILE: nemo_skills/evaluation/metrics/contextasr_metrics.py
  class ContextASRMetrics (line 26) | class ContextASRMetrics(BaseMetrics):
    method __init__ (line 29) | def __init__(self, compute_no_answer: bool = True, max_k: int = 1):
    method _get_score_dict (line 43) | def _get_score_dict(self, prediction):
    method get_incorrect_sample (line 52) | def get_incorrect_sample(self, prediction):
    method update_common_metrics (line 58) | def update_common_metrics(self, agg_dict):
    method update (line 65) | def update(self, predictions):
    method get_metrics (line 96) | def get_metrics(self):
    method evaluations_to_print (line 115) | def evaluations_to_print(self):
    method metrics_to_print (line 122) | def metrics_to_print(self):

FILE: nemo_skills/evaluation/metrics/critpt_metrics.py
  class CritPtMetrics (line 23) | class CritPtMetrics(BaseMetrics):
    method _get_score_dict (line 33) | def _get_score_dict(self, prediction: dict) -> dict[str, bool | int | ...
    method update (line 42) | def update(self, predictions):
    method metrics_to_print (line 51) | def metrics_to_print(self):

FILE: nemo_skills/evaluation/metrics/gradingbench_metrics.py
  class GradingBenchMetrics (line 24) | class GradingBenchMetrics(BaseMetrics):
    method __init__ (line 54) | def __init__(self):
    method _extract_grade (line 58) | def _extract_grade(self, text: str) -> str | None:
    method _get_grades (line 89) | def _get_grades(self, prediction: dict) -> tuple[str | None, str | None]:
    method _get_score_dict (line 108) | def _get_score_dict(self, prediction: dict) -> dict[str, bool | int | ...
    method update (line 123) | def update(self, predictions):
    method get_metrics (line 137) | def get_metrics(self):
    method reset (line 149) | def reset(self):
    method metrics_to_print (line 154) | def metrics_to_print(self):
    method evaluations_to_print (line 162) | def evaluations_to_print(self):

FILE: nemo_skills/evaluation/metrics/hleaa_metrics.py
  class HLEAAMetrics (line 24) | class HLEAAMetrics(MathMetrics):
    method _postprocess_judgement (line 27) | def _postprocess_judgement(self, prediction: dict) -> dict:
    method update (line 37) | def update(self, predictions):

FILE: nemo_skills/evaluation/metrics/hotpotqa_filtering.py
  function _normalize_unicode (line 109) | def _normalize_unicode(s: str) -> str:
  function _gt_alternatives (line 122) | def _gt_alternatives(gt: str) -> tuple[list[str], list[str]]:
  function _is_multi_word_name (line 207) | def _is_multi_word_name(gt: str) -> bool:
  function _should_remove (line 219) | def _should_remove(gt: str) -> tuple[bool, str]:
  function normalize_gt (line 228) | def normalize_gt(gt_answer: str) -> dict:
  function is_correct (line 252) | def is_correct(alternatives: list[str], model_answer: str) -> bool:
  function is_correct_strict (line 263) | def is_correct_strict(alternatives: list[str], model_answer: str) -> bool:

FILE: nemo_skills/evaluation/metrics/hotpotqa_metrics.py
  function normalize_answer (line 36) | def normalize_answer(s: str) -> str:
  function answer_f1_score (line 55) | def answer_f1_score(prediction: str, ground_truth: str) -> tuple[float, ...
  function answer_exact_match (line 82) | def answer_exact_match(prediction: str, ground_truth: str) -> float:
  function sp_scores (line 87) | def sp_scores(prediction: list, gold: list) -> tuple[float, float, float...
  function _try_parse_answer_json (line 113) | def _try_parse_answer_json(text: str) -> tuple[str, list] | None:
  function _extract_json_candidates (line 135) | def _extract_json_candidates(text: str) -> list[str]:
  function parse_generation (line 158) | def parse_generation(generation: str) -> tuple[str, list]:
  class HotpotQAMetrics (line 187) | class HotpotQAMetrics(BaseMetrics):
    method __init__ (line 203) | def __init__(self, compute_no_answer: bool = False, closed_book: bool ...
    method reset (line 207) | def reset(self):
    method _get_score_dict (line 214) | def _get_score_dict(self, prediction: dict) -> dict[str, float]:
    method _update_score_metrics_for_pass (line 254) | def _update_score_metrics_for_pass(
    method update (line 272) | def update(self, predictions):
    method get_metrics (line 284) | def get_metrics(self):
    method evaluations_to_print (line 300) | def evaluations_to_print(self):
    method metrics_to_print (line 306) | def metrics_to_print(self):

FILE: nemo_skills/evaluation/metrics/icpc_metrics.py
  function extract_final_cpp_block (line 24) | def extract_final_cpp_block(text):
  class ICPCMetrics (line 30) | class ICPCMetrics(BaseMetrics):
    method __init__ (line 31) | def __init__(self, **kwargs):
    method update (line 37) | def update(self, predictions):
    method _get_score_dict (line 43) | def _get_score_dict(self, p):
    method get_problem_score (line 46) | def get_problem_score(self, submissions) -> bool:
    method get_problem_sample_score (line 52) | def get_problem_sample_score(self, submissions) -> bool:
    method extract_info (line 58) | def extract_info(self, submission) -> dict:
    method get_clusters (line 66) | def get_clusters(self, submissions) -> dict:
    method get_metrics (line 100) | def get_metrics(self):
    method evaluations_to_print (line 157) | def evaluations_to_print(self):
    method metrics_to_print (line 161) | def metrics_to_print(self):
    method reset (line 171) | def reset(self):
    method print_problem_scores (line 176) | def print_problem_scores(self):

FILE: nemo_skills/evaluation/metrics/if_metrics.py
  class IFMetrics (line 20) | class IFMetrics(BaseMetrics):
    method _get_score_dict (line 24) | def _get_score_dict(self, prediction: dict) -> dict[str, bool | int | ...
    method get_incorrect_sample (line 30) | def get_incorrect_sample(self, prediction: dict) -> dict:
    method update (line 35) | def update(self, predictions):
    method get_metrics (line 50) | def get_metrics(self):
    method reset (line 70) | def reset(self):

FILE: nemo_skills/evaluation/metrics/ioi_metrics.py
  function extract_final_cpp_block (line 22) | def extract_final_cpp_block(text):
  class IOIMetrics (line 28) | class IOIMetrics(BaseMetrics):
    method __init__ (line 29) | def __init__(self, **kwargs):
    method update (line 35) | def update(self, predictions):
    method _get_score_dict (line 41) | def _get_score_dict(self, p):
    method extract_info (line 44) | def extract_info(self, submission) -> dict:
    method get_clusters (line 53) | def get_clusters(self, submissions) -> dict:
    method get_problem_score (line 92) | def get_problem_score(self, submissions) -> float:
    method get_metrics (line 107) | def get_metrics(self):
    method reset (line 158) | def reset(self):
    method evaluations_to_print (line 164) | def evaluations_to_print(self):
    method print_problem_scores (line 167) | def print_problem_scores(self):

FILE: nemo_skills/evaluation/metrics/lean4_metrics.py
  class Lean4Metrics (line 19) | class Lean4Metrics(BaseMetrics):
    method __init__ (line 20) | def __init__(self):
    method _get_score_dict (line 23) | def _get_score_dict(self, prediction):
    method get_incorrect_sample (line 26) | def get_incorrect_sample(self, prediction: dict) -> dict:
    method _update_score_metrics_for_pass (line 31) | def _update_score_metrics_for_pass(
    method update (line 46) | def update(self, predictions):

FILE: nemo_skills/evaluation/metrics/map_metrics.py
  function get_metrics (line 109) | def get_metrics(metric_type: str, **kwargs):

FILE: nemo_skills/evaluation/metrics/math_metrics.py
  class MathMetrics (line 25) | class MathMetrics(BaseMetrics):
    method __init__ (line 28) | def __init__(
    method _compute_reward_at_k (line 35) | def _compute_reward_at_k(self, predictions: list[dict]):
    method _get_score_dict (line 70) | def _get_score_dict(self, prediction: dict) -> dict[str, bool | int | ...
    method is_correct_judgement (line 84) | def is_correct_judgement(self, judgement: str) -> bool:
    method get_incorrect_sample (line 88) | def get_incorrect_sample(self, prediction: dict) -> dict:
    method update (line 97) | def update(self, predictions):
    method evaluations_to_print (line 129) | def evaluations_to_print(self):
    method metrics_to_print (line 139) | def metrics_to_print(self):

FILE: nemo_skills/evaluation/metrics/mcq_multilingual_metrics.py
  class MCQMultilingualMetrics (line 41) | class MCQMultilingualMetrics(MathMetrics):
    method __init__ (line 42) | def __init__(
    method _get_score_dict (line 50) | def _get_score_dict(self, prediction: dict) -> dict[str, bool | int | ...
    method metrics_to_print (line 99) | def metrics_to_print(self):
    method _detect_language (line 104) | def _detect_language(self, text):

FILE: nemo_skills/evaluation/metrics/mmau_pro_metrics.py
  function extract_multicriteria_scores (line 26) | def extract_multicriteria_scores(judgement_text: str) -> dict[str, float]:
  class MMAUProMetrics (line 68) | class MMAUProMetrics(BaseMetrics):
    method __init__ (line 71) | def __init__(self, compute_no_answer: bool = True, max_k: int = 1):
    method _get_score_dict (line 84) | def _get_score_dict(self, prediction: dict) -> dict[str, bool | int | ...
    method get_incorrect_sample (line 100) | def get_incorrect_sample(self, prediction: dict) -> dict:
    method update (line 110) | def update(self, predictions):
    method get_metrics (line 125) | def get_metrics(self):
    method metrics_to_print (line 170) | def metrics_to_print(self):

FILE: nemo_skills/evaluation/metrics/mrcr_metrics.py
  class MRCRMetrics (line 18) | class MRCRMetrics(BaseMetrics):
    method _get_score_dict (line 21) | def _get_score_dict(self, prediction: dict) -> dict[str, bool | int | ...
    method update (line 24) | def update(self, predictions):

FILE: nemo_skills/evaluation/metrics/omni_metrics.py
  class OmniMetrics (line 20) | class OmniMetrics(BaseMetrics):
    method __init__ (line 21) | def __init__(self, compute_no_answer: bool = True, answer_key: str = "...
    method _compute_reward_at_k (line 26) | def _compute_reward_at_k(self, predictions: list[dict]):
    method _get_score_dict (line 61) | def _get_score_dict(self, prediction: dict) -> dict[str, bool | int | ...
    method get_metrics (line 76) | def get_metrics(self):
    method get_incorrect_sample (line 107) | def get_incorrect_sample(self, prediction: dict) -> dict:
    method update (line 118) | def update(self, predictions):
    method evaluations_to_print (line 125) | def evaluations_to_print(self):
    method metrics_to_print (line 131) | def metrics_to_print(self):

FILE: nemo_skills/evaluation/metrics/physics_metrics.py
  class PhysicsMetrics (line 24) | class PhysicsMetrics(MathMetrics):
    method __init__ (line 25) | def __init__(self, compute_no_answer: bool = False, answer_key: str = ...
    method is_correct_judgement (line 29) | def is_correct_judgement(self, judgement: str, return_none: bool = Fal...
    method get_incorrect_sample (line 41) | def get_incorrect_sample(self, prediction: dict) -> dict:

FILE: nemo_skills/evaluation/metrics/ruler2_metrics.py
  class Ruler2Metrics (line 18) | class Ruler2Metrics(BaseMetrics):
    method _get_score_dict (line 28) | def _get_score_dict(self, prediction: dict) -> dict[str, bool | int | ...
    method update (line 33) | def update(self, predictions):
    method get_incorrect_sample (line 37) | def get_incorrect_sample(self, prediction: dict) -> dict:

FILE: nemo_skills/evaluation/metrics/ruler_metrics.py
  class RulerMetrics (line 18) | class RulerMetrics(BaseMetrics):
    method _get_score_dict (line 19) | def _get_score_dict(self, prediction: dict) -> dict[str, bool | int | ...
    method update (line 22) | def update(self, predictions):
    method get_incorrect_sample (line 26) | def get_incorrect_sample(self, prediction: dict) -> dict:

FILE: nemo_skills/evaluation/metrics/simpleqa_metrics.py
  function is_correct_judgement_label_matching (line 24) | def is_correct_judgement_label_matching(judgement: str, correct_label: s...
  class SimpleQAMetrics (line 38) | class SimpleQAMetrics(BaseMetrics):
    method __init__ (line 41) | def __init__(self, compute_no_answer: bool = False, answer_key: str = ...
    method update (line 45) | def update(self, predictions):
    method _get_score_dict (line 60) | def _get_score_dict(self, prediction: dict) -> dict[str, bool | int | ...
    method _to_bool_or_none (line 90) | def _to_bool_or_none(j):
    method get_metrics (line 103) | def get_metrics(self):

FILE: nemo_skills/evaluation/metrics/specdec_metrics.py
  class SpecdecMetrics (line 23) | class SpecdecMetrics(BaseMetrics):
    method __init__ (line 42) | def __init__(self):
    method _get_score_dict (line 45) | def _get_score_dict(self, prediction: dict) -> dict[str, bool | int | ...
    method update (line 54) | def update(self, predictions: list[dict]) -> None:
    method get_metrics (line 67) | def get_metrics(self) -> dict:
    method metrics_to_print (line 87) | def metrics_to_print(self) -> dict:

FILE: nemo_skills/evaluation/metrics/translation_metrics.py
  function install_packages (line 24) | def install_packages(lang):
  class TranslationMetrics (line 34) | class TranslationMetrics(BaseMetrics):
    method get_metrics (line 37) | def get_metrics(self):
    method _add_std_metrics (line 89) | def _add_std_metrics(self, metrics_dict):
    method update (line 101) | def update(self, predictions):
    method reset (line 130) | def reset(self):
    method evaluations_to_print (line 136) | def evaluations_to_print(self):
    method metrics_to_print (line 140) | def metrics_to_print(self):

FILE: nemo_skills/evaluation/metrics/ugphysics_metrics.py
  class UGPhysicsMetrics (line 24) | class UGPhysicsMetrics(MathMetrics):
    method __init__ (line 25) | def __init__(self, compute_no_answer: bool = False, answer_key: str = ...
    method is_correct_judgement (line 29) | def is_correct_judgement(self, judgement: str, return_none: bool = Fal...
    method get_incorrect_sample (line 44) | def get_incorrect_sample(self, prediction: dict) -> dict:

FILE: nemo_skills/evaluation/metrics/utils.py
  function read_predictions (line 24) | def read_predictions(predictions, line_idx, file_handles):
  function is_correct_judgement (line 37) | def is_correct_judgement(judgement, return_none=False) -> Union[bool, No...

FILE: nemo_skills/evaluation/metrics/weighted_math_metrics.py
  class WeightedMathMetrics (line 24) | class WeightedMathMetrics(MathMetrics):
    method reset (line 27) | def reset(self) -> None:
    method _get_sample_weight (line 33) | def _get_sample_weight(self, prediction: dict) -> float:
    method _update_pass1_avg_of_k (line 37) | def _update_pass1_avg_of_k(self, score_method: str, attempt_scores: li...
    method _update_pass_at_k (line 43) | def _update_pass_at_k(self, score_method: str, attempt_scores: list[bo...
    method _update_majority_at_k (line 54) | def _update_majority_at_k(
    method update (line 70) | def update(self, predictions: list[dict]) -> None:
    method _add_weighted_std_metrics (line 89) | def _add_weighted_std_metrics(self, metrics_dict: dict) -> None:
    method get_metrics (line 116) | def get_metrics(self) -> dict:
    method metrics_to_print (line 127) | def metrics_to_print(self) -> dict:

FILE: nemo_skills/evaluation/utils.py
  function load_config (line 20) | def load_config(config: str, config_dir: str | None = None) -> dict:
  function get_eval_group (line 48) | def get_eval_group(eval_config: str | dict, eval_group_dir: str | None =...

FILE: nemo_skills/file_utils.py
  function unroll_files (line 21) | def unroll_files(input_files, parent_dir: str | None = None):
  function _make_w_io_base (line 35) | def _make_w_io_base(f, mode: str):
  function _make_r_io_base (line 53) | def _make_r_io_base(f, mode: str):
  function jdump (line 68) | def jdump(obj, f, mode="w", indent=None, default=str):
  function jload (line 103) | def jload(filepath, mode="r", verbose=False):
  function count_newlines (line 141) | def count_newlines(fname, verbose: bool = False):
  function calculate_chunk_indices (line 172) | def calculate_chunk_indices(num_samples: int, num_chunks: int, chunk_id:...
  function jload_chunk (line 214) | def jload_chunk(filepath, num_chunks: int, chunk_id: int, mode="r", verb...

FILE: nemo_skills/inference/autoformalize.py
  class AutoformalizeConfig (line 49) | class AutoformalizeConfig(GenerationTaskConfig):
  class AutoformalizeTask (line 72) | class AutoformalizeTask(GenerationTask):
    method __init__ (line 73) | def __init__(self, cfg: AutoformalizeConfig):
    method setup_llm (line 87) | def setup_llm(self):
    method setup_refine_prompt (line 101) | def setup_refine_prompt(self):
    method setup_judge_prompt (line 116) | def setup_judge_prompt(self):
    method _extract_code_sync (line 126) | def _extract_code_sync(self, completion: str):
    method _extract_code (line 140) | async def _extract_code(self, completion: str):
    method _backtranslate_code (line 144) | async def _backtranslate_code(self, code: str) -> str:
    method _judge_backtranslation (line 149) | async def _judge_backtranslation(self, backtranslation_result: str, da...
    method _judge_code (line 159) | async def _judge_code(self, code: str | None, data_point) -> dict:
    method _construct_refine_prompt (line 212) | def _construct_refine_prompt(self, results_dict):
    method _generate_single_completion (line 226) | async def _generate_single_completion(self, prompt: List[str]):
    method _single_data_point_generate (line 267) | async def _single_data_point_generate(self, data_point, data):
    method process_single_datapoint (line 308) | async def process_single_datapoint(self, data_point, all_data, prompt_...
  function generate (line 319) | def generate(cfg: AutoformalizeConfig):

FILE: nemo_skills/inference/chat_interface/chat_service.py
  class ChatService (line 26) | class ChatService:
    method __init__ (line 29) | def __init__(self, loader: ModelLoader, prompts: PromptManager):
    method stream_chat (line 33) | def stream_chat(
  class AppContext (line 77) | class AppContext:
    method __init__ (line 80) | def __init__(self, cfg: AppConfig):

FILE: nemo_skills/inference/chat_interface/core.py
  class AppConfig (line 42) | class AppConfig:
    method __post_init__ (line 71) | def __post_init__(self):
  class CodeExecStatus (line 116) | class CodeExecStatus(Enum):
  class PromptManager (line 129) | class PromptManager:
    method __init__ (line 132) | def __init__(self, cfg: AppConfig):
    method get (line 136) | def get(self, use_code: bool, prompt_config_override: str | None = Non...
  class ModelLoader (line 159) | class ModelLoader:
    method __init__ (line 162) | def __init__(self, cfg: AppConfig):
    method generic_llm (line 169) | def generic_llm(self) -> Any | None:  # noqa: D401
    method code_llm (line 173) | def code_llm(self) -> Any | None:  # noqa: D401
    method sandbox (line 177) | def sandbox(self):  # noqa: D401
    method cfg (line 181) | def cfg(self):  # noqa: D401
    method load_generic (line 184) | def load_generic(self) -> Tuple[bool, str]:
    method load_code_and_sandbox (line 206) | def load_code_and_sandbox(self) -> Tuple[bool, str]:
    method get_code_execution_status (line 246) | def get_code_execution_status(self, requested: bool) -> CodeExecStatus:
    method _is_sandbox_alive (line 256) | def _is_sandbox_alive(self) -> bool:
    method supports_code_toggle (line 267) | def supports_code_toggle(self) -> bool:

FILE: nemo_skills/inference/chat_interface/launch.py
  function launch (line 33) | def launch(cfg: AppConfig):

FILE: nemo_skills/inference/chat_interface/ui.py
  function _format_output (line 48) | def _format_output(text: str) -> str:
  class ChatUI (line 67) | class ChatUI:
    method __init__ (line 70) | def __init__(self, ctx: AppContext):
    method _get_default_prompt_config (line 104) | def _get_default_prompt_config(self, use_code: bool) -> str:
    method _get_current_prompt_config (line 108) | def _get_current_prompt_config(self, use_code: bool) -> str:
    method _build_chat_panel (line 117) | def _build_chat_panel(self):
    method on_prompt_config_change (line 195) | def on_prompt_config_change(self, prompt_config_value: str):
    method on_toggle_code_exec (line 210) | def on_toggle_code_exec(self, checkbox_val: bool):
    method on_cancel (line 253) | def on_cancel(self):
    method on_clear_chat (line 267) | def on_clear_chat(self):
    method on_reset_params (line 273) | def on_reset_params(self):
    method handle_chat_submit (line 326) | def handle_chat_submit(self, user_msg: str, max_tokens: int, temperatu...
    method launch (line 422) | def launch(self):
    method _banner_from_code_status (line 425) | def _banner_from_code_status(self, code_status: CodeExecStatus):

FILE: nemo_skills/inference/check_contamination.py
  class CheckContaminationConfig (line 40) | class CheckContaminationConfig(GenerationTaskConfig):
    method _get_disallowed_params (line 62) | def _get_disallowed_params(self):
  class CheckContaminationTask (line 74) | class CheckContaminationTask(GenerationTask):
    method __init__ (line 75) | def __init__(self, cfg: CheckContaminationConfig):
    method load_data (line 78) | def load_data(self):
    method log_example_prompt (line 88) | def log_example_prompt(self, data):
    method _create_query_data (line 102) | def _create_query_data(self, data_point):
    method prefill_generation (line 123) | def prefill_generation(self, data_point):
    method process_single_datapoint (line 130) | async def process_single_datapoint(self, data_point, all_data, prompt_...
    method postprocess (line 152) | def postprocess(self):
  function check_contamination (line 171) | def check_contamination(cfg: CheckContaminationConfig):

FILE: nemo_skills/inference/eval/arena_judge.py
  function sanitize_generation (line 42) | def sanitize_generation(generation: str) -> str:
  class ArenaJudgeConfig (line 51) | class ArenaJudgeConfig(GenerationTaskConfig):
  class ArenaJudgeTask (line 79) | class ArenaJudgeTask(GenerationTask):
    method __init__ (line 80) | def __init__(self, cfg: ArenaJudgeConfig):
    method setup_prompt (line 83) | def setup_prompt(self):
    method fill_prompt (line 113) | def fill_prompt(self, data_point, data, prompt_format=None):
    method log_example_prompt (line 141) | def log_example_prompt(self, all_data):
    method process_single_datapoint (line 159) | async def process_single_datapoint(self, data_point, all_data, prompt_...
  function generate (line 196) | def generate(cfg: ArenaJudgeConfig):

FILE: nemo_skills/inference/eval/bfcl.py
  class BFCLGenerationConfig (line 89) | class BFCLGenerationConfig(GenerationTaskConfig):
    method _post_init_validate_params (line 100) | def _post_init_validate_params(self):
    method _get_disallowed_params (line 113) | def _get_disallowed_params(self):
  class ClientMessageParser (line 124) | class ClientMessageParser:
    method __init__ (line 127) | def __init__(self, cfg: BFCLGenerationConfig):
    method _validate_and_setup_client_parsing (line 131) | def _validate_and_setup_client_parsing(self):
    method create_response_parser (line 172) | def create_response_parser(self, native_response_parser):
    method construct_input_dict (line 203) | def construct_input_dict(self, messages: list[dict], tools: list[dict]):
    method parse_output_dict (line 221) | def parse_output_dict(self, output_dict: dict):
    method get_response_text (line 263) | def get_response_text(self, message):
    method set_response_text (line 266) | def set_response_text(self, message, response_text):
  class ServerMessageParser (line 270) | class ServerMessageParser:
    method __init__ (line 273) | def __init__(self, cfg: BFCLGenerationConfig):
    method construct_input_dict (line 276) | def construct_input_dict(self, messages: list[dict], tools: list[dict]):
    method parse_output_dict (line 284) | def parse_output_dict(self, output_dict: dict):
    method get_response_text (line 324) | def get_response_text(self, message):
    method set_response_text (line 329) | def set_response_text(self, message, response_text):
  class BFCLGenerationTask (line 336) | class BFCLGenerationTask(GenerationTask):
    method get_generation_requirements (line 338) | def get_generation_requirements(cls) -> list[str] | None:
    method __init__ (line 341) | def __init__(self, cfg: BFCLGenerationConfig):
    method log_example_prompt (line 348) | def log_example_prompt(self, data):
    method setup_prompt (line 352) | def setup_prompt(self):
    method load_data (line 355) | def load_data(self):
    method _generate_single_assistant_turn (line 382) | async def _generate_single_assistant_turn(self, inference_state_dict):
    method _generate_single_data_point_single_turn (line 418) | async def _generate_single_data_point_single_turn(self, data_point):
    method _generate_single_data_point_multi_turn (line 437) | async def _generate_single_data_point_multi_turn(self, data_point):
    method _parse_reasoning_from_message_content (line 600) | def _parse_reasoning_from_message_content(self, model_response_text: s...
    method process_single_datapoint (line 611) | async def process_single_datapoint(self, data_point, all_data, prompt_...
  function bfcl_generation (line 624) | def bfcl_generation(cfg: BFCLGenerationConfig):

FILE: nemo_skills/inference/eval/bfcl_utils.py
  function convert_to_function_call (line 79) | def convert_to_function_call(function_call_list):
  function execute_multi_turn_func_call (line 93) | def execute_multi_turn_func_call(
  function is_empty_execute_response (line 195) | def is_empty_execute_response(input_list: list):
  function _process_method_calls (line 203) | def _process_method_calls(function_call_string: str, instance_mapping: d...

FILE: nemo_skills/inference/eval/bfcl_web_search.py
  class WebSearchBackendUnavailable (line 44) | class WebSearchBackendUnavailable(RuntimeError):
  class WebSearchAPI (line 48) | class WebSearchAPI:
    method __init__ (line 49) | def __init__(self):
    method _load_scenario (line 55) | def _load_scenario(self, initial_config: dict, long_context: bool = Fa...
    method _get_serp_api_key (line 63) | def _get_serp_api_key() -> Optional[str]:
    method _has_module (line 70) | def _has_module(module_name: str) -> bool:
    method _validate_backends_available (line 74) | def _validate_backends_available(self):
    method _warn_no_serp_api_key_once (line 105) | def _warn_no_serp_api_key_once(self):
    method _format_results (line 121) | def _format_results(self, results: list[dict]) -> list[dict]:
    method _search_with_serpapi_duckduckgo (line 131) | def _search_with_serpapi_duckduckgo(
    method _search_with_ddgs (line 215) | def _search_with_ddgs(self, *, keywords: str, max_results: int, region...
    method search_engine_query (line 242) | def search_engine_query(
    method fetch_url_content (line 399) | def fetch_url_content(self, url: str, mode: str = "raw") -> str:

FILE: nemo_skills/inference/eval/compute_eval.py
  class ComputeEvalGenerationTask (line 30) | class ComputeEvalGenerationTask(GenerationTask):
    method __init__ (line 31) | def __init__(self, cfg: GenerationTaskConfig):
    method process_single_datapoint (line 34) | async def process_single_datapoint(self, data_point, data, prompt_form...
  function run_compute_eval (line 65) | def run_compute_eval(cfg: GenerationTaskConfig):

FILE: nemo_skills/inference/eval/critpt.py
  class CritPtInferenceConfig (line 41) | class CritPtInferenceConfig:
  class CritPtGenerationConfig (line 57) | class CritPtGenerationConfig(GenerationTaskConfig):
  class CritPtGenerationTask (line 77) | class CritPtGenerationTask(GenerationTask):
    method __init__ (line 80) | def __init__(self, cfg: GenerationTaskConfig):
    method fill_prompt (line 89) | def fill_prompt(self, data_point, data, prompt_format=None):
    method process_single_datapoint (line 100) | async def process_single_datapoint(self, data_point, all_data):
  function generate (line 146) | def generate(cfg: CritPtGenerationConfig):

FILE: nemo_skills/inference/eval/scicode.py
  class SciCodeGenerationConfig (line 45) | class SciCodeGenerationConfig(GenerationTaskConfig):
  class SciCodeGenerationTask (line 63) | class SciCodeGenerationTask(GenerationTask):
    method log_example_prompt (line 64) | def log_example_prompt(self, data):
    method process_single_datapoint (line 68) | async def process_single_datapoint(self, data_point, all_data, prompt_...
  function scicode_generation (line 138) | def scicode_generation(cfg: SciCodeGenerationConfig):

FILE: nemo_skills/inference/eval/scicode_utils.py
  function process_problem_code (line 25) | def process_problem_code(prob_data: dict, num_steps: int) -> str:
  function process_problem_steps (line 32) | def process_problem_steps(problem_data: dict, num_steps: int, previous_l...
  function extract_python_script (line 63) | def extract_python_script(response: str):

FILE: nemo_skills/inference/eval/specdec.py
  class SpecDecodeMetricsError (line 37) | class SpecDecodeMetricsError(Exception):
    method __init__ (line 40) | def __init__(self, message: str):
    method __str__ (line 44) | def __str__(self):
  class SpecDecodeMetrics (line 54) | class SpecDecodeMetrics:
  function _fetch_metrics_text (line 70) | def _fetch_metrics_text(base_url: str) -> str | None:
  function fetch_vllm_spec_decode_metrics (line 84) | def fetch_vllm_spec_decode_metrics(base_url: str) -> SpecDecodeMetrics:
  function find_sglang_metrics_file (line 144) | def find_sglang_metrics_file(metrics_dir: str) -> str | None:
  function fetch_sglang_spec_decode_metrics (line 171) | def fetch_sglang_spec_decode_metrics(base_url: str) -> SpecDecodeMetrics:
  function _build_specdec_stats (line 221) | def _build_specdec_stats(
  function _compute_weighted_delta (line 241) | def _compute_weighted_delta(
  function compute_sglang_spec_decode_delta (line 259) | def compute_sglang_spec_decode_delta(
  function compute_vllm_spec_decode_delta (line 330) | def compute_vllm_spec_decode_delta(
  class SpecdecGenerationConfig (line 396) | class SpecdecGenerationConfig(GenerationTaskConfig):
    method _post_init_validate_server (line 414) | def _post_init_validate_server(self):
  class SpecdecGenerationTask (line 425) | class SpecdecGenerationTask(GenerationTask):
    method __init__ (line 436) | def __init__(self, cfg: SpecdecGenerationConfig):
    method _ensure_sglang_metrics_dir (line 441) | def _ensure_sglang_metrics_dir(cls) -> str:
    method get_generation_default_args (line 451) | def get_generation_default_args(cls) -> str:
    method get_server_command_fn (line 462) | def get_server_command_fn(cls) -> callable:
    method inject_sglang_metrics (line 500) | def inject_sglang_metrics(
    method process_single_datapoint (line 601) | async def process_single_datapoint(self, data_point, all_data, prompt_...
    method _get_server_base_address (line 635) | def _get_server_base_address(self) -> str:
    method wait_for_server (line 643) | def wait_for_server(self):
    method run_batch_evaluation (line 684) | def run_batch_evaluation(self):
  function specdec_generation (line 766) | def specdec_generation(cfg: SpecdecGenerationConfig):

FILE: nemo_skills/inference/eval/swebench.py
  class SupportedAgentFrameworks (line 45) | class SupportedAgentFrameworks(str, Enum):
  class SupportedDatasetTypes (line 52) | class SupportedDatasetTypes(str, Enum):
  class SweBenchInferenceConfig (line 60) | class SweBenchInferenceConfig:
  class SweBenchGenerationConfig (line 105) | class SweBenchGenerationConfig:
  class SweBenchGenerationTask (line 192) | class SweBenchGenerationTask(GenerationTask):
    method __init__ (line 193) | def __init__(self, cfg: SweBenchGenerationConfig):
    method log_example_prompt (line 383) | def log_example_prompt(self, data):
    method setup_prompt (line 386) | def setup_prompt(self):
    method setup_llm (line 389) | def setup_llm(self):
    method setup_litellm_cache (line 392) | def setup_litellm_cache(self):
    method cleanup_litellm_cache (line 395) | def cleanup_litellm_cache(self):
    method evaluate_single_datapoint (line 398) | async def evaluate_single_datapoint(self, data_point):
    method _execute_local_command (line 402) | async def _execute_local_command(self, command, timeout=None):
    method _execute_container_command (line 436) | async def _execute_container_command(self, data_point, command, expect...
    method _run_swe_agent (line 589) | async def _run_swe_agent(self, data_point, api_base):
    method _run_mini_swe_agent (line 661) | async def _run_mini_swe_agent(self, data_point, api_base):
    method _run_openhands (line 757) | async def _run_openhands(self, data_point, api_base):
    method _get_gold_patch (line 891) | async def _get_gold_patch(self, data_point):
    method process_single_datapoint (line 910) | async def process_single_datapoint(self, data_point, data, prompt_form...
    method _process_single_datapoint_impl (line 915) | async def _process_single_datapoint_impl(self, data_point, data):
  function swebench_generation (line 1035) | def swebench_generation(cfg: SweBenchGenerationConfig):

FILE: nemo_skills/inference/factory.py
  class GenerationType (line 18) | class GenerationType(str, Enum):

FILE: nemo_skills/inference/generate.py
  class InferenceConfig (line 66) | class InferenceConfig:
  class GenerationTaskConfig (line 90) | class GenerationTaskConfig:
    method __post_init__ (line 229) | def __post_init__(self):
    method _post_init_validate_data (line 235) | def _post_init_validate_data(self):
    method _post_init_validate_server (line 247) | def _post_init_validate_server(self):
    method _post_init_validate_params (line 251) | def _post_init_validate_params(self):
    method _post_init_deprecated_params (line 263) | def _post_init_deprecated_params(self):
    method _get_disallowed_params (line 267) | def _get_disallowed_params(self):
  class GenerationTask (line 276) | class GenerationTask:
    method get_generation_default_args (line 278) | def get_generation_default_args(cls) -> str:
    method get_server_command_fn (line 289) | def get_server_command_fn(cls) -> callable:
    method get_generation_requirements (line 302) | def get_generation_requirements(cls) -> list[str] | None:
    method __init__ (line 306) | def __init__(self, cfg: GenerationTaskConfig):
    method setup_prompt (line 413) | def setup_prompt(self):
    method setup_llm (line 430) | def setup_llm(self):
    method log_example_prompt (line 519) | def log_example_prompt(self, data):
    method load_data (line 524) | def load_data(self):
    method preprocess_data (line 543) | def preprocess_data(self, data):
    method postprocess (line 547) | def postprocess(self):
    method run_batch_evaluation (line 554) | def run_batch_evaluation(self):
    method skip_completed_samples (line 559) | def skip_completed_samples(self, data):
    method _merge_audio_from_data (line 591) | def _merge_audio_from_data(self, template_filled_messages, data_point):
    method _set_message_text_content (line 619) | def _set_message_text_content(message: dict, text: str) -> None:
    method _append_message_text_suffix (line 635) | def _append_message_text_suffix(message: dict, suffix: str) -> None:
    method fill_prompt (line 651) | def fill_prompt(self, data_point, data, prompt_format=None):
    method dump_outputs (line 717) | def dump_outputs(self, outputs, data_points, fout):
    method drop_fields_from_messages (line 721) | def drop_fields_from_messages(self, output):
    method postprocess_single_output (line 740) | async def postprocess_single_output(self, output, original_data_point):
    method prefill_generation (line 776) | def prefill_generation(self, data_point) -> dict | None:
    method process_single_datapoint (line 781) | async def process_single_datapoint(self, data_point, all_data, prompt_...
    method generate_with_semaphore (line 811) | async def generate_with_semaphore(self, **generation_params):
    method evaluate_single_datapoint (line 832) | async def evaluate_single_datapoint(self, data_point):
    method _generate_and_save_datapoint (line 840) | async def _generate_and_save_datapoint(self, data_point, all_data, fou...
    method async_loop (line 863) | async def async_loop(self, data):
    method restore_async_order (line 910) | def restore_async_order(self):
    method wait_for_server (line 927) | def wait_for_server(self):
    method wait_for_sandbox (line 939) | def wait_for_sandbox(self):
    method setup_litellm_cache (line 943) | def setup_litellm_cache(self):
    method cleanup_litellm_cache (line 952) | def cleanup_litellm_cache(self):
    method generate (line 957) | def generate(self):
  function generate (line 994) | def generate(cfg: GenerationTaskConfig):

FILE: nemo_skills/inference/litellm_hybrid_cache.py
  class HybridCache (line 36) | class HybridCache:
    method __init__ (line 37) | def __init__(
    method _check_no_ttl (line 56) | def _check_no_ttl(self, **kwargs):
    method _load_from_disk (line 61) | def _load_from_disk(self):
    method _save_to_disk (line 68) | def _save_to_disk(self):
    method _start_background_save_thread (line 84) | def _start_background_save_thread(self):
    method _shutdown (line 94) | def _shutdown(self):
    method set_cache (line 101) | def set_cache(self, key, value, **kwargs):
    method async_set_cache (line 108) | async def async_set_cache(self, key, value, **kwargs):
    method async_set_cache_pipeline (line 112) | async def async_set_cache_pipeline(self, cache_list, **kwargs):
    method get_cache (line 117) | def get_cache(self, key, **kwargs):
    method async_get_cache (line 130) | async def async_get_cache(self, key, **kwargs):
    method batch_get_cache (line 134) | def batch_get_cache(self, keys: list, **kwargs):
    method async_batch_get_cache (line 138) | async def async_batch_get_cache(self, keys: list, **kwargs):
    method increment_cache (line 142) | def increment_cache(self, key, value: int, **kwargs) -> int:
    method async_increment (line 150) | async def async_increment(self, key, value: float, **kwargs) -> float:
    method flush_cache (line 154) | def flush_cache(self):
    method delete_cache (line 160) | def delete_cache(self, key):
    method disconnect (line 166) | async def disconnect(self):
    method async_set_cache_sadd (line 170) | async def async_set_cache_sadd(self, key, value: List):
    method force_save (line 179) | def force_save(self):
  class StableLiteLLMCache (line 185) | class StableLiteLLMCache(LiteLLMCache):
    method __init__ (line 194) | def __init__(self, cache_file_path: str, save_interval_seconds: float ...
    method _stable_str (line 201) | def _stable_str(self, value) -> str:
    method get_cache_key (line 207) | def get_cache_key(self, **kwargs) -> str:

FILE: nemo_skills/inference/llm_math_judge.py
  class LlmMathJudgeConfig (line 40) | class LlmMathJudgeConfig(GenerationTaskConfig):
  class LLMMathJudgeTask (line 61) | class LLMMathJudgeTask(GenerationTask):
    method __init__ (line 62) | def __init__(self, cfg: LlmMathJudgeConfig):
    method preprocess_data (line 65) | def preprocess_data(self, data):
    method prefill_generation (line 73) | def prefill_generation(self, data_point):
  function generate (line 87) | def generate(cfg: LlmMathJudgeConfig):

FILE: nemo_skills/inference/log_samples_wandb.py
  function _process_and_log_samples (line 25) | def _process_and_log_samples(jsonl_file, num_samples, output_name, tmpdi...
  function log_random_samples (line 44) | def log_random_samples(jsonl_file, num_samples, project, name, group=None):

FILE: nemo_skills/inference/merge_chunks.py
  function unescape_shell_command (line 23) | def unescape_shell_command(command: str) -> str:

FILE: nemo_skills/inference/model/__init__.py
  function get_model (line 72) | def get_model(server_type, tokenizer=None, model_class: str | None = Non...
  function get_code_execution_model (line 95) | def get_code_execution_model(server_type, tokenizer=None, code_execution...
  function get_parallel_thinking_model (line 104) | def get_parallel_thinking_model(
  function get_tool_calling_model (line 131) | def get_tool_calling_model(
  function server_params (line 153) | def server_params():

FILE: nemo_skills/inference/model/asr_nim.py
  class ASRNIMModel (line 44) | class ASRNIMModel:
    method __init__ (line 69) | def __init__(
    method generate_async (line 129) | async def generate_async(self, prompt: str, **kwargs):
    method _generate_single (line 149) | def _generate_single(
    method __del__ (line 301) | def __del__(self):

FILE: nemo_skills/inference/model/audio_utils.py
  function audio_file_to_base64 (line 30) | def audio_file_to_base64(audio_file_path: str) -> str:
  function load_audio_file (line 44) | def load_audio_file(audio_file_path: str):
  function chunk_audio (line 59) | def chunk_audio(audio_array, sampling_rate, chunk_duration_sec=30, min_c...
  function save_audio_chunk_to_base64 (line 99) | def save_audio_chunk_to_base64(audio_chunk, sampling_rate) -> str:
  function make_audio_content_block (line 133) | def make_audio_content_block(base64_audio: str, audio_format: str = "aud...

FILE: nemo_skills/inference/model/azure.py
  class AzureOpenAIModel (line 20) | class AzureOpenAIModel(OpenAIModel):
    method __init__ (line 23) | def __init__(
    method _get_api_key (line 32) | def _get_api_key(self, api_key: str | None, api_key_env_var: str | Non...

FILE: nemo_skills/inference/model/base.py
  class EndpointType (line 53) | class EndpointType(str, Enum):
  class BaseModel (line 59) | class BaseModel:
    method __init__ (line 76) | def __init__(
    method _get_api_key (line 174) | def _get_api_key(self, api_key: str | None, api_key_env_var: str | Non...
    method __del__ (line 187) | def __del__(self):
    method _maybe_apply_stop_phrase_removal (line 191) | def _maybe_apply_stop_phrase_removal(
    method _get_tokenizer (line 197) | def _get_tokenizer(self, tokenizer: str | None) -> Union[ServerTokeniz...
    method _get_tokenizer_endpoint (line 217) | def _get_tokenizer_endpoint(self) -> str | None:
    method _initialize_tokenizer (line 221) | def _initialize_tokenizer(self, tokenizer: str | None) -> WrapperAutoT...
    method _build_chat_request_params (line 232) | def _build_chat_request_params(self, **kwargs) -> dict:
    method _build_completion_request_params (line 236) | def _build_completion_request_params(self, **kwargs) -> dict:
    method _build_responses_request_params (line 239) | def _build_responses_request_params(self, **kwargs) -> dict:
    method generate_async (line 243) | async def generate_async(
    method _parse_completion_response (line 354) | def _parse_completion_response(
    method _parse_chat_completion_response (line 387) | def _parse_chat_completion_response(self, response, include_response: ...
    method _process_completion_chunk (line 428) | def _process_completion_chunk(self, chunk, emitted_so_far: list):
    method _process_chat_chunk (line 462) | def _process_chat_chunk(self, chunk):
    method _stream_completion_chunks_async (line 495) | async def _stream_completion_chunks_async(self, response):
    method _parse_responses_completion_response (line 502) | def _parse_responses_completion_response(self, response, include_respo...
    method _serialize_output (line 548) | def _serialize_output(self, response):
    method _stream_chat_chunks_async (line 562) | async def _stream_chat_chunks_async(self, response):

FILE: nemo_skills/inference/model/code_execution.py
  class CodeExecutionConfig (line 31) | class CodeExecutionConfig:
  class CodeExecutionWrapper (line 41) | class CodeExecutionWrapper:
    method __init__ (line 42) | def __init__(self, model: BaseModel, sandbox: Sandbox, config: CodeExe...
    method _generate_single (line 47) | async def _generate_single(
    method execute_generated_code (line 234) | async def execute_generated_code(self, input_prompt, code_begin, code_...
    method generate_async (line 250) | async def generate_async(
    method _stream_single (line 321) | async def _stream_single(

FILE: nemo_skills/inference/model/context_retry.py
  function parse_context_window_exceeded_error (line 30) | def parse_context_window_exceeded_error(error) -> Union[Dict[str, int], ...
  class ContextLimitRetryConfig (line 114) | class ContextLimitRetryConfig:
    method __post_init__ (line 123) | def __post_init__(self):
    method reduce_generate_tokens (line 133) | def reduce_generate_tokens(self):
    method reduce_prompt_from_start (line 142) | def reduce_prompt_from_start(self):
    method reduce_prompt_from_end (line 151) | def reduce_prompt_from_end(self):
  function with_context_retry (line 160) | def with_context_retry(func: Callable) -> Callable:
  function handle_context_retries_async (line 183) | async def handle_context_retries_async(
  function handle_context_retries_sync (line 217) | def handle_context_retries_sync(
  function _prepare_context_error_retry (line 251) | def _prepare_context_error_retry(
  function _try_reduce_generation_tokens (line 291) | def _try_reduce_generation_tokens(
  function _try_reduce_prompt_tokens (line 320) | def _try_reduce_prompt_tokens(
  function _trim_string_prompt (line 365) | def _trim_string_prompt(
  function _trim_list_prompt (line 385) | def _trim_list_prompt(
  function _trim_messages_from_end (line 411) | def _trim_messages_from_end(
  function _trim_messages_from_start (line 453) | def _trim_messages_from_start(
  function get_trimmed_content (line 503) | def get_trimmed_content(
  function return_empty_generation_with_error (line 525) | def return_empty_generation_with_error(detailed_error: str, error_reason...

FILE: nemo_skills/inference/model/gemini.py
  class GeminiModel (line 20) | class GeminiModel(BaseModel):
    method __init__ (line 23) | def __init__(self, base_url: str | None = None, *args, **kwargs):
    method _get_api_key (line 34) | def _get_api_key(self, api_key: str | None, api_key_env_var: str | Non...
    method _build_chat_request_params (line 43) | def _build_chat_request_params(

FILE: nemo_skills/inference/model/megatron.py
  class MegatronModel (line 20) | class MegatronModel(BaseModel):
    method __init__ (line 21) | def __init__(self, **kwargs):
    method _build_chat_request_params (line 25) | def _build_chat_request_params(
    method _build_completion_request_params (line 74) | def _build_completion_request_params(
    method _parse_completion_response (line 122) | def _parse_completion_response(
    method _parse_chat_completion_response (line 158) | def _parse_chat_completion_response(

FILE: nemo_skills/inference/model/nim_utils.py
  class TTSExtraConfig (line 26) | class TTSExtraConfig:
  class ASRExtraConfig (line 49) | class ASRExtraConfig:
  function setup_ssh_tunnel (line 84) | def setup_ssh_tunnel(
  function validate_unsupported_params (line 141) | def validate_unsupported_params(kwargs: dict, model_name: str = "NIM mod...

FILE: nemo_skills/inference/model/openai.py
  class OpenAIModel (line 22) | class OpenAIModel(BaseModel):
    method __init__ (line 23) | def __init__(
    method _get_api_key (line 47) | def _get_api_key(self, api_key: str | None, api_key_env_var: str | Non...
    method _is_reasoning_model (line 61) | def _is_reasoning_model(self, model_name: str) -> bool:
    method _build_completion_request_params (line 66) | def _build_completion_request_params(self, **kwargs) -> dict:
    method _build_chat_request_params (line 91) | def _build_chat_request_params(
    method _build_responses_request_params (line 168) | def _build_responses_request_params(self, input, **kwargs) -> dict:

FILE: nemo_skills/inference/model/parallel_thinking.py
  class GenSelectSpecificConfig (line 38) | class GenSelectSpecificConfig:
  class GenSynthesisSpecificConfig (line 44) | class GenSynthesisSpecificConfig:
  class ParallelThinkingConfig (line 50) | class ParallelThinkingConfig:
  class ParallelThinkingTask (line 82) | class ParallelThinkingTask:
    method __init__ (line 88) | def __init__(self, model: BaseModel, tokenizer: str | None, orig_promp...
    method hash_prompt (line 126) | def hash_prompt(cls, prompt: Union[str, List[dict]]) -> str:
    method generate_solutions (line 130) | async def generate_solutions(
    method _load_solutions (line 182) | def _load_solutions(self, input_dir: str) -> Dict[str, List[Dict]]:
    method _get_multiple_solutions (line 230) | async def _get_multiple_solutions(
    method _generate_parallel_thinking_contraction (line 267) | async def _generate_parallel_thinking_contraction(self, prompt: str, s...
    method _extract_selected_solution (line 316) | def _extract_selected_solution(self, generation: str, max_idx: int) ->...
    method _extract_synthesized_solution (line 333) | def _extract_synthesized_solution(self, generation: str) -> str:
    method _run_genselect (line 341) | async def _run_genselect(
    method _run_gensynthesis (line 365) | async def _run_gensynthesis(
    method generate_async (line 389) | async def generate_async(self, prompt: Union[str, List], **kwargs):

FILE: nemo_skills/inference/model/sglang.py
  class SGLangModel (line 18) | class SGLangModel(VLLMModel):
    method _build_chat_request_params (line 25) | def _build_chat_request_params(

FILE: nemo_skills/inference/model/tool_call.py
  class ToolCallingWrapper (line 37) | class ToolCallingWrapper:
    method __init__ (line 44) | def __init__(
    method _execute_tool_call (line 67) | async def _execute_tool_call(self, tool_call, request_id: str, endpoin...
    method _execute_tool_calls (line 100) | async def _execute_tool_calls(self, tool_calls: List, request_id: str,...
    method _count_tool_response_tokens (line 111) | def _count_tool_response_tokens(self, tool_response_messages: list) ->...
    method _coerce_tool_call_dict (line 133) | def _coerce_tool_call_dict(self, tool_call: object) -> dict:
    method _duplicate_reasoning_content_keys (line 142) | def _duplicate_reasoning_content_keys(self, value):
    method _merge_tool_call_delta (line 155) | def _merge_tool_call_delta(self, tool_call_delta: object, tool_call_ac...
    method _finalize_tool_calls (line 183) | def _finalize_tool_calls(self, tool_call_accumulator: dict) -> list[di...
    method generate_async (line 201) | async def generate_async(
    method _stream_single (line 309) | async def _stream_single(

FILE: nemo_skills/inference/model/tts_nim.py
  class TTSNIMModel (line 29) | class TTSNIMModel:
    method __init__ (line 37) | def __init__(
    method _get_available_voices (line 100) | def _get_available_voices(self):
    method _generate_audio_filename (line 125) | def _generate_audio_filename(self, text: str, voice: str, idx: int) ->...
    method _save_audio (line 135) | def _save_audio(self, audio_data: bytes, output_file: Path, sample_rat...
    method generate_async (line 148) | async def generate_async(self, prompt: str, **kwargs):
    method _generate_single (line 168) | def _generate_single(
    method __del__ (line 285) | def __del__(self):

FILE: nemo_skills/inference/model/utils.py
  function trim_after_stop_phrases (line 27) | def trim_after_stop_phrases(text: str, stop_phrases: list[str]) -> str:
  function is_context_window_exceeded_error (line 36) | def is_context_window_exceeded_error(error: Exception) -> bool:
  class ServerTokenizer (line 54) | class ServerTokenizer:
    method __init__ (line 57) | def __init__(self, url):
    method encode (line 61) | def encode(self, prompt: str | list[dict], tools=None) -> list[int]:
    method decode (line 76) | def decode(self, tokens: list) -> str:
  class WrapperAutoTokenizer (line 86) | class WrapperAutoTokenizer:
    method __init__ (line 89) | def __init__(self, model_name: str):
    method encode (line 93) | def encode(self, prompt: str | list[dict], tools=None) -> list[int]:
    method decode (line 104) | def decode(self, tokens: list[int]) -> str:
  class RequestException (line 109) | class RequestException(RuntimeError):

FILE: nemo_skills/inference/model/vllm.py
  function encode_image_to_base64 (line 31) | def encode_image_to_base64(image_path: str) -> str:
  function process_image_content (line 48) | def process_image_content(content: list | str | None, data_dir: str = ""...
  class VLLMModel (line 93) | class VLLMModel(BaseModel):
    method __init__ (line 94) | def __init__(self, **kwargs):
    method _get_tokenizer_endpoint (line 97) | def _get_tokenizer_endpoint(self):
    method _build_request_body (line 114) | def _build_request_body(self, top_k, min_p, repetition_penalty, extra_...
    method _build_completion_request_params (line 129) | def _build_completion_request_params(
    method _build_chat_request_params (line 172) | def _build_chat_request_params(
    method _build_responses_request_params (line 224) | def _build_responses_request_params(self, input, **kwargs) -> dict:

FILE: nemo_skills/inference/model/vllm_multimodal.py
  class VLLMMultimodalModel (line 47) | class VLLMMultimodalModel(VLLMModel):
    method __init__ (line 72) | def __init__(
    method _is_local_url (line 116) | def _is_local_url(self, base_url: str | None) -> bool:
    method _get_api_key (line 130) | def _get_api_key(self, api_key: str | None, api_key_env_var: str | Non...
    method _build_request_body (line 183) | def _build_request_body(self, top_k, min_p, repetition_penalty, extra_...
    method _parse_chat_completion_response (line 218) | def _parse_chat_completion_response(self, response, include_response: ...
    method _process_audio_response (line 251) | def _process_audio_response(self, audio_data, response_id: str) -> dict:
    method _preprocess_messages_for_model (line 287) | def _preprocess_messages_for_model(self, messages: list[dict]) -> list...
    method content_text_to_list (line 300) | def content_text_to_list(self, message: dict) -> dict:
    method _needs_audio_chunking (line 345) | def _needs_audio_chunking(self, messages: list[dict], task_type: str =...
    method _generate_with_chunking (line 388) | async def _generate_with_chunking(
    method generate_async (line 471) | async def generate_async(

FILE: nemo_skills/inference/patch_litellm_logging.py
  class NoOpLoggingWorker (line 28) | class NoOpLoggingWorker:
    method __init__ (line 31) | def __init__(self, *args, **kwargs):
    method _ensure_queue (line 34) | def _ensure_queue(self) -> None:
    method start (line 37) | def start(self) -> None:
    method _worker_loop (line 40) | async def _worker_loop(self) -> None:
    method enqueue (line 43) | def enqueue(self, coroutine: Coroutine) -> None:
    method ensure_initialized_and_enqueue (line 47) | def ensure_initialized_and_enqueue(self, async_coroutine: Coroutine):
    method stop (line 51) | async def stop(self) -> None:
    method flush (line 54) | async def flush(self) -> None:
    method clear_queue (line 57) | async def clear_queue(self):
  function patch_litellm_logging_worker (line 61) | def patch_litellm_logging_worker():

FILE: nemo_skills/inference/prover.py
  class ProverConfig (line 55) | class ProverConfig(GenerationTaskConfig):
    method _post_init_validate_params (line 72) | def _post_init_validate_params(self):
  class ProverTask (line 97) | class ProverTask(GenerationTask):
    method __init__ (line 98) | def __init__(self, cfg: ProverConfig):
    method log_example_prompt (line 118) | def log_example_prompt(self, data):
    method setup_llm (line 121) | def setup_llm(self):
    method setup_refine_prompt (line 126) | def setup_refine_prompt(self):
    method _generate_single_completion (line 138) | async def _generate_single_completion(self, prompt: str, **kwargs):
    method _extract_and_replace_code (line 184) | async def _extract_and_replace_code(self, formal_statement, generation):
    method _transform_for_nemotron_refinement (line 189) | def _transform_for_nemotron_refinement(self, proof_attempt: str, error...
    method _parse_gpt_oss_output (line 198) | def _parse_gpt_oss_output(self, content: str) -> tuple[str, str | None]:
    method _make_assistant_message (line 236) | def _make_assistant_message(self, content: str, reasoning_content: str...
    method _single_data_point_generate (line 254) | async def _single_data_point_generate(self, data_point, data):
    method pass_at_N (line 440) | async def pass_at_N(self, data_point, data, N=None):
    method process_single_datapoint (line 457) | async def process_single_datapoint(self, data_point, all_data, prompt_...
  function generate (line 469) | def generate(cfg: ProverConfig):

FILE: nemo_skills/inference/retrieve_similar.py
  function top_k_similarity (line 33) | def top_k_similarity(from_emb, to_emb, top_k, chunk_size):
  function encode (line 51) | def encode(model, data, batch_size):
  function read_data (line 55) | def read_data(file_paths, retrieve_key) -> list:
  class RetrieveSimilarConfig (line 64) | class RetrieveSimilarConfig:
    method __post_init__ (line 86) | def __post_init__(self):
  function retrieve_similar (line 105) | def retrieve_similar(cfg: RetrieveSimilarConfig):

FILE: nemo_skills/inference/server/serve_riva_nim.py
  function main (line 20) | def main():

FILE: nemo_skills/inference/server/serve_sglang.py
  function main (line 20) | def main():

FILE: nemo_skills/inference/server/serve_unified.py
  function setup_pythonpath (line 68) | def setup_pythonpath(code_path: Optional[str] = None):
  function apply_safetensors_patch (line 103) | def apply_safetensors_patch(hack_path: Optional[str]):
  function load_yaml_config (line 119) | def load_yaml_config(config_path: str) -> dict:
  function _coerce_value (line 127) | def _coerce_value(value: str):
  function parse_extra_args (line 144) | def parse_extra_args(extra_args: list) -> dict:
  function main (line 186) | def main():

FILE: nemo_skills/inference/server/serve_vllm.py
  function main (line 20) | def main():

FILE: nemo_skills/inference/server/serve_vllm_dp_ray.py
  function _apply_vllm_patches (line 98) | def _apply_vllm_patches() -> None:
  function _reserve_head_placement_group (line 280) | def _reserve_head_placement_group(
  function _patch_signal_for_thread_safety (line 337) | def _patch_signal_for_thread_safety() -> None:
  function _build_vllm_argv (line 353) | def _build_vllm_argv(args: argparse.Namespace, extra: Sequence[str]) -> ...
  function main (line 411) | def main() -> None:

FILE: nemo_skills/inference/structured_outputs.py
  class HLEJudgeAAResponseFormat (line 20) | class HLEJudgeAAResponseFormat(BaseModel):

FILE: nemo_skills/inference/tournament_utils.py
  class KnockoutTournamentManager (line 25) | class KnockoutTournamentManager:
    method __init__ (line 26) | def __init__(
    method load_prompt_template (line 40) | def load_prompt_template(self, prompt_config_path: str) -> str:
    method _llm_call (line 45) | async def _llm_call(self, prompt: str, req_seed: int) -> Tuple[str, int]:
    method format_participants (line 57) | def format_participants(self, participants: List[Tuple[int, str]], com...
    method extract_winner_from_result (line 61) | def extract_winner_from_result(
    method validate_participant (line 67) | def validate_participant(self, participant: str) -> bool:
    method run_single_game (line 71) | async def run_single_game(
    method run_tournament (line 103) | async def run_tournament(
  class ProofKnockoutTournamentManager (line 186) | class ProofKnockoutTournamentManager(KnockoutTournamentManager):
    method format_participants (line 192) | def format_participants(self, participants: List[Tuple[int, str]], com...
    method extract_winner_from_result (line 198) | def extract_winner_from_result(
    method validate_participant (line 227) | def validate_participant(self, participant: str) -> bool:

FILE: nemo_skills/mcp/adapters.py
  class ToolSchemaAdapter (line 29) | class ToolSchemaAdapter(ABC):
    method convert (line 31) | def convert(self, tools: list[dict]) -> list[dict]:
  class ToolCallInterpreter (line 36) | class ToolCallInterpreter(ABC):
    method parse (line 38) | def parse(self, raw_call: dict) -> dict:
  class ToolResponseFormatter (line 42) | class ToolResponseFormatter(ABC):
    method format (line 44) | def format(self, tool_call: ChatCompletionMessageToolCall, result: dic...
  function load_schema_overrides (line 54) | def load_schema_overrides(schema_overrides: dict | None) -> Dict[str, Di...
  function apply_schema_overrides (line 92) | def apply_schema_overrides(
  function remap_tool_call (line 131) | def remap_tool_call(tool_name: str, args: dict, mappings: dict) -> tuple...
  function format_tool_list_by_endpoint_type (line 139) | def format_tool_list_by_endpoint_type(
  class OpenAICallInterpreter (line 198) | class OpenAICallInterpreter(ToolCallInterpreter):
    method parse (line 199) | def parse(self, tool_call):
  class CompletionResponseFormatter (line 205) | class CompletionResponseFormatter(ToolResponseFormatter):
    method format (line 207) | def format(self, tool_call: ChatCompletionMessageToolCall, result):
  function format_tool_response_by_endpoint_type (line 215) | def format_tool_response_by_endpoint_type(tool_call, result, endpoint_ty...
  function get_tool_details_by_endpoint_type (line 233) | def get_tool_details_by_endpoint_type(tool_call, endpoint_type: Endpoint...

FILE: nemo_skills/mcp/clients.py
  function _process_hide_args (line 31) | def _process_hide_args(result, hide_args):
  function _filter_tools (line 49) | def _filter_tools(result, disabled_tools, enabled_tools):
  function async_wrapper (line 76) | def async_wrapper(method):
  function _sanitize_input_args_for_tool (line 95) | def _sanitize_input_args_for_tool(args_dict, tool_name, hide_args):
  function _extract_item (line 109) | def _extract_item(item) -> Any:
  function _extract_tool_result (line 124) | def _extract_tool_result(result) -> Any:
  function _wrap_call_tool_output_formatter (line 156) | def _wrap_call_tool_output_formatter(method):
  function inject_hide_args (line 185) | def inject_hide_args(init_func):
  class MCPClientMeta (line 217) | class MCPClientMeta(type):
    method __new__ (line 269) | def __new__(mcls, name, bases, namespace):
    method __call__ (line 286) | def __call__(cls, *args, **kwargs):
  class MCPClient (line 299) | class MCPClient(metaclass=MCPClientMeta):
    method sanitize (line 333) | def sanitize(self, tool: str, args: dict) -> dict:
    method list_tools (line 338) | async def list_tools(self):
    method call_tool (line 342) | async def call_tool(self, tool: str, args: dict) -> Any:
    method _assert_tool_allowed (line 346) | def _assert_tool_allowed(self, tool: str):
  class MCPStreamableHttpClient (line 354) | class MCPStreamableHttpClient(MCPClient):
    method __init__ (line 378) | def __init__(self, base_url: str):
    method list_tools (line 382) | async def list_tools(self):
    method call_tool (line 404) | async def call_tool(self, tool: str, args: dict) -> Any:
  class MCPStdioClient (line 413) | class MCPStdioClient(MCPClient):
    method __init__ (line 437) | def __init__(self, command: str, args: list[str] | None = None):
    method list_tools (line 444) | async def list_tools(self):
    method call_tool (line 464) | async def call_tool(self, tool: str, args: dict) -> Any:

FILE: nemo_skills/mcp/config.py
  class MCPAdaptersConfig (line 37) | class MCPAdaptersConfig:
  class MCPClientParamsBase (line 44) | class MCPClientParamsBase:
  class MCPStdioClientParams (line 53) | class MCPStdioClientParams(MCPClientParamsBase):
  class MCPStreamableHttpClientParams (line 59) | class MCPStreamableHttpClientParams(MCPClientParamsBase):
  class MCPToolConfig (line 64) | class MCPToolConfig:
  class MCPConfig (line 71) | class MCPConfig:
  function _is_locate_mapping (line 83) | def _is_locate_mapping(value: Any) -> bool:
  function _resolve_special (line 94) | def _resolve_special(value: Any, full_cfg: DictConfig) -> Any:
  function _resolve_locate_mapping (line 100) | def _resolve_locate_mapping(spec: Mapping, full_cfg: DictConfig) -> Any:
  function resolve_value (line 109) | def resolve_value(value: Any, full_cfg: DictConfig) -> Any:
  function resolve_adapters (line 115) | def resolve_adapters(cfg: DictConfig):

FILE: nemo_skills/mcp/servers/chemistry/periodictable_tool.py
  function _resolve_element (line 37) | def _resolve_element(name_or_symbol: str):
  function element_info (line 50) | def element_info(
  function isotope_info (line 86) | def isotope_info(
  class PeriodictableTool (line 120) | class PeriodictableTool(Tool):
    method __init__ (line 121) | def __init__(self) -> None:
    method default_config (line 124) | def default_config(self) -> dict[str, Any]:
    method configure (line 127) | def configure(self, overrides: dict[str, Any] | None = None, context: ...
    method list_tools (line 131) | async def list_tools(self) -> list[dict[str, Any]]:
    method execute (line 158) | async def execute(self, tool_name: str, arguments: dict[str, Any], ext...

FILE: nemo_skills/mcp/servers/exa_tool.py
  class ExecutionResult (line 30) | class ExecutionResult:
  function exa_websearch (line 42) | async def exa_websearch(
  function main (line 67) | def main():
  class ExaTool (line 86) | class ExaTool(MCPClientTool):
    method __init__ (line 87) | def __init__(self) -> None:
  class ExaMCPTool (line 103) | class ExaMCPTool(MCPClientTool):
    method __init__ (line 104) | def __init__(self) -> None:

FILE: nemo_skills/mcp/servers/physics/coolprop_tool.py
  function fluid_property (line 54) | def fluid_property(
  function fluid_list (line 86) | def fluid_list() -> str:
  class CoolPropTool (line 94) | class CoolPropTool(Tool):
    method __init__ (line 95) | def __init__(self) -> None:
    method default_config (line 98) | def default_config(self) -> dict[str, Any]:
    method configure (line 101) | def configure(self, overrides: dict[str, Any] | None = None, context: ...
    method list_tools (line 105) | async def list_tools(self) -> list[dict[str, Any]]:
    method execute (line 131) | async def execute(self, tool_name: str, arguments: dict[str, Any], ext...

FILE: nemo_skills/mcp/servers/physics/particle_tool.py
  function _format_particle (line 39) | def _format_particle(p) -> str:
  function particle_lookup (line 63) | def particle_lookup(
  function particle_search (line 91) | def particle_search(
  class ParticleTool (line 112) | class ParticleTool(Tool):
    method __init__ (line 113) | def __init__(self) -> None:
    method default_config (line 116) | def default_config(self) -> dict[str, Any]:
    method configure (line 119) | def configure(self, overrides: dict[str, Any] | None = None, context: ...
    method list_tools (line 124) | async def list_tools(self) -> list[dict[str, Any]]:
    method execute (line 146) | async def execute(self, tool_name: str, arguments: dict[str, Any], ext...

FILE: nemo_skills/mcp/servers/physics/radioactivedecay_tool.py
  function nuclide_info (line 40) | def nuclide_info(
  function decay_chain (line 81) | def decay_chain(
  class RadioactivedecayTool (line 116) | class RadioactivedecayTool(Tool):
    method __init__ (line 117) | def __init__(self) -> None:
    method default_config (line 120) | def default_config(self) -> dict[str, Any]:
    method configure (line 123) | def configure(self, overrides: dict[str, Any] | None = None, context: ...
    method list_tools (line 137) | async def list_tools(self) -> list[dict[str, Any]]:
    method execute (line 164) | async def execute(self, tool_name: str, arguments: dict[str, Any], ext...

FILE: nemo_skills/mcp/servers/python_tool.py
  class ExecutionResult (line 36) | class ExecutionResult:
  function stateful_python_code_exec (line 54) | async def stateful_python_code_exec(
  function main (line 71) | def main():
  class PythonTool (line 107) | class PythonTool(MCPClientTool):
    method __init__ (line 108) | def __init__(self) -> None:
    method execute (line 128) | async def execute(self, tool_name: str, arguments: Dict[str, Any], ext...
    method shutdown (line 143) | async def shutdown(self) -> None:
  class DirectPythonTool (line 147) | class DirectPythonTool(Tool):
    method __init__ (line 162) | def __init__(self) -> None:
    method default_config (line 173) | def default_config(self) -> Dict[str, Any]:
    method configure (line 176) | def configure(self, overrides: Dict[str, Any] | None = None, context: ...
    method list_tools (line 192) | async def list_tools(self) -> List[Dict[str, Any]]:
    method execute (line 207) | async def execute(
    method shutdown (line 258) | async def shutdown(self) -> None:
    method cleanup_request (line 274) | async def cleanup_request(self, request_id: str) -> None:

FILE: nemo_skills/mcp/servers/tavily_search_tool.py
  class ExecutionResult (line 33) | class ExecutionResult:
  function answer (line 61) | async def answer(
  function _parse_exclude_domains (line 128) | def _parse_exclude_domains(exclude_config: dict) -> list[str]:
  class TavilySearchTool (line 139) | class TavilySearchTool(MCPClientTool):
    method __init__ (line 140) | def __init__(self) -> None:
    method post_configure (line 156) | def post_configure(self) -> None:
    method execute (line 165) | async def execute(self, tool_name: str, arguments: dict[str, Any], ext...
  function main (line 183) | def main():

FILE: nemo_skills/mcp/servers/web/arxiv_tool.py
  function _cache_key (line 92) | def _cache_key(*args: Any) -> str:
  function _cache_get (line 97) | def _cache_get(key: str) -> str | None:
  function _cache_set (line 102) | def _cache_set(key: str, value: str) -> None:
  function _paper_cache_get (line 110) | def _paper_cache_get(key: str) -> tuple[str, str] | None:
  function _paper_cache_set (line 118) | def _paper_cache_set(key: str, value: tuple[str, str]) -> None:
  function _reconstruct_abstract (line 126) | def _reconstruct_abstract(inv_idx: dict[str, list[int]] | None) -> str:
  function _truncate (line 139) | def _truncate(text: str, limit: int = ABSTRACT_LIMIT) -> str:
  class _ArxivHTMLTextParser (line 148) | class _ArxivHTMLTextParser(HTMLParser):
    method __init__ (line 155) | def __init__(self) -> None:
    method handle_starttag (line 162) | def handle_starttag(self, tag: str, attrs: list[tuple[str, str | None]...
    method handle_endtag (line 176) | def handle_endtag(self, tag: str) -> None:
    method handle_data (line 193) | def handle_data(self, data: str) -> None:
    method text (line 203) | def text(self) -> str:
  function _normalize_id (line 210) | def _normalize_id(paper_id: str) -> str:
  function _extract_arxiv_id (line 234) | def _extract_arxiv_id(paper_id: str) -> str | None:
  function _fetch_paper_text (line 251) | async def _fetch_paper_text(paper_id: str) -> tuple[str, str]:
  function _section_offsets (line 287) | def _section_offsets(text: str) -> list[tuple[int, int, str]]:
  function _format_openalex_work (line 297) | def _format_openalex_work(work: dict[str, Any], include_abstract: bool =...
  function _format_arxiv_entry (line 336) | def _format_arxiv_entry(entry: dict[str, Any], include_abstract: bool = ...
  function _parse_arxiv_atom (line 368) | def _parse_arxiv_atom(feed_text: str) -> list[dict[str, Any]]:
  function _arxiv_api_search (line 416) | async def _arxiv_api_search(query: str, max_results: int) -> str:
  function _http_get_json (line 436) | async def _http_get_json(client: httpx.AsyncClient, url: str, params: di...
  function _arxiv_rate_limit (line 485) | async def _arxiv_rate_limit() -> None:
  function arxiv_search (line 542) | async def arxiv_search(
  function arxiv_get (line 578) | async def arxiv_get(
  function arxiv_sections (line 625) | async def arxiv_sections(
  function arxiv_read_chunk (line 667) | async def arxiv_read_chunk(
  function _arxiv_api_get (line 717) | async def _arxiv_api_get(arxiv_id: str) -> str:
  class ArxivSearchTool (line 756) | class ArxivSearchTool(Tool):
    method __init__ (line 759) | def __init__(self) -> None:
    method default_config (line 766) | def default_config(self) -> dict[str, Any]:
    method configure (line 769) | def configure(self, overrides: dict[str, Any] | None = None, context: ...
    method list_tools (line 785) | async def list_tools(self) -> list[dict[str, Any]]:
    method execute (line 829) | async def execute(self, tool_name: str, arguments: dict[str, Any], ext...

FILE: nemo_skills/mcp/servers/web/wikipedia_tool.py
  function _cache_key (line 84) | def _cache_key(*args: Any) -> str:
  function _cache_get (line 89) | def _cache_get(key: str) -> str | None:
  function _cache_set (line 94) | def _cache_set(key: str, value: str) -> None:
  function _strip_html (line 101) | def _strip_html(s: str) -> str:
  function _truncate (line 109) | def _truncate(text: str, limit: int) -> str:
  function _page_url (line 118) | def _page_url(title: str) -> str:
  function _sentence_split (line 123) | def _sentence_split(text: str) -> list[str]:
  function _page_extract (line 131) | async def _page_extract(title: str) -> tuple[str, str, str] | tuple[None...
  function _http_get_json (line 165) | async def _http_get_json(client: httpx.AsyncClient, url: str, params: di...
  function _retry_after_seconds (line 195) | def _retry_after_seconds(response: httpx.Response) -> float | None:
  function _rate_limit (line 206) | async def _rate_limit() -> None:
  function wikipedia_search (line 242) | async def wikipedia_search(
  function wikipedia_page (line 297) | async def wikipedia_page(
  function wikipedia_summary (line 357) | async def wikipedia_summary(
  function wikipedia_sections (line 401) | async def wikipedia_sections(
  function wikipedia_query_summary (line 447) | async def wikipedia_query_summary(
  function wikipedia_key_facts (line 484) | async def wikipedia_key_facts(
  function wikipedia_section (line 519) | async def wikipedia_section(
  function _suggest_titles (line 606) | async def _suggest_titles(query: str, n: int = 5) -> list[str]:
  class WikipediaSearchTool (line 630) | class WikipediaSearchTool(Tool):
    method __init__ (line 633) | def __init__(self) -> None:
    method default_config (line 641) | def default_config(self) -> dict[str, Any]:
    method configure (line 644) | def configure(self, overrides: dict[str, Any] | None = None, context: ...
    method list_tools (line 664) | async def list_tools(self) -> list[dict[str, Any]]:
    method execute (line 737) | async def execute(self, tool_name: str, arguments: dict[str, Any], ext...

FILE: nemo_skills/mcp/tool_manager.py
  class FatalToolError (line 34) | class FatalToolError(Exception):
  class Tool (line 44) | class Tool(ABC):
    method default_config (line 53) | def default_config(self) -> Dict[str, Any]:
    method configure (line 57) | def configure(self, overrides: Dict[str, Any] | None = None, context: ...
    method list_tools (line 61) | async def list_tools(self) -> List[Dict[str, Any]]:
    method execute (line 65) | async def execute(
    method cleanup_request (line 70) | async def cleanup_request(self, request_id: str) -> None:  # Optional ...
    method shutdown (line 73) | async def shutdown(self) -> None:  # Optional hook
    method post_configure (line 76) | def post_configure(self) -> None:
  class ToolManager (line 80) | class ToolManager:
    method __init__ (line 89) | def __init__(
    method shutdown (line 120) | async def shutdown(self) -> None:
    method cleanup_request (line 128) | async def cleanup_request(self, request_id: str) -> None:
    method list_all_tools (line 132) | async def list_all_tools(self, use_cache: bool = True) -> List[Dict[st...
    method _resolve (line 174) | def _resolve(self, qualified_name: str) -> tuple[Tool, str]:
    method execute_tool (line 183) | async def execute_tool(self, raw_name: str, args: Dict[str, Any], extr...

FILE: nemo_skills/mcp/tool_providers.py
  class MCPClientTool (line 26) | class MCPClientTool(Tool):
    method __init__ (line 40) | def __init__(self) -> None:
    method apply_config_updates (line 54) | def apply_config_updates(self, updates: Dict[str, Any] | None) -> None:
    method default_config (line 60) | def default_config(self) -> Dict[str, Any]:
    method _resolve_maybe_callable (line 63) | def _resolve_maybe_callable(self, value: Any):
    method post_configure (line 74) | def post_configure(self) -> None:
    method configure (line 77) | def configure(self, overrides: Dict[str, Any] | None = None, context: ...
    method list_tools (line 123) | async def list_tools(self) -> List[Dict[str, Any]]:
    method execute (line 126) | async def execute(self, tool_name: str, arguments: Dict[str, Any], ext...

FILE: nemo_skills/mcp/utils.py
  function exa_auth_connector (line 34) | def exa_auth_connector(client: MCPStreamableHttpClient):
  function exa_stdio_connector (line 38) | def exa_stdio_connector(client: MCPStdioClient):
  function exa_output_formatter (line 45) | def exa_output_formatter(result: CallToolResult):
  function hydra_config_connector_factory (line 52) | def hydra_config_connector_factory(config_obj):
  function load_mcp_config (line 73) | def load_mcp_config(
  function add_config_args (line 107) | def add_config_args(parser):

FILE: nemo_skills/pipeline/app.py
  function typer_unpacker (line 25) | def typer_unpacker(f: Callable):

FILE: nemo_skills/pipeline/cli.py
  function wrap_arguments (line 44) | def wrap_arguments(arguments: str):

FILE: nemo_skills/pipeline/convert.py
  function get_hf_to_trtllm_cmd (line 37) | def get_hf_to_trtllm_cmd(
  function get_hf_to_megatron_cmd (line 99) | def get_hf_to_megatron_cmd(
  class SupportedTypes (line 126) | class SupportedTypes(str, Enum):
  class SupportedFormatsTo (line 132) | class SupportedFormatsTo(str, Enum):
  class SupportedFormatsFrom (line 138) | class SupportedFormatsFrom(str, Enum):
  class SupportedDtypes (line 142) | class SupportedDtypes(str, Enum):
  function convert (line 151) | def convert(

FILE: nemo_skills/pipeline/dataset.py
  function _get_dataset_module_from_cluster (line 36) | def _get_dataset_module_from_cluster(cluster_config, mounted_path):
  function get_dataset_module (line 50) | def get_dataset_module(dataset, data_dir=None, cluster_config=None, extr...

FILE: nemo_skills/pipeline/eval.py
  class SingleNodeMode (line 44) | class SingleNodeMode(str, enum.Enum):
  function _resolve_child_sbatch_kwargs (line 49) | def _resolve_child_sbatch_kwargs(sbatch_kwargs, child_sbatch_kwargs):
  function _create_llm_judge_tasks (line 55) | def _create_llm_judge_tasks(
  function eval (line 136) | def eval(

FILE: nemo_skills/pipeline/generate.py
  function _create_job_unified (line 50) | def _create_job_unified(
  function generate (line 216) | def generate(

FILE: nemo_skills/pipeline/judges/comet_judge.py
  function create_judge_tasks (line 26) | def create_judge_tasks(

FILE: nemo_skills/pipeline/judges/nvembed_judge.py
  function create_judge_tasks (line 26) | def create_judge_tasks(

FILE: nemo_skills/pipeline/megatron_lm/train.py
  function get_training_cmd (line 38) | def get_training_cmd(
  function train_megatron_lm (line 96) | def train_megatron_lm(

FILE: nemo_skills/pipeline/nemo_evaluator.py
  function nemo_evaluator (line 113) | def nemo_evaluator(
  function _create_serving_command_obj (line 439) | def _create_serving_command_obj(
  class _TaskCreationContext (line 509) | class _TaskCreationContext:
  function _hardware_for_group (line 560) | def _hardware_for_group(
  function _build_main_server_if_needed (line 594) | def _build_main_server_if_needed(ctx: _TaskCreationContext) -> Optional[...
  function _build_judge_server_if_needed (line 619) | def _build_judge_server_if_needed(ctx: _TaskCreationContext) -> Optional...
  function _build_client_command (line 644) | def _build_client_command(
  function _build_task_cmd (line 671) | def _build_task_cmd(
  class EvaluatorClientScript (line 749) | class EvaluatorClientScript(BaseJobScript):
    method __post_init__ (line 757) | def __post_init__(self):

FILE: nemo_skills/pipeline/nemo_gym_rollouts.py
  function nemo_gym_rollouts (line 77) | def nemo_gym_rollouts(

FILE: nemo_skills/pipeline/nemo_rl/average_checkpoints.py
  class SupportedBackends (line 29) | class SupportedBackends(str, Enum):
  function list_candidate_model_dirs (line 34) | def list_candidate_model_dirs(checkpoint_dir, steps):
  function find_index_json (line 46) | def find_index_json(model_dir):
  function build_key_to_shard_map (line 54) | def build_key_to_shard_map(model_dir):
  function copy_side_files (line 103) | def copy_side_files(src_model_dir, dst_dir):
  function convert_fsdp_bin_to_safetensors (line 121) | def convert_fsdp_bin_to_safetensors(model_dir):
  function main (line 170) | def main():

FILE: nemo_skills/pipeline/nemo_rl/grpo.py
  class SupportedBackends (line 52) | class SupportedBackends(str, Enum):
  class NemoRLTask (line 58) | class NemoRLTask:
    method format_train_args (line 76) | def format_train_args(self):
    method format_data_args (line 93) | def format_data_args(self):
    method format_wandb_args (line 99) | def format_wandb_args(self):
    method get_cmd (line 135) | def get_cmd(self):
  function get_training_cmd (line 152) | def get_training_cmd(
  function get_checkpoint_convert_cmd (line 195) | def get_checkpoint_convert_cmd(output_dir, final_hf_path, step, backend,...
  function get_checkpoint_average_cmd (line 220) | def get_checkpoint_average_cmd(output_dir, average_steps, backend, remov...
  function grpo_nemo_rl (line 242) | def grpo_nemo_rl(

FILE: nemo_skills/pipeline/nemo_rl/sft.py
  class SupportedBackends (line 49) | class SupportedBackends(str, Enum):
  class NemoRLTask (line 55) | class NemoRLTask:
    method format_train_args (line 73) | def format_train_args(self):
    method format_data_args (line 88) | def format_data_args(self):
    method format_wandb_args (line 94) | def format_wandb_args(self):
    method get_cmd (line 114) | def get_cmd(self):
  function get_training_cmd (line 131) | def get_training_cmd(
  function get_checkpoint_convert_cmd (line 174) | def get_checkpoint_convert_cmd(output_dir, final_hf_path, step, backend,...
  function get_checkpoint_average_cmd (line 199) | def get_checkpoint_average_cmd(output_dir, average_steps, backend, remov...
  function sft_nemo_rl (line 221) | def sft_nemo_rl(

FILE: nemo_skills/pipeline/prepare_data.py
  function _parse_prepare_cli_arguments (line 44) | def _parse_prepare_cli_arguments(args: list[str]) -> tuple[list[str], li...
  function _is_external_dataset (line 52) | def _is_external_dataset(dataset: str, extra_benchmark_map: dict[str, st...
  function _get_container_dataset_path (line 56) | def _get_container_dataset_path(dataset: str, extra_benchmark_map: dict[...
  function _build_command (line 62) | def _build_command(
  function prepare_data (line 111) | def prepare_data(

FILE: nemo_skills/pipeline/robust_eval.py
  class PromptConfig (line 33) | class PromptConfig:
  function robust_eval (line 40) | def robust_eval(

FILE: nemo_skills/pipeline/run_cmd.py
  function get_cmd (line 34) | def get_cmd(command):
  function run_cmd (line 46) | def run_cmd(

FILE: nemo_skills/pipeline/setup.py
  function is_docker_available (line 29) | def is_docker_available():
  function pull_docker_containers (line 38) | def pull_docker_containers(containers):
  function setup (line 57) | def setup():

FILE: nemo_skills/pipeline/start_server.py
  function get_gradio_chat_cmd (line 42) | def get_gradio_chat_cmd(model, server_type, extra_args):
  function create_job_tunnel (line 52) | def create_job_tunnel(
  function launch_server (line 112) | def launch_server(
  function stop_server (line 199) | def stop_server(exp):
  function start_server (line 207) | def start_server(

FILE: nemo_skills/pipeline/summarize_results.py
  function get_subset_name (line 43) | def get_subset_name(benchmark: str, subset: str) -> str:
  function _set_asr_leaderboard_macro_wer (line 50) | def _set_asr_leaderboard_macro_wer(metrics: dict):
  function add_benchmark_groups (line 61) | def add_benchmark_groups(results, metrics_to_print, evaluations_to_print):
  function summarize_results (line 148) | def summarize_results(

FILE: nemo_skills/pipeline/summarize_robustness.py
  function get_metrics (line 43) | def get_metrics(prediction_files: List[str]) -> List[float] | List[float]:
  function summarize_robustness (line 81) | def summarize_robustness(

FILE: nemo_skills/pipeline/utils/cluster.py
  function _parse_slurm_timeout (line 43) | def _parse_slurm_timeout(value: str) -> timedelta:
  function _get_timeout (line 77) | def _get_timeout(cluster_config, partition, with_save_delay: bool = True...
  function get_slurm_timeout_str (line 93) | def get_slurm_timeout_str(cluster_config, partition, with_save_delay: bo...
  function get_timeout_str (line 102) | def get_timeout_str(cluster_config, partition, with_save_delay: bool = T...
  function kwargs_to_string (line 109) | def kwargs_to_string(kwargs: str | dict) -> dict:
  function parse_kwargs (line 121) | def parse_kwargs(kwargs: str | dict | None, **extra_kwargs) -> dict | None:
  function get_env_variables (line 163) | def get_env_variables(cluster_config):
  function temporary_env_update (line 281) | def temporary_env_update(cluster_config, updates):
  function read_config (line 293) | def read_config(config_file):
  function get_cluster_config (line 315) | def get_cluster_config(cluster=None, config_dir=None):
  function update_ssh_tunnel_config (line 372) | def update_ssh_tunnel_config(cluster_config: dict):
  function _get_tunnel_cached (line 416) | def _get_tunnel_cached(
  function tunnel_hash (line 446) | def tunnel_hash(tunnel):
  function get_tunnel (line 452) | def get_tunnel(cluster_config):
  class OutputWatcher (line 461) | class OutputWatcher(StreamWatcher):
    method submit (line 464) | def submit(self, stream):
  function progress_callback (line 470) | def progress_callback(transferred: int, total: int) -> None:
  function cluster_download_file (line 481) | def cluster_download_file(cluster_config: dict, remote_file: str, local_...
  function cluster_path_exists (line 486) | def cluster_path_exists(cluster_config: dict, remote_path: str):
  function cluster_download_dir (line 492) | def cluster_download_dir(
  function cluster_upload (line 566) | def cluster_upload(cluster_config: dict, local_file: str, remote_dir: st...

FILE: nemo_skills/pipeline/utils/commands.py
  function vllm_server_command (line 28) | def vllm_server_command(
  function sandbox_command (line 77) | def sandbox_command(cluster_config: Dict, port: int, **kwargs) -> Tuple[...
  function wrap_command (line 114) | def wrap_command(command: str, working_dir: str = "/nemo_run/code", env_...

FILE: nemo_skills/pipeline/utils/declarative.py
  class Command (line 212) | class Command:
    method prepare_for_execution (line 240) | def prepare_for_execution(self, cluster_config: Dict) -> Tuple[run.Scr...
    method get_name (line 328) | def get_name(self) -> str:
  class HardwareConfig (line 333) | class HardwareConfig:
  class CommandGroup (line 344) | class CommandGroup:
    method __init__ (line 347) | def __init__(
  class Pipeline (line 360) | class Pipeline:
    method __init__ (line 370) | def __init__(
    method _validate (line 398) | def _validate(self):
    method run (line 427) | def run(self, dry_run: bool = False, log_dir: Optional[str] = None, _r...
    method _prepare_command (line 566) | def _prepare_command(self, command, cluster_config: Dict) -> Tuple[run...
    method _rewrite_local_paths (line 581) | def _rewrite_local_paths(self, script: run.Script) -> run.Script:
    method _resolve_container (line 610) | def _resolve_container(self, exec_config: Dict, command, cluster_confi...
    method _create_executor (line 617) | def _create_executor(
    method _plan_and_add_job (line 719) | def _plan_and_add_job(
    method _add_single_group_job (line 938) | def _add_single_group_job(
    method _add_multi_group_job (line 959) | def _add_multi_group_job(

FILE: nemo_skills/pipeline/utils/docker_images.py
  function _sanitize_image_component (line 29) | def _sanitize_image_component(value: str) -> str:
  function _resolve_dockerfile_path (line 34) | def _resolve_dockerfile_path(dockerfile_path_str: str) -> Path:
  function _build_local_docker_image (line 55) | def _build_local_docker_image(dockerfile_spec: str) -> str:
  function resolve_container_image (line 102) | def resolve_container_image(container: str, cluster_config: dict) -> str:

FILE: nemo_skills/pipeline/utils/eval.py
  class BenchmarkArgs (line 34) | class BenchmarkArgs:
    method requires_judge (line 55) | def requires_judge(self):
  class EvalGenerationUnit (line 60) | class EvalGenerationUnit:
  function get_arg_from_module_or_dict (line 79) | def get_arg_from_module_or_dict(module, arg_name, default_value=None, ov...
  function get_benchmark_args_from_module (line 90) | def get_benchmark_args_from_module(
  function _resolve_data_path (line 217) | def _resolve_data_path(data_path):
  function add_default_args (line 226) | def add_default_args(
  function prepare_eval_commands (line 297) | def prepare_eval_commands(

FILE: nemo_skills/pipeline/utils/exp.py
  function get_exp_handles (line 70) | def get_exp_handles(expname: str, ignore_finished=True, ignore_exp_not_e...
  function get_sandbox_command (line 118) | def get_sandbox_command(cluster_config):
  class CustomJobDetails (line 125) | class CustomJobDetails(SlurmJobDetails):
    method stdout (line 131) | def stdout(self) -> Path:
    method srun_stdout (line 135) | def srun_stdout(self) -> Path:
    method stderr (line 139) | def stderr(self) -> Path:
    method srun_stderr (line 143) | def srun_stderr(self) -> Path:
    method ls_term (line 147) | def ls_term(self) -> str:
  class CustomJobDetailsRay (line 157) | class CustomJobDetailsRay(CustomJobDetails):
    method ls_term (line 162) | def ls_term(self) -> str:
  function get_executor (line 167) | def get_executor(
  function install_packages_wrap (line 421) | def install_packages_wrap(cmd, installation_command: str | None = None):
  function add_task (line 469) | def add_task(
  function run_exp (line 889) | def run_exp(exp, cluster_config, sequential=False, dry_run=False):
  function get_exp (line 937) | def get_exp(expname, cluster_config, _reuse_exp=None):
  function get_nsight_cmd (line 956) | def get_nsight_cmd(profile_step_range):

FILE: nemo_skills/pipeline/utils/generation.py
  function normalize_models_config (line 32) | def normalize_models_config(
  function normalize_parameter (line 64) | def normalize_parameter(
  function build_requirements_venv_cmd (line 107) | def build_requirements_venv_cmd(requirements: list[str]) -> str:
  function get_chunked_rs_filename (line 152) | def get_chunked_rs_filename(
  function get_expected_done_files (line 171) | def get_expected_done_files(output_dir, random_seeds, chunk_ids):
  function get_remaining_jobs (line 183) | def get_remaining_jobs(cluster_config, output_dir, random_seeds, chunk_i...
  function separate_hydra_args (line 301) | def separate_hydra_args(extra_arguments: str) -> tuple[str, str]:
  function get_generation_cmd (line 407) | def get_generation_cmd(
  function wrap_cmd (line 551) | def wrap_cmd(cmd, preprocess_cmd, postprocess_cmd, random_seed=None, wan...
  function configure_client (line 573) | def configure_client(

FILE: nemo_skills/pipeline/utils/mounts.py
  function is_mounted_filepath (line 27) | def is_mounted_filepath(cluster_config: dict | None, path: str, mounts: ...
  function check_if_mounted (line 49) | def check_if_mounted(cluster_config, path_to_check):
  function _resolve_path_placeholders (line 59) | def _resolve_path_placeholders(path: str) -> str:
  function check_mounts (line 71) | def check_mounts(
  function get_mounted_path (line 165) | def get_mounted_path(cluster_config: dict, path: str):
  function get_unmounted_path (line 213) | def get_unmounted_path(cluster_config: dict, path: str):
  function add_mount_path (line 261) | def add_mount_path(mount_source: str, mount_dest: str, cluster_config):
  function create_remote_directory (line 284) | def create_remote_directory(directory: str | list, cluster_config: dict):
  function resolve_mount_paths (line 317) | def resolve_mount_paths(cluster_config: dict, mount_paths: str | list | ...
  function check_remote_mount_directories (line 362) | def check_remote_mount_directories(directories: list, cluster_config: di...
  function normalize_mounts_list (line 399) | def normalize_mounts_list(mounts: list[str], allow_rw_mode: bool = False):
  function get_mounts_from_config (line 473) | def get_mounts_from_config(cluster_config: dict):

FILE: nemo_skills/pipeline/utils/packager.py
  class RepoMetadata (line 30) | class RepoMetadata:
    method __post_init__ (line 36) | def __post_init__(self):
  function register_external_repo (line 52) | def register_external_repo(metadata: RepoMetadata, ignore_if_registered:...
  function get_registered_external_repo (line 67) | def get_registered_external_repo(name: str) -> Optional[RepoMetadata]:
  function resolve_external_data_path (line 82) | def resolve_external_data_path(local_data_path: str | Path) -> str:
  function get_git_repo_path (line 132) | def get_git_repo_path(path: str | Path = None):
  function get_packager (line 164) | def get_packager(extra_package_dirs: tuple[str] | None = None):

FILE: nemo_skills/pipeline/utils/ray_executor.py
  function _import_ray (line 71) | def _import_ray():
  class RayJobConfig (line 87) | class RayJobConfig:
  class RayJobClient (line 101) | class RayJobClient:
    method __init__ (line 104) | def __init__(self, ray_address: str = "auto", namespace: str = "nemo"):
    method _connect (line 117) | def _connect(self):
    method submit_job (line 138) | def submit_job(self, config: RayJobConfig) -> str:
    method _wait_for_dependencies (line 195) | def _wait_for_dependencies(self, job_ids: List[str], poll_interval: in...
    method get_job_status (line 242) | def get_job_status(self, job_id: str) -> str:
    method get_job_logs (line 246) | def get_job_logs(self, job_id: str) -> str:
    method cancel_job (line 254) | def cancel_job(self, job_id: str):
    method list_jobs (line 262) | def list_jobs(self) -> List[Dict[str, Any]]:
  function get_ray_client (line 271) | def get_ray_client(cluster_config: Dict[str, Any]) -> RayJobClient:
  class RayExecutor (line 283) | class RayExecutor(Executor):
    method assign (line 328) | def assign(
    method nnodes (line 345) | def nnodes(self) -> int:
    method nproc_per_node (line 349) | def nproc_per_node(self) -> int:

FILE: nemo_skills/pipeline/utils/scripts/base.py
  class BaseJobScript (line 26) | class BaseJobScript(run.Script):
    method __post_init__ (line 54) | def __post_init__(self):
    method set_inline (line 73) | def set_inline(self, command: Union[str, Callable, run.Script]) -> None:
    method hostname_ref (line 77) | def hostname_ref(self) -> str:

FILE: nemo_skills/pipeline/utils/scripts/eval.py
  function _combine_cmds (line 24) | def _combine_cmds(cmds: List[str], single_node_mode: str) -> str:
  function _inject_if_missing (line 38) | def _inject_if_missing(extra_arguments: str, needle: str, insertion: str...
  function _inject_single_server_overrides (line 45) | def _inject_single_server_overrides(
  class EvalClientScript (line 76) | class EvalClientScript(BaseJobScript):
    method __post_init__ (line 98) | def __post_init__(self):

FILE: nemo_skills/pipeline/utils/scripts/generation.py
  class GenerationClientScript (line 26) | class GenerationClientScript(BaseJobScript):
    method __post_init__ (line 79) | def __post_init__(self):

FILE: nemo_skills/pipeline/utils/scripts/nemo_gym.py
  class NemoGymRolloutsScript (line 26) | class NemoGymRolloutsScript(BaseJobScript):
    method __post_init__ (line 62) | def __post_init__(self):

FILE: nemo_skills/pipeline/utils/scripts/server.py
  class ServerScript (line 30) | class ServerScript(BaseJobScript):
    method __post_init__ (line 79) | def __post_init__(self):
    method get_address (line 100) | def get_address(self) -> str:
  class SandboxScript (line 106) | class SandboxScript(BaseJobScript):
    method __post_init__ (line 128) | def __post_init__(self):

FILE: nemo_skills/pipeline/utils/server.py
  class SupportedServersSelfHosted (line 25) | class SupportedServersSelfHosted(str, Enum):
  class SupportedServers (line 35) | class SupportedServers(str, Enum):
  function get_free_port (line 48) | def get_free_port(exclude: list[int] | None = None, strategy: int | str ...
  function should_get_random_port (line 67) | def should_get_random_port(server_gpus, exclusive):
  function wrap_python_path (line 71) | def wrap_python_path(cmd):
  function set_python_path_and_wait_for_server (line 75) | def set_python_path_and_wait_for_server(server_address, generation_comma...
  function _parse_last_flag (line 85) | def _parse_last_flag(tokens: list[str], *names: str) -> str | None:
  function _compute_vllm_dp_ray_serving_nodes (line 114) | def _compute_vllm_dp_ray_serving_nodes(server_args: str, num_gpus: int, ...
  function get_ray_server_cmd (line 151) | def get_ray_server_cmd(start_cmd, serving_nodes: int | None = None, num_...
  function get_server_command (line 229) | def get_server_command(

FILE: nemo_skills/pipeline/verl/ppo.py
  class PPOVerlTask (line 38) | class PPOVerlTask:
    method get_ray_launch_cmd (line 55) | def get_ray_launch_cmd(self):
    method format_train_args (line 59) | def format_train_args(self):
    method format_data_args (line 116) | def format_data_args(self):
    method format_wandb_args (line 125) | def format_wandb_args(self, disable_wandb, wandb_project, expname):
    method get_preamble_cmd (line 139) | def get_preamble_cmd(self):
    method get_script_module (line 143) | def get_script_module(self):
    method get_job_cmd (line 146) | def get_job_cmd(self):
    method get_cmd (line 158) | def get_cmd(self):
  function get_training_cmd (line 178) | def get_training_cmd(
  class SupportedServers (line 225) | class SupportedServers(str, Enum):
  function ppo_verl (line 234) | def ppo_verl(

FILE: nemo_skills/prompt/utils.py
  class BM25Retriever (line 34) | class BM25Retriever:
    method __init__ (line 35) | def __init__(self, data_path: str, field: str):
    method retrieve (line 45) | def retrieve(self, query: str, top_k: int = 1):
  class FewShotExamplesConfig (line 51) | class FewShotExamplesConfig:
    method __post_init__ (line 67) | def __post_init__(self):
  class CodeTags (line 85) | class CodeTags:
  class PromptConfig (line 99) | class PromptConfig:
  class Prompt (line 114) | class Prompt:
    method __init__ (line 115) | def __init__(self, config, tokenizer):
    method build_filled_example (line 125) | def build_filled_example(self, example_dict: Dict[str, Any]) -> str:
    method build_examples_dict (line 154) | def build_examples_dict(self, input_dict):
    method build_user_message (line 192) | def build_user_message(self, input_dict: Dict[str, str]) -> str:
    method get_code_execution_args (line 204) | def get_code_execution_args(self):
    method format_assistant_response (line 218) | def format_assistant_response(
    method fill (line 250) | def fill(
    method __str__ (line 351) | def __str__(self):
  function get_token_count (line 355) | def get_token_count(
  function get_config_path (line 423) | def get_config_path(config: str, config_dir: str | None = None, config_e...
  function load_config (line 439) | def load_config(config: str, config_dir: str | None = None) -> dict:
  function get_prompt (line 458) | def get_prompt(

FILE: nemo_skills/training/data_preparation_utils/arithmetic_utils.py
  function get_eval_func (line 40) | def get_eval_func(op):
  function get_op_counts (line 45) | def get_op_counts(counter):
  function extract_expressions (line 49) | def extract_expressions(text: str):
  function tokenize (line 84) | def tokenize(expression):
  function infix_to_postfix (line 95) | def infix_to_postfix(tokens):
  function evaluate_postfix_once (line 120) | def evaluate_postfix_once(postfix):
  function solve_expression (line 141) | def solve_expression(expression):
  function merge_solution_steps (line 163) | def merge_solution_steps(solution_steps):

FILE: nemo_skills/training/data_preparation_utils/filters.py
  class BaseFilter (line 45) | class BaseFilter(BaseParallelProcessor):
    method __init__ (line 46) | def __init__(self, **kwargs):
    method finalize (line 55) | def finalize(self, metrics: List):
    method _chunk_manifest (line 69) | def _chunk_manifest(self):
  class DropIfRegexMatch (line 83) | class DropIfRegexMatch(BaseFilter):
    method __init__ (line 86) | def __init__(
    method process_dataset_entry (line 96) | def process_dataset_entry(self, data_entry) -> List:
  class DropIfRegexNotMatch (line 103) | class DropIfRegexNotMatch(BaseFilter):
    method __init__ (line 106) | def __init__(
    method process_dataset_entry (line 116) | def process_dataset_entry(self, data_entry) -> List:
  class DropIfEqual (line 123) | class DropIfEqual(BaseFilter):
    method __init__ (line 126) | def __init__(
    method process_dataset_entry (line 136) | def process_dataset_entry(self, data_entry) -> List:
  class DropMultiBoxed (line 143) | class DropMultiBoxed(BaseFilter):
    method __init__ (line 144) | def __init__(self, solution_key: str = "generation", **kwargs):
    method process_dataset_entry (line 148) | def process_dataset_entry(self, data_entry) -> List:
  class DropIncorrectCodeBlocks (line 154) | class DropIncorrectCodeBlocks(BaseFilter):
    method __init__ (line 155) | def __init__(self, solution_key: str = "generation", **kwargs):
    method process_dataset_entry (line 159) | def process_dataset_entry(self, data_entry) -> List:
  class AddCodeExecutionsCounts (line 165) | class AddCodeExecutionsCounts(BaseFilter):
    method __init__ (line 166) | def __init__(self, solution_key: str = "generation", ce_counter_key: s...
    method process_dataset_entry (line 171) | def process_dataset_entry(self, data_entry) -> List:
  class DropIncorrectArithmetic (line 185) | class DropIncorrectArithmetic(BaseFilter):
    method __init__ (line 186) | def __init__(self, solution_key: str = "generation", tolerance=1e-4, *...
    method process_dataset_entry (line 191) | def process_dataset_entry(self, data_entry: str) -> List:
  class MajorityFilter (line 214) | class MajorityFilter(BaseFilter):
    method __init__ (line 215) | def __init__(
    method process_dataset_entry (line 225) | def process_dataset_entry(self, data_entry) -> List:
  class RemoveContaminated (line 236) | class RemoveContaminated(BaseFilter):
    method __init__ (line 237) | def __init__(self, contamination_file, check_key="problem", **kwargs):
    method process_dataset_entry (line 249) | def process_dataset_entry(self, data_entry) -> List:
  class RemoveLenOutliers (line 256) | class RemoveLenOutliers(BaseFilter):
    method __init__ (line 259) | def __init__(
    method process_dataset_entry (line 280) | def process_dataset_entry(self, data_entry):
  class TrimPrefix (line 297) | class TrimPrefix(BaseFilter):
    method __init__ (line 300) | def __init__(self, solution_key: str = "generation", **kwargs):
    method process_dataset_entry (line 304) | def process_dataset_entry(self, data_entry) -> List:
  class TrimSolutions (line 312) | class TrimSolutions(BaseFilter):
    method __init__ (line 315) | def __init__(self, solution_key: str = "generation", **kwargs):
    method process_dataset_entry (line 319) | def process_dataset_entry(self, data_entry) -> List:
  class SplitArithmetic (line 333) | class SplitArithmetic(BaseFilter):
    method __init__ (line 334) | def __init__(self, solution_key: str = "generation", **kwargs):
    method process_dataset_entry (line 338) | def process_dataset_entry(self, data_entry: str) -> List:
  class CodeTextFilter (line 389) | class CodeTextFilter(BaseParallelProcessor):
    method __init__ (line 390) | def __init__(self, filter_type, code_tags, solution_key="generation", ...
    method process_dataset_entry (line 400) | def process_dataset_entry(self, grouped_samples: List, code_begin_toke...
    method process (line 435) | def process(self):
    method finalize (line 464) | def finalize(self, metrics: List):

FILE: nemo_skills/training/data_preparation_utils/merge_processor.py
  class MergeProcessor (line 26) | class MergeProcessor(BaseFilter):
    method __init__ (line 27) | def __init__(self, processor_configs: list[dict], **kwargs):
    method process_dataset_entry (line 41) | def process_dataset_entry(self, data_entry: dict) -> list[DataEntry]:
    method finalize (line 54) | def finalize(self, metrics: list):

FILE: nemo_skills/training/data_preparation_utils/preprocessing.py
  class ReadData (line 33) | class ReadData(BaseProcessor):
    method __init__ (line 34) | def __init__(
    method _read_preprocessed_data (line 89) | def _read_preprocessed_data(self, file_handle) -> int:
    method _parallel_read_file (line 103) | def _parallel_read_file(self, args):
    method _read_raw_data (line 109) | def _read_raw_data(self, file_handle) -> int:
    method _get_sample_hash (line 154) | def _get_sample_hash(self, sample):
    method _batch_deduplicate (line 159) | def _batch_deduplicate(self, batch):
    method _chunks (line 171) | def _chunks(self, lst, n):
    method process (line 176) | def process(self):
  class GroupSamples (line 222) | class GroupSamples(BaseProcessor):
    method __init__ (line 223) | def __init__(self, group_key="input", **kwargs):
    method process (line 227) | def process(self):
  class ShuffleAndDownsampleData (line 239) | class ShuffleAndDownsampleData(BaseProcessor):
    method __init__ (line 240) | def __init__(
    method process (line 265) | def process(self):
  class WriteFinalSftManifest (line 309) | class WriteFinalSftManifest(BaseProcessor):
    method __init__ (line 310) | def __init__(
    method process (line 359) | def process(self):
  class WriteFinalConversationManifest (line 417) | class WriteFinalConversationManifest(WriteFinalSftManifest):
    method process (line 418) | def process(self):
  class WriteFinalRLManifest (line 455) | class WriteFinalRLManifest(BaseProcessor):
    method __init__ (line 456) | def __init__(
    method process (line 504) | def process(self):

FILE: nemo_skills/training/nemo_rl/convert_dcp_to_hf.py
  function parse_args (line 28) | def parse_args():
  function find_max_step_folder (line 58) | def find_max_step_folder(training_folder, step_override=None):
  function is_safetensors_checkpoint (line 85) | def is_safetensors_checkpoint(weights_path):
  function copy_tokenizer_files (line 91) | def copy_tokenizer_files(tokenizer_path, hf_ckpt_path):
  function convert_safetensors_to_hf (line 114) | def convert_safetensors_to_hf(weights_path, hf_ckpt_path, model_name, to...
  function main (line 160) | def main():

FILE: nemo_skills/training/nemo_rl/convert_megatron_to_hf.py
  function parse_args (line 26) | def parse_args():
  function find_max_step_folder (line 62) | def find_max_step_folder(training_folder, step_override=None):
  function main (line 89) | def main():

FILE: nemo_skills/training/nemo_rl/environments/math_environment.py
  class MathEnvConfig (line 36) | class MathEnvConfig(TypedDict):
  function _mute_output (line 42) | def _mute_output():
  class HFVerifyWorker (line 52) | class HFVerifyWorker:
    method __init__ (line 53) | def __init__(self) -> None:
    method verify (line 56) | def verify(self, pred_responses: list[str], ground_truths: list[str]) ...
  class MathEnvironmentMetadata (line 83) | class MathEnvironmentMetadata(TypedDict):
  class MathEnvironment (line 88) | class MathEnvironment(EnvironmentInterface):
    method __init__ (line 89) | def __init__(self, cfg: MathEnvConfig):
    method shutdown (line 99) | def shutdown(self) -> None:
    method step (line 104) | def step(  # type: ignore[override]
    method global_post_process_and_metrics (line 173) | def global_post_process_and_metrics(

FILE: nemo_skills/training/nemo_rl/offline_hf_consolidation.py
  function copy_metadata_files (line 50) | def copy_metadata_files(input_dir, output_dir):
  function parse_args (line 63) | def parse_args() -> argparse.Namespace:
  function main (line 107) | def main() -> None:

FILE: nemo_skills/training/nemo_rl/start_grpo.py
  function parse_args (line 46) | def parse_args() -> tuple[argparse.Namespace, list[str]]:
  function load_jsonl_as_dataset (line 62) | def load_jsonl_as_dataset(
  function extract_dataset (line 89) | def extract_dataset(split, dataset_path):
  function format_passthrough (line 99) | def format_passthrough(data):
  function prepare_math_dataset (line 107) | def prepare_math_dataset(split_ds):
  class NeMoSkillsDataset (line 119) | class NeMoSkillsDataset:
    method __init__ (line 122) | def __init__(self, training_data, validation_data):
  class NSTaskDataSpec (line 143) | class NSTaskDataSpec(TaskDataSpec):
  function ns_data_processor (line 148) | def ns_data_processor(
  function setup_data (line 196) | def setup_data(
  function main (line 272) | def main() -> None:

FILE: nemo_skills/training/nemo_rl/start_sft.py
  function detect_data_format (line 45) | def detect_data_format(data_path: str) -> str:
  class PromptResponseDataset (line 82) | class PromptResponseDataset:
    method __init__ (line 83) | def __init__(
    method load_or_process_split (line 115) | def load_or_process_split(self, path: str, split_name: str) -> Dataset:
    method add_messages_key (line 149) | def add_messages_key(self, examples: dict[str, list[Any]]) -> dict[str...
  function parse_args (line 161) | def parse_args():
  function sft_preprocessor (line 175) | def sft_preprocessor(
  function setup_data (line 229) | def setup_data(tokenizer: AutoTokenizer, data_config: DataConfig):
  function main (line 278) | def main():

FILE: nemo_skills/training/prepare_data.py
  function main (line 22) | def main(cfg):

FILE: nemo_skills/training/train_redrafter.py
  class ReDrafterTrainer (line 64) | class ReDrafterTrainer(Trainer):
    method __init__ (line 65) | def __init__(self, *args, **kwargs):
    method compute_loss (line 69) | def compute_loss(self, model, inputs, return_outputs=False, **kwargs):
  class ModelArguments (line 105) | class ModelArguments:
  class TrainingArguments (line 111) | class TrainingArguments(transformers.TrainingArguments):
  function get_tokenizer (line 172) | def get_tokenizer(model_args, training_args):
  function generate_drafter_config_from_base (line 183) | def generate_drafter_config_from_base(llm, training_args):
  function get_compute_metrics (line 193) | def get_compute_metrics(training_args):
  function record_to_training_instance (line 207) | def record_to_training_instance(
  function train (line 245) | def train(model_args, training_args):
  function eval (line 299) | def eval(model_args, training_args):

FILE: nemo_skills/training/verl/prepare_data.py
  function parse_args (line 21) | def parse_args():
  function transform_data (line 32) | def transform_data(input_file, data_source, ability):
  function save_to_parquet (line 52) | def save_to_parquet(df, output_file):
  function main (line 56) | def main():

FILE: nemo_skills/utils.py
  function get_logger_name (line 37) | def get_logger_name(file):
  function parse_reasoning (line 47) | def parse_reasoning(sample: dict, generation_key: str = "generation", en...
  function nested_dataclass (line 67) | def nested_dataclass(*args, **kwargs):
  function setup_logging (line 103) | def setup_logging(disable_hydra_logs: bool = True, log_level: int = logg...
  function remove_handlers (line 141) | def remove_handlers():
  function get_skills_root_dir (line 148) | def get_skills_root_dir():
  function init_wandb (line 153) | def init_wandb(project, name, exp_dir=None, verbose=False):
  function validate_wandb_project_name (line 204) | def validate_wandb_project_name(wandb_project=None, wandb_name=None, wan...
  function extract_comments (line 232) | def extract_comments(code: str):
  function type_to_str (line 244) | def type_to_str(type_hint):
  function extract_comments_above_fields (line 270) | def extract_comments_above_fields(dataclass_obj, prefix: str = "", level...
  function get_fields_docstring (line 333) | def get_fields_docstring(dataclass_obj, **kwargs):
  function get_help_message (line 339) | def get_help_message(dataclass_obj, help_message="", **kwargs):
  function python_doc_to_cmd_help (line 362) | def python_doc_to_cmd_help(doc_class, docs_prefix="", arg_prefix=""):
  function get_chunked_filename (line 383) | def get_chunked_filename(chunk_id, output_filename):
  function chunk_data (line 388) | def chunk_data(data: List[Any], output_filename: str, chunk_id: Optional...
  function str_ids_to_list (line 426) | def str_ids_to_list(ids: str) -> list[int]:
  function compute_chunk_ids (line 454) | def compute_chunk_ids(chunk_ids: list[int] | str, num_chunks: int) -> li...
  function prefill_judgement (line 485) | def prefill_judgement(data_point: dict) -> str | None:
  function check_no_extra_args_fire (line 496) | def check_no_extra_args_fire():
  function resolve_python_module_from_file (line 557) | def resolve_python_module_from_file(py_filepath: str, root_module: str =...
  function maybe_get_env (line 580) | def maybe_get_env(value: Union[Any, List[Any]], env_name, default=None, ...
  function get_server_wait_cmd (line 621) | def get_server_wait_cmd(server_address):
  function setup_make_sequence_length_divisible_by (line 630) | def setup_make_sequence_length_divisible_by(tensor_model_parallel_size: ...

FILE: recipes/asr_tts/riva_generate.py
  class RivaGenerateConfig (line 36) | class RivaGenerateConfig(GenerationTaskConfig):
  class RivaGenerationTask (line 55) | class RivaGenerationTask(GenerationTask):
    method __init__ (line 56) | def __init__(self, cfg: RivaGenerateConfig):
    method wait_for_server (line 59) | def wait_for_server(self):
    method setup_llm (line 102) | def setup_llm(self):
    method setup_prompt (line 131) | def setup_prompt(self):
    method fill_prompt (line 134) | def fill_prompt(self, data_point, all_data, prompt_format=None):
    method log_example_prompt (line 140) | def log_example_prompt(self, data):
    method process_single_datapoint (line 144) | async def process_single_datapoint(self, data_point, all_data, prompt_...
  function generate (line 177) | def generate(cfg: RivaGenerateConfig):

FILE: recipes/data-integrity/model_comparison/analyses/length_analysis.py
  function analyze_response_lengths (line 28) | def analyze_response_lengths(df, subdirs):

FILE: recipes/data-integrity/model_comparison/analyses/similarity_analysis.py
  function analyze_semantic_similarity (line 32) | def analyze_semantic_similarity(df, subdirs, sentence_model=None):
  function _fallback_similarity_analysis (line 205) | def _fallback_similarity_analysis(df, subdirs):

FILE: recipes/data-integrity/model_comparison/analyses/umap_analysis.py
  function analyze_response_embeddings_umap (line 52) | def analyze_response_embeddings_umap(df, subdirs, sentence_model):
  function analyze_input_response_mapping_umap (line 154) | def analyze_input_response_mapping_umap(df, subdirs, sentence_model):
  function analyze_multimodal_space_umap (line 299) | def analyze_multimodal_space_umap(df, subdirs, sentence_model):

FILE: recipes/data-integrity/model_comparison/analyses/vocabulary_analysis.py
  function analyze_vocabulary_diversity (line 30) | def analyze_vocabulary_diversity(df, subdirs):

FILE: recipes/data-integrity/model_comparison/analyzer.py
  class OrganizedModelAnalyzer (line 56) | class OrganizedModelAnalyzer:
    method __init__ (line 59) | def __init__(self, json_file_path, results_base_dir="model_comparison_...
    method setup_results_directory (line 71) | def setup_results_directory(self):
    method load_data (line 96) | def load_data(self):
    method initialize_models (line 103) | def initialize_models(self):
    method generate_final_report (line 118) | def generate_final_report(self):

FILE: recipes/data-integrity/model_comparison/data_loader.py
  function load_json_data (line 26) | def load_json_data(json_file_path):
  function json_to_dataframe (line 33) | def json_to_dataframe(data):
  function load_and_prepare_data (line 56) | def load_and_prepare_data(json_file_path):

FILE: recipes/data-integrity/model_comparison/main.py
  function main (line 27) | def main():

FILE: recipes/data-integrity/model_comparison/report_generator.py
  function generate_analysis_report (line 26) | def generate_analysis_report(df, results_dir, subdirs, length_stats, div...
  function generate_index_file (line 106) | def generate_index_file(results_dir, subdirs, df):

FILE: recipes/data-integrity/model_comparison/setup.py
  function install_requirements (line 25) | def install_requirements():
  function download_nltk_data (line 31) | def download_nltk_data():
  function download_spacy_model (line 41) | def download_spacy_model():
  function verify_installation (line 48) | def verify_installation():
  function main (line 98) | def main():

FILE: recipes/data-integrity/model_comparison/utils/file_utils.py
  function get_model_comparison_name (line 28) | def get_model_comparison_name(df):
  function save_plot (line 53) | def save_plot(subdirs, df, filename_suffix, title=""):
  function save_data (line 64) | def save_data(subdirs, df, data, filename_suffix, format="csv"):

FILE: recipes/data-integrity/model_comparison/utils/model_utils.py
  function shorten_model_name (line 18) | def shorten_model_name(model_name):

FILE: recipes/data-integrity/model_comparison/utils/text_utils.py
  function calculate_rouge_l (line 32) | def calculate_rouge_l(text1, text2):
  function basic_rouge_l (line 46) | def basic_rouge_l(text1, text2):

FILE: recipes/data-integrity/model_comparison/visualization/interactive_plots.py
  function create_response_embeddings_umap (line 47) | def create_response_embeddings_umap(df, subdirs, sentence_model):
  function create_input_response_mapping_umap (line 57) | def create_input_response_mapping_umap(df, subdirs, sentence_model):
  function create_multimodal_space_umap (line 67) | def create_multimodal_space_umap(df, subdirs, sentence_model):
  function create_interactive_explorer (line 77) | def create_interactive_explorer(df, subdirs, sentence_model):

FILE: recipes/data-integrity/model_comparison/visualization/static_plots.py
  function plot_response_lengths (line 22) | def plot_response_lengths(df, subdirs):
  function plot_vocabulary_diversity (line 32) | def plot_vocabulary_diversity(df, subdirs):
  function plot_similarity_heatmaps (line 42) | def plot_similarity_heatmaps(df, subdirs, sentence_model=None):
  function plot_similarity_histograms (line 52) | def plot_similarity_histograms(df, subdirs, sentence_model=None):

FILE: recipes/data-integrity/postprocess_data.py
  function process_data (line 23) | def process_data(elem, target_model):

FILE: recipes/data-integrity/prepare_data.py
  function process_data (line 24) | def process_data(elem, split):
  function get_from_iterable (line 33) | def get_from_iterable(dataset: IterableDataset):

FILE: recipes/data-integrity/run_integrity_pipeline.py
  function download (line 24) | def download(workspace, cluster, num_gpus, expname_prefix, target_model,...
  function gen_answer (line 41) | def gen_answer(workspace, cluster, num_gpus, expname_prefix, target_mode...
  function postprocess (line 59) | def postprocess(workspace, cluster, num_gpus, expname_prefix, target_mod...
  function compare (line 75) | def compare(workspace, cluster, num_gpus, expname_prefix, target_model, ...

FILE: recipes/gencluster/pipeline/run_inter_tournament.py
  function tournament_schedule_file_exists (line 24) | def tournament_schedule_file_exists(
  function main (line 41) | def main():

FILE: recipes/gencluster/pipeline/run_intra_tournament.py
  function tournament_schedule_file_exists (line 24) | def tournament_schedule_file_exists(
  function main (line 41) | def main():

FILE: recipes/gencluster/pipeline/solution_generation.py
  function parse_generation_benchmark (line 23) | def parse_generation_benchmark(benchmark: str, split: str | None = None)...
  function main (line 48) | def main():

FILE: recipes/gencluster/pipeline/test_case_generation.py
  function main (line 20) | def main() -> None:

FILE: recipes/gencluster/scripts/compute_tournament_score.py
  function parse_tail_scores_and_winner (line 32) | def parse_tail_scores_and_winner(generation_text: str) -> Tuple[float, f...
  function try_get_numeric (line 62) | def try_get_numeric(value) -> Optional[float]:
  function extract_cluster_base_score (line 77) | def extract_cluster_base_score(obj: dict, side: str, explicit_key: Optio...
  function extract_cluster_grade (line 109) | def extract_cluster_grade(obj: dict, side: str, explicit_key: Optional[s...
  function main (line 135) | def main():

FILE: recipes/gencluster/scripts/extract_cpp_code.py
  function extract_final_cpp_block (line 29) | def extract_final_cpp_block(text):
  function wait_for_sandbox (line 36) | def wait_for_sandbox(sandbox, loop, timeout: int = 240, poll: float = 1.0):
  function compile_cpp_file (line 51) | def compile_cpp_file(cpp_file_path, binary_dir, sandbox, loop):
  function process_jsonl_file (line 73) | def process_jsonl_file(jsonl_path, output_dir, binary_dir, folder_name, ...
  function main (line 163) | def main():

FILE: recipes/gencluster/scripts/filter_clusters.py
  function filter_cluster (line 28) | def filter_cluster(cluster_data):
  function filter_file (line 68) | def filter_file(input_file, output_file):
  function main (line 115) | def main():

FILE: recipes/gencluster/scripts/generate_datasets_json.py
  function collect_datasets (line 22) | def collect_datasets(root_dir: Path):
  function main (line 69) | def main() -> None:

FILE: recipes/gencluster/scripts/generate_test_cases.py
  function _get_thread_context (line 35) | def _get_thread_context():
  function wait_for_sandbox (line 48) | def wait_for_sandbox(sandbox, loop, timeout: int = 240, poll: float = 1.0):
  function run_generator (line 62) | def run_generator(gen_binary_path, timeout=10, *, loop=None, sandbox: Lo...
  function run_generator_to_sandbox_file (line 95) | def run_generator_to_sandbox_file(gen_binary_path, timeout=10, *, loop=N...
  function run_validator (line 117) | def run_validator(
  function validate_dataset (line 173) | def validate_dataset(
  function generate_datasets_for_problem (line 207) | def generate_datasets_for_problem(
  function main (line 455) | def main():

FILE: recipes/gencluster/scripts/merge_tournament_scores.py
  function load_clusters (line 24) | def load_clusters(path: str) -> Dict[str, dict]:
  function write_clusters (line 30) | def write_clusters(path: str, clusters: Dict[str, dict]) -> None:
  function read_scores_by_problem (line 37) | def read_scores_by_problem(csv_path: str, include_solution: bool = False):
  function main (line 160) | def main():

FILE: recipes/gencluster/scripts/run_tournament_all.py
  function derive_output_path (line 26) | def derive_output_path(input_file: str, output_dir: str) -> str:
  function build_directed_edges (line 35) | def build_directed_edges(n: int, edges: List[Tuple[int, int]], k: int, r...
  function build_simple_schedule (line 85) | def build_simple_schedule(n: int, games_per_player: int, rng: random.Ran...
  function write_schedule_jsonl (line 122) | def write_schedule_jsonl(
  function write_intracluster_schedule_jsonl (line 169) | def write_intracluster_schedule_jsonl(
  function main (line 203) | def main():

FILE: recipes/gencluster/scripts/submission_ICPC.py
  function to_bool (line 39) | def to_bool(value: Any) -> bool:
  function to_int (line 50) | def to_int(value: Any, default: int = 0) -> int:
  function to_float (line 58) | def to_float(value: Any, default: float = 0.0) -> float:
  function load_clusters (line 66) | def load_clusters(path: Path) -> Dict[str, Any]:
  function extract_problem_number (line 76) | def extract_problem_number(filename: str) -> int:
  function any_solution_true (line 84) | def any_solution_true(clusters_payload: Dict[str, Any]) -> bool:
  function build_sorted_clusters (line 98) | def build_sorted_clusters(
  function compute_submission_count_for_problem (line 205) | def compute_submission_count_for_problem(
  function cluster_has_any_true (line 242) | def cluster_has_any_true(cluster_val: Dict[str, Any]) -> bool:
  function compute_oracle_inside_cluster_submission_count (line 253) | def compute_oracle_inside_cluster_submission_count(
  function main (line 272) | def main() -> int:

FILE: recipes/gencluster/scripts/submission_IOI.py
  function get_max_score_for_subtask (line 107) | def get_max_score_for_subtask(subtask_number, dataset="ioi24"):
  function get_grade_slice_for_problem (line 115) | def get_grade_slice_for_problem(problem_name, dataset="ioi24"):
  function load_cluster_data (line 126) | def load_cluster_data(filepath):
  function apply_blind_cluster_filtering (line 132) | def apply_blind_cluster_filtering(clusters, strategy="balanced"):
  function get_solution_iterator (line 164) | def get_solution_iterator(clusters):
  function run_submission (line 190) | def run_submission(
  function calculate_theoretical_max_score (line 527) | def calculate_theoretical_max_score(submission_scores=None, dataset="ioi...

FILE: recipes/gencluster/scripts/tournament_schedule.py
  function load_clusters (line 24) | def load_clusters(cluster_file: str) -> Dict[str, Any]:
  function remove_empty_output_clusters (line 30) | def remove_empty_output_clusters(clusters: Dict[str, Any]) -> Dict[str, .
Condensed preview — 1174 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (7,884K chars).
[
  {
    "path": ".coderabbit.yaml",
    "chars": 4983,
    "preview": "# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.\n# SPDX-License-Identi"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/config.yml",
    "chars": 27,
    "preview": "blank_issues_enabled: true\n"
  },
  {
    "path": ".github/workflows/copyright-check.yml",
    "chars": 763,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": ".github/workflows/docs.yml",
    "chars": 1500,
    "preview": "name: Build docs\n\non:\n  push:\n    branches: [\"main\"]\n\n  # Allows you to run this workflow manually from the Actions tab\n"
  },
  {
    "path": ".github/workflows/gpu_tests.yml",
    "chars": 3027,
    "preview": "name: Integration tests\n\non:\n  pull_request:\n    branches: [ \"main\" ]\n    types: [opened, synchronize, reopened, labeled"
  },
  {
    "path": ".github/workflows/lint.yml",
    "chars": 1223,
    "preview": "name: Lint and Format\n\non:\n  pull_request:\n    branches: [ \"main\" ]\n\n  # Allows you to run this workflow manually from t"
  },
  {
    "path": ".github/workflows/tests.yml",
    "chars": 3951,
    "preview": "name: CPU tests\n\non:\n  pull_request:\n    branches: [ \"main\" ]\n\n  # Allows you to run this workflow manually from the Act"
  },
  {
    "path": ".gitignore",
    "chars": 1103,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": ".pre-commit-config.yaml",
    "chars": 1748,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "CONTRIBUTING.md",
    "chars": 9154,
    "preview": "# Contributing To Nemo-Skills\n\nThanks for your interest in contributing to Nemo-Skills!\n\n## General guidelines\n\nApplies "
  },
  {
    "path": "LICENSE",
    "chars": 11357,
    "preview": "                                 Apache License\n                           Version 2.0, January 2004\n                   "
  },
  {
    "path": "MANIFEST.in",
    "chars": 110,
    "preview": "recursive-include nemo_skills *.yaml\nrecursive-include nemo_skills *.txt\ngraft dockerfiles\ngraft requirements\n"
  },
  {
    "path": "README.md",
    "chars": 8368,
    "preview": "# Nemo Skills\n\nNemo-Skills is a collection of pipelines to improve \"skills\" of large language models (LLMs). We support "
  },
  {
    "path": "__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "cluster_configs/example-local.yaml",
    "chars": 1776,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "cluster_configs/example-ray.yaml",
    "chars": 3454,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "cluster_configs/example-slurm.yaml",
    "chars": 2425,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "core/README.md",
    "chars": 3614,
    "preview": "# Core / Pipeline Dependency Boundary\n\nNeMo Skills is split into **Core** (agent runtime) and **Pipeline** (orchestratio"
  },
  {
    "path": "core/pyproject.toml",
    "chars": 1784,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "core/requirements.txt",
    "chars": 859,
    "preview": "# Core dependencies for inference, evaluation, tool calling, and all benchmark evaluators.\n# No cluster orchestration de"
  },
  {
    "path": "dataset_explorer_demo/README.md",
    "chars": 967,
    "preview": "# Dataset Explorer Demo\n\n1. Download data TBD\n2. Retrieve similar questions from OpenMathInstruct2. Do it for all benchm"
  },
  {
    "path": "dataset_explorer_demo/visualize_similar.py",
    "chars": 9825,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "dockerfiles/Dockerfile.megatron",
    "chars": 653,
    "preview": "FROM nvcr.io/nvidia/pytorch:25.04-py3\n\n# Set working directory\nWORKDIR /opt\n\n# Install megatron-lm\nENV MEGATRON_COMMIT=d"
  },
  {
    "path": "dockerfiles/Dockerfile.nemo-rl",
    "chars": 6231,
    "preview": "# syntax=docker/dockerfile:1\n# copied and edited from https://github.com/NVIDIA/NeMo-RL/blob/main/docker/Dockerfile\n# TO"
  },
  {
    "path": "dockerfiles/Dockerfile.nemo-skills",
    "chars": 3773,
    "preview": "# using ubuntu instead of debian for easier apptainer installation on arm64\nFROM ubuntu:22.04\n\n# Install Python and othe"
  },
  {
    "path": "dockerfiles/Dockerfile.sandbox",
    "chars": 7057,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "dockerfiles/Dockerfile.verl",
    "chars": 624,
    "preview": "FROM whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6-mcore0.12.0-te2.3\n# Set working directory\nWORKDIR /opt\n\n# Instal"
  },
  {
    "path": "dockerfiles/Dockerfile.vllm",
    "chars": 198,
    "preview": "FROM vllm/vllm-openai:v0.18.1\nRUN pip install ray\nRUN pip install \"vllm[audio]\"\n# Required by vLLM for Qwen-VL model fam"
  },
  {
    "path": "dockerfiles/README.md",
    "chars": 1440,
    "preview": "# Building Docker Images\n\nSome dockerfiles are directly included in this folder and for some others the instructions to "
  },
  {
    "path": "dockerfiles/build.sh",
    "chars": 4169,
    "preview": "#!/usr/bin/env bash\n\n# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache Licen"
  },
  {
    "path": "dockerfiles/ifbench.patch",
    "chars": 2380,
    "preview": "diff --git a/evaluation_lib.py b/evaluation_lib.py\nindex a0db9e7..912a26e 100644\n--- a/evaluation_lib.py\n+++ b/evaluatio"
  },
  {
    "path": "dockerfiles/sandbox/block_network.c",
    "chars": 2231,
    "preview": "/*\n * Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n *\n * Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "dockerfiles/sandbox/nginx-worker-proxy.conf.template",
    "chars": 1242,
    "preview": "events {\n    worker_connections 1024;\n}\n\nhttp {\n    # Proxy all requests to the master node's nginx load balancer.\n    #"
  },
  {
    "path": "dockerfiles/sandbox/nginx.conf.template",
    "chars": 2662,
    "preview": "events {\n    worker_connections 1024;\n}\n\nhttp {\n    # Add custom log format for load monitoring\n    log_format worker_lo"
  },
  {
    "path": "dockerfiles/sandbox/start-with-nginx.sh",
    "chars": 26317,
    "preview": "#!/bin/bash\n# Start nginx load balancer with multiple uwsgi workers\n# Uses TCP sockets for workers, supporting both sing"
  },
  {
    "path": "dockerfiles/swe-bench/Dockerfile.nemo-skills.alpine",
    "chars": 1355,
    "preview": "# using the oldest version of alpine among swe-bench pro containers for maximum compatibility\nFROM alpine:3.17.1\n\n# inst"
  },
  {
    "path": "dockerfiles/swe-bench/Dockerfile.swe-zero",
    "chars": 2485,
    "preview": "# Docker image for SWE-Zero (v2).\n# In the SWE-Zero setup, any instance can be run inside this image.\n# The only require"
  },
  {
    "path": "docs/agentic_inference/parallel_thinking.md",
    "chars": 6893,
    "preview": "# Parallel Thinking\n\nParallel thinking encompasses methods that scale inference time via parallel sampling. The approach"
  },
  {
    "path": "docs/agentic_inference/tool_calling.md",
    "chars": 11718,
    "preview": "# Tool Calling\n\nTool calling enables LLMs to execute external functions and use their results in generation. NeMo-Skills"
  },
  {
    "path": "docs/basics/chat_interface.md",
    "chars": 2774,
    "preview": "# Chat Interface\n\nThe chat interface provides a web UI where you can interactively chat with a deployed model. It suppor"
  },
  {
    "path": "docs/basics/cluster-configs.md",
    "chars": 3930,
    "preview": "# Cluster configs\n\nAll of the [pipeline scripts](../pipelines/index.md) accept `--cluster` argument which you can use\nto"
  },
  {
    "path": "docs/basics/code-packaging.md",
    "chars": 4552,
    "preview": "# Code packaging\n\nWe use [NeMo-Run](https://github.com/NVIDIA-NeMo/Run) for managing our experiments with local and slur"
  },
  {
    "path": "docs/basics/index.md",
    "chars": 19089,
    "preview": "# Getting Started\n\nLet's walk through a little tutorial to get started working with nemo-skills.\n\nWe will use a simple g"
  },
  {
    "path": "docs/basics/inference.md",
    "chars": 6815,
    "preview": "# Inference\n\nHere are the instructions on how to run inference with our repo.\n\n## Download/convert the model\n\nGet the mo"
  },
  {
    "path": "docs/basics/installation.md",
    "chars": 2558,
    "preview": "# Installation & Dependency Groups\n\nNeMo Skills provides three installable packages:\n\n- **`nemo-skills`** (root) -- full"
  },
  {
    "path": "docs/basics/prompt-format.md",
    "chars": 7535,
    "preview": "# Prompt utilities\n\nOur prompts are configured via two yaml files:\n\n1. **Prompt config** - contains the actual prompt te"
  },
  {
    "path": "docs/basics/sandbox.md",
    "chars": 1580,
    "preview": "# Sandbox for code execution\n\nOur pipeline relies on Python interpreter to execute code generated by LLMs. This creates "
  },
  {
    "path": "docs/css/extra.css",
    "chars": 254,
    "preview": "/* Target only inline code */\np code, li code, td code {\n  word-break: keep-all;\n  white-space: nowrap;\n}\n\n/* Preserve f"
  },
  {
    "path": "docs/evaluation/code.md",
    "chars": 40075,
    "preview": "# Code\n\nMore details are coming soon!\n\n## Supported benchmarks\n\n### swe-bench\n\n!!! note\n    While swe-bench evaluation w"
  },
  {
    "path": "docs/evaluation/external-benchmarks.md",
    "chars": 11127,
    "preview": "# External benchmarks\n\nNeMo-Skills supports defining benchmarks in external repositories. This lets you\nkeep proprietary"
  },
  {
    "path": "docs/evaluation/formal-math.md",
    "chars": 5503,
    "preview": "# Math (formal language)\n\nWe support formal-math evaluation in Lean 4. The task is to generate a Lean 4 proof of a given"
  },
  {
    "path": "docs/evaluation/index.md",
    "chars": 13037,
    "preview": "# Evaluation\n\nWe support many popular benchmarks and it's easy to add new in the future. The following categories of ben"
  },
  {
    "path": "docs/evaluation/instruction-following.md",
    "chars": 628,
    "preview": "# Instruction following\n\nMore details are coming soon!\n\n## Supported benchmarks\n\n### ifbench\n\n- Benchmark is defined in "
  },
  {
    "path": "docs/evaluation/long-context.md",
    "chars": 5898,
    "preview": "# Long-context\n\nMore details are coming soon!\n\n## Supported benchmarks\n\n### ruler\n\n- Benchmark is defined in [`nemo_skil"
  },
  {
    "path": "docs/evaluation/multilingual.md",
    "chars": 13368,
    "preview": "# Multilingual\n\nOur multilingual benchmarks cover things like multilingual reasoning as well as machine translation.\n\nAl"
  },
  {
    "path": "docs/evaluation/natural-math.md",
    "chars": 12894,
    "preview": "# Math (natural language)\n\nThis section details how to evaluate natural language math benchmarks. For all benchmarks in "
  },
  {
    "path": "docs/evaluation/other-benchmarks.md",
    "chars": 11769,
    "preview": "# Other benchmarks\n\nMore details are coming soon!\n\n## Supported benchmarks\n\n### arena-hard\n\n- Benchmark is defined in [`"
  },
  {
    "path": "docs/evaluation/robustness.md",
    "chars": 6212,
    "preview": "# Robustness Evaluation\n`robust_eval` is built on top of `ns eval` to evaluate the model on multiple benchmarks using di"
  },
  {
    "path": "docs/evaluation/scientific-knowledge.md",
    "chars": 4731,
    "preview": "# Scientific Knowledge\n\nNemo-Skills can be used to evaluate an LLM on various STEM datasets.\n\n## Dataset Overview\n\n| <di"
  },
  {
    "path": "docs/evaluation/speculative-decoding.md",
    "chars": 4878,
    "preview": "# Speculative Decoding\n\nThis section details how to evaluate speculative decoding (SD) benchmarks.\nSD has emerged as a l"
  },
  {
    "path": "docs/evaluation/speech-audio.md",
    "chars": 26421,
    "preview": "# Speech & Audio\n\nThis section details how to evaluate speech and audio benchmarks, including understanding tasks that t"
  },
  {
    "path": "docs/evaluation/tool-calling.md",
    "chars": 5560,
    "preview": "# Tool-calling\n\n## Supported benchmarks\n\n## bfcl_v3\n\nBFCL v3 consists of seventeen distinct evaluation subsets that comp"
  },
  {
    "path": "docs/evaluation/vlm.md",
    "chars": 4958,
    "preview": "# Vision-Language Models (VLM)\n\nThis section details how to evaluate Vision-Language Model (VLM) benchmarks that require"
  },
  {
    "path": "docs/index.md",
    "chars": 3727,
    "preview": "---\nhide:\n  - navigation\n  - toc\n---\n\n[Nemo-Skills](https://github.com/NVIDIA-NeMo/Skills) is a collection of pipelines "
  },
  {
    "path": "docs/pipelines/decontamination.md",
    "chars": 3236,
    "preview": "# LLM-based data decontamination\n\n!!! info\n\n    This pipeline starting script is [nemo_skills/pipeline/generate.py](http"
  },
  {
    "path": "docs/pipelines/evaluation.md",
    "chars": 495,
    "preview": "# Model evaluation\n\n!!! info\n\n    This pipeline starting script is [nemo_skills/pipeline/eval.py](https://github.com/NVI"
  },
  {
    "path": "docs/pipelines/generation.md",
    "chars": 19581,
    "preview": "# Generation\n\n!!! info\n\n    This pipeline starting script is [nemo_skills/pipeline/generate.py](https://github.com/NVIDI"
  },
  {
    "path": "docs/pipelines/index.md",
    "chars": 9644,
    "preview": "# Pipelines\n\n## Basics\n\nNemo-Skills has a large collection of building blocks that you can use to construct various pipe"
  },
  {
    "path": "docs/pipelines/llm-as-a-judge.md",
    "chars": 3885,
    "preview": "# LLM-as-a-judge for math evaluation\n\n!!! info\n\n    This pipeline starting script is [nemo_skills/pipeline/generate.py]("
  },
  {
    "path": "docs/pipelines/run-cmd.md",
    "chars": 2786,
    "preview": "# Running arbitrary commands\n\n!!! info\n\n    This pipeline starting script is [nemo_skills/pipeline/run_cmd.py](https://g"
  },
  {
    "path": "docs/pipelines/start-server.md",
    "chars": 3226,
    "preview": "# Starting a Model Server\n\n!!! info\n\n    This pipeline starting script is [nemo_skills/pipeline/start_server.py](https:/"
  },
  {
    "path": "docs/pipelines/training-verl.md",
    "chars": 1312,
    "preview": "# Training using verl\n\n!!! info\n\n    The pipeline starting script is\n\n    * [nemo_skills/pipeline/verl/ppo.py](https://g"
  },
  {
    "path": "docs/pipelines/training.md",
    "chars": 5159,
    "preview": "# Training using NeMo-RL\n\n!!! info\n\n    This pipeline starting script is [nemo_skills/pipeline/nemo_rl/sft.py](https://g"
  },
  {
    "path": "docs/recipes/libtrace.md",
    "chars": 1291,
    "preview": "# LibTrace\n\nLibTrace is a recipe for building domain-specific reasoning data from library\nAPIs. It harvests docstrings, "
  },
  {
    "path": "docs/releases/index.md",
    "chars": 1734,
    "preview": "---\ntitle: Papers & Releases\nhide:\n  - toc\n---\n\nOn this page you can find a list of papers, model and dataset releases t"
  },
  {
    "path": "docs/releases/nemotron-math-v2/dataset.md",
    "chars": 9210,
    "preview": "# Dataset construction\n\nNemotron-Math-v2 dataset consists of mathematical problems collected from [AoPS forums](https://"
  },
  {
    "path": "docs/releases/nemotron-math-v2/evaluation.md",
    "chars": 5140,
    "preview": "# Model evaluation\n\nHere are the commands you can run to reproduce our evaluation numbers.\n\n\n## Prepare evaluation data\n"
  },
  {
    "path": "docs/releases/nemotron-math-v2/index.md",
    "chars": 1209,
    "preview": "---\ndate: 2025-12-15\n---\n\n# Nemotron-Math-v2\n\n## Nemotron-Math-v2 Dataset\n\nUsing our pipelines we created [Nemotron-Math"
  },
  {
    "path": "docs/releases/nemotron-math-v2/training.md",
    "chars": 3574,
    "preview": "# Model training\n\nWe assume you have `/workspace` defined in your [cluster config](../../basics/cluster-configs.md) and\n"
  },
  {
    "path": "docs/releases/nemotronmathproofs/index.md",
    "chars": 14939,
    "preview": "---\ndate: 2025-12-15\n---\n\n# Nemotron-Math-Proofs\n\n## Dataset Overview\n\nUsing our pipelines we created [Nemotron-Math-Pro"
  },
  {
    "path": "docs/releases/opencodereasoning/dataset.md",
    "chars": 4638,
    "preview": "# Dataset construction\n\n[OpenCodeReasoning-1](https://huggingface.co/datasets/nvidia/OpenCodeReasoning) and [OpenCodeRea"
  },
  {
    "path": "docs/releases/opencodereasoning/evaluation.md",
    "chars": 1167,
    "preview": "# Model evaluation\n\nHere are the commands you can run to reproduce our evaluation numbers.\nThe commands below are for [n"
  },
  {
    "path": "docs/releases/opencodereasoning/index.md",
    "chars": 370,
    "preview": "# OpenCodeReasoning\n\nThis section has instructions for training a model that attains results similar to\n[OpenCodeReasoni"
  },
  {
    "path": "docs/releases/openmathinstruct2/dataset.md",
    "chars": 10372,
    "preview": "# Dataset construction\n\nHere are the commands you can run to re-create [OpenMathInstruct-2 dataset](https://huggingface."
  },
  {
    "path": "docs/releases/openmathinstruct2/evaluation.md",
    "chars": 5692,
    "preview": "# Model evaluation\n\nHere are the commands you can run to reproduce our evaluation numbers.\nThe commands below are for Op"
  },
  {
    "path": "docs/releases/openmathinstruct2/index.md",
    "chars": 3748,
    "preview": "# OpenMathInstruct-2\n\nUsing our pipelines we created [OpenMathInstruct-2 dataset](https://huggingface.co/datasets/nvidia"
  },
  {
    "path": "docs/releases/openmathinstruct2/training.md",
    "chars": 3975,
    "preview": "# Model training\n\nWe assume you have `/workspace` defined in your [cluster config](../../basics/cluster-configs.md) and "
  },
  {
    "path": "docs/releases/openmathreasoning/dataset.md",
    "chars": 12111,
    "preview": "# Dataset construction\n\nOpenMathReasoning-1 dataset consists of mathematical problems collected from [AoPS community for"
  },
  {
    "path": "docs/releases/openmathreasoning/evaluation.md",
    "chars": 5756,
    "preview": "# Model evaluation\n\nHere are the commands you can run to reproduce our evaluation numbers.\nThe commands below are for [O"
  },
  {
    "path": "docs/releases/openmathreasoning/index.md",
    "chars": 6919,
    "preview": "---\ndate: 2025-04-23\n---\n\n# OpenMathReasoning\n\n## OpenMathReasoning Dataset\n\nUsing our pipelines we created [OpenMathRea"
  },
  {
    "path": "docs/releases/openmathreasoning/training.md",
    "chars": 8870,
    "preview": "# Model training\n\nWe assume you have `/workspace` defined in your [cluster config](../../basics/cluster-configs.md) and\n"
  },
  {
    "path": "docs/releases/openreasoning/dataset.md",
    "chars": 8202,
    "preview": "# Dataset construction\n\n!!! note\n\n    This page has instructions for how to re-generate datasets from scratch. If you ju"
  },
  {
    "path": "docs/releases/openreasoning/evaluation.md",
    "chars": 1910,
    "preview": "# Model evaluation\n\nHere are the commands you can run to reproduce our evaluation numbers.\nWe assume you have `/workspac"
  },
  {
    "path": "docs/releases/openreasoning/index.md",
    "chars": 4042,
    "preview": "---\ndate: 2025-07-18\n---\n\n# OpenReasoning\n\nWe released OpenReasoning-Nemotrons: a suite of reasoning-capable large langu"
  },
  {
    "path": "docs/releases/openreasoning/training.md",
    "chars": 6181,
    "preview": "# Model training\n\n## Download data and convert to SFT format\n\nOpenReasoning dataset consists of 5 independent parts:\n\n* "
  },
  {
    "path": "docs/tutorials/index.md",
    "chars": 39,
    "preview": "---\ntitle: Tutorials\nhide:\n  - toc\n---\n"
  },
  {
    "path": "docs/tutorials/notebooks/demo_aimo_inference.ipynb",
    "chars": 18135,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 1,\n   \"id\": \"62aa155e-9613-4ca9-a88d-d11ce4ac8b0f\",\n   \""
  },
  {
    "path": "docs/tutorials/notebooks/prepare_calibration_data.py",
    "chars": 1970,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "docs/tutorials/posts/gpt-oss-python.md",
    "chars": 11907,
    "preview": "---\ndate: 2025-08-29\nreadtime: 5\nhide:\n  - toc\n---\n\n# Inference with gpt-oss-120b using stateful Python code execution\n\n"
  },
  {
    "path": "docs/tutorials/posts/llama-nemotron-super-v1.5-evals.md",
    "chars": 25501,
    "preview": "---\ndate: 2025-08-15\nreadtime: 15\nhide:\n  - toc\n---\n\n# Reproducing Llama-Nemotron-Super-49B-V1.5 Evals\n\nIn this tutorial"
  },
  {
    "path": "docs/tutorials/posts/nemotron-nano-v2-evals.md",
    "chars": 18389,
    "preview": "---\ndate: 2025-08-22\nreadtime: 10\nhide:\n  - toc\n---\n\n# Reproducing NVIDIA-Nemotron-Nano-9B-v2 Evals\n\nIn this tutorial, w"
  },
  {
    "path": "docs/tutorials/posts/noc-reasoning-agent.md",
    "chars": 25711,
    "preview": "---\ndate: 2026-02-27\nreadtime: 30\nhide:\n  - toc\n---\n\n# Teaching a Model to Reason Over Telecom Network Incidents\n\nThis t"
  },
  {
    "path": "docs/tutorials/posts/omr-simple-recipe.md",
    "chars": 13712,
    "preview": "---\ndate: 2025-07-10\nreadtime: 20\n---\n\n# A Simple Pipeline to Improve Math Reasoning Accuracy\n\nThis tutorial walks you t"
  },
  {
    "path": "greptile.json",
    "chars": 671,
    "preview": "{\n  \"commentTypes\": [\"logic\", \"syntax\", \"style\"],\n  \"instructions\": \"Review code following the guidelines in CONTRIBUTIN"
  },
  {
    "path": "mkdocs.yml",
    "chars": 4350,
    "preview": "strict: true\nsite_name: Nemo-Skills\nsite_url: https://nvidia-nemo.github.io/Skills\nextra_css:\n  - css/extra.css\nplugins:"
  },
  {
    "path": "nemo_skills/__init__.py",
    "chars": 1201,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/_cli_stub.py",
    "chars": 766,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/code_execution/__init__.py",
    "chars": 721,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/code_execution/local_sandbox/__init__.py",
    "chars": 610,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/code_execution/local_sandbox/local_sandbox_server.py",
    "chars": 32532,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/code_execution/local_sandbox/start_local_sandbox.sh",
    "chars": 1204,
    "preview": "#!/bin/bash\n\n# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Vers"
  },
  {
    "path": "nemo_skills/code_execution/proof_utils.py",
    "chars": 13985,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/code_execution/sandbox.py",
    "chars": 18173,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/code_execution/utils.py",
    "chars": 4805,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/conversion/__init__.py",
    "chars": 610,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/conversion/hf_to_nemo_llama.py",
    "chars": 15566,
    "preview": "# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/conversion/hf_to_nemo_qwen.py",
    "chars": 14667,
    "preview": "# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/conversion/hf_to_trtllm_quantize.py",
    "chars": 8354,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/conversion/nemo_config_llama.yaml",
    "chars": 14143,
    "preview": "# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/conversion/nemo_config_qwen.yaml",
    "chars": 13917,
    "preview": "name: megatron_qwen2\nrestore_from_path: null # used when starting from a .nemo file\n\ntrainer:\n  devices: 1\n  num_nodes: "
  },
  {
    "path": "nemo_skills/conversion/nemo_to_hf_llama.py",
    "chars": 11781,
    "preview": "# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/conversion/nemo_to_hf_qwen.py",
    "chars": 11229,
    "preview": "# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/__init__.py",
    "chars": 610,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/aai/__init__.py",
    "chars": 2291,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/aai/aai_score.py",
    "chars": 1835,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/aai/prepare.py",
    "chars": 869,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/aalcr/__init__.py",
    "chars": 1295,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/aalcr/prepare.py",
    "chars": 10203,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/aime24/__init__.py",
    "chars": 797,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/aime24/prepare.py",
    "chars": 899,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/aime24/test.txt",
    "chars": 61514,
    "preview": "{\"id\": \"aime24-0\", \"problem\": \"Among the 900 residents of Aimeville, there are 195 who own a diamond ring, 367 who own a"
  },
  {
    "path": "nemo_skills/dataset/aime24-x/__init__.py",
    "chars": 728,
    "preview": "# Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.\n#\n# Licensed under the Apache License, Vers"
  },
  {
    "path": "nemo_skills/dataset/aime24-x/aime24_x_utils.py",
    "chars": 2798,
    "preview": "# Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.\n#\n# Licensed under the Apache License, Vers"
  },
  {
    "path": "nemo_skills/dataset/aime24-x/prepare.py",
    "chars": 2668,
    "preview": "# Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.\n#\n# Licensed under the Apache License, Vers"
  },
  {
    "path": "nemo_skills/dataset/aime25/__init__.py",
    "chars": 797,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/aime25/prepare.py",
    "chars": 899,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/aime25/test.txt",
    "chars": 61949,
    "preview": "{\"id\": \"aime25-0\", \"problem\": \"Find the sum of all integer bases  $b>9$  for which  $17_b$  is a divisor of  $97_b.$\", \""
  },
  {
    "path": "nemo_skills/dataset/aime25-x/__init__.py",
    "chars": 728,
    "preview": "# Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.\n#\n# Licensed under the Apache License, Vers"
  },
  {
    "path": "nemo_skills/dataset/aime25-x/aime25_x_utils.py",
    "chars": 2798,
    "preview": "# Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.\n#\n# Licensed under the Apache License, Vers"
  },
  {
    "path": "nemo_skills/dataset/aime25-x/prepare.py",
    "chars": 2668,
    "preview": "# Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.\n#\n# Licensed under the Apache License, Vers"
  },
  {
    "path": "nemo_skills/dataset/aime26/__init__.py",
    "chars": 797,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/aime26/prepare.py",
    "chars": 1665,
    "preview": "# Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.\n#\n# Licensed under the Apache License, Vers"
  },
  {
    "path": "nemo_skills/dataset/algebra222/__init__.py",
    "chars": 797,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/algebra222/prepare.py",
    "chars": 1719,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/amc23/__init__.py",
    "chars": 797,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/amc23/prepare.py",
    "chars": 730,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/answer-judge/__init__.py",
    "chars": 887,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/answer-judge/prepare.py",
    "chars": 7434,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/apex-shortlist/__init__.py",
    "chars": 797,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/apex-shortlist/prepare.py",
    "chars": 1277,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.\n#\n# Licensed under the Apache License, Vers"
  },
  {
    "path": "nemo_skills/dataset/arena-hard/__init__.py",
    "chars": 1051,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/arena-hard/prepare.py",
    "chars": 2530,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/arena-hard-v2/__init__.py",
    "chars": 1051,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/arena-hard-v2/prepare.py",
    "chars": 3069,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/asdiv/__init__.py",
    "chars": 797,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/asdiv/prepare.py",
    "chars": 1963,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/asr-leaderboard/__init__.py",
    "chars": 991,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/asr-leaderboard/prepare.py",
    "chars": 7475,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/audiobench/__init__.py",
    "chars": 1380,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.\n#\n# Licensed under the Apache License, Vers"
  },
  {
    "path": "nemo_skills/dataset/audiobench/judge/__init__.py",
    "chars": 1577,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.\n#\n# Licensed under the Apache License, Vers"
  },
  {
    "path": "nemo_skills/dataset/audiobench/nonjudge/__init__.py",
    "chars": 1198,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.\n#\n# Licensed under the Apache License, Vers"
  },
  {
    "path": "nemo_skills/dataset/audiobench/prepare.py",
    "chars": 22268,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.\n#\n# Licensed under the Apache License, Vers"
  },
  {
    "path": "nemo_skills/dataset/beyond-aime/__init__.py",
    "chars": 797,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/beyond-aime/prepare.py",
    "chars": 1221,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.\n#\n# Licensed under the Apache License, Vers"
  },
  {
    "path": "nemo_skills/dataset/bfcl_v3/__init__.py",
    "chars": 1990,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v3/bfcl_score.py",
    "chars": 6457,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v3/constants.py",
    "chars": 2995,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v3/irrelevance/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v3/java/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v3/javascript/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v3/live_irrelevance/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v3/live_multiple/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v3/live_parallel/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v3/live_parallel_multiple/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v3/live_relevance/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v3/live_simple/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v3/multi_turn_base/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v3/multi_turn_long_context/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v3/multi_turn_miss_func/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v3/multi_turn_miss_param/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v3/multiple/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v3/parallel/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v3/parallel_multiple/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v3/prepare.py",
    "chars": 8571,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v3/simple/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v3/simple_java/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v3/simple_javascript/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v3/simple_python/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v3/utils.py",
    "chars": 6527,
    "preview": "# Copyright 2023 https://github.com/ShishirPatil/gorilla\n#\n# Licensed under the Apache License, Version 2.0 (the \"Licens"
  },
  {
    "path": "nemo_skills/dataset/bfcl_v4/__init__.py",
    "chars": 2344,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v4/bfcl_score.py",
    "chars": 5314,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v4/irrelevance/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v4/live_irrelevance/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v4/live_multiple/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v4/live_parallel/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v4/live_parallel_multiple/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v4/live_relevance/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v4/live_simple/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v4/memory_kv/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "nemo_skills/dataset/bfcl_v4/memory_rec_sum/__init__.py",
    "chars": 724,
    "preview": "# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  }
]

// ... and 974 more files (download for full content)

About this extraction

This page contains the full source code of the Kipok/NeMo-Skills GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 1174 files (7.1 MB), approximately 1.9M tokens, and a symbol index with 3360 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!