Full Code of hpcaitech/ColossalAI for AI

main b1915d288954 cached

2185 files

13.0 MB

3.5M tokens

12172 symbols

1 requests

Copy disabled (too large) Download .txt

Showing preview only (14,076K chars total). Download the full file to get everything.

Repository: hpcaitech/ColossalAI
Branch: main
Commit: b1915d288954
Files: 2185
Total size: 13.0 MB

Directory structure:
gitextract_ft07ahjp/

├── .clang-format
├── .compatibility
├── .coveragerc
├── .cuda_ext.json
├── .github/
│   ├── CODEOWNERS
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug-report.yml
│   │   ├── config.yml
│   │   ├── documentation.yml
│   │   ├── feature_request.yml
│   │   └── proposal.yml
│   ├── pull_request_template.md
│   └── workflows/
│       ├── README.md
│       ├── build_on_pr.yml
│       ├── build_on_schedule.yml
│       ├── close_inactive.yml
│       ├── compatiblity_test_on_dispatch.yml
│       ├── compatiblity_test_on_pr.yml
│       ├── compatiblity_test_on_schedule.yml
│       ├── cuda_ext_check_before_merge.yml
│       ├── doc_build_on_schedule_after_release.yml
│       ├── doc_check_on_pr.yml
│       ├── doc_test_on_pr.yml
│       ├── doc_test_on_schedule.yml
│       ├── draft_github_release_post_after_merge.yml
│       ├── example_check_on_dispatch.yml
│       ├── example_check_on_pr.yml
│       ├── example_check_on_schedule.yml
│       ├── release_docker_after_publish.yml
│       ├── release_nightly_on_schedule.yml
│       ├── release_pypi_after_merge.yml
│       ├── release_test_pypi_before_merge.yml
│       ├── report_leaderboard_to_lark.yml
│       ├── report_test_coverage.yml
│       ├── run_chatgpt_examples.yml
│       ├── run_chatgpt_unit_tests.yml
│       ├── run_colossalqa_unit_tests.yml
│       ├── scripts/
│       │   ├── check_doc_i18n.py
│       │   ├── example_checks/
│       │   │   ├── check_dispatch_inputs.py
│       │   │   ├── check_example_weekly.py
│       │   │   └── detect_changed_example.py
│       │   ├── generate_leaderboard_and_send_to_lark.py
│       │   ├── generate_release_draft.py
│       │   ├── send_message_to_lark.py
│       │   └── update_setup_for_nightly.py
│       ├── submodule.yml
│       └── translate_comment.yml
├── .gitignore
├── .gitmodules
├── .isort.cfg
├── .pre-commit-config.yaml
├── CHANGE_LOG.md
├── CONTRIBUTING.md
├── LICENSE
├── MANIFEST.in
├── README.md
├── applications/
│   ├── Colossal-LLaMA/
│   │   ├── README.md
│   │   ├── colossal_llama/
│   │   │   ├── __init__.py
│   │   │   ├── dataset/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── conversation.py
│   │   │   │   ├── dummy_dataset.py
│   │   │   │   ├── loader.py
│   │   │   │   └── spliced_and_tokenized_dataset.py
│   │   │   ├── model/
│   │   │   │   └── init_model.py
│   │   │   ├── tokenizer/
│   │   │   │   └── init_tokenizer.py
│   │   │   └── utils/
│   │   │       ├── __init__.py
│   │   │       ├── ckpt_io.py
│   │   │       ├── froze.py
│   │   │       ├── neftune_patch.py
│   │   │       ├── stream_chat_patch.py
│   │   │       └── utils.py
│   │   ├── dataset/
│   │   │   ├── prepare_pretrain_dataset.py
│   │   │   └── prepare_sft_dataset.py
│   │   ├── docs/
│   │   │   ├── example_13b.md
│   │   │   └── example_7b.md
│   │   ├── hostfile.example
│   │   ├── inference/
│   │   │   ├── inference_example.py
│   │   │   └── stream_chat_example.py
│   │   ├── requirements.txt
│   │   ├── setup.py
│   │   ├── train.example.sh
│   │   ├── train.py
│   │   ├── train_sft.example.sh
│   │   └── version.txt
│   ├── ColossalChat/
│   │   ├── .gitignore
│   │   ├── LICENSE
│   │   ├── README.md
│   │   ├── benchmarks/
│   │   │   ├── Opt.json
│   │   │   ├── README.md
│   │   │   ├── benchmark_dpo.sh
│   │   │   ├── benchmark_kto.sh
│   │   │   ├── benchmark_memory_consumption.txt
│   │   │   ├── benchmark_orpo.sh
│   │   │   ├── benchmark_performance_summarization.txt
│   │   │   ├── benchmark_ppo.py
│   │   │   ├── benchmark_ppo.sh
│   │   │   ├── benchmark_sft.sh
│   │   │   ├── benchmark_simpo.sh
│   │   │   ├── data_preparation.sh
│   │   │   ├── dummy_dataset.py
│   │   │   ├── prepare_dummy_test_dataset.py
│   │   │   └── ray/
│   │   │       ├── 1mmt_dummy.py
│   │   │       └── mmmt_dummy.py
│   │   ├── coati/
│   │   │   ├── __init__.py
│   │   │   ├── dataset/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── conversation.py
│   │   │   │   ├── loader.py
│   │   │   │   ├── tokenization_utils.py
│   │   │   │   └── utils.py
│   │   │   ├── distributed/
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── comm.py
│   │   │   │   ├── consumer.py
│   │   │   │   ├── grpo_consumer.py
│   │   │   │   ├── inference_backend.py
│   │   │   │   ├── launch.py
│   │   │   │   ├── launch_zero_bubble.py
│   │   │   │   ├── loss.py
│   │   │   │   ├── producer.py
│   │   │   │   ├── profiling_utils.py
│   │   │   │   ├── reward/
│   │   │   │   │   ├── code_reward/
│   │   │   │   │   │   ├── testing_util.py
│   │   │   │   │   │   └── utils.py
│   │   │   │   │   ├── reward_fn.py
│   │   │   │   │   ├── reward_utils.py
│   │   │   │   │   └── verifiable_reward.py
│   │   │   │   ├── utils.py
│   │   │   │   └── zero_bubble/
│   │   │   │       ├── README.md
│   │   │   │       ├── __init__.py
│   │   │   │       ├── consumer.py
│   │   │   │       ├── distributor.py
│   │   │   │       ├── grpo_consumer.py
│   │   │   │       ├── producer.py
│   │   │   │       └── requirements.txt
│   │   │   ├── experience_buffer/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base.py
│   │   │   │   ├── naive.py
│   │   │   │   └── utils.py
│   │   │   ├── experience_maker/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base.py
│   │   │   │   └── naive.py
│   │   │   ├── models/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base.py
│   │   │   │   ├── critic.py
│   │   │   │   ├── generation.py
│   │   │   │   ├── lora.py
│   │   │   │   ├── loss.py
│   │   │   │   ├── reward_model.py
│   │   │   │   ├── rlvr_reward_model.py
│   │   │   │   └── utils.py
│   │   │   ├── quant/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── llama_gptq/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── loader.py
│   │   │   │   │   ├── model_utils.py
│   │   │   │   │   └── quant.py
│   │   │   │   └── utils.py
│   │   │   ├── ray/
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── callbacks/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   └── performance_evaluator.py
│   │   │   │   ├── detached_replay_buffer.py
│   │   │   │   ├── detached_trainer_base.py
│   │   │   │   ├── detached_trainer_ppo.py
│   │   │   │   ├── experience_maker_holder.py
│   │   │   │   ├── lora_constructor.py
│   │   │   │   └── utils.py
│   │   │   ├── trainer/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base.py
│   │   │   │   ├── callbacks/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   └── performance_evaluator.py
│   │   │   │   ├── dpo.py
│   │   │   │   ├── grpo.py
│   │   │   │   ├── kto.py
│   │   │   │   ├── orpo.py
│   │   │   │   ├── ppo.py
│   │   │   │   ├── rm.py
│   │   │   │   ├── sft.py
│   │   │   │   └── utils.py
│   │   │   └── utils/
│   │   │       ├── __init__.py
│   │   │       ├── accumulative_meter.py
│   │   │       ├── ckpt_io.py
│   │   │       └── reward_score/
│   │   │           ├── __init__.py
│   │   │           ├── competition.py
│   │   │           ├── gsm8k.py
│   │   │           └── utils.py
│   │   ├── conversation_template/
│   │   │   ├── 01-ai_Yi-1.5-9B-Chat.json
│   │   │   ├── MiniCPM-2b.json
│   │   │   ├── Qwen_Qwen1.5-110B-Chat.json
│   │   │   ├── Qwen_Qwen1.5-32B-Chat.json
│   │   │   ├── Qwen_Qwen2.5-3B.json
│   │   │   ├── THUDM_chatglm2-6b.json
│   │   │   ├── THUDM_chatglm3-6b.json
│   │   │   ├── baichuan-inc_Baichuan2-13B-Chat.json
│   │   │   ├── colossal-llama2.json
│   │   │   ├── deepseek-ai_DeepSeek-V2-Lite.json
│   │   │   ├── llama2.json
│   │   │   ├── microsoft_phi-2.json
│   │   │   ├── mistralai_Mixtral-8x7B-Instruct-v0.1.json
│   │   │   └── tiny-llama.json
│   │   ├── examples/
│   │   │   ├── README.md
│   │   │   ├── community/
│   │   │   │   ├── README.md
│   │   │   │   ├── peft/
│   │   │   │   │   ├── README.md
│   │   │   │   │   ├── easy_dataset.py
│   │   │   │   │   ├── easy_models.py
│   │   │   │   │   ├── train_peft_prompts.py
│   │   │   │   │   └── train_peft_sft.py
│   │   │   │   └── ray/
│   │   │   │       ├── README.md
│   │   │   │       ├── ray_job_script.py
│   │   │   │       └── train_prompts_on_ray.py
│   │   │   ├── data_preparation_scripts/
│   │   │   │   ├── prepare_dataset.py
│   │   │   │   ├── prepare_kto_dataset.sh
│   │   │   │   ├── prepare_preference_dataset.sh
│   │   │   │   ├── prepare_prompt_dataset.sh
│   │   │   │   └── prepare_sft_dataset.sh
│   │   │   ├── inference/
│   │   │   │   ├── chatio.py
│   │   │   │   ├── inference.py
│   │   │   │   └── web_chatbot/
│   │   │   │       ├── README.md
│   │   │   │       ├── locustfile.py
│   │   │   │       ├── requirements.txt
│   │   │   │       ├── server.py
│   │   │   │       └── utils.py
│   │   │   ├── requirements.txt
│   │   │   └── training_scripts/
│   │   │       ├── hostfile
│   │   │       ├── lora_config.json
│   │   │       ├── lora_finetune.py
│   │   │       ├── lora_sft_data.jsonl
│   │   │       ├── train_dpo.py
│   │   │       ├── train_dpo.sh
│   │   │       ├── train_grpo.py
│   │   │       ├── train_grpo.sh
│   │   │       ├── train_kto.py
│   │   │       ├── train_kto.sh
│   │   │       ├── train_orpo.py
│   │   │       ├── train_orpo.sh
│   │   │       ├── train_ppo.py
│   │   │       ├── train_ppo.sh
│   │   │       ├── train_rm.py
│   │   │       ├── train_rm.sh
│   │   │       ├── train_sft.py
│   │   │       └── train_sft.sh
│   │   ├── profiling.sh
│   │   ├── pytest.ini
│   │   ├── rl_example.py
│   │   ├── rl_example_zero_bubble.py
│   │   ├── setup.py
│   │   ├── start_code_verifier.py
│   │   ├── tests/
│   │   │   ├── __init__.py
│   │   │   ├── generate_dummy_datasets_for_testing.py
│   │   │   ├── llama.json
│   │   │   ├── opt.json
│   │   │   ├── prepare_test_env.sh
│   │   │   ├── test_data/
│   │   │   │   ├── dpo/
│   │   │   │   │   └── test_dpo_data.jsonl
│   │   │   │   ├── kto/
│   │   │   │   │   └── test_kto_data.jsonl
│   │   │   │   └── sft/
│   │   │   │       └── test_sft_data.jsonl
│   │   │   ├── test_data_preparation.sh
│   │   │   ├── test_lora.py
│   │   │   ├── test_templating.sh
│   │   │   ├── test_train.sh
│   │   │   └── verify_chat_data.py
│   │   └── visualization.py
│   ├── ColossalEval/
│   │   ├── README.md
│   │   ├── colossal_eval/
│   │   │   ├── __init__.py
│   │   │   ├── dataset/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── agieval.py
│   │   │   │   ├── base.py
│   │   │   │   ├── ceval.py
│   │   │   │   ├── cmmlu.py
│   │   │   │   ├── colossalai.py
│   │   │   │   ├── cvalues.py
│   │   │   │   ├── gaokaobench.py
│   │   │   │   ├── gsm.py
│   │   │   │   ├── longbench.py
│   │   │   │   ├── mmlu.py
│   │   │   │   ├── mtbench.py
│   │   │   │   ├── safetybench_en.py
│   │   │   │   └── safetybench_zh.py
│   │   │   ├── evaluate/
│   │   │   │   ├── GPT Evaluation.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── dataset_evaluator/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── dataset_evaluator.py
│   │   │   │   │   ├── gpt_judge.py
│   │   │   │   │   └── metrics.py
│   │   │   │   ├── evaluator.py
│   │   │   │   ├── gpt_evaluate.py
│   │   │   │   └── utils.py
│   │   │   ├── models/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base.py
│   │   │   │   ├── chatglm.py
│   │   │   │   ├── huggingface.py
│   │   │   │   └── vllm.py
│   │   │   └── utils/
│   │   │       ├── __init__.py
│   │   │       ├── conversation.py
│   │   │       └── utilities.py
│   │   ├── configs/
│   │   │   └── gpt_evaluation/
│   │   │       ├── config/
│   │   │       │   ├── config_cn.json
│   │   │       │   └── config_en.json
│   │   │       ├── data/
│   │   │       │   ├── eval_cn_examples.json
│   │   │       │   └── eval_en_examples.json
│   │   │       └── prompt/
│   │   │           ├── battle_prompt/
│   │   │           │   ├── battle_prompt_cn.json
│   │   │           │   └── battle_prompt_en.json
│   │   │           └── evaluation_prompt/
│   │   │               ├── evaluation_prompt_cn.json
│   │   │               └── evaluation_prompt_en.json
│   │   ├── examples/
│   │   │   ├── dataset_evaluation/
│   │   │   │   ├── config/
│   │   │   │   │   ├── evaluation/
│   │   │   │   │   │   └── config.json
│   │   │   │   │   └── inference/
│   │   │   │   │       └── config.json
│   │   │   │   ├── eval_dataset.py
│   │   │   │   ├── eval_dataset.sh
│   │   │   │   ├── inference.py
│   │   │   │   └── inference.sh
│   │   │   └── gpt_evaluation/
│   │   │       ├── config/
│   │   │       │   ├── evaluation/
│   │   │       │   │   └── config.json
│   │   │       │   └── inference/
│   │   │       │       └── config.json
│   │   │       ├── eval.py
│   │   │       ├── eval.sh
│   │   │       ├── inference.py
│   │   │       └── inference.sh
│   │   ├── requirements.txt
│   │   └── setup.py
│   ├── ColossalMoE/
│   │   ├── README.md
│   │   ├── infer.py
│   │   ├── infer.sh
│   │   ├── requirements.txt
│   │   ├── setup.py
│   │   ├── tests/
│   │   │   └── __init__.py
│   │   ├── train.py
│   │   ├── train.sh
│   │   ├── utils.py
│   │   └── version.txt
│   ├── ColossalQA/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── colossalqa/
│   │   │   ├── __init__.py
│   │   │   ├── chain/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── memory/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── summary.py
│   │   │   │   └── retrieval_qa/
│   │   │   │       ├── __init__.py
│   │   │   │       ├── base.py
│   │   │   │       ├── load_chain.py
│   │   │   │       └── stuff.py
│   │   │   ├── data_loader/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── document_loader.py
│   │   │   │   └── table_dataloader.py
│   │   │   ├── local/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── colossalcloud_llm.py
│   │   │   │   ├── llm.py
│   │   │   │   ├── pangu_llm.py
│   │   │   │   └── utils.py
│   │   │   ├── memory.py
│   │   │   ├── mylogging.py
│   │   │   ├── prompt/
│   │   │   │   ├── README.md
│   │   │   │   └── prompt.py
│   │   │   ├── retrieval_conversation_en.py
│   │   │   ├── retrieval_conversation_universal.py
│   │   │   ├── retrieval_conversation_zh.py
│   │   │   ├── retriever.py
│   │   │   ├── text_splitter/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── chinese_text_splitter.py
│   │   │   │   └── utils.py
│   │   │   └── utils.py
│   │   ├── data/
│   │   │   ├── data_sample/
│   │   │   │   ├── companies.txt
│   │   │   │   ├── companies_zh.txt
│   │   │   │   ├── csv_organization_100.csv
│   │   │   │   ├── custom_service.json
│   │   │   │   ├── custom_service_classification.json
│   │   │   │   ├── custom_service_preprocessed.json
│   │   │   │   └── luchen_zh.txt
│   │   │   └── tests/
│   │   │       ├── 64KB.json
│   │   │       ├── companies.csv
│   │   │       ├── test.html
│   │   │       ├── test.md
│   │   │       └── test.txt
│   │   ├── examples/
│   │   │   ├── conversation_agent_chatgpt.py
│   │   │   ├── retrieval_conversation_chatgpt.py
│   │   │   ├── retrieval_conversation_en.py
│   │   │   ├── retrieval_conversation_en_customer_service.py
│   │   │   ├── retrieval_conversation_universal.py
│   │   │   ├── retrieval_conversation_zh.py
│   │   │   ├── retrieval_intent_classification_zh_customer_service.py
│   │   │   └── webui_demo/
│   │   │       ├── RAG_ChatBot.py
│   │   │       ├── README.md
│   │   │       ├── config.py
│   │   │       ├── requirements.txt
│   │   │       ├── server.py
│   │   │       ├── utils.py
│   │   │       └── webui.py
│   │   ├── pytest.ini
│   │   ├── requirements.txt
│   │   ├── setup.py
│   │   ├── tests/
│   │   │   ├── __init__.py
│   │   │   ├── test_document_loader.py
│   │   │   ├── test_memory.py
│   │   │   ├── test_retrieval_qa.py
│   │   │   └── test_text_splitter.py
│   │   └── version.txt
│   └── README.md
├── colossalai/
│   ├── _C/
│   │   └── __init__.py
│   ├── __init__.py
│   ├── _analyzer/
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── _subclasses/
│   │   │   ├── __init__.py
│   │   │   ├── _meta_registration.py
│   │   │   ├── _monkey_patch.py
│   │   │   ├── flop_tensor.py
│   │   │   └── meta_tensor.py
│   │   ├── envs.py
│   │   └── fx/
│   │       ├── __init__.py
│   │       ├── codegen.py
│   │       ├── graph_module.py
│   │       ├── node_util.py
│   │       ├── passes/
│   │       │   ├── __init__.py
│   │       │   ├── graph_profile.py
│   │       │   └── shape_prop.py
│   │       ├── symbolic_profile.py
│   │       └── tracer/
│   │           ├── __init__.py
│   │           ├── bias_addition.py
│   │           ├── custom_leaf_module.py
│   │           ├── proxy.py
│   │           ├── symbolic_trace.py
│   │           └── tracer.py
│   ├── accelerator/
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── api.py
│   │   ├── base_accelerator.py
│   │   ├── cpu_accelerator.py
│   │   ├── cuda_accelerator.py
│   │   └── npu_accelerator.py
│   ├── amp/
│   │   ├── __init__.py
│   │   └── naive_amp/
│   │       ├── __init__.py
│   │       ├── grad_scaler/
│   │       │   ├── __init__.py
│   │       │   ├── base_grad_scaler.py
│   │       │   ├── constant_grad_scaler.py
│   │       │   └── dynamic_grad_scaler.py
│   │       ├── mixed_precision_mixin/
│   │       │   ├── __init__.py
│   │       │   ├── base.py
│   │       │   ├── bf16.py
│   │       │   └── fp16.py
│   │       └── mixed_precision_optimizer.py
│   ├── auto_parallel/
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── checkpoint/
│   │   │   ├── __init__.py
│   │   │   ├── build_c_ext.py
│   │   │   ├── ckpt_solver_base.py
│   │   │   ├── ckpt_solver_chen.py
│   │   │   ├── ckpt_solver_rotor.c
│   │   │   ├── ckpt_solver_rotor.py
│   │   │   └── operation.py
│   │   ├── meta_profiler/
│   │   │   ├── __init__.py
│   │   │   ├── constants.py
│   │   │   ├── meta_registry/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── activation.py
│   │   │   │   ├── binary_elementwise_ops.py
│   │   │   │   ├── conv.py
│   │   │   │   ├── embedding.py
│   │   │   │   ├── linear.py
│   │   │   │   ├── non_spmd.py
│   │   │   │   ├── norm.py
│   │   │   │   ├── pooling.py
│   │   │   │   ├── tensor.py
│   │   │   │   └── where.py
│   │   │   ├── registry.py
│   │   │   └── shard_metainfo.py
│   │   ├── offload/
│   │   │   ├── __init__.py
│   │   │   ├── amp_optimizer.py
│   │   │   ├── base_offload_module.py
│   │   │   ├── mem_optimize.py
│   │   │   ├── region.py
│   │   │   ├── region_manager.py
│   │   │   ├── runtime.py
│   │   │   ├── solver.py
│   │   │   ├── training_simulator.py
│   │   │   └── util.py
│   │   ├── passes/
│   │   │   ├── __init__.py
│   │   │   ├── comm_metainfo_pass.py
│   │   │   ├── constants.py
│   │   │   ├── meta_info_prop.py
│   │   │   ├── runtime_apply_pass.py
│   │   │   └── runtime_preparation_pass.py
│   │   ├── pipeline_shard/
│   │   │   └── __init__.py
│   │   └── tensor_shard/
│   │       ├── __init__.py
│   │       ├── constants.py
│   │       ├── initialize.py
│   │       ├── node_handler/
│   │       │   ├── __init__.py
│   │       │   ├── addmm_handler.py
│   │       │   ├── batch_norm_handler.py
│   │       │   ├── binary_elementwise_handler.py
│   │       │   ├── bmm_handler.py
│   │       │   ├── conv_handler.py
│   │       │   ├── default_reshape_handler.py
│   │       │   ├── embedding_handler.py
│   │       │   ├── getattr_handler.py
│   │       │   ├── getitem_handler.py
│   │       │   ├── layer_norm_handler.py
│   │       │   ├── linear_handler.py
│   │       │   ├── matmul_handler.py
│   │       │   ├── node_handler.py
│   │       │   ├── normal_pooling_handler.py
│   │       │   ├── output_handler.py
│   │       │   ├── permute_handler.py
│   │       │   ├── placeholder_handler.py
│   │       │   ├── registry.py
│   │       │   ├── softmax_handler.py
│   │       │   ├── split_handler.py
│   │       │   ├── strategy/
│   │       │   │   ├── __init__.py
│   │       │   │   ├── batch_norm_generator.py
│   │       │   │   ├── binary_elementwise_generator.py
│   │       │   │   ├── conv_strategy_generator.py
│   │       │   │   ├── embedding_generator.py
│   │       │   │   ├── getattr_generator.py
│   │       │   │   ├── getitem_generator.py
│   │       │   │   ├── layer_norm_generator.py
│   │       │   │   ├── matmul_strategy_generator.py
│   │       │   │   ├── normal_pooling_generator.py
│   │       │   │   ├── output_generator.py
│   │       │   │   ├── placeholder_generator.py
│   │       │   │   ├── reshape_generator.py
│   │       │   │   ├── softmax_generator.py
│   │       │   │   ├── strategy_generator.py
│   │       │   │   ├── sum_generator.py
│   │       │   │   ├── tensor_constructor_generator.py
│   │       │   │   ├── unary_elementwise_generator.py
│   │       │   │   └── where_generator.py
│   │       │   ├── sum_handler.py
│   │       │   ├── tensor_constructor_handler.py
│   │       │   ├── transpose_handler.py
│   │       │   ├── unary_elementwise_handler.py
│   │       │   ├── view_handler.py
│   │       │   └── where_handler.py
│   │       ├── options.py
│   │       ├── sharding_strategy.py
│   │       ├── solver/
│   │       │   ├── __init__.py
│   │       │   ├── cost_graph.py
│   │       │   ├── graph_analysis.py
│   │       │   ├── solver.py
│   │       │   └── strategies_constructor.py
│   │       └── utils/
│   │           ├── __init__.py
│   │           ├── broadcast.py
│   │           ├── factory.py
│   │           ├── misc.py
│   │           ├── reshape.py
│   │           └── sharding.py
│   ├── autochunk/
│   │   ├── autochunk_codegen.py
│   │   ├── estimate_memory.py
│   │   ├── reorder_graph.py
│   │   ├── search_chunk.py
│   │   ├── select_chunk.py
│   │   ├── trace_flow.py
│   │   ├── trace_indice.py
│   │   └── utils.py
│   ├── booster/
│   │   ├── __init__.py
│   │   ├── accelerator.py
│   │   ├── booster.py
│   │   ├── mixed_precision/
│   │   │   ├── __init__.py
│   │   │   ├── bf16.py
│   │   │   ├── fp16_apex.py
│   │   │   ├── fp16_naive.py
│   │   │   ├── fp16_torch.py
│   │   │   ├── fp8.py
│   │   │   └── mixed_precision_base.py
│   │   └── plugin/
│   │       ├── __init__.py
│   │       ├── dp_plugin_base.py
│   │       ├── gemini_plugin.py
│   │       ├── hybrid_parallel_plugin.py
│   │       ├── low_level_zero_plugin.py
│   │       ├── moe_hybrid_parallel_plugin.py
│   │       ├── plugin_base.py
│   │       ├── pp_plugin_base.py
│   │       ├── torch_ddp_plugin.py
│   │       └── torch_fsdp_plugin.py
│   ├── checkpoint_io/
│   │   ├── __init__.py
│   │   ├── checkpoint_io_base.py
│   │   ├── general_checkpoint_io.py
│   │   ├── hybrid_parallel_checkpoint_io.py
│   │   ├── index_file.py
│   │   ├── moe_checkpoint.py
│   │   └── utils.py
│   ├── cli/
│   │   ├── __init__.py
│   │   ├── check/
│   │   │   ├── __init__.py
│   │   │   └── check_installation.py
│   │   ├── cli.py
│   │   └── launcher/
│   │       ├── __init__.py
│   │       ├── hostinfo.py
│   │       ├── multinode_runner.py
│   │       └── run.py
│   ├── cluster/
│   │   ├── __init__.py
│   │   ├── device_mesh_manager.py
│   │   ├── dist_coordinator.py
│   │   ├── process_group_manager.py
│   │   └── process_group_mesh.py
│   ├── context/
│   │   ├── __init__.py
│   │   ├── config.py
│   │   └── singleton_meta.py
│   ├── device/
│   │   ├── __init__.py
│   │   ├── alpha_beta_profiler.py
│   │   ├── calc_pipeline_strategy.py
│   │   └── device_mesh.py
│   ├── fx/
│   │   ├── __init__.py
│   │   ├── _compatibility.py
│   │   ├── _meta_regist_12.py
│   │   ├── _meta_regist_13.py
│   │   ├── codegen/
│   │   │   ├── __init__.py
│   │   │   └── activation_checkpoint_codegen.py
│   │   ├── graph_module.py
│   │   ├── passes/
│   │   │   ├── __init__.py
│   │   │   ├── adding_split_node_pass.py
│   │   │   ├── concrete_info_prop.py
│   │   │   ├── experimental/
│   │   │   │   └── adding_shape_consistency_pass.py
│   │   │   ├── meta_info_prop.py
│   │   │   ├── passes_for_gpt2_test.py
│   │   │   ├── shard_1d_pass.py
│   │   │   ├── split_module.py
│   │   │   └── utils.py
│   │   ├── profiler/
│   │   │   ├── __init__.py
│   │   │   ├── constants.py
│   │   │   ├── dataflow.py
│   │   │   ├── experimental/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── constants.py
│   │   │   │   ├── profiler.py
│   │   │   │   ├── profiler_function/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── activation_function.py
│   │   │   │   │   ├── arithmetic.py
│   │   │   │   │   ├── embedding.py
│   │   │   │   │   ├── linear.py
│   │   │   │   │   ├── normalization.py
│   │   │   │   │   ├── pooling.py
│   │   │   │   │   ├── python_ops.py
│   │   │   │   │   └── torch_ops.py
│   │   │   │   ├── profiler_module/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── activation_function.py
│   │   │   │   │   ├── attention.py
│   │   │   │   │   ├── convolution.py
│   │   │   │   │   ├── dropout.py
│   │   │   │   │   ├── embedding.py
│   │   │   │   │   ├── linear.py
│   │   │   │   │   ├── normalization.py
│   │   │   │   │   ├── pooling.py
│   │   │   │   │   ├── rnn.py
│   │   │   │   │   └── torch_op.py
│   │   │   │   ├── registry.py
│   │   │   │   └── shard_utils.py
│   │   │   ├── memory_utils.py
│   │   │   ├── opcount.py
│   │   │   ├── profiler.py
│   │   │   ├── shard_utils.py
│   │   │   └── tensor.py
│   │   ├── proxy.py
│   │   └── tracer/
│   │       ├── __init__.py
│   │       ├── _meta_trace.py
│   │       ├── _symbolic_trace.py
│   │       ├── _tracer_utils.py
│   │       ├── bias_addition_patch/
│   │       │   ├── __init__.py
│   │       │   ├── patched_bias_addition_function/
│   │       │   │   ├── __init__.py
│   │       │   │   ├── addbmm.py
│   │       │   │   ├── addmm.py
│   │       │   │   ├── bias_addition_function.py
│   │       │   │   └── linear.py
│   │       │   └── patched_bias_addition_module/
│   │       │       ├── __init__.py
│   │       │       ├── bias_addition_module.py
│   │       │       ├── conv.py
│   │       │       └── linear.py
│   │       ├── experimental.py
│   │       ├── meta_patch/
│   │       │   ├── __init__.py
│   │       │   ├── patched_function/
│   │       │   │   ├── __init__.py
│   │       │   │   ├── activation_function.py
│   │       │   │   ├── arithmetic.py
│   │       │   │   ├── convolution.py
│   │       │   │   ├── embedding.py
│   │       │   │   ├── normalization.py
│   │       │   │   ├── python_ops.py
│   │       │   │   └── torch_ops.py
│   │       │   └── patched_module/
│   │       │       ├── __init__.py
│   │       │       ├── activation_function.py
│   │       │       ├── convolution.py
│   │       │       ├── embedding.py
│   │       │       ├── linear.py
│   │       │       ├── normalization.py
│   │       │       ├── pooling.py
│   │       │       └── rnn.py
│   │       ├── registry.py
│   │       └── tracer.py
│   ├── inference/
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── batch_bucket.py
│   │   ├── config.py
│   │   ├── core/
│   │   │   ├── __init__.py
│   │   │   ├── async_engine.py
│   │   │   ├── base_engine.py
│   │   │   ├── diffusion_engine.py
│   │   │   ├── engine.py
│   │   │   ├── llm_engine.py
│   │   │   ├── plugin.py
│   │   │   ├── request_handler.py
│   │   │   └── rpc_engine.py
│   │   ├── executor/
│   │   │   ├── __init__.py
│   │   │   └── rpc_worker.py
│   │   ├── flash_decoding_utils.py
│   │   ├── graph_runner.py
│   │   ├── kv_cache/
│   │   │   ├── __init__.py
│   │   │   ├── block_cache.py
│   │   │   └── kvcache_manager.py
│   │   ├── logit_processors.py
│   │   ├── modeling/
│   │   │   ├── __init__.py
│   │   │   ├── backends/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── attention_backend.py
│   │   │   │   └── pre_attention_backend.py
│   │   │   ├── layers/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── attention.py
│   │   │   │   ├── baichuan_tp_linear.py
│   │   │   │   ├── diffusion.py
│   │   │   │   └── distrifusion.py
│   │   │   ├── models/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── glide_llama.py
│   │   │   │   ├── nopadding_baichuan.py
│   │   │   │   ├── nopadding_llama.py
│   │   │   │   ├── pixart_alpha.py
│   │   │   │   └── stablediffusion3.py
│   │   │   └── policy/
│   │   │       ├── __init__.py
│   │   │       ├── glide_llama.py
│   │   │       ├── nopadding_baichuan.py
│   │   │       ├── nopadding_llama.py
│   │   │       ├── pixart_alpha.py
│   │   │       └── stablediffusion3.py
│   │   ├── sampler.py
│   │   ├── server/
│   │   │   ├── __init__.py
│   │   │   ├── api_server.py
│   │   │   ├── chat_service.py
│   │   │   ├── completion_service.py
│   │   │   └── utils.py
│   │   ├── spec/
│   │   │   ├── __init__.py
│   │   │   ├── drafter.py
│   │   │   └── struct.py
│   │   ├── struct.py
│   │   └── utils.py
│   ├── initialize.py
│   ├── interface/
│   │   ├── __init__.py
│   │   ├── model.py
│   │   ├── optimizer.py
│   │   └── pretrained.py
│   ├── kernel/
│   │   ├── __init__.py
│   │   ├── jit/
│   │   │   ├── __init__.py
│   │   │   ├── bias_dropout_add.py
│   │   │   ├── bias_gelu.py
│   │   │   └── option.py
│   │   ├── kernel_loader.py
│   │   └── triton/
│   │       ├── __init__.py
│   │       ├── context_attn_unpad.py
│   │       ├── flash_decoding.py
│   │       ├── fused_rotary_embedding.py
│   │       ├── kvcache_copy.py
│   │       ├── llama_act_combine_kernel.py
│   │       ├── no_pad_rotary_embedding.py
│   │       ├── qkv_matmul_kernel.py
│   │       ├── rms_layernorm.py
│   │       ├── rotary_cache_copy.py
│   │       └── softmax.py
│   ├── lazy/
│   │   ├── __init__.py
│   │   ├── construction.py
│   │   ├── lazy_init.py
│   │   └── pretrained.py
│   ├── legacy/
│   │   ├── __init__.py
│   │   ├── amp/
│   │   │   ├── __init__.py
│   │   │   ├── amp_type.py
│   │   │   ├── apex_amp/
│   │   │   │   ├── __init__.py
│   │   │   │   └── apex_amp.py
│   │   │   ├── naive_amp/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── _fp16_optimizer.py
│   │   │   │   ├── _utils.py
│   │   │   │   └── naive_amp.py
│   │   │   └── torch_amp/
│   │   │       ├── __init__.py
│   │   │       ├── _grad_scaler.py
│   │   │       └── torch_amp.py
│   │   ├── builder/
│   │   │   ├── __init__.py
│   │   │   └── builder.py
│   │   ├── communication/
│   │   │   ├── __init__.py
│   │   │   ├── collective.py
│   │   │   ├── p2p.py
│   │   │   ├── p2p_v2.py
│   │   │   ├── ring.py
│   │   │   └── utils.py
│   │   ├── constants.py
│   │   ├── context/
│   │   │   ├── __init__.py
│   │   │   ├── parallel_context.py
│   │   │   ├── parallel_mode.py
│   │   │   ├── process_group_initializer/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── initializer_1d.py
│   │   │   │   ├── initializer_2d.py
│   │   │   │   ├── initializer_2p5d.py
│   │   │   │   ├── initializer_3d.py
│   │   │   │   ├── initializer_data.py
│   │   │   │   ├── initializer_model.py
│   │   │   │   ├── initializer_pipeline.py
│   │   │   │   ├── initializer_sequence.py
│   │   │   │   ├── initializer_tensor.py
│   │   │   │   └── process_group_initializer.py
│   │   │   └── random/
│   │   │       ├── __init__.py
│   │   │       ├── _helper.py
│   │   │       └── seed_manager.py
│   │   ├── core.py
│   │   ├── engine/
│   │   │   ├── __init__.py
│   │   │   ├── _base_engine.py
│   │   │   ├── gradient_accumulation/
│   │   │   │   ├── __init__.py
│   │   │   │   └── _gradient_accumulation.py
│   │   │   ├── gradient_handler/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── _base_gradient_handler.py
│   │   │   │   ├── _data_parallel_gradient_handler.py
│   │   │   │   ├── _moe_gradient_handler.py
│   │   │   │   ├── _pipeline_parallel_gradient_handler.py
│   │   │   │   ├── _sequence_parallel_gradient_handler.py
│   │   │   │   ├── _zero_gradient_handler.py
│   │   │   │   └── utils.py
│   │   │   └── schedule/
│   │   │       ├── __init__.py
│   │   │       ├── _base_schedule.py
│   │   │       ├── _non_pipeline_schedule.py
│   │   │       ├── _pipeline_schedule.py
│   │   │       └── _pipeline_schedule_v2.py
│   │   ├── global_variables.py
│   │   ├── inference/
│   │   │   ├── README.md
│   │   │   ├── __init__.py
│   │   │   ├── async_engine.py
│   │   │   ├── async_manager.py
│   │   │   ├── dynamic_batching/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── get_tokenizer.py
│   │   │   │   ├── infer_batch.py
│   │   │   │   ├── io_struct.py
│   │   │   │   ├── ray_dist_init.py
│   │   │   │   ├── ray_init_config.py
│   │   │   │   ├── req_queue.py
│   │   │   │   ├── sampling_params.py
│   │   │   │   └── stats.py
│   │   │   ├── hybridengine/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── engine.py
│   │   │   │   ├── modeling/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── _utils.py
│   │   │   │   │   └── llama.py
│   │   │   │   └── polices/
│   │   │   │       ├── __init__.py
│   │   │   │       └── llama.py
│   │   │   ├── manager.py
│   │   │   ├── pipeline/
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── benchmark/
│   │   │   │   │   ├── benchmark.py
│   │   │   │   │   └── run.sh
│   │   │   │   └── microbatch_manager.py
│   │   │   ├── quant/
│   │   │   │   ├── gptq/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── cai_gptq/
│   │   │   │   │       ├── __init__.py
│   │   │   │   │       ├── cai_quant_linear.py
│   │   │   │   │       └── gptq_op.py
│   │   │   │   └── smoothquant/
│   │   │   │       ├── __init__.py
│   │   │   │       └── models/
│   │   │   │           ├── __init__.py
│   │   │   │           ├── base_model.py
│   │   │   │           ├── linear.py
│   │   │   │           └── llama.py
│   │   │   ├── serving/
│   │   │   │   ├── ray_serve/
│   │   │   │   │   ├── Colossal_Inference_rayserve.py
│   │   │   │   │   ├── README.md
│   │   │   │   │   ├── send_request.py
│   │   │   │   │   └── send_requests.py
│   │   │   │   ├── test_ci.sh
│   │   │   │   └── torch_serve/
│   │   │   │       ├── Colossal_Inference_Handler.py
│   │   │   │       ├── README.md
│   │   │   │       ├── config.properties
│   │   │   │       ├── docker/
│   │   │   │       │   └── Dockerfile
│   │   │   │       ├── model-config.yaml
│   │   │   │       └── sample_text.txt
│   │   │   └── tensor_parallel/
│   │   │       ├── __init__.py
│   │   │       ├── batch_infer_state.py
│   │   │       ├── engine.py
│   │   │       ├── kvcache_manager.py
│   │   │       ├── modeling/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── _utils.py
│   │   │       │   ├── bloom.py
│   │   │       │   ├── chatglm2.py
│   │   │       │   └── llama.py
│   │   │       └── policies/
│   │   │           ├── __init__.py
│   │   │           ├── bloom.py
│   │   │           ├── chatglm2.py
│   │   │           └── llama.py
│   │   ├── initialize.py
│   │   ├── moe/
│   │   │   ├── layer/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── experts.py
│   │   │   │   ├── layers.py
│   │   │   │   └── routers.py
│   │   │   ├── load_balance.py
│   │   │   ├── manager.py
│   │   │   ├── openmoe/
│   │   │   │   ├── README.md
│   │   │   │   ├── benchmark/
│   │   │   │   │   ├── benchmark_cai.py
│   │   │   │   │   ├── benchmark_cai.sh
│   │   │   │   │   ├── benchmark_cai_dist.sh
│   │   │   │   │   ├── benchmark_fsdp.py
│   │   │   │   │   ├── benchmark_fsdp.sh
│   │   │   │   │   ├── hostfile.txt
│   │   │   │   │   └── utils.py
│   │   │   │   ├── infer.py
│   │   │   │   ├── infer.sh
│   │   │   │   ├── model/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── convert_openmoe_ckpt.py
│   │   │   │   │   ├── convert_openmoe_ckpt.sh
│   │   │   │   │   ├── modeling_openmoe.py
│   │   │   │   │   ├── openmoe_8b_config.json
│   │   │   │   │   ├── openmoe_base_config.json
│   │   │   │   │   └── openmoe_policy.py
│   │   │   │   ├── requirements.txt
│   │   │   │   ├── test_ci.sh
│   │   │   │   ├── train.py
│   │   │   │   └── train.sh
│   │   │   └── utils.py
│   │   ├── nn/
│   │   │   ├── __init__.py
│   │   │   ├── _ops/
│   │   │   │   ├── __init__.py
│   │   │   │   └── _utils.py
│   │   │   ├── layer/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base_layer.py
│   │   │   │   ├── colossalai_layer/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── _utils.py
│   │   │   │   │   ├── dropout.py
│   │   │   │   │   ├── embedding.py
│   │   │   │   │   ├── linear.py
│   │   │   │   │   └── normalization.py
│   │   │   │   ├── parallel_1d/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── _operation.py
│   │   │   │   │   ├── _utils.py
│   │   │   │   │   └── layers.py
│   │   │   │   ├── parallel_2d/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── _operation.py
│   │   │   │   │   ├── _utils.py
│   │   │   │   │   └── layers.py
│   │   │   │   ├── parallel_2p5d/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── _operation.py
│   │   │   │   │   ├── _utils.py
│   │   │   │   │   └── layers.py
│   │   │   │   ├── parallel_3d/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── _operation.py
│   │   │   │   │   ├── _utils.py
│   │   │   │   │   └── layers.py
│   │   │   │   ├── parallel_sequence/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── _operation.py
│   │   │   │   │   ├── _utils.py
│   │   │   │   │   └── layers.py
│   │   │   │   ├── utils/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── common.py
│   │   │   │   ├── vanilla/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── layers.py
│   │   │   │   └── wrapper/
│   │   │   │       ├── __init__.py
│   │   │   │       └── pipeline_wrapper.py
│   │   │   ├── loss/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── loss_1d.py
│   │   │   │   ├── loss_2d.py
│   │   │   │   ├── loss_2p5d.py
│   │   │   │   └── loss_3d.py
│   │   │   ├── metric/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── _utils.py
│   │   │   │   ├── accuracy_2d.py
│   │   │   │   ├── accuracy_2p5d.py
│   │   │   │   └── accuracy_3d.py
│   │   │   └── parallel/
│   │   │       ├── __init__.py
│   │   │       ├── data_parallel.py
│   │   │       ├── layers/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── cache_embedding/
│   │   │       │   │   ├── __init__.py
│   │   │       │   │   ├── base_embedding.py
│   │   │       │   │   ├── cache_mgr.py
│   │   │       │   │   ├── cached_embedding.py
│   │   │       │   │   ├── copyer.py
│   │   │       │   │   ├── embedding_config.py
│   │   │       │   │   ├── parallel_cached_embedding.py
│   │   │       │   │   ├── parallel_cached_embedding_tablewise.py
│   │   │       │   │   └── parallel_cached_embedding_tablewise_split_cache.py
│   │   │       │   ├── colo_module.py
│   │   │       │   ├── embedding.py
│   │   │       │   ├── linear.py
│   │   │       │   └── module_utils.py
│   │   │       └── reducer.py
│   │   ├── pipeline/
│   │   │   ├── __init__.py
│   │   │   ├── layer_spec.py
│   │   │   ├── middleware/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── adaptor/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── fx.py
│   │   │   │   └── topo.py
│   │   │   ├── pipelinable.py
│   │   │   ├── pipeline_process_group.py
│   │   │   ├── rpc/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── _pipeline_base.py
│   │   │   │   ├── _pipeline_schedule.py
│   │   │   │   └── utils.py
│   │   │   └── utils.py
│   │   ├── registry/
│   │   │   ├── __init__.py
│   │   │   └── registry.py
│   │   ├── tensor/
│   │   │   ├── __init__.py
│   │   │   ├── compute_spec.py
│   │   │   ├── const.py
│   │   │   ├── dist_spec_mgr.py
│   │   │   ├── distspec.py
│   │   │   ├── op_wrapper.py
│   │   │   ├── process_group.py
│   │   │   └── tensor_spec.py
│   │   ├── trainer/
│   │   │   ├── __init__.py
│   │   │   ├── _trainer.py
│   │   │   └── hooks/
│   │   │       ├── __init__.py
│   │   │       ├── _base_hook.py
│   │   │       ├── _checkpoint_hook.py
│   │   │       ├── _commons_.py
│   │   │       ├── _log_hook.py
│   │   │       ├── _lr_scheduler_hook.py
│   │   │       └── _metric_hook.py
│   │   ├── utils/
│   │   │   ├── __init__.py
│   │   │   ├── activation_checkpoint.py
│   │   │   ├── checkpoint/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── module_checkpoint.py
│   │   │   │   └── utils.py
│   │   │   ├── checkpointing.py
│   │   │   ├── common.py
│   │   │   ├── data_sampler/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base_sampler.py
│   │   │   │   └── data_parallel_sampler.py
│   │   │   ├── memory.py
│   │   │   └── profiler/
│   │   │       ├── __init__.py
│   │   │       ├── extention.py
│   │   │       ├── legacy/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── comm_profiler.py
│   │   │       │   ├── pcie_profiler.py
│   │   │       │   └── prof_utils.py
│   │   │       ├── profiler.py
│   │   │       └── stateful_tensor_mem_extention.py
│   │   └── zero/
│   │       ├── __init__.py
│   │       ├── gemini/
│   │       │   ├── __init__.py
│   │       │   ├── colo_init_context.py
│   │       │   ├── gemini_context.py
│   │       │   ├── ophooks/
│   │       │   │   ├── __init__.py
│   │       │   │   ├── _shard_grad_ophook.py
│   │       │   │   ├── _shard_param_ophook.py
│   │       │   │   ├── runtime_mem_tracer_hook.py
│   │       │   │   └── utils.py
│   │       │   ├── paramhooks/
│   │       │   │   ├── __init__.py
│   │       │   │   └── _param_hookmgr.py
│   │       │   ├── stateful_tensor.py
│   │       │   ├── stateful_tensor_mgr.py
│   │       │   ├── tensor_placement_policy.py
│   │       │   └── tensor_utils.py
│   │       ├── init_ctx/
│   │       │   ├── __init__.py
│   │       │   └── init_context.py
│   │       ├── shard_utils/
│   │       │   ├── __init__.py
│   │       │   ├── base_shard_strategy.py
│   │       │   ├── bucket_tensor_shard_strategy.py
│   │       │   ├── commons.py
│   │       │   └── tensor_shard_strategy.py
│   │       ├── sharded_model/
│   │       │   ├── __init__.py
│   │       │   ├── _utils.py
│   │       │   ├── reduce_scatter.py
│   │       │   ├── sharded_model_v2.py
│   │       │   ├── utils.py
│   │       │   └── zero_hook.py
│   │       ├── sharded_optim/
│   │       │   ├── __init__.py
│   │       │   └── sharded_optim_v2.py
│   │       └── sharded_param/
│   │           ├── __init__.py
│   │           ├── sharded_param.py
│   │           └── sharded_tensor.py
│   ├── logging/
│   │   ├── __init__.py
│   │   └── logger.py
│   ├── moe/
│   │   ├── __init__.py
│   │   └── _operation.py
│   ├── nn/
│   │   ├── __init__.py
│   │   ├── init.py
│   │   ├── layer/
│   │   │   ├── __init__.py
│   │   │   ├── layernorm.py
│   │   │   ├── scaled_softmax.py
│   │   │   └── utils.py
│   │   ├── loss/
│   │   │   └── __init__.py
│   │   ├── lr_scheduler/
│   │   │   ├── __init__.py
│   │   │   ├── cosine.py
│   │   │   ├── delayed.py
│   │   │   ├── linear.py
│   │   │   ├── multistep.py
│   │   │   ├── onecycle.py
│   │   │   ├── poly.py
│   │   │   └── torch.py
│   │   └── optimizer/
│   │       ├── README.md
│   │       ├── __init__.py
│   │       ├── adafactor.py
│   │       ├── came.py
│   │       ├── cpu_adam.py
│   │       ├── distributed_adafactor.py
│   │       ├── distributed_came.py
│   │       ├── distributed_galore.py
│   │       ├── distributed_lamb.py
│   │       ├── fused_adam.py
│   │       ├── fused_lamb.py
│   │       ├── fused_sgd.py
│   │       ├── galore.py
│   │       ├── hybrid_adam.py
│   │       ├── lamb.py
│   │       ├── lars.py
│   │       └── nvme_optimizer.py
│   ├── pipeline/
│   │   ├── __init__.py
│   │   ├── p2p.py
│   │   ├── schedule/
│   │   │   ├── __init__.py
│   │   │   ├── _utils.py
│   │   │   ├── base.py
│   │   │   ├── generate.py
│   │   │   ├── interleaved_pp.py
│   │   │   ├── one_f_one_b.py
│   │   │   ├── v_schedule.py
│   │   │   └── zero_bubble_pp.py
│   │   ├── stage_manager.py
│   │   └── weight_grad_store.py
│   ├── quantization/
│   │   ├── __init__.py
│   │   ├── bnb.py
│   │   ├── bnb_config.py
│   │   ├── fp8.py
│   │   ├── fp8_config.py
│   │   ├── fp8_hook.py
│   │   └── utils.py
│   ├── shardformer/
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── _utils.py
│   │   ├── examples/
│   │   │   ├── convergence_benchmark.py
│   │   │   ├── convergence_benchmark.sh
│   │   │   ├── data.py
│   │   │   └── performance_benchmark.py
│   │   ├── layer/
│   │   │   ├── __init__.py
│   │   │   ├── _operation.py
│   │   │   ├── attn.py
│   │   │   ├── dropout.py
│   │   │   ├── embedding.py
│   │   │   ├── linear.py
│   │   │   ├── loss.py
│   │   │   ├── normalization.py
│   │   │   ├── parallel_module.py
│   │   │   ├── qkv_fused_linear.py
│   │   │   └── utils.py
│   │   ├── modeling/
│   │   │   ├── __init__.py
│   │   │   ├── bert.py
│   │   │   ├── blip2.py
│   │   │   ├── bloom.py
│   │   │   ├── chatglm2.py
│   │   │   ├── chatglm2_6b/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── configuration_chatglm.py
│   │   │   │   └── modeling_chatglm.py
│   │   │   ├── command.py
│   │   │   ├── deepseek.py
│   │   │   ├── deepseek_v3.py
│   │   │   ├── falcon.py
│   │   │   ├── gpt2.py
│   │   │   ├── gptj.py
│   │   │   ├── jit.py
│   │   │   ├── llama.py
│   │   │   ├── mistral.py
│   │   │   ├── mixtral.py
│   │   │   ├── opt.py
│   │   │   ├── qwen2.py
│   │   │   ├── qwen3.py
│   │   │   ├── sam.py
│   │   │   ├── t5.py
│   │   │   ├── vit.py
│   │   │   └── whisper.py
│   │   ├── policies/
│   │   │   ├── __init__.py
│   │   │   ├── auto_policy.py
│   │   │   ├── base_policy.py
│   │   │   ├── bert.py
│   │   │   ├── blip2.py
│   │   │   ├── bloom.py
│   │   │   ├── chatglm2.py
│   │   │   ├── command.py
│   │   │   ├── deepseek.py
│   │   │   ├── deepseek_v3.py
│   │   │   ├── falcon.py
│   │   │   ├── gpt2.py
│   │   │   ├── gptj.py
│   │   │   ├── llama.py
│   │   │   ├── mistral.py
│   │   │   ├── mixtral.py
│   │   │   ├── opt.py
│   │   │   ├── qwen2.py
│   │   │   ├── qwen3.py
│   │   │   ├── sam.py
│   │   │   ├── t5.py
│   │   │   ├── vit.py
│   │   │   └── whisper.py
│   │   └── shard/
│   │       ├── __init__.py
│   │       ├── grad_ckpt_config.py
│   │       ├── shard_config.py
│   │       ├── sharder.py
│   │       ├── shardformer.py
│   │       └── utils.py
│   ├── tensor/
│   │   ├── __init__.py
│   │   ├── colo_parameter.py
│   │   ├── colo_tensor.py
│   │   ├── comm_spec.py
│   │   ├── d_tensor/
│   │   │   ├── README.md
│   │   │   ├── __init__.py
│   │   │   ├── api.py
│   │   │   ├── comm_spec.py
│   │   │   ├── layout.py
│   │   │   ├── layout_converter.py
│   │   │   ├── misc.py
│   │   │   ├── sharding_spec.py
│   │   │   └── utils.py
│   │   ├── moe_tensor/
│   │   │   ├── __init__.py
│   │   │   ├── api.py
│   │   │   └── moe_info.py
│   │   ├── padded_tensor/
│   │   │   ├── __init__.py
│   │   │   └── api.py
│   │   ├── param_op_hook.py
│   │   ├── shape_consistency.py
│   │   ├── sharding_spec.py
│   │   └── utils.py
│   ├── testing/
│   │   ├── __init__.py
│   │   ├── comparison.py
│   │   ├── pytest_wrapper.py
│   │   ├── random.py
│   │   └── utils.py
│   ├── utils/
│   │   ├── __init__.py
│   │   ├── common.py
│   │   ├── memory.py
│   │   ├── model/
│   │   │   ├── __init__.py
│   │   │   └── utils.py
│   │   ├── multi_tensor_apply/
│   │   │   ├── __init__.py
│   │   │   └── multi_tensor_apply.py
│   │   ├── rank_recorder/
│   │   │   ├── README.md
│   │   │   ├── __init__.py
│   │   │   └── rank_recorder.py
│   │   ├── safetensors.py
│   │   ├── tensor_detector/
│   │   │   ├── __init__.py
│   │   │   ├── readme.md
│   │   │   └── tensor_detector.py
│   │   └── timer.py
│   └── zero/
│       ├── __init__.py
│       ├── gemini/
│       │   ├── __init__.py
│       │   ├── chunk/
│       │   │   ├── __init__.py
│       │   │   ├── chunk.py
│       │   │   ├── manager.py
│       │   │   ├── search_utils.py
│       │   │   └── utils.py
│       │   ├── gemini_ddp.py
│       │   ├── gemini_hook.py
│       │   ├── gemini_mgr.py
│       │   ├── gemini_optimizer.py
│       │   ├── memory_tracer/
│       │   │   ├── __init__.py
│       │   │   ├── chunk_memstats_collector.py
│       │   │   ├── memory_monitor.py
│       │   │   ├── memory_stats.py
│       │   │   ├── memstats_collector.py
│       │   │   ├── param_runtime_order.py
│       │   │   ├── runtime_mem_tracer.py
│       │   │   ├── static_memstats_collector.py
│       │   │   └── utils.py
│       │   ├── placement_policy.py
│       │   └── utils.py
│       ├── low_level/
│       │   ├── __init__.py
│       │   ├── _utils.py
│       │   ├── bookkeeping/
│       │   │   ├── __init__.py
│       │   │   ├── base_store.py
│       │   │   ├── bucket_store.py
│       │   │   ├── gradient_store.py
│       │   │   └── tensor_bucket.py
│       │   ├── low_level_optim.py
│       │   ├── readme.md
│       │   └── zero_hook.py
│       └── wrapper.py
├── docker/
│   └── Dockerfile
├── docs/
│   ├── README-zh-Hans.md
│   ├── README.md
│   ├── REFERENCE.md
│   ├── conda-doc-test-deps.yml
│   ├── requirements-doc-test.txt
│   ├── sidebars.json
│   ├── source/
│   │   ├── en/
│   │   │   ├── Colossal-Auto/
│   │   │   │   ├── feature/
│   │   │   │   │   ├── auto_checkpoint.md
│   │   │   │   │   ├── device_mesh.md
│   │   │   │   │   ├── layout_converting_management.md
│   │   │   │   │   └── tracer.md
│   │   │   │   └── get_started/
│   │   │   │       ├── installation.md
│   │   │   │       ├── introduction.md
│   │   │   │       └── run_demo.md
│   │   │   ├── advanced_tutorials/
│   │   │   │   ├── integrate_mixture_of_experts_into_your_model.md
│   │   │   │   ├── meet_gemini.md
│   │   │   │   ├── opt_service.md
│   │   │   │   ├── train_gpt_using_hybrid_parallelism.md
│   │   │   │   └── train_vit_with_hybrid_parallelism.md
│   │   │   ├── basics/
│   │   │   │   ├── booster_api.md
│   │   │   │   ├── booster_checkpoint.md
│   │   │   │   ├── booster_plugins.md
│   │   │   │   ├── command_line_tool.md
│   │   │   │   └── launch_colossalai.md
│   │   │   ├── concepts/
│   │   │   │   ├── colossalai_overview.md
│   │   │   │   ├── distributed_training.md
│   │   │   │   └── paradigms_of_parallelism.md
│   │   │   ├── features/
│   │   │   │   ├── 1D_tensor_parallel.md
│   │   │   │   ├── 2D_tensor_parallel.md
│   │   │   │   ├── 2p5D_tensor_parallel.md
│   │   │   │   ├── 3D_tensor_parallel.md
│   │   │   │   ├── cluster_utils.md
│   │   │   │   ├── distributed_optimizers.md
│   │   │   │   ├── gradient_accumulation_with_booster.md
│   │   │   │   ├── gradient_clipping_with_booster.md
│   │   │   │   ├── lazy_init.md
│   │   │   │   ├── mixed_precision_training_with_booster.md
│   │   │   │   ├── nvme_offload.md
│   │   │   │   ├── pipeline_parallel.md
│   │   │   │   ├── sequence_parallelism.md
│   │   │   │   ├── shardformer.md
│   │   │   │   ├── zero_with_chunk.md
│   │   │   │   └── zerobubble_pipeline_parallelism.md
│   │   │   ├── get_started/
│   │   │   │   ├── bonus.md
│   │   │   │   ├── installation.md
│   │   │   │   ├── reading_roadmap.md
│   │   │   │   └── run_demo.md
│   │   │   └── sidebar_category_translation.json
│   │   └── zh-Hans/
│   │       ├── Colossal-Auto/
│   │       │   ├── feature/
│   │       │   │   ├── auto_checkpoint.md
│   │       │   │   ├── device_mesh.md
│   │       │   │   ├── layout_converting_management.md
│   │       │   │   └── tracer.md
│   │       │   └── get_started/
│   │       │       ├── installation.md
│   │       │       ├── introduction.md
│   │       │       └── run_demo.md
│   │       ├── advanced_tutorials/
│   │       │   ├── integrate_mixture_of_experts_into_your_model.md
│   │       │   ├── meet_gemini.md
│   │       │   ├── opt_service.md
│   │       │   ├── train_gpt_using_hybrid_parallelism.md
│   │       │   └── train_vit_with_hybrid_parallelism.md
│   │       ├── basics/
│   │       │   ├── booster_api.md
│   │       │   ├── booster_checkpoint.md
│   │       │   ├── booster_plugins.md
│   │       │   ├── command_line_tool.md
│   │       │   └── launch_colossalai.md
│   │       ├── concepts/
│   │       │   ├── colossalai_overview.md
│   │       │   ├── distributed_training.md
│   │       │   └── paradigms_of_parallelism.md
│   │       ├── features/
│   │       │   ├── 1D_tensor_parallel.md
│   │       │   ├── 2D_tensor_parallel.md
│   │       │   ├── 2p5D_tensor_parallel.md
│   │       │   ├── 3D_tensor_parallel.md
│   │       │   ├── cluster_utils.md
│   │       │   ├── distributed_optimizers.md
│   │       │   ├── gradient_accumulation_with_booster.md
│   │       │   ├── gradient_clipping_with_booster.md
│   │       │   ├── lazy_init.md
│   │       │   ├── mixed_precision_training_with_booster.md
│   │       │   ├── nvme_offload.md
│   │       │   ├── pipeline_parallel.md
│   │       │   ├── sequence_parallelism.md
│   │       │   ├── shardformer.md
│   │       │   ├── zero_with_chunk.md
│   │       │   └── zerobubble_pipeline_parallelism.md
│   │       ├── get_started/
│   │       │   ├── bonus.md
│   │       │   ├── installation.md
│   │       │   ├── reading_roadmap.md
│   │       │   └── run_demo.md
│   │       └── sidebar_category_translation.json
│   └── versions.json
├── examples/
│   ├── README.md
│   ├── __init__.py
│   ├── community/
│   │   ├── README.md
│   │   ├── fp8/
│   │   │   └── mnist/
│   │   │       ├── README.md
│   │   │       └── main.py
│   │   └── roberta/
│   │       ├── README.md
│   │       ├── preprocessing/
│   │       │   ├── Makefile
│   │       │   ├── README.md
│   │       │   ├── get_mask.py
│   │       │   ├── mask.cpp
│   │       │   ├── sentence_split.py
│   │       │   └── tokenize_mask.py
│   │       ├── pretraining/
│   │       │   ├── README.md
│   │       │   ├── arguments.py
│   │       │   ├── bert_dataset_provider.py
│   │       │   ├── evaluation.py
│   │       │   ├── hostfile
│   │       │   ├── loss.py
│   │       │   ├── model/
│   │       │   │   ├── bert.py
│   │       │   │   └── deberta_v2.py
│   │       │   ├── nvidia_bert_dataset_provider.py
│   │       │   ├── pretrain_utils.py
│   │       │   ├── run_pretrain.sh
│   │       │   ├── run_pretrain_resume.sh
│   │       │   ├── run_pretraining.py
│   │       │   └── utils/
│   │       │       ├── WandbLog.py
│   │       │       ├── exp_util.py
│   │       │       ├── global_vars.py
│   │       │       └── logger.py
│   │       ├── requirements.txt
│   │       └── test_ci.sh
│   ├── images/
│   │   ├── diffusion/
│   │   │   ├── LICENSE
│   │   │   ├── README.md
│   │   │   ├── configs/
│   │   │   │   ├── Inference/
│   │   │   │   │   ├── v2-inference-v.yaml
│   │   │   │   │   ├── v2-inference.yaml
│   │   │   │   │   ├── v2-inpainting-inference.yaml
│   │   │   │   │   ├── v2-midas-inference.yaml
│   │   │   │   │   └── x4-upscaling.yaml
│   │   │   │   ├── Teyvat/
│   │   │   │   │   ├── README.md
│   │   │   │   │   └── train_colossalai_teyvat.yaml
│   │   │   │   ├── train_colossalai.yaml
│   │   │   │   ├── train_colossalai_cifar10.yaml
│   │   │   │   └── train_ddp.yaml
│   │   │   ├── docker/
│   │   │   │   └── Dockerfile
│   │   │   ├── environment.yaml
│   │   │   ├── ldm/
│   │   │   │   ├── data/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   ├── cifar10.py
│   │   │   │   │   ├── imagenet.py
│   │   │   │   │   ├── lsun.py
│   │   │   │   │   └── teyvat.py
│   │   │   │   ├── lr_scheduler.py
│   │   │   │   ├── models/
│   │   │   │   │   ├── autoencoder.py
│   │   │   │   │   └── diffusion/
│   │   │   │   │       ├── __init__.py
│   │   │   │   │       ├── classifier.py
│   │   │   │   │       ├── ddim.py
│   │   │   │   │       ├── ddpm.py
│   │   │   │   │       ├── dpm_solver/
│   │   │   │   │       │   ├── __init__.py
│   │   │   │   │       │   ├── dpm_solver.py
│   │   │   │   │       │   └── sampler.py
│   │   │   │   │       ├── plms.py
│   │   │   │   │       └── sampling_util.py
│   │   │   │   ├── modules/
│   │   │   │   │   ├── attention.py
│   │   │   │   │   ├── diffusionmodules/
│   │   │   │   │   │   ├── __init__.py
│   │   │   │   │   │   ├── model.py
│   │   │   │   │   │   ├── openaimodel.py
│   │   │   │   │   │   ├── upscaling.py
│   │   │   │   │   │   └── util.py
│   │   │   │   │   ├── distributions/
│   │   │   │   │   │   ├── __init__.py
│   │   │   │   │   │   └── distributions.py
│   │   │   │   │   ├── ema.py
│   │   │   │   │   ├── encoders/
│   │   │   │   │   │   ├── __init__.py
│   │   │   │   │   │   └── modules.py
│   │   │   │   │   ├── image_degradation/
│   │   │   │   │   │   ├── __init__.py
│   │   │   │   │   │   ├── bsrgan.py
│   │   │   │   │   │   ├── bsrgan_light.py
│   │   │   │   │   │   └── utils_image.py
│   │   │   │   │   └── midas/
│   │   │   │   │       ├── __init__.py
│   │   │   │   │       ├── api.py
│   │   │   │   │       ├── midas/
│   │   │   │   │       │   ├── __init__.py
│   │   │   │   │       │   ├── base_model.py
│   │   │   │   │       │   ├── blocks.py
│   │   │   │   │       │   ├── dpt_depth.py
│   │   │   │   │       │   ├── midas_net.py
│   │   │   │   │       │   ├── midas_net_custom.py
│   │   │   │   │       │   ├── transforms.py
│   │   │   │   │       │   └── vit.py
│   │   │   │   │       └── utils.py
│   │   │   │   └── util.py
│   │   │   ├── main.py
│   │   │   ├── requirements.txt
│   │   │   ├── scripts/
│   │   │   │   ├── download_first_stages.sh
│   │   │   │   ├── download_models.sh
│   │   │   │   ├── img2img.py
│   │   │   │   ├── inpaint.py
│   │   │   │   ├── knn2img.py
│   │   │   │   ├── sample_diffusion.py
│   │   │   │   ├── tests/
│   │   │   │   │   ├── test_checkpoint.py
│   │   │   │   │   └── test_watermark.py
│   │   │   │   ├── train_searcher.py
│   │   │   │   ├── txt2img.py
│   │   │   │   ├── txt2img.sh
│   │   │   │   └── utils.py
│   │   │   ├── setup.py
│   │   │   ├── test_ci.sh
│   │   │   ├── train_colossalai.sh
│   │   │   └── train_ddp.sh
│   │   ├── dreambooth/
│   │   │   ├── README.md
│   │   │   ├── colossalai.sh
│   │   │   ├── debug.py
│   │   │   ├── dreambooth.sh
│   │   │   ├── inference.py
│   │   │   ├── requirements.txt
│   │   │   ├── test_ci.sh
│   │   │   ├── train_dreambooth.py
│   │   │   ├── train_dreambooth_colossalai.py
│   │   │   ├── train_dreambooth_colossalai_lora.py
│   │   │   └── train_dreambooth_inpaint.py
│   │   ├── resnet/
│   │   │   ├── .gitignore
│   │   │   ├── README.md
│   │   │   ├── eval.py
│   │   │   ├── requirements.txt
│   │   │   ├── test_ci.sh
│   │   │   └── train.py
│   │   └── vit/
│   │       ├── README.md
│   │       ├── args.py
│   │       ├── data.py
│   │       ├── requirements.txt
│   │       ├── run_benchmark.sh
│   │       ├── run_demo.sh
│   │       ├── test_ci.sh
│   │       ├── vit_benchmark.py
│   │       └── vit_train_demo.py
│   ├── inference/
│   │   ├── benchmark_ops/
│   │   │   ├── benchmark_context_attn_unpad.py
│   │   │   ├── benchmark_decoding_attn.py
│   │   │   ├── benchmark_flash_decoding_attention.py
│   │   │   ├── benchmark_fused_rotary_embdding_unpad.py
│   │   │   ├── benchmark_kv_cache_memcopy.py
│   │   │   ├── benchmark_rmsnorm.py
│   │   │   ├── benchmark_rotary_embedding.py
│   │   │   ├── benchmark_xine_copy.py
│   │   │   └── test_ci.sh
│   │   ├── client/
│   │   │   ├── locustfile.py
│   │   │   ├── run_locust.sh
│   │   │   └── test_ci.sh
│   │   ├── llama/
│   │   │   ├── README.md
│   │   │   ├── benchmark_llama.py
│   │   │   ├── benchmark_llama3.py
│   │   │   ├── llama_generation.py
│   │   │   ├── run_benchmark.sh
│   │   │   └── test_ci.sh
│   │   └── stable_diffusion/
│   │       ├── README.md
│   │       ├── benchmark_sd3.py
│   │       ├── compute_metric.py
│   │       ├── requirements.txt
│   │       ├── run_benchmark.sh
│   │       ├── sd3_generation.py
│   │       └── test_ci.sh
│   ├── language/
│   │   ├── __init__.py
│   │   ├── bert/
│   │   │   ├── README.md
│   │   │   ├── benchmark.py
│   │   │   ├── benchmark.sh
│   │   │   ├── benchmark_utils.py
│   │   │   ├── data.py
│   │   │   ├── finetune.py
│   │   │   ├── requirements.txt
│   │   │   └── test_ci.sh
│   │   ├── commons/
│   │   │   └── utils.py
│   │   ├── data_utils.py
│   │   ├── deepseek/
│   │   │   ├── benchmark.py
│   │   │   └── test_ci.sh
│   │   ├── gpt/
│   │   │   ├── README.md
│   │   │   ├── experiments/
│   │   │   │   ├── auto_offload/
│   │   │   │   │   ├── README.md
│   │   │   │   │   ├── model_zoo.py
│   │   │   │   │   ├── requirements.txt
│   │   │   │   │   ├── run.sh
│   │   │   │   │   └── train_gpt_offload.py
│   │   │   │   ├── auto_parallel/
│   │   │   │   │   ├── README.md
│   │   │   │   │   ├── auto_parallel_with_gpt.py
│   │   │   │   │   ├── gpt_modules.py
│   │   │   │   │   └── requirements.txt
│   │   │   │   └── pipeline_parallel/
│   │   │   │       ├── README.md
│   │   │   │       ├── model_zoo.py
│   │   │   │       ├── requirements.txt
│   │   │   │       ├── run.sh
│   │   │   │       └── train_gpt_pp.py
│   │   │   ├── gemini/
│   │   │   │   ├── benchmark_gemini.sh
│   │   │   │   ├── commons/
│   │   │   │   │   ├── model_zoo.py
│   │   │   │   │   └── utils.py
│   │   │   │   ├── requirements.txt
│   │   │   │   ├── run_gemini.sh
│   │   │   │   ├── test_ci.sh
│   │   │   │   └── train_gpt_demo.py
│   │   │   ├── hybridparallelism/
│   │   │   │   ├── benchmark.py
│   │   │   │   ├── data.py
│   │   │   │   ├── finetune.py
│   │   │   │   └── run.sh
│   │   │   ├── requirements.txt
│   │   │   ├── test_ci.sh
│   │   │   └── titans/
│   │   │       ├── LICENSE
│   │   │       ├── README.md
│   │   │       ├── configs/
│   │   │       │   ├── gpt2_small_zero3_pp1d.py
│   │   │       │   └── gpt3_zero3_pp1d.py
│   │   │       ├── dataset/
│   │   │       │   └── webtext.py
│   │   │       ├── model/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── embed.py
│   │   │       │   ├── gpt1d.py
│   │   │       │   └── pipeline_gpt1d.py
│   │   │       ├── requirements.txt
│   │   │       ├── run.sh
│   │   │       ├── test_ci.sh
│   │   │       └── train_gpt.py
│   │   ├── grok-1/
│   │   │   ├── README.md
│   │   │   ├── grok1_policy.py
│   │   │   ├── inference.py
│   │   │   ├── inference_tp.py
│   │   │   ├── requirements.txt
│   │   │   ├── run_inference_fast.sh
│   │   │   ├── run_inference_slow.sh
│   │   │   ├── test_ci.sh
│   │   │   └── utils.py
│   │   ├── llama/
│   │   │   ├── README.md
│   │   │   ├── benchmark.py
│   │   │   ├── requirements.txt
│   │   │   ├── scripts/
│   │   │   │   ├── benchmark_70B/
│   │   │   │   │   ├── 3d.sh
│   │   │   │   │   ├── gemini.sh
│   │   │   │   │   └── gemini_auto.sh
│   │   │   │   └── benchmark_7B/
│   │   │   │       ├── gemini.sh
│   │   │   │       └── gemini_auto.sh
│   │   │   └── test_ci.sh
│   │   ├── mixtral/
│   │   │   ├── benchmark.py
│   │   │   └── test_ci.sh
│   │   ├── model_utils.py
│   │   ├── opt/
│   │   │   ├── README.md
│   │   │   ├── args.py
│   │   │   ├── data.py
│   │   │   ├── opt_benchmark.py
│   │   │   ├── opt_train_demo.py
│   │   │   ├── requirements.txt
│   │   │   ├── run_benchmark.sh
│   │   │   ├── run_demo.sh
│   │   │   └── test_ci.sh
│   │   ├── palm/
│   │   │   ├── README.md
│   │   │   ├── data/
│   │   │   │   └── README.md
│   │   │   ├── palm_pytorch/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── autoregressive_wrapper.py
│   │   │   │   └── palm_pytorch.py
│   │   │   ├── requirements.txt
│   │   │   ├── run.sh
│   │   │   ├── test_ci.sh
│   │   │   └── train.py
│   │   └── performance_evaluator.py
│   └── tutorial/
│       ├── .gitignore
│       ├── README.md
│       ├── auto_parallel/
│       │   ├── README.md
│       │   ├── auto_ckpt_batchsize_test.py
│       │   ├── auto_ckpt_solver_test.py
│       │   ├── auto_parallel_with_resnet.py
│       │   ├── bench_utils.py
│       │   ├── config.py
│       │   ├── requirements.txt
│       │   ├── setup.py
│       │   └── test_ci.sh
│       ├── download_cifar10.py
│       ├── fastfold/
│       │   └── README.md
│       ├── hybrid_parallel/
│       │   ├── README.md
│       │   ├── config.py
│       │   ├── requirements.txt
│       │   ├── test_ci.sh
│       │   └── train.py
│       ├── large_batch_optimizer/
│       │   ├── README.md
│       │   ├── config.py
│       │   ├── requirements.txt
│       │   ├── test_ci.sh
│       │   └── train.py
│       ├── new_api/
│       │   ├── README.md
│       │   ├── cifar_resnet/
│       │   │   ├── .gitignore
│       │   │   ├── README.md
│       │   │   ├── eval.py
│       │   │   ├── requirements.txt
│       │   │   ├── test_ci.sh
│       │   │   └── train.py
│       │   ├── cifar_vit/
│       │   │   ├── README.md
│       │   │   ├── requirements.txt
│       │   │   ├── test_ci.sh
│       │   │   └── train.py
│       │   ├── glue_bert/
│       │   │   ├── README.md
│       │   │   ├── data.py
│       │   │   ├── finetune.py
│       │   │   ├── requirements.txt
│       │   │   └── test_ci.sh
│       │   └── test_ci.sh
│       ├── opt/
│       │   ├── inference/
│       │   │   ├── README.md
│       │   │   ├── batch.py
│       │   │   ├── benchmark/
│       │   │   │   └── locustfile.py
│       │   │   ├── cache.py
│       │   │   ├── opt_fastapi.py
│       │   │   ├── opt_server.py
│       │   │   ├── requirements.txt
│       │   │   └── script/
│       │   │       ├── process-opt-175b/
│       │   │       │   ├── README.md
│       │   │       │   ├── convert_ckpt.py
│       │   │       │   ├── flat-meta.json
│       │   │       │   └── unflat.sh
│       │   │       └── processing_ckpt_66b.py
│       │   ├── opt/
│       │   │   ├── README.md
│       │   │   ├── benchmark.sh
│       │   │   ├── colossalai_zero.py
│       │   │   ├── context.py
│       │   │   ├── requirements.txt
│       │   │   ├── run_clm.py
│       │   │   ├── run_clm.sh
│       │   │   ├── run_clm_synthetic.sh
│       │   │   └── test_ci.sh
│       │   └── test_ci.sh
│       ├── requirements.txt
│       └── sequence_parallel/
│           ├── README.md
│           ├── config.py
│           ├── data/
│           │   ├── __init__.py
│           │   ├── bert_helper.py
│           │   ├── datasets/
│           │   │   ├── Makefile
│           │   │   ├── __init__.py
│           │   │   ├── bert_dataset.py
│           │   │   ├── blendable_dataset.py
│           │   │   ├── builder.py
│           │   │   ├── data_samplers.py
│           │   │   ├── dataset_utils.py
│           │   │   ├── helpers.cpp
│           │   │   ├── ict_dataset.py
│           │   │   ├── indexed_dataset.py
│           │   │   └── test/
│           │   │       ├── test_indexed_dataset.py
│           │   │       └── test_preprocess_data.sh
│           │   ├── dummy_dataloader.py
│           │   └── tokenizer/
│           │       ├── __init__.py
│           │       ├── bert_tokenization.py
│           │       └── tokenizer.py
│           ├── loss_func/
│           │   ├── __init__.py
│           │   ├── bert_loss.py
│           │   ├── cross_entropy.py
│           │   └── utils.py
│           ├── lr_scheduler/
│           │   ├── __init__.py
│           │   └── annealing_lr.py
│           ├── model/
│           │   ├── __init__.py
│           │   ├── bert.py
│           │   └── layers/
│           │       ├── __init__.py
│           │       ├── bert_layer.py
│           │       ├── dropout.py
│           │       ├── embedding.py
│           │       ├── head.py
│           │       ├── init_method.py
│           │       ├── linear.py
│           │       ├── mlp.py
│           │       ├── pooler.py
│           │       └── preprocess.py
│           ├── requirements.txt
│           ├── test_ci.sh
│           └── train.py
├── extensions/
│   ├── README.md
│   ├── __init__.py
│   ├── base_extension.py
│   ├── cpp_extension.py
│   ├── csrc/
│   │   ├── __init__.py
│   │   ├── common/
│   │   │   ├── data_type.h
│   │   │   ├── micros.h
│   │   │   ├── mp_type_traits.h
│   │   │   ├── target.h
│   │   │   └── vec_type_traits.h
│   │   ├── funcs/
│   │   │   ├── binary_functor.h
│   │   │   ├── cast_functor.h
│   │   │   ├── reduce_function.h
│   │   │   ├── ternary_functor.h
│   │   │   └── unary_functor.h
│   │   └── kernel/
│   │       ├── arm/
│   │       │   ├── cpu_adam_arm.cpp
│   │       │   └── cpu_adam_arm.h
│   │       ├── cuda/
│   │       │   ├── activation_kernel.cu
│   │       │   ├── attention/
│   │       │   │   └── attention_utils.h
│   │       │   ├── context_kv_cache_memcpy_kernel.cu
│   │       │   ├── convert_fp8_kernel.cu
│   │       │   ├── decode_kv_cache_memcpy_kernel.cu
│   │       │   ├── flash_decoding_attention_kernel.cu
│   │       │   ├── fused_rotary_emb_and_cache_kernel.cu
│   │       │   ├── get_cos_and_sin_kernel.cu
│   │       │   ├── layer_norm_kernel.cu
│   │       │   ├── moe_kernel.cu
│   │       │   ├── multi_tensor_adam_kernel.cu
│   │       │   ├── multi_tensor_apply.cuh
│   │       │   ├── multi_tensor_l2norm_kernel.cu
│   │       │   ├── multi_tensor_lamb_kernel.cu
│   │       │   ├── multi_tensor_scale_kernel.cu
│   │       │   ├── multi_tensor_sgd_kernel.cu
│   │       │   ├── rms_layernorm_kernel.cu
│   │       │   ├── scaled_masked_softmax_kernel.cu
│   │       │   ├── scaled_upper_triang_masked_softmax_kernel.cu
│   │       │   └── utils/
│   │       │       ├── gpu_launch_config.h
│   │       │       ├── micros.h
│   │       │       ├── nvgpu_dev_info.h
│   │       │       └── vec_copy.h
│   │       └── x86/
│   │           ├── cpu_adam.cpp
│   │           └── cpu_adam.h
│   ├── cuda_extension.py
│   ├── pybind/
│   │   ├── __init__.py
│   │   ├── cpu_adam/
│   │   │   ├── __init__.py
│   │   │   ├── cpu_adam_arm.py
│   │   │   └── cpu_adam_x86.py
│   │   ├── flash_attention/
│   │   │   ├── __init__.py
│   │   │   ├── flash_attention_dao_cuda.py
│   │   │   ├── flash_attention_npu.py
│   │   │   └── flash_attention_sdpa_cuda.py
│   │   ├── inference/
│   │   │   ├── __init__.py
│   │   │   ├── inference.cpp
│   │   │   └── inference_ops_cuda.py
│   │   ├── layernorm/
│   │   │   ├── __init__.py
│   │   │   ├── layer_norm.cpp
│   │   │   └── layernorm_cuda.py
│   │   ├── moe/
│   │   │   ├── __init__.py
│   │   │   ├── moe.cpp
│   │   │   └── moe_cuda.py
│   │   ├── optimizer/
│   │   │   ├── __init__.py
│   │   │   ├── fused_optimizer_cuda.py
│   │   │   └── optimizer.cpp
│   │   └── softmax/
│   │       ├── __init__.py
│   │       ├── scaled_masked_softmax.cpp
│   │       ├── scaled_masked_softmax_cuda.py
│   │       ├── scaled_upper_triang_masked_softmax.cpp
│   │       └── scaled_upper_triangle_masked_softmax_cuda.py
│   ├── triton_extension.py
│   └── utils.py
├── pytest.ini
├── requirements/
│   ├── requirements-test.txt
│   └── requirements.txt
├── setup.py
├── tests/
│   ├── __init__.py
│   ├── conftest.py
│   ├── kit/
│   │   ├── __init__.py
│   │   └── model_zoo/
│   │       ├── __init__.py
│   │       ├── custom/
│   │       │   ├── __init__.py
│   │       │   ├── base.py
│   │       │   ├── hanging_param_model.py
│   │       │   ├── nested_model.py
│   │       │   ├── repeated_computed_layers.py
│   │       │   ├── simple_mlp.py
│   │       │   └── simple_net.py
│   │       ├── diffusers/
│   │       │   ├── __init__.py
│   │       │   └── diffusers.py
│   │       ├── executor.py
│   │       ├── registry.py
│   │       ├── timm/
│   │       │   ├── __init__.py
│   │       │   └── timm.py
│   │       ├── torchaudio/
│   │       │   ├── __init__.py
│   │       │   └── torchaudio.py
│   │       ├── torchrec/
│   │       │   ├── __init__.py
│   │       │   └── torchrec.py
│   │       ├── torchvision/
│   │       │   ├── __init__.py
│   │       │   └── torchvision.py
│   │       └── transformers/
│   │           ├── __init__.py
│   │           ├── albert.py
│   │           ├── bert.py
│   │           ├── blip2.py
│   │           ├── bloom.py
│   │           ├── chatglm2.py
│   │           ├── command.py
│   │           ├── deepseek.py
│   │           ├── deepseek_v3.py
│   │           ├── falcon.py
│   │           ├── gpt.py
│   │           ├── gptj.py
│   │           ├── llama.py
│   │           ├── mistral.py
│   │           ├── mixtral.py
│   │           ├── opt.py
│   │           ├── qwen2.py
│   │           ├── qwen3.py
│   │           ├── sam.py
│   │           ├── t5.py
│   │           ├── vit.py
│   │           └── whisper.py
│   ├── test_analyzer/
│   │   ├── __init__.py
│   │   ├── test_fx/
│   │   │   ├── __init__.py
│   │   │   ├── test_bias_addition.py
│   │   │   ├── test_mod_dir.py
│   │   │   ├── test_nested_ckpt.py
│   │   │   ├── test_shape_prop.py
│   │   │   ├── test_symbolic_profile.py
│   │   │   └── zoo.py
│   │   └── test_subclasses/
│   │       ├── __init__.py
│   │       ├── test_aten.py
│   │       ├── test_flop_tensor.py
│   │       └── test_meta_mode.py
│   ├── test_auto_parallel/
│   │   ├── __init__.py
│   │   ├── test_ckpt_solvers/
│   │   │   ├── test_C_solver_consistency.py
│   │   │   ├── test_ckpt_torchvision.py
│   │   │   └── test_linearize.py
│   │   ├── test_offload/
│   │   │   ├── model_utils.py
│   │   │   ├── test_perf.py
│   │   │   └── test_solver.py
│   │   ├── test_pass/
│   │   │   ├── __init__.py
│   │   │   ├── test_node_converting_pass.py
│   │   │   └── test_size_value_converting_pass.py
│   │   └── test_tensor_shard/
│   │       ├── __init__.py
│   │       ├── test_bias_addition_forward.py
│   │       ├── test_broadcast.py
│   │       ├── test_checkpoint.py
│   │       ├── test_compatibility_with_ddp.py
│   │       ├── test_compatibility_with_gemini.py
│   │       ├── test_find_repeat_block.py
│   │       ├── test_gpt/
│   │       │   ├── __init__.py
│   │       │   ├── gpt_modules.py
│   │       │   ├── test_runtime_with_gpt_modules.py
│   │       │   └── test_solver_with_gpt_module.py
│   │       ├── test_liveness_analysis.py
│   │       ├── test_metainfo/
│   │       │   ├── test_activation_metainfo.py
│   │       │   ├── test_binary_elementwise_metainfo.py
│   │       │   ├── test_conv_metainfo.py
│   │       │   ├── test_embedding_metainfo.py
│   │       │   ├── test_linear_metainfo.py
│   │       │   ├── test_matmul_metainfo.py
│   │       │   ├── test_norm_metainfo.py
│   │       │   ├── test_pooling_metainfo.py
│   │       │   ├── test_tensor_metainfo.py
│   │       │   ├── test_where_metainfo.py
│   │       │   └── utils.py
│   │       ├── test_node_handler/
│   │       │   ├── __init__.py
│   │       │   ├── test_addbmm_handler.py
│   │       │   ├── test_addmm_handler.py
│   │       │   ├── test_batch_norm_handler.py
│   │       │   ├── test_bias_linear_function_node.py
│   │       │   ├── test_bias_linear_module_node.py
│   │       │   ├── test_binary_elementwise_handler.py
│   │       │   ├── test_bmm_handler.py
│   │       │   ├── test_conv_handler.py
│   │       │   ├── test_default_reshape_handler.py
│   │       │   ├── test_embedding_handler.py
│   │       │   ├── test_getattr_handler.py
│   │       │   ├── test_getitem_handler.py
│   │       │   ├── test_layer_norm_handler.py
│   │       │   ├── test_linear_handler.py
│   │       │   ├── test_matmul_handler.py
│   │       │   ├── test_norm_pooling_handler.py
│   │       │   ├── test_output_handler.py
│   │       │   ├── test_permute_and_transpose_handler.py
│   │       │   ├── test_placeholder_handler.py
│   │       │   ├── test_shard_option.py
│   │       │   ├── test_softmax_handler.py
│   │       │   ├── test_split_handler.py
│   │       │   ├── test_sum_handler.py
│   │       │   ├── test_tensor_constructor.py
│   │       │   ├── test_unary_element_wise_handler.py
│   │       │   ├── test_view_handler.py
│   │       │   ├── test_where_handler.py
│   │       │   └── utils.py
│   │       └── test_solver_with_resnet_v2.py
│   ├── test_autochunk/
│   │   ├── test_autochunk_alphafold/
│   │   │   ├── benchmark_autochunk_alphafold.py
│   │   │   ├── test_autochunk_alphafold_utils.py
│   │   │   ├── test_autochunk_evoformer_block.py
│   │   │   ├── test_autochunk_evoformer_stack.py
│   │   │   └── test_autochunk_extramsa_block.py
│   │   ├── test_autochunk_diffuser/
│   │   │   ├── benchmark_autochunk_diffuser.py
│   │   │   ├── test_autochunk_diffuser_utils.py
│   │   │   └── test_autochunk_unet.py
│   │   ├── test_autochunk_transformer/
│   │   │   ├── benchmark_autochunk_transformer.py
│   │   │   ├── test_autochunk_gpt.py
│   │   │   └── test_autochunk_transformer_utils.py
│   │   └── test_autochunk_vit/
│   │       ├── test_autochunk_vit.py
│   │       └── test_autochunk_vit_utils.py
│   ├── test_booster/
│   │   ├── test_accelerator.py
│   │   ├── test_mixed_precision/
│   │   │   └── test_fp16_torch.py
│   │   └── test_plugin/
│   │       ├── test_3d_plugin.py
│   │       ├── test_dp_plugin_base.py
│   │       ├── test_gemini_plugin.py
│   │       ├── test_low_level_zero_plugin.py
│   │       ├── test_torch_ddp_plugin.py
│   │       └── test_torch_fsdp_plugin.py
│   ├── test_checkpoint_io/
│   │   ├── test_gemini_checkpoint_io.py
│   │   ├── test_gemini_torch_compability.py
│   │   ├── test_general_checkpoint_io.py
│   │   ├── test_hybrid_parallel_plugin_checkpoint_io.py
│   │   ├── test_low_level_zero_checkpoint_io.py
│   │   ├── test_plugins_huggingface_compatibility.py
│   │   ├── test_safetensors_async_io.py
│   │   ├── test_torch_ddp_checkpoint_io.py
│   │   ├── test_torch_fsdp_checkpoint_io.py
│   │   └── utils.py
│   ├── test_cluster/
│   │   ├── test_device_mesh_manager.py
│   │   └── test_process_group_mesh.py
│   ├── test_config/
│   │   ├── sample_config.py
│   │   └── test_load_config.py
│   ├── test_device/
│   │   ├── test_alpha_beta.py
│   │   ├── test_device_mesh.py
│   │   ├── test_extract_alpha_beta.py
│   │   ├── test_init_logical_pg.py
│   │   └── test_search_logical_device_mesh.py
│   ├── test_fp8/
│   │   ├── test_all_to_all_single.py
│   │   ├── test_fp8_all_to_all.py
│   │   ├── test_fp8_all_to_all_single.py
│   │   ├── test_fp8_allgather.py
│   │   ├── test_fp8_allreduce.py
│   │   ├── test_fp8_cast.py
│   │   ├── test_fp8_ddp_comm_hook.py
│   │   ├── test_fp8_fsdp_comm_hook.py
│   │   ├── test_fp8_hook.py
│   │   ├── test_fp8_linear.py
│   │   └── test_fp8_reduce_scatter.py
│   ├── test_fx/
│   │   ├── test_codegen/
│   │   │   ├── test_activation_checkpoint_codegen.py
│   │   │   ├── test_nested_activation_checkpoint_codegen.py
│   │   │   └── test_offload_codegen.py
│   │   ├── test_coloproxy.py
│   │   ├── test_comm_size_compute.py
│   │   ├── test_graph_manipulation.py
│   │   ├── test_meta/
│   │   │   ├── test_aten.py
│   │   │   ├── test_backward.py
│   │   │   └── test_meta_trace.py
│   │   ├── test_meta_info_prop.py
│   │   ├── test_parallel_1d.py
│   │   ├── test_pipeline/
│   │   │   ├── test_hf_model/
│   │   │   │   ├── hf_utils.py
│   │   │   │   ├── test_albert.py
│   │   │   │   ├── test_bert.py
│   │   │   │   ├── test_gpt.py
│   │   │   │   ├── test_opt.py
│   │   │   │   └── test_t5.py
│   │   │   ├── test_timm_model/
│   │   │   │   ├── test_timm.py
│   │   │   │   └── timm_utils.py
│   │   │   ├── test_topo/
│   │   │   │   ├── test_topo.py
│   │   │   │   └── topo_utils.py
│   │   │   └── test_torchvision/
│   │   │       └── test_torchvision.py
│   │   ├── test_pipeline_passes.py
│   │   ├── test_profiler/
│   │   │   ├── gpt_utils.py
│   │   │   └── test_profiler_meta_info_prop.py
│   │   └── test_tracer/
│   │       ├── test_activation_checkpoint_annotation.py
│   │       ├── test_bias_addition_module.py
│   │       ├── test_control_flow.py
│   │       ├── test_functional_conv.py
│   │       ├── test_hf_model/
│   │       │   ├── hf_tracer_utils.py
│   │       │   ├── test_hf_albert.py
│   │       │   ├── test_hf_bert.py
│   │       │   ├── test_hf_diffuser.py
│   │       │   ├── test_hf_gpt.py
│   │       │   ├── test_hf_opt.py
│   │       │   └── test_hf_t5.py
│   │       ├── test_patched_module.py
│   │       ├── test_patched_op.py
│   │       ├── test_timm_model/
│   │       │   └── test_timm_model.py
│   │       ├── test_torchaudio_model/
│   │       │   ├── test_torchaudio_model.py
│   │       │   └── torchaudio_utils.py
│   │       ├── test_torchrec_model/
│   │       │   ├── test_deepfm_model.py
│   │       │   └── test_dlrm_model.py
│   │       └── test_torchvision_model/
│   │           └── test_torchvision_model.py
│   ├── test_infer/
│   │   ├── __init__.py
│   │   ├── _utils.py
│   │   ├── test_async_engine/
│   │   │   ├── test_async_engine.py
│   │   │   └── test_request_tracer.py
│   │   ├── test_batch_bucket.py
│   │   ├── test_config_and_struct.py
│   │   ├── test_continuous_batching.py
│   │   ├── test_cuda_graph.py
│   │   ├── test_drafter.py
│   │   ├── test_inference_engine.py
│   │   ├── test_kernels/
│   │   │   ├── __init__.py
│   │   │   ├── cuda/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── test_convert_fp8.py
│   │   │   │   ├── test_flash_decoding_attention.py
│   │   │   │   ├── test_get_cos_and_sin.py
│   │   │   │   ├── test_kv_cache_memcpy.py
│   │   │   │   ├── test_rms_layernorm.py
│   │   │   │   ├── test_rotary_embdding_unpad.py
│   │   │   │   └── test_silu_and_mul.py
│   │   │   └── triton/
│   │   │       ├── __init__.py
│   │   │       ├── kernel_utils.py
│   │   │       ├── test_context_attn_unpad.py
│   │   │       ├── test_decoding_attn.py
│   │   │       ├── test_fused_rotary_embedding.py
│   │   │       ├── test_kvcache_copy.py
│   │   │       ├── test_rmsnorm_triton.py
│   │   │       ├── test_rotary_embdding_unpad.py
│   │   │       └── test_xine_copy.py
│   │   ├── test_kvcache_manager.py
│   │   ├── test_models/
│   │   │   ├── test_attention.py
│   │   │   ├── test_baichuan.py
│   │   │   └── test_custom_model.py
│   │   ├── test_request_handler.py
│   │   ├── test_rpc_engine.py
│   │   └── test_streamingllm.py
│   ├── test_lazy/
│   │   ├── lazy_init_utils.py
│   │   ├── test_from_pretrained.py
│   │   ├── test_models.py
│   │   └── test_ops.py
│   ├── test_legacy/
│   │   ├── test_amp/
│   │   │   ├── test_naive_fp16.py
│   │   │   └── test_torch_fp16.py
│   │   ├── test_comm/
│   │   │   ├── test_boardcast_send_recv_v2.py
│   │   │   ├── test_comm.py
│   │   │   ├── test_object_list_p2p.py
│   │   │   └── test_object_list_p2p_v2.py
│   │   ├── test_context/
│   │   │   ├── configs/
│   │   │   │   ├── parallel_2d_init.py
│   │   │   │   ├── parallel_2p5d_init.py
│   │   │   │   └── parallel_3d_init.py
│   │   │   └── test_hybrid_parallel.py
│   │   ├── test_data/
│   │   │   ├── test_cifar10_dataset.py
│   │   │   ├── test_data_parallel_sampler.py
│   │   │   └── test_deterministic_dataloader.py
│   │   ├── test_engine/
│   │   │   ├── test_engine.py
│   │   │   └── test_gradient_accumluation.py
│   │   ├── test_layers/
│   │   │   ├── test_1d/
│   │   │   │   ├── checks_1d/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── check_layer_1d.py
│   │   │   │   │   └── common.py
│   │   │   │   └── test_1d.py
│   │   │   ├── test_2d/
│   │   │   │   ├── checks_2d/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── check_layer_2d.py
│   │   │   │   │   ├── check_operation_2d.py
│   │   │   │   │   └── common.py
│   │   │   │   └── test_2d.py
│   │   │   ├── test_2p5d/
│   │   │   │   ├── checks_2p5d/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── check_layer_2p5d.py
│   │   │   │   │   ├── check_operation_2p5d.py
│   │   │   │   │   └── common.py
│   │   │   │   └── test_2p5d.py
│   │   │   ├── test_3d/
│   │   │   │   ├── checks_3d/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── check_layer_3d.py
│   │   │   │   │   └── common.py
│   │   │   │   └── test_3d.py
│   │   │   ├── test_cache_embedding.py
│   │   │   └── test_sequence/
│   │   │       ├── checks_seq/
│   │   │       │   ├── __init__.py
│   │   │       │   └── check_layer_seq.py
│   │   │       └── test_sequence.py
│   │   ├── test_moe/
│   │   │   ├── moe_utils.py
│   │   │   ├── test_grad_handler.py
│   │   │   ├── test_moe_group.py
│   │   │   ├── test_moe_hybrid_zero.py
│   │   │   └── test_moe_load_balance.py
│   │   ├── test_pipeline/
│   │   │   ├── rpc_test_utils.py
│   │   │   ├── test_cuda_rpc_chimera.py
│   │   │   ├── test_cuda_rpc_optimizer.py
│   │   │   ├── test_cuda_rpc_pipeline.py
│   │   │   ├── test_cuda_rpc_value_correctness.py
│   │   │   ├── test_middleware_1f1b.py
│   │   │   ├── test_pipelinable.py
│   │   │   └── test_pipeline_process_group.py
│   │   ├── test_tensor/
│   │   │   ├── common_utils/
│   │   │   │   ├── __init__.py
│   │   │   │   └── _utils.py
│   │   │   ├── core/
│   │   │   │   └── test_dist_spec_mgr.py
│   │   │   └── test_parameter.py
│   │   ├── test_trainer/
│   │   │   ├── test_pipeline/
│   │   │   │   ├── test_p2p.py
│   │   │   │   └── test_pipeline_schedule.py
│   │   │   ├── test_trainer_with_non_pipe_schedule.py
│   │   │   └── test_trainer_with_pipe_schedule.py
│   │   ├── test_utils/
│   │   │   ├── test_activation_checkpointing.py
│   │   │   ├── test_checkpoint/
│   │   │   │   ├── test_checkpoint_1d.py
│   │   │   │   ├── test_checkpoint_2d.py
│   │   │   │   ├── test_checkpoint_2p5d.py
│   │   │   │   └── test_checkpoint_3d.py
│   │   │   ├── test_memory.py
│   │   │   └── test_norm_gradient_clipping.py
│   │   └── test_zero/
│   │       └── test_commons.py
│   ├── test_lora/
│   │   └── test_lora.py
│   ├── test_moe/
│   │   ├── moe_utils.py
│   │   ├── test_deepseek_layer.py
│   │   ├── test_kernel.py
│   │   ├── test_mixtral_layer.py
│   │   ├── test_moe_checkpoint.py
│   │   ├── test_moe_ep_tp.py
│   │   └── test_moe_ep_zero.py
│   ├── test_optimizer/
│   │   ├── _utils.py
│   │   ├── test_adam_kernel.py
│   │   ├── test_adam_optim.py
│   │   ├── test_dist_adafactor.py
│   │   ├── test_dist_came.py
│   │   ├── test_dist_galore.py
│   │   ├── test_dist_lamb.py
│   │   ├── test_lr_scheduler.py
│   │   └── test_nvme.py
│   ├── test_pipeline/
│   │   ├── test_p2p_communication.py
│   │   ├── test_pipeline_utils/
│   │   │   ├── test_t5_pipeline_utils.py
│   │   │   └── test_whisper_pipeline_utils.py
│   │   ├── test_schedule/
│   │   │   ├── test_interleaved.py
│   │   │   ├── test_oneF_oneB.py
│   │   │   ├── test_pipeline_schedule_utils.py
│   │   │   └── test_zerobubble_pp.py
│   │   └── test_stage_manager.py
│   ├── test_shardformer/
│   │   ├── __init__.py
│   │   ├── test_flash_attention.py
│   │   ├── test_hybrid_parallel_grad_clip_norm/
│   │   │   ├── test_amp_optimizer.py
│   │   │   ├── test_naive_optimizer.py
│   │   │   └── test_zero_optimizer.py
│   │   ├── test_layer/
│   │   │   ├── test_dist_crossentropy.py
│   │   │   ├── test_dist_log_prob.py
│   │   │   ├── test_dropout.py
│   │   │   ├── test_embedding.py
│   │   │   ├── test_gpt2_qkv_fused_linear_1d.py
│   │   │   ├── test_layernorm.py
│   │   │   ├── test_linear_1d.py
│   │   │   ├── test_qkv_fused_linear_1d.py
│   │   │   ├── test_ring_attn.py
│   │   │   ├── test_sequence_parallel.py
│   │   │   └── test_vocab_parallel_embedding_1d.py
│   │   ├── test_model/
│   │   │   ├── __init__.py
│   │   │   ├── _utils.py
│   │   │   ├── test_shard_bert.py
│   │   │   ├── test_shard_blip2.py
│   │   │   ├── test_shard_bloom.py
│   │   │   ├── test_shard_chatglm2.py
│   │   │   ├── test_shard_command.py
│   │   │   ├── test_shard_deepseek.py
│   │   │   ├── test_shard_deepseek_v3.py
│   │   │   ├── test_shard_falcon.py
│   │   │   ├── test_shard_gpt2.py
│   │   │   ├── test_shard_gptj.py
│   │   │   ├── test_shard_llama.py
│   │   │   ├── test_shard_mistral.py
│   │   │   ├── test_shard_mixtral.py
│   │   │   ├── test_shard_opt.py
│   │   │   ├── test_shard_qwen2.py
│   │   │   ├── test_shard_qwen3.py
│   │   │   ├── test_shard_sam.py
│   │   │   ├── test_shard_t5.py
│   │   │   ├── test_shard_vit.py
│   │   │   └── test_shard_whisper.py
│   │   ├── test_shard_utils.py
│   │   └── test_with_torch_ddp.py
│   ├── test_smoothquant/
│   │   ├── test_llama_attention.py
│   │   ├── test_llama_mlp.py
│   │   ├── test_smoothquant_linear.py
│   │   └── test_sq_rotary_embedding.py
│   ├── test_tensor/
│   │   ├── test_comm_spec_apply.py
│   │   ├── test_dtensor/
│   │   │   ├── test_comm_spec.py
│   │   │   ├── test_dtensor.py
│   │   │   ├── test_dtensor_sharding_spec.py
│   │   │   └── test_layout_converter.py
│   │   ├── test_mix_gather.py
│   │   ├── test_padded_tensor.py
│   │   ├── test_shape_consistency.py
│   │   ├── test_shape_consistency_apply.py
│   │   └── test_sharding_spec.py
│   └── test_zero/
│       ├── test_gemini/
│       │   ├── test_chunk_mgrv2.py
│       │   ├── test_chunkv2.py
│       │   ├── test_gemini_use_rmt.py
│       │   ├── test_grad_accum.py
│       │   ├── test_grad_clip.py
│       │   ├── test_inference.py
│       │   ├── test_optim.py
│       │   ├── test_runtime_mem_tracer.py
│       │   ├── test_search.py
│       │   ├── test_zeroddp_state_dict.py
│       │   └── test_zerooptim_state_dict.py
│       └── test_low_level/
│           ├── test_coll_nd.py
│           ├── test_grad_acc.py
│           ├── test_mem_leak.py
│           ├── test_zero1_2.py
│           └── test_zero_ckpt.py
└── version.txt

================================================
FILE CONTENTS
================================================

================================================
FILE: .clang-format
================================================
BasedOnStyle: Google


================================================
FILE: .compatibility
================================================
2.3.0-12.1.0
2.4.0-12.4.1
2.5.1-12.4.1


================================================
FILE: .coveragerc
================================================
[run]
concurrency = multiprocessing
parallel = true
sigterm = true


================================================
FILE: .cuda_ext.json
================================================
{
  "build": [
    {
      "torch_command": "pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu121",
      "cuda_image": "image-cloud.luchentech.com/hpcaitech/cuda-conda:12.1"
    },
    {
      "torch_command": "pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124",
      "cuda_image": "image-cloud.luchentech.com/hpcaitech/cuda-conda:12.4"
    }
  ]
}


================================================
FILE: .github/CODEOWNERS
================================================
*   @hpcaitech/colossalai-qa


================================================
FILE: .github/ISSUE_TEMPLATE/bug-report.yml
================================================
name: 🐛 Bug Report
description: Create a report to help us reproduce and fix the bug
title: "[BUG]: "
labels: [bug]

body:
- type: markdown
  attributes:
    value: >
      #### Not suitable for your needs? [Open a blank issue](https://github.com/hpcaitech/ColossalAI/issues/new).
- type: checkboxes
  attributes:
    label: Is there an existing issue for this bug?
    description: Please search [here](https://github.com/hpcaitech/ColossalAI/issues) to see if an open or closed issue already exists for the bug you have encountered.
    options:
    - label: I have searched the existing issues
      required: true

- type: checkboxes
  attributes:
    label: The bug has not been fixed in the latest main branch
    options:
    - label: I have checked the latest main branch
      required: true

- type: dropdown
  id: share_script
  attributes:
    label: Do you feel comfortable sharing a concise (minimal) script that reproduces the error? :)
    description: If not, please share your setting/training config, and/or point to the line in the repo that throws the error.
              If the issue is not easily reproducible by us, it will reduce the likelihood of getting responses.
    options:
      - Yes, I will share a minimal reproducible script.
      - No, I prefer not to share.
  validations:
    required: true

- type: textarea
  attributes:
    label: 🐛 Describe the bug
    description: |
      **Describe the bug**
      A clear and concise description of what the bug is.
      **To Reproduce**
      Steps or code snippet to reproduce the behavior.
      **Expected behavior**
      A clear and concise description of what you expected to happen.
      **Screenshots**
      If applicable, add screenshots to help explain your problem.
      **Optional: Affiliation**
      Institution/email information helps better analyze and evaluate users to improve the project. Welcome to establish in-depth cooperation.
    placeholder: |
      A clear and concise description of what the bug is.
  validations:
    required: true
- type: textarea
  attributes:
    label: Environment
    description: |
      Please provide the environment information, eg. CUDA/cuDNN/NCCL/Python/PyTorch version.

- type: markdown
  attributes:
    value: >
      Thanks for contributing 🎉!


================================================
FILE: .github/ISSUE_TEMPLATE/config.yml
================================================
blank_issues_enabled: true
contact_links:
  - name: ❓ Simple question - Slack Chat
    url: https://github.com/hpcaitech/public_assets/tree/main/colossalai/contact/slack
    about: This issue tracker is not for technical support. Please use our Slack chat, and ask the community for help.
  - name: ❓ Simple question - WeChat
    url: https://github.com/hpcaitech/ColossalAI/blob/main/docs/images/WeChat.png
    about: This issue tracker is not for technical support. Please use WeChat, and ask the community for help.
  - name: 😊 Advanced question - GitHub Discussions
    url: https://github.com/hpcaitech/ColossalAI/discussions
    about: Use GitHub Discussions for advanced and unanswered technical questions, requiring a maintainer's answer.


================================================
FILE: .github/ISSUE_TEMPLATE/documentation.yml
================================================
name: 📚 Documentation
description: Report an issue related to https://www.colossalai.org/
title: "[DOC]: "
labels: [documentation]

body:
- type: markdown
  attributes:
    value: >
      #### Not suitable for your needs? [Open a blank issue](https://github.com/hpcaitech/ColossalAI/issues/new).
- type: textarea
  attributes:
    label: 📚 The doc issue
    description: |
      **Description** What content in [Documentation](https://www.colossalai.org/) is an issue?
      **Location** Where is the issue location?
      **Expectation** What is your expected content about it?
      **Screenshots** If applicable, add screenshots to help explain your problem.
      **Suggestions** Tell us how we could improve the documentation.
      **Optional: Affiliation** Institution/email information helps better analyze and evaluate users to improve the project. Welcome to establish in-depth cooperation.
    placeholder: |
      A clear and concise description of the issue.
  validations:
    required: true

- type: markdown
  attributes:
    value: >
      Thanks for contributing 🎉!


================================================
FILE: .github/ISSUE_TEMPLATE/feature_request.yml
================================================
name: 🚀 Feature request
description: Suggest an idea for this project
title: "[FEATURE]: "
labels: [enhancement]

body:
- type: markdown
  attributes:
    value: >
      #### Not suitable for your needs? [Open a blank issue](https://github.com/hpcaitech/ColossalAI/issues/new).
- type: textarea
  attributes:
    label: Describe the feature
    description: |
      **Is your feature request related to a problem? Please describe.**
      A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
      **Describe the solution you'd like**
      A clear and concise description of what you want to happen.
      **Describe alternatives you've considered**
      A clear and concise description of any alternative solutions or features you've considered.
      **Screenshots**
      If applicable, add screenshots to help explain your problem.
      **Suggest a potential alternative/fix**
      Tell us how we could improve this project.
      **Optional: Affiliation**
      Institution/email information helps better analyze and evaluate users to improve the project. Welcome to establish in-depth cooperation.
    placeholder: |
      A clear and concise description of your idea.
  validations:
    required: true

- type: markdown
  attributes:
    value: >
      Thanks for contributing 🎉!


================================================
FILE: .github/ISSUE_TEMPLATE/proposal.yml
================================================
name: 💥 Proposal
description: Propose a non-trivial change to Colossal-AI
title: "[PROPOSAL]: "
labels: [enhancement]

body:
- type: markdown
  attributes:
    value: |
      Common reasons for proposals include:

      - Altering the infrastructure;
      - Bumping a critical dependency's major version;
      - A significant improvement in user-friendliness;
      - Significant refactor;
      - Optional: Affiliation/email information helps better analyze and evaluate users to improve the project. Welcome to establish in-depth cooperation.
      - ...

      Please note this is not for feature request or bug template; such action could make us identify the issue wrongly and close it without doing anything.

      We give you maximum freedom to write an elaborated proposal illustrating why you think the change is beneficial for us, and what steps we should take to turn this into reality.


- type: textarea
  attributes:
    label: Proposal
    description: A clear and concise description of what the proposal is.
  validations:
    required: true

- type: checkboxes
  attributes:
    label: Self-service
    description: |
      If you feel like you could contribute to this issue, please check the box below. This would tell us and other people looking for contributions that someone's working on it.
      If you do check this box, please send a pull request within 7 days after a maintainer's approval so we can still delegate this to someone else.

      Proposals usually involve significant code changes, so please reach consensus with the maintainers before rushing to implement it, and make sure you follow the [Contributing Guidelines](https://github.com/hpcaitech/ColossalAI/blob/main/CONTRIBUTING.md).
      This ensures that you don't waste your time and we don't waste ours reading the large diffs.
    options:
      - label: I'd be willing to do some initial work on this proposal myself.


- type: markdown
  attributes:
    value: >
      Thanks for contributing 🎉!


================================================
FILE: .github/pull_request_template.md
================================================
## 📌 Checklist before creating the PR

- [ ] I have created an issue for this PR for traceability
- [ ] The title follows the standard format: `[doc/gemini/tensor/...]: A concise description`
- [ ] I have added relevant tags if possible for us to better distinguish different PRs
- [ ] I have installed pre-commit: `pip install pre-commit && pre-commit install`


## 🚨 Issue number

> Link this PR to your issue with words like fixed to automatically close the linked issue upon merge
>
> e.g. `fixed #1234`, `closed #1234`, `resolved #1234`



## 📝 What does this PR do?

> Summarize your work here.
> if you have any plots/diagrams/screenshots/tables, please attach them here.



## 💥 Checklist before requesting a review

- [ ] I have linked my PR to an issue ([instruction](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue))
- [ ] My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
- [ ] I have performed a self-review of my code
- [ ] I have added thorough tests.
- [ ] I have added docstrings for all the functions/methods I implemented

## ⭐️ Do you enjoy contributing to Colossal-AI?

- [ ] 🌝 Yes, I do.
- [ ] 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.


================================================
FILE: .github/workflows/README.md
================================================
# CI/CD

## Table of Contents

- [CI/CD](#cicd)
  - [Table of Contents](#table-of-contents)
  - [Overview](#overview)
  - [Workflows](#workflows)
    - [Code Style Check](#code-style-check)
    - [Unit Test](#unit-test)
    - [Example Test](#example-test)
      - [Example Test on Dispatch](#example-test-on-dispatch)
    - [Compatibility Test](#compatibility-test)
      - [Compatibility Test on Dispatch](#compatibility-test-on-dispatch)
    - [Release](#release)
    - [User Friendliness](#user-friendliness)
    - [Community](#community)
  - [Configuration](#configuration)
  - [Progress Log](#progress-log)

## Overview

Automation makes our development more efficient as the machine automatically run the pre-defined tasks for the contributors.
This saves a lot of manual work and allow the developer to fully focus on the features and bug fixes.
In Colossal-AI, we use [GitHub Actions](https://github.com/features/actions) to automate a wide range of workflows to ensure the robustness of the software.
In the section below, we will dive into the details of different workflows available.

## Workflows

Refer to this [documentation](https://docs.github.com/en/actions/managing-workflow-runs/manually-running-a-workflow) on how to manually trigger a workflow.
I will provide the details of each workflow below.

**A PR which changes the `version.txt` is considered as a release PR in the following context.**


### Code Style Check

| Workflow Name | File name         | Description                                                                                                    |
| ------------- | ----------------- | -------------------------------------------------------------------------------------------------------------- |
| `post-commit` | `post_commit.yml` | This workflow runs pre-commit checks for changed files to achieve code style consistency after a PR is merged. |

### Unit Test

| Workflow Name          | File name                  | Description                                                                                                                                       |
| ---------------------- | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- |
| `Build on PR`          | `build_on_pr.yml`          | This workflow is triggered when a PR changes essential files and a branch is created/deleted. It will run all the unit tests in the repository with 4 GPUs. |
| `Build on Schedule`    | `build_on_schedule.yml`    | This workflow will run the unit tests everyday with 8 GPUs. The result is sent to Lark.                                                           |
| `Report test coverage` | `report_test_coverage.yml` | This PR will put up a comment to report the test coverage results when `Build` is done.                                                           |

To reduce the average time of the unit test on PR, `Build on PR` workflow manages testmon cache.

1. When creating a new branch, it copies `cache/main/.testmondata*` to `cache/<branch>/`.
2. When creating a new PR or change the base branch of a PR, it copies `cache/<base_ref>/.testmondata*` to `cache/_pull/<pr_number>/`.
3. When running unit tests for each PR, it restores testmon cache from `cache/_pull/<pr_number>/`. After the test, it stores the cache back to `cache/_pull/<pr_number>/`.
4. When a PR is closed, if it's merged, it copies `cache/_pull/<pr_number>/.testmondata*` to `cache/<base_ref>/`. Otherwise, it just removes `cache/_pull/<pr_number>`.
5. When a branch is deleted, it removes `cache/<ref>`.

### Example Test

| Workflow Name              | File name                       | Description                                                                    |
| -------------------------- | ------------------------------- | ------------------------------------------------------------------------------ |
| `Test example on PR`       | `example_check_on_pr.yml`       | The example will be automatically tested if its files are changed in the PR    |
| `Test example on Schedule` | `example_check_on_schedule.yml` | This workflow will test all examples every Sunday. The result is sent to Lark. |
| `Example Test on Dispatch` | `example_check_on_dispatch.yml` | Manually test a specified example.                                             |

#### Example Test on Dispatch

This workflow is triggered by manually dispatching the workflow. It has the following input parameters:
- `example_directory`: the example directory to test. Multiple directories are supported and must be separated by comma. For example, language/gpt, images/vit. Simply input language or simply gpt does not work.

### Compatibility Test

| Workflow Name                    | File name                            | Description                                                                                                          |
| -------------------------------- | ------------------------------------ | -------------------------------------------------------------------------------------------------------------------- |
| `Compatibility Test on PR`       | `compatibility_test_on_pr.yml`       | Check Colossal-AI's compatibility when `version.txt` is changed in a PR.                                              |
| `Compatibility Test on Schedule` | `compatibility_test_on_schedule.yml` | This workflow will check the compatibility of Colossal-AI against PyTorch specified in `.compatibility` every Sunday. |
| `Compatibility Test on Dispatch`  | `compatibility_test_on_dispatch.yml` | Test PyTorch Compatibility manually.                                                                                 |


#### Compatibility Test on Dispatch
This workflow is triggered by manually dispatching the workflow. It has the following input parameters:
- `torch version`:torch version to test against, multiple versions are supported but must be separated by comma. The default is value is all, which will test all available torch versions listed in this [repository](https://github.com/hpcaitech/public_assets/tree/main/colossalai/torch_build/torch_wheels).
- `cuda version`: cuda versions to test against, multiple versions are supported but must be separated by comma. The CUDA versions must be present in our [DockerHub repository](https://hub.docker.com/r/hpcaitech/cuda-conda).

> It only test the compatibility of the main branch


### Release

| Workflow Name                                   | File name                                   | Description                                                                                                   |
| ----------------------------------------------- | ------------------------------------------- | ------------------------------------------------------------------------------------------------------------- |
| `Draft GitHub Release Post`                     | `draft_github_release_post_after_merge.yml` | Compose a GitHub release post draft based on the commit history when a release PR is merged.                  |
| `Publish to PyPI`                               | `release_pypi_after_merge.yml`              | Build and release the wheel to PyPI when a release PR is merged. The result is sent to Lark.                  |
| `Publish Nightly Version to PyPI`               | `release_nightly_on_schedule.yml`           | Build and release the nightly wheel to PyPI as `colossalai-nightly` every Sunday. The result is sent to Lark. |
| `Publish Docker Image to DockerHub after Merge` | `release_docker_after_merge.yml`            | Build and release the Docker image to DockerHub when a release PR is merged.  The result is sent to Lark.     |
| `Check CUDA Extension Build Before Merge`       | `cuda_ext_check_before_merge.yml`           | Build CUDA extensions with different CUDA versions when a release PR is created.                              |
| `Publish to Test-PyPI Before Merge`             | `release_test_pypi_before_merge.yml`        | Release to test-pypi to simulate user installation when a release PR is created.                              |


### User Friendliness

| Workflow Name           | File name               | Description                                                                                                                            |
| ----------------------- | ----------------------- | -------------------------------------------------------------------------------------------------------------------------------------- |
| `issue-translate`       | `translate_comment.yml` | This workflow is triggered when a new issue comment is created. The comment will be translated into English if not written in English. |
| `Synchronize submodule` | `submodule.yml`         | This workflow will check if any git submodule is updated. If so, it will create a PR to update the submodule pointers.                 |
| `Close inactive issues` | `close_inactive.yml`    | This workflow will close issues which are stale for 14 days.                                                                           |

### Community

| Workflow Name                                | File name                        | Description                                                                      |
| -------------------------------------------- | -------------------------------- | -------------------------------------------------------------------------------- |
| `Generate Community Report and Send to Lark` | `report_leaderboard_to_lark.yml` | Collect contribution and user engagement stats and share with Lark every Friday. |

## Configuration

This section lists the files used to configure the workflow.

1. `.compatibility`

This `.compatibility` file is to tell GitHub Actions which PyTorch and CUDA versions to test against. Each line in the file is in the format `${torch-version}-${cuda-version}`, which is a tag for Docker image. Thus, this tag must be present in the [docker registry](https://hub.docker.com/r/pytorch/conda-cuda) so as to perform the test.

2. `.cuda_ext.json`

This file controls which CUDA versions will be checked against CUDA extension built. You can add a new entry according to the json schema below to check the AOT build of PyTorch extensions before release.

```json
{
  "build": [
    {
      "torch_command": "",
      "cuda_image": ""
    },
  ]
}
```

## Progress Log

- [x] Code style check
  - [x] post-commit check
- [x] unit testing
  - [x] test on PR
  - [x] report test coverage
  - [x] regular test
- [x] release
  - [x] pypi release
  - [x] test-pypi simulation
  - [x] nightly build
  - [x] docker build
  - [x] draft release post
- [x] example check
  - [x] check on PR
  - [x] regular check
  - [x] manual dispatch
- [x] compatibility check
  - [x] check on PR
  - [x] manual dispatch
  - [x] auto test when release
- [x] community
  - [x] contribution report
  - [x] user engagement report
- [x] helpers
  - [x] comment translation
  - [x] submodule update
  - [x] close inactive issue


================================================
FILE: .github/workflows/build_on_pr.yml
================================================
name: Build on PR

on:
  pull_request:
    types: [synchronize, opened, reopened, ready_for_review, closed]
    branches:
      - "main"
      - "develop"
      - "feature/**"
    paths:
      - ".github/workflows/build_on_pr.yml" # run command & env variables change
      - "colossalai/**" # source code change
      - "!colossalai/**.md" # ignore doc change
      - "op_builder/**" # cuda extension change
      - "!op_builder/**.md" # ignore doc change
      - "requirements/**" # requirements change
      - "tests/**" # test change
      - "!tests/**.md" # ignore doc change
      - "pytest.ini" # test config change
      - "setup.py" # install command change
  create:
  delete:

jobs:
  detect:
    name: Detect file change
    if: |
      github.event_name == 'pull_request' &&
      (github.event.action == 'synchronize' || github.event.action == 'opened' || github.event.action == 'reopened' || github.event.action == 'ready_for_review') &&
      github.event.pull_request.draft == false &&
      github.event.pull_request.base.repo.full_name == 'hpcaitech/ColossalAI'
    outputs:
      changedExtenisonFiles: ${{ steps.find-extension-change.outputs.all_changed_files }}
      anyExtensionFileChanged: ${{ steps.find-extension-change.outputs.any_changed }}
      changedLibraryFiles: ${{ steps.find-lib-change.outputs.all_changed_files }}
      anyLibraryFileChanged: ${{ steps.find-lib-change.outputs.any_changed }}
    runs-on: [self-hosted, ubuntu-latest]
    concurrency:
      group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-detect-change
      cancel-in-progress: true
    steps:
      - uses: actions/checkout@v2
        with:
          fetch-depth: 0
          ref: ${{ github.event.pull_request.head.sha }}

      - name: Locate base commit
        id: locate-base-sha
        run: |
          curBranch=$(git rev-parse --abbrev-ref HEAD)
          commonCommit=$(git merge-base origin/main $curBranch)
          echo $commonCommit
          echo "baseSHA=$commonCommit" >> $GITHUB_OUTPUT

      - name: Find the changed extension-related files
        id: find-extension-change
        uses: tj-actions/changed-files@v35
        with:
          base_sha: ${{ steps.locate-base-sha.outputs.baseSHA }}
          files: |
            op_builder/**
            colossalai/kernel/**
            setup.py

      - name: Find the changed library-related files
        id: find-lib-change
        uses: tj-actions/changed-files@v35
        with:
          base_sha: ${{ steps.locate-base-sha.outputs.baseSHA }}
          files: |
            **/*.py
            **/*.h
            **/*.cpp
            **/*.cu
            **/*.txt

      - name: List changed files
        run: |
          for file in ${{ steps.find-extension-change.outputs.all_changed_files }}; do
            echo "$file was changed"
          done
          for file in ${{ steps.find-lib-change.outputs.all_changed_files }}; do
            echo "$file was changed"
          done

  build:
    name: Build and Test Colossal-AI
    needs: detect
    if: needs.detect.outputs.anyLibraryFileChanged == 'true'
    runs-on: [self-hosted, ubuntu-latest]
    container:
      image: image-cloud.luchentech.com/hpcaitech/pytorch-cuda:2.2.2-12.1.0
      options: --gpus all --shm-size=2g --rm -v /dev/shm -v /data/scratch:/data/scratch
    timeout-minutes: 90
    defaults:
      run:
        shell: bash
    concurrency:
      group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-run-test
      cancel-in-progress: true
    steps:
      - name: Checkout TensorNVMe
        uses: actions/checkout@v2
        with:
          repository: hpcaitech/TensorNVMe
          ssh-key: ${{ secrets.SSH_KEY_FOR_CI }}
          path: TensorNVMe

      - name: Restore TensorNVMe Cache
        run: |
          if [ -d /github/home/tensornvme_cache ] && [ ! -z "$(ls -A /github/home/tensornvme_cache/)" ]; then
            cp -p -r /github/home/tensornvme_cache/* /__w/ColossalAI/ColossalAI/TensorNVMe
          fi

      - name: Install TensorNVMe
        run: |
          cd TensorNVMe
          conda install cmake
          pip install -r requirements.txt
          DISABLE_URING=1 pip install -v --no-cache-dir .

      - name: Store TensorNVMe Cache
        run: |
          cd TensorNVMe
          cp -p -r ./build /github/home/tensornvme_cache/
          cp -p -r ./cmake-build /github/home/tensornvme_cache/

      - name: Checkout Colossal-AI
        uses: actions/checkout@v2
        with:
          ssh-key: ${{ secrets.SSH_KEY_FOR_CI }}

      - name: Restore Colossal-AI Cache
        if: needs.detect.outputs.anyExtensionFileChanged != 'true'
        run: |
          # -p flag is required to preserve the file timestamp to avoid ninja rebuild
          if [ -d /github/home/cuda_ext_cache ] && [ ! -z "$(ls -A /github/home/cuda_ext_cache/)" ]; then
            cp -p -r /github/home/cuda_ext_cache/* /__w/ColossalAI/ColossalAI/
          fi

      - name: Install flash-attention
        run: |
          pip install flash-attn==2.7.4.post1 --no-build-isolation

      - name: Install Colossal-AI
        run: |
          BUILD_EXT=1 pip install -v -e .
          pip install --no-cache-dir -r requirements/requirements-test.txt

      - name: Store Colossal-AI Cache
        run: |
          # -p flag is required to preserve the file timestamp to avoid ninja rebuild
          cp -p -r /__w/ColossalAI/ColossalAI/build /github/home/cuda_ext_cache/

      - name: Execute Unit Testing
        run: |
          CURL_CA_BUNDLE="" PYTHONPATH=$PWD FAST_TEST=1 pytest \
          -m "not largedist" \
          --durations=0 \
          --ignore tests/test_analyzer \
          --ignore tests/test_auto_parallel \
          --ignore tests/test_fx \
          --ignore tests/test_autochunk \
          --ignore tests/test_gptq \
          --ignore tests/test_infer_ops \
          --ignore tests/test_legacy \
          --ignore tests/test_smoothquant \
          tests/
        env:
          LD_LIBRARY_PATH: /github/home/.tensornvme/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
          LLAMA_PATH: /data/scratch/llama-tiny
          MOE_TENSOR_PATH: /data/scratch/moe_tensors
          HF_ENDPOINT: https://hf-mirror.com

      - name: Collate artifact
        env:
          PR_NUMBER: ${{ github.event.number }}
          changedLibraryFiles: ${{ needs.detect.outputs.changedLibraryFiles }}
          anyLibraryFileChanged: ${{ needs.detect.outputs.anyLibraryFileChanged }}
          changedExtenisonFiles: ${{ needs.detect.outputs.changedExtenisonFiles }}
        run: |
          mkdir report
          echo $PR_NUMBER > ./report/pr_number

          # generate coverage.xml if any
          if [ "$anyLibraryFileChanged" == "true" ] && [ -e .coverage ]; then
            allFiles=""
            for file in $changedLibraryFiles; do
              if [ "$allFiles" == "" ]; then
                allFiles=$file
              else
                allFiles=$allFiles,$file
              fi
            done

            coverage report --data-file .coverage --include $allFiles > ./coverage.txt

            covPercentage=$(tail -n 1 coverage.txt  | grep -o '[1-9]*%$')
            covNum=${covPercentage::-1}
            mv coverage.txt ./report
            echo $covNum > ./report/cov_number
          else
            echo "No coverage report is generated"
          fi

      - name: Upload test coverage artifact
        uses: actions/upload-artifact@v4
        with:
          name: report
          path: report/


================================================
FILE: .github/workflows/build_on_schedule.yml
================================================
name: Build on Schedule

on:
  schedule:
    # run at 00:00 of every Sunday
    - cron: "0 0 * * 0"
  workflow_dispatch:

jobs:
  build:
    name: Build and Test Colossal-AI
    if: github.repository == 'hpcaitech/ColossalAI'
    runs-on: [self-hosted, ubuntu-latest]
    container:
      image: image-cloud.luchentech.com/hpcaitech/pytorch-cuda:2.2.2-12.1.0
      options: --gpus all --rm -v /dev/shm -v /data/scratch/:/data/scratch/
    timeout-minutes: 90
    steps:
      - name: Check GPU Availability # ensure all GPUs have enough memory
        id: check-avai
        run: |
          avai=true
          ngpu=$(nvidia-smi --query-gpu=name --format=csv,noheader | wc -l)
          endIndex=$(($ngpu-1))
          for i in $(seq 0 $endIndex);
          do
            gpu_used=$(nvidia-smi -i $i --query-gpu=memory.used --format=csv,noheader,nounits)
            [ "$gpu_used" -gt "2000" ] && avai=false
          done

          echo "GPU is available: $avai"
          echo "avai=$avai" >> $GITHUB_OUTPUT

      - uses: actions/checkout@v2
        if: steps.check-avai.outputs.avai == 'true'
        with:
          repository: hpcaitech/TensorNVMe
          ssh-key: ${{ secrets.SSH_KEY_FOR_CI }}
          path: TensorNVMe

      - name: Install tensornvme
        if: steps.check-avai.outputs.avai == 'true'
        run: |
          cd TensorNVMe
          conda install cmake
          pip install -r requirements.txt
          DISABLE_URING=1 pip install -v .

      - uses: actions/checkout@v2
        if: steps.check-avai.outputs.avai == 'true'
        with:
          ssh-key: ${{ secrets.SSH_KEY_FOR_CI }}

      - name: Install flash-attention
        run: |
          pip install flash-attn==2.7.4.post1 --no-build-isolation

      - name: Install Colossal-AI
        if: steps.check-avai.outputs.avai == 'true'
        run: |
          [ ! -z "$(ls -A /github/home/cuda_ext_cache/)" ] && cp -r /github/home/cuda_ext_cache/* /__w/ColossalAI/ColossalAI/
          BUILD_EXT=1 pip install -v -e .
          cp -r /__w/ColossalAI/ColossalAI/build /github/home/cuda_ext_cache/
          pip install --no-cache-dir -r requirements/requirements-test.txt

      - name: Unit Testing
        if: steps.check-avai.outputs.avai == 'true'
        run: |
          PYTHONPATH=$PWD pytest \
          -m "not largedist" \
          --durations=0 \
          tests/
        env:
          LD_LIBRARY_PATH: /github/home/.tensornvme/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
          LLAMA_PATH: /data/scratch/llama-tiny
          MOE_TENSOR_PATH: /data/scratch/moe_tensors
          HF_ENDPOINT: https://hf-mirror.com

      - name: Notify Lark
        id: message-preparation
        if: ${{ failure() }}
        run: |
          url=$SERVER_URL/$REPO/actions/runs/$RUN_ID
          msg="Scheduled Build and Test failed, please visit $url for details"
          echo $msg
          python .github/workflows/scripts/send_message_to_lark.py -m "$msg" -u $WEBHOOK_URL
        env:
          SERVER_URL: ${{github.server_url }}
          REPO: ${{ github.repository }}
          RUN_ID: ${{ github.run_id }}
          WEBHOOK_URL: ${{ secrets.LARK_NOTIFICATION_WEBHOOK_URL }}


================================================
FILE: .github/workflows/close_inactive.yml
================================================
name: Close inactive issues

on:
  schedule:
    - cron: "0 0 * * *"

jobs:
  close-issues:
    if: github.event.pull_request.draft == false && github.base_ref == 'main' && github.event.pull_request.base.repo.full_name == 'hpcaitech/ColossalAI'
    runs-on: [self-hosted, ubuntu-latest]
    permissions:
      issues: write
      pull-requests: write
    steps:
      - uses: actions/stale@v3
        with:
          days-before-issue-stale: 14
          days-before-issue-close: -1
          stale-issue-label: "stale"
          stale-issue-message: "This issue is stale because it has been open for 14 days with no activity."
#           close-issue-message: "This issue was closed because it has been inactive for 14 days since being marked as stale."
          days-before-pr-stale: 14
          days-before-pr-close: -1
          stale-pr-message: "This PR is stale because it has been open for 14 days with no activity."
#           close-pr-message: "This PR was closed because it has been inactive for 14 days since being marked as stale."
          repo-token: ${{ secrets.GITHUB_TOKEN }}


================================================
FILE: .github/workflows/compatiblity_test_on_dispatch.yml
================================================
name: Compatibility Test on Dispatch

on:
  workflow_dispatch:
    inputs:
      torch_version:
        type: string
        description: torch version, separated by comma
        required: true
      cuda_version:
        type: string
        description: cuda version, separated by comma
        required: true

jobs:
  matrix_preparation:
    name: Prepare Container List
    runs-on: [self-hosted, ubuntu-latest]
    outputs:
      matrix: ${{ steps.set-matrix.outputs.matrix }}
    steps:
      - id: set-matrix
        env:
          TORCH_VERSIONS: ${{ inputs.torch_version }}
          CUDA_VERSIONS: ${{ inputs.cuda_version }}
        run: |
          IFS=','
          DOCKER_IMAGE=()

          for tv in $TORCH_VERSIONS
          do
              for cv in $CUDA_VERSIONS
              do
                  DOCKER_IMAGE+=("\"image-cloud.luchentech.com/hpcaitech/pytorch-cuda:${tv}-${cv}\"")
              done
          done

          container=$( IFS=',' ; echo "${DOCKER_IMAGE[*]}" )
          container="[${container}]"
          echo "$container"
          echo "::set-output name=matrix::{\"container\":$(echo "$container")}"

  build:
    name: Test for PyTorch Compatibility
    needs: matrix_preparation
    if: github.repository == 'hpcaitech/ColossalAI'
    runs-on: [self-hosted, ubuntu-latest]
    strategy:
      fail-fast: false
      matrix: ${{fromJson(needs.matrix_preparation.outputs.matrix)}}
    container:
      image: ${{ matrix.container }}
      options: --gpus all --rm -v /dev/shm -v /data/scratch/:/data/scratch/
    timeout-minutes: 200
    steps:
      - name: Install dependencies
        run: |
          apt update && apt install -y cmake
          pip install -U pip setuptools==68.2.2 wheel --user

      - uses: actions/checkout@v2
        with:
          ssh-key: ${{ secrets.SSH_KEY_FOR_CI }}

      - name: Install Colossal-AI
        run: |
          BUILD_EXT=1 pip install -v -e .
          pip install --no-cache-dir -r requirements/requirements-test.txt

      - name: Install tensornvme
        run: |
          DISABLE_URING=1 pip install -v git+https://github.com/hpcaitech/TensorNVMe.git

      - name: Unit Testing
        run: |
          PYTHONPATH=$PWD pytest
          -m "not largedist" \
          --durations=0 \
          --ignore tests/test_analyzer \
          --ignore tests/test_auto_parallel \
          --ignore tests/test_fx \
          --ignore tests/test_autochunk \
          --ignore tests/test_gptq \
          --ignore tests/test_infer_ops \
          --ignore tests/test_legacy \
          --ignore tests/test_smoothquant \
          tests/
        env:
          DATA: /data/scratch/cifar-10
          LD_LIBRARY_PATH: /github/home/.tensornvme/lib
          LLAMA_PATH: /data/scratch/llama-tiny
          MOE_TENSOR_PATH: /data/scratch/moe_tensors
          HF_ENDPOINT: https://hf-mirror.com


================================================
FILE: .github/workflows/compatiblity_test_on_pr.yml
================================================
name: Compatibility Test on PR

on:
  pull_request:
    paths:
      - "version.txt"
      - ".compatibility"

jobs:
  matrix_preparation:
    name: Prepare Container List
    runs-on: [self-hosted, ubuntu-latest]
    outputs:
      matrix: ${{ steps.set-matrix.outputs.matrix }}
    concurrency:
      group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-prepare-matrix
      cancel-in-progress: true
    steps:
      - uses: actions/checkout@v3
      - id: set-matrix
        run: |
          IFS=','
          DOCKER_IMAGE=()

          while read tag; do
            DOCKER_IMAGE+=("\"image-cloud.luchentech.com/hpcaitech/pytorch-cuda:${tag}\"")
          done <.compatibility

          container=$( IFS=',' ; echo "${DOCKER_IMAGE[*]}" )
          container="[${container}]"
          echo "$container"
          echo "::set-output name=matrix::{\"container\":$(echo "$container")}"

  build:
    name: Test for PyTorch Compatibility
    needs: matrix_preparation
    if: github.repository == 'hpcaitech/ColossalAI'
    runs-on: [self-hosted, ubuntu-latest]
    strategy:
      fail-fast: false
      matrix: ${{fromJson(needs.matrix_preparation.outputs.matrix)}}
    container:
      image: ${{ matrix.container }}
      options: --gpus all --rm -v /dev/shm -v /data/scratch/:/data/scratch/
    timeout-minutes: 200
    concurrency:
      group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-run-test-${{ matrix.container }}
      cancel-in-progress: true
    steps:
      - name: Install dependencies
        run: |
          apt update && apt install -y cmake
          pip install -U pip setuptools==68.2.2 wheel --user

      - uses: actions/checkout@v2
        with:
          ssh-key: ${{ secrets.SSH_KEY_FOR_CI }}

      - name: Install Colossal-AI
        run: |
          BUILD_EXT=1 pip install -v -e .
          pip install --no-cache-dir -r requirements/requirements-test.txt

      - name: Install tensornvme
        run: |
          DISABLE_URING=1 pip install -v git+https://github.com/hpcaitech/TensorNVMe.git

      - name: Unit Testing
        run: |
          PYTHONPATH=$PWD pytest \
          -m "not largedist" \
          --durations=0 \
          --ignore tests/test_analyzer \
          --ignore tests/test_auto_parallel \
          --ignore tests/test_fx \
          --ignore tests/test_autochunk \
          --ignore tests/test_gptq \
          --ignore tests/test_infer_ops \
          --ignore tests/test_legacy \
          --ignore tests/test_smoothquant \
          tests/
        env:
          DATA: /data/scratch/cifar-10
          LD_LIBRARY_PATH: /github/home/.tensornvme/lib
          LLAMA_PATH: /data/scratch/llama-tiny
          MOE_TENSOR_PATH: /data/scratch/moe_tensors
          HF_ENDPOINT: https://hf-mirror.com


================================================
FILE: .github/workflows/compatiblity_test_on_schedule.yml
================================================
name: Compatibility Test on Schedule

on:
  # run at 03:00 of every Sunday(singapore time) so here is UTC time Saturday 16:00
  schedule:
    - cron:  '0 19 * * 6'
  workflow_dispatch:

jobs:
  matrix_preparation:
    name: Prepare Container List
    runs-on: [self-hosted, ubuntu-latest]
    outputs:
      matrix: ${{ steps.set-matrix.outputs.matrix }}
    steps:
      - uses: actions/checkout@v3
      - id: set-matrix
        run: |
          IFS=','
          DOCKER_IMAGE=()

          while read tag; do
            DOCKER_IMAGE+=("\"image-cloud.luchentech.com/hpcaitech/pytorch-cuda:${tag}\"")
          done <.compatibility

          container=$( IFS=',' ; echo "${DOCKER_IMAGE[*]}" )
          container="[${container}]"
          echo "$container"
          echo "::set-output name=matrix::{\"container\":$(echo "$container")}"

  build:
    name: Test for PyTorch Compatibility
    needs: matrix_preparation
    if: github.repository == 'hpcaitech/ColossalAI'
    runs-on: [self-hosted, ubuntu-latest]
    strategy:
      fail-fast: false
      matrix: ${{fromJson(needs.matrix_preparation.outputs.matrix)}}
    container:
      image: ${{ matrix.container }}
      options: --gpus all --rm -v /dev/shm -v /data/scratch/:/data/scratch/
    timeout-minutes: 200
    steps:
      - name: Install dependencies
        run: |
          apt update && apt install -y cmake
          pip install -U pip setuptools==68.2.2 wheel --user

      - uses: actions/checkout@v2
        with:
          ssh-key: ${{ secrets.SSH_KEY_FOR_CI }}

      - name: Install Colossal-AI
        run: |
          BUILD_EXT=1 pip install -v -e .
          pip install --no-cache-dir -r requirements/requirements-test.txt

      - name: Install tensornvme
        run: |
          DISABLE_URING=1 pip install -v git+https://github.com/hpcaitech/TensorNVMe.git

      - name: Unit Testing
        run: |
          PYTHONPATH=$PWD pytest \
          -m "not largedist" \
          --durations=0 \
          --ignore tests/test_analyzer \
          --ignore tests/test_auto_parallel \
          --ignore tests/test_fx \
          --ignore tests/test_autochunk \
          --ignore tests/test_gptq \
          --ignore tests/test_infer_ops \
          --ignore tests/test_legacy \
          --ignore tests/test_smoothquant \
          tests/
        env:
          DATA: /data/scratch/cifar-10
          LD_LIBRARY_PATH: /github/home/.tensornvme/lib
          LLAMA_PATH: /data/scratch/llama-tiny
          MOE_TENSOR_PATH: /data/scratch/moe_tensors
          HF_ENDPOINT: https://hf-mirror.com

      - name: Notify Lark
        id: message-preparation
        if: ${{ failure() }}
        run: |
          url=$SERVER_URL/$REPO/actions/runs/$RUN_ID
          msg="Compatibility test failed with $container, please visit $url for details"
          echo $msg
          python .github/workflows/scripts/send_message_to_lark.py -m "$msg" -u $WEBHOOK_URL
        env:
          SERVER_URL: ${{github.server_url }}
          REPO: ${{ github.repository }}
          RUN_ID: ${{ github.run_id }}
          WEBHOOK_URL: ${{ secrets.LARK_NOTIFICATION_WEBHOOK_URL }}
          container: ${{ matrix.container }}


================================================
FILE: .github/workflows/cuda_ext_check_before_merge.yml
================================================
name: Check CUDA Extension Build Before Merge

on:
  workflow_dispatch:
  pull_request:
    paths:
      - 'version.txt'

jobs:
  matrix_preparation:
    name: Prepare Container List
    if: github.repository == 'hpcaitech/ColossalAI'
    runs-on: [self-hosted, ubuntu-latest]
    outputs:
      matrix: ${{ steps.set-matrix.outputs.matrix }}
    steps:
      - uses: actions/checkout@v3

      - id: set-matrix
        run: |
          cuda_ext=$(cat .cuda_ext.json | tr '\n' ' ')
          echo "matrix=${cuda_ext}" >> $GITHUB_OUTPUT

  build:
    name: Release bdist wheels
    needs: matrix_preparation
    runs-on: [self-hosted, ubuntu-latest]
    strategy:
      fail-fast: false
      matrix: ${{fromJson(needs.matrix_preparation.outputs.matrix)}}
    container:
      image: ${{ matrix.build.cuda_image }}
      options: --gpus all --rm
    steps:
      - uses: actions/checkout@v2

      - name: Install PyTorch
        run: eval ${{ matrix.build.torch_command }}

      - name: Download cub for CUDA 10.2
        run: |
          CUDA_VERSION=$(nvcc -V | awk -F ',| ' '/release/{print $6}')

          # check if it is CUDA 10.2
          # download cub
          if [ "$CUDA_VERSION" = "10.2" ]; then
            wget https://github.com/NVIDIA/cub/archive/refs/tags/1.8.0.zip
            unzip 1.8.0.zip
            cp -r cub-1.8.0/cub/ colossalai/kernel/cuda_native/csrc/kernels/include/
          fi

      - name: Build
        run: |
          BUILD_EXT=1 pip install -v -e .


================================================
FILE: .github/workflows/doc_build_on_schedule_after_release.yml
================================================
name: Build Documentation On Schedule & After Release

on:
  workflow_dispatch:
  schedule:
    - cron: "0 12 * * *" # build doc every day at 8pm Singapore time (12pm UTC time)
  release:
    types: [published]

jobs:
  build-doc:
    name: Trigger Documentation Build Workflow
    if: github.repository == 'hpcaitech/ColossalAI'
    runs-on: [self-hosted, ubuntu-latest]
    steps:
      - name: trigger workflow in ColossalAI-Documentation
        run: |
          curl \
            -X POST \
            -H "Accept: application/vnd.github+json" \
            -H "Authorization: Bearer ${GH_TOKEN}"\
            -H "X-GitHub-Api-Version: 2022-11-28" \
            https://api.github.com/repos/hpcaitech/ColossalAI-Documentation/actions/workflows/deploy.yml/dispatches \
            -d '{"ref":"main"}'
        env:
          GH_TOKEN: ${{secrets.DOC_REPO_TOKEN}}


================================================
FILE: .github/workflows/doc_check_on_pr.yml
================================================
name: Check Documentation on PR

on:
  pull_request:
    branches:
      - "main"
      - "develop"
      - "feature/**"
    paths:
      - "docs/**"

jobs:
  check-i18n:
    name: Check docs in diff languages
    if: |
      github.event.pull_request.draft == false &&
      github.event.pull_request.base.repo.full_name == 'hpcaitech/ColossalAI'
    runs-on: [self-hosted, ubuntu-latest]
    concurrency:
      group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-check-i18n
      cancel-in-progress: true
    steps:
      - uses: actions/checkout@v2

      - uses: actions/setup-python@v2
        with:
          python-version: "3.9"

      - run: python .github/workflows/scripts/check_doc_i18n.py -d docs/source

  check-doc-build:
    name: Test if the docs can be built
    if: |
      github.event.pull_request.draft == false &&
      github.event.pull_request.base.repo.full_name == 'hpcaitech/ColossalAI'
    runs-on: [self-hosted, ubuntu-latest]
    concurrency:
      group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-check-doc
      cancel-in-progress: true
    steps:
      - uses: actions/checkout@v2
        with:
          path: "./ColossalAI"
          fetch-depth: 0

      - uses: actions/checkout@v2
        with:
          path: "./ColossalAI-Documentation"
          repository: "hpcaitech/ColossalAI-Documentation"

      - uses: actions/setup-python@v2
        with:
          python-version: "3.9"

      # we use the versions in the main branch as the guide for versions to display
      # checkout will give your merged branch
      # therefore, we need to make the merged branch as the main branch
      # there is no main branch, so it's safe to checkout the main branch from the merged branch
      # docer will rebase the remote main branch to the merged branch, so we have to config user
      - name: Make the merged branch main

        run: |
          cd ColossalAI
          git checkout -b main
          git branch -u origin/main
          git config user.name 'github-actions'
          git config user.email 'github-actions@github.com'

      - name: Build docs
        run: |
          cache_dir=ColossalAI-Documentation/doc-build/.cache
          mkdir $cache_dir
          mv ColossalAI $cache_dir
          cd ColossalAI-Documentation
          pip install -v ./doc-build/third_party/hf-doc-builder
          pip install -v ./doc-build
          bash ./scripts/build.sh


================================================
FILE: .github/workflows/doc_test_on_pr.yml
================================================
name: Test Documentation on PR
on:
  pull_request:
    branches:
      - "main"
      - "develop"
      - "feature/**"
    # any change in the examples folder will trigger check for the corresponding example.
    paths:
      - "docs/source/**.md"

jobs:
  # This is for changed example files detect and output a matrix containing all the corresponding directory name.
  detect-changed-doc:
    if: |
      github.event.pull_request.draft == false &&
      github.event.pull_request.base.repo.full_name == 'hpcaitech/ColossalAI' && github.event_name == 'pull_request'
    runs-on: [self-hosted, ubuntu-latest]
    outputs:
      any_changed: ${{ steps.changed-files.outputs.any_changed }}
      changed_files: ${{ steps.changed-files.outputs.all_changed_files }}
    concurrency:
      group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-detect-change
      cancel-in-progress: true
    name: Detect changed example files
    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0
          ref: ${{ github.event.pull_request.head.sha }}

      - name: Locate base commit
        id: locate-base-sha
        run: |
          curBranch=$(git rev-parse --abbrev-ref HEAD)
          commonCommit=$(git merge-base origin/main $curBranch)
          echo $commonCommit
          echo "baseSHA=$commonCommit" >> $GITHUB_OUTPUT

      - name: Get all changed example files
        id: changed-files
        uses: tj-actions/changed-files@v35
        with:
          base_sha: ${{ steps.locate-base-sha.outputs.baseSHA }}
          files: |
            ./docs/source/**/*.md

  # If no file is changed, it will prompt an error and shows the matrix do not have value.
  check-changed-doc:
    # Add this condition to avoid executing this job if the trigger event is workflow_dispatch.
    if: |
      github.event.pull_request.draft == false &&
      github.event.pull_request.base.repo.full_name == 'hpcaitech/ColossalAI' && github.event_name == 'pull_request' &&
      needs.detect-changed-doc.outputs.any_changed == 'true'
    name: Test the changed Doc
    needs: detect-changed-doc
    runs-on: [self-hosted, ubuntu-latest]
    container:
      image: image-cloud.luchentech.com/hpcaitech/pytorch-cuda:2.2.2-12.1.0
      options: --gpus all --rm
    timeout-minutes: 30
    defaults:
      run:
        shell: bash
    concurrency:
      group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-run-doctest
      cancel-in-progress: true
    steps:
      - name: Checkout ColossalAI-Documentation
        uses: actions/checkout@v2
        with:
          path: "./ColossalAI-Documentation"
          repository: "hpcaitech/ColossalAI-Documentation"

      - name: Install Docer
        run: |
          pip install -v ./ColossalAI-Documentation/doc-build/third_party/hf-doc-builder
          pip install -v ./ColossalAI-Documentation/doc-build

      - name: Checkout ColossalAI
        uses: actions/checkout@v3

      - name: Install Doc Test Requirements
        run: |
          source activate pytorch
          conda env update --file docs/conda-doc-test-deps.yml --prune
          pip install -r docs/requirements-doc-test.txt

      - name: Install ColossalAI
        run: |
          source activate pytorch
          BUILD_EXT=1 pip install -v -e .

      - name: Test the Doc
        run: |
          source activate pytorch
          for file in ${{ needs.detect-changed-doc.outputs.changed_files }}; do
            echo "Testing $file now..."
            docer test -p $file
          done
        env:
          NCCL_SHM_DISABLE: 1


================================================
FILE: .github/workflows/doc_test_on_schedule.yml
================================================
name: Test Documentation on Schedule
on:
  # run at 07:00 of every Sunday(singapore time) so here is UTC time Saturday 23:00
  schedule:
    - cron:  '0 23 * * 6'
  workflow_dispatch:

jobs:
  check-changed-doc:
    # Add this condition to avoid executing this job if the trigger event is workflow_dispatch.
    if: github.repository == 'hpcaitech/ColossalAI'
    name: Test the changed Doc
    runs-on: [self-hosted, ubuntu-latest]
    container:
      image: image-cloud.luchentech.com/hpcaitech/pytorch-cuda:2.2.2-12.1.0
      options: --gpus all --rm
    timeout-minutes: 60
    steps:
      - name: Checkout ColossalAI-Documentation
        uses: actions/checkout@v2
        with:
          path: './ColossalAI-Documentation'
          repository: 'hpcaitech/ColossalAI-Documentation'

      - name: Install Docer
        run: |
          pip install -v ./ColossalAI-Documentation/doc-build/third_party/hf-doc-builder
          pip install -v ./ColossalAI-Documentation/doc-build

      - name: Checkout ColossalAI
        uses: actions/checkout@v3

      - name: Install ColossalAI
        run: |
          BUILD_EXT=1 pip install -v -e .

      - name: Install Doc Test Requirements
        run: |
          pip install -r docs/requirements-doc-test.txt

      - name: Test the Doc
        run: |
          for file in $(find ./docs/source -name "*.md"); do
            docer test -p $file
          done
        env:
          NCCL_SHM_DISABLE: 1


================================================
FILE: .github/workflows/draft_github_release_post_after_merge.yml
================================================
name: Draft GitHub Release Post

on:
  workflow_dispatch:
  pull_request:
    paths:
      - 'version.txt'
    types:
      - closed

jobs:
  release:
    name: Draft Release Post
    if: ( github.event_name == 'workflow_dispatch' || github.event.pull_request.merged == true ) && github.repository == 'hpcaitech/ColossalAI'
    runs-on: [self-hosted, ubuntu-latest]
    steps:
      - uses: actions/checkout@v2
        with:
          fetch-depth: 0
      - uses: actions/setup-python@v2
        with:
          python-version: '3.9'
      - name: generate draft
        id: generate_draft
        run: |
          version=v$(cat version.txt)
          pip install requests
          python ./.github/workflows/scripts/generate_release_draft.py --out $PWD/release_draft.md --version $version
          echo "::set-output name=version::$version"
          echo "::set-output name=path::$PWD/release_draft.md"
        env:
          GITHUB_API_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      - name: Create Release
        id: create_release
        uses: actions/create-release@v1
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        with:
          tag_name: ${{ steps.generate_draft.outputs.version }}
          release_name: Version ${{ steps.generate_draft.outputs.version }} Release Today!
          body_path: ${{ steps.generate_draft.outputs.path }}
          draft: True
          prerelease: false


================================================
FILE: .github/workflows/example_check_on_dispatch.yml
================================================
name: Test Example on Dispatch
on:
  workflow_dispatch:
    inputs:
      example_directory:
        type: string
        description: example directory, separated by space. For example, language/gpt, images/vit. Simply input language or simply gpt does not work.
        required: true

jobs:
  matrix_preparation:
    if: |
        github.event.pull_request.draft == false &&
        github.base_ref == 'main' &&
        github.event.pull_request.base.repo.full_name == 'hpcaitech/ColossalAI'
    name: Check the examples user want
    runs-on: [self-hosted, ubuntu-latest]
    outputs:
      matrix: ${{ steps.set-matrix.outputs.matrix }}
    steps:
    - name: 📚 Checkout
      uses: actions/checkout@v3
    - name: Set up matrix
      id: set-matrix
      env:
        check_dir: ${{ inputs.example_directory }}
      run: |
        res=`python .github/workflows/scripts/example_checks/check_dispatch_inputs.py --fileNameList $check_dir`
        if [ res == "failure" ];then
          exit -1
        fi
        dirs="[${check_dir}]"
        echo "Testing examples in $dirs"
        echo "matrix={\"directory\":$(echo "$dirs")}" >> $GITHUB_OUTPUT

  test_example:
    if: |
        github.event.pull_request.draft == false &&
        github.base_ref == 'main' &&
        github.event.pull_request.base.repo.full_name == 'hpcaitech/ColossalAI'
    name: Manually check example files
    needs: manual_check_matrix_preparation
    runs-on: [self-hosted, ubuntu-latest]
    strategy:
      fail-fast: false
      matrix: ${{fromJson(needs.manual_check_matrix_preparation.outputs.matrix)}}
    container:
      image: image-cloud.luchentech.com/hpcaitech/pytorch-cuda:2.2.2-12.1.0
      options: --gpus all --rm -v /data/scratch/examples-data:/data/ -v /dev/shm
    timeout-minutes: 15
    steps:
      - name: 📚 Checkout
        uses: actions/checkout@v3
      - name: Install Colossal-AI
        run: |
          BUILD_EXT=1 pip install -v -e .
      - name: Test the example
        run: |
          dir=${{ matrix.directory }}
          echo "Testing ${dir} now"
          cd "${PWD}/examples/${dir}"
          bash test_ci.sh


================================================
FILE: .github/workflows/example_check_on_pr.yml
================================================
name: Test Example on PR
on:
  pull_request:
    branches:
      - "main"
      - "develop"
      - "feature/**"
    # any change in the examples folder will trigger check for the corresponding example.
    paths:
      - "examples/**"
      - "!examples/**.md"
      - ".github/workflows/example_check_on_pr.yml"

jobs:
  # This is for changed example files detect and output a matrix containing all the corresponding directory name.
  detect-changed-example:
    if: |
      github.event.pull_request.draft == false &&
      github.event.pull_request.base.repo.full_name == 'hpcaitech/ColossalAI' && github.event_name == 'pull_request'
    runs-on: [self-hosted, ubuntu-latest]
    outputs:
      matrix: ${{ steps.setup-matrix.outputs.matrix }}
      anyChanged: ${{ steps.setup-matrix.outputs.anyChanged }}
      anyExtensionFileChanged: ${{ steps.find-extension-change.outputs.any_changed }}
    name: Detect changed example files
    concurrency:
      group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-detect-change
      cancel-in-progress: true
    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0
          ref: ${{ github.event.pull_request.head.sha }}

      - name: Locate base commit
        id: locate-base-sha
        run: |
          curBranch=$(git rev-parse --abbrev-ref HEAD)
          commonCommit=$(git merge-base origin/main $curBranch)
          echo $commonCommit
          echo "baseSHA=$commonCommit" >> $GITHUB_OUTPUT

      - name: Find the changed extension-related files
        id: find-extension-change
        uses: tj-actions/changed-files@v35
        with:
          base_sha: ${{ steps.locate-base-sha.outputs.baseSHA }}
          files: |
            op_builder/**
            colossalai/kernel/**
            setup.py

      - name: Get all changed example files
        id: changed-files
        uses: tj-actions/changed-files@v35
        with:
          base_sha: ${{ steps.locate-base-sha.outputs.baseSHA }}

      - name: setup matrix
        id: setup-matrix
        run: |
          changedFileName=""
          for file in ${{ steps.changed-files.outputs.all_changed_files  }}; do
            changedFileName="${file}:${changedFileName}"
          done
          echo "$changedFileName was changed"
          res=`python3 .github/workflows/scripts/example_checks/detect_changed_example.py --fileNameList $changedFileName`
          echo "All changed examples are $res"

          if [ "$res" == "[]" ]; then
            echo "anyChanged=false" >> $GITHUB_OUTPUT
            echo "matrix=null" >> $GITHUB_OUTPUT
          else
            dirs=$( IFS=',' ; echo "${res[*]}" )
            echo "anyChanged=true" >> $GITHUB_OUTPUT
            echo "matrix={\"directory\":$(echo "$dirs")}" >> $GITHUB_OUTPUT
          fi

  # If no file is changed, it will prompt an error and shows the matrix do not have value.
  check-changed-example:
    # Add this condition to avoid executing this job if the trigger event is workflow_dispatch.
    if: |
      github.event.pull_request.draft == false &&
      github.event.pull_request.base.repo.full_name == 'hpcaitech/ColossalAI' && github.event_name == 'pull_request' &&
      needs.detect-changed-example.outputs.anyChanged == 'true'
    name: Test the changed example
    needs: detect-changed-example
    runs-on: [self-hosted, ubuntu-latest]
    strategy:
      fail-fast: false
      matrix: ${{fromJson(needs.detect-changed-example.outputs.matrix)}}
    container:
      image: image-cloud.luchentech.com/hpcaitech/pytorch-cuda:2.2.2-12.1.0
      options: --gpus all --rm -v /data/scratch/examples-data:/data/ -v /dev/shm
    timeout-minutes: 30
    concurrency:
      group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-run-example-${{ matrix.directory }}
      cancel-in-progress: true
    steps:
      - uses: actions/checkout@v3

      - name: Restore Colossal-AI Cache
        if: needs.detect.outputs.anyExtensionFileChanged != 'true'
        run: |
          if [ -d /github/home/cuda_ext_cache ] && [ ! -z "$(ls -A /github/home/cuda_ext_cache/)" ]; then
            cp -p -r /github/home/cuda_ext_cache/* /__w/ColossalAI/ColossalAI/
          fi

      - name: Install Colossal-AI
        run: |
          BUILD_EXT=1 pip install -v -e .

      - name: Store Colossal-AI Cache
        run: |
          cp -p -r /__w/ColossalAI/ColossalAI/build /github/home/cuda_ext_cache/

      - name: Test the example
        run: |
          example_dir=${{ matrix.directory }}
          cd "${PWD}/examples/${example_dir}"
          bash test_ci.sh


================================================
FILE: .github/workflows/example_check_on_schedule.yml
================================================
name: Test Example on Schedule
on:
  # run at 00:00 of every Sunday(singapore time) so here is UTC time Saturday 16:00
  schedule:
    - cron:  '0 16 * * 6'
  workflow_dispatch:

jobs:
  # This is for all files' weekly check. Specifically, this job is to find all the directories.
  matrix_preparation:
    if: github.repository == 'hpcaitech/ColossalAI'
    name: Prepare matrix for weekly check
    runs-on: [self-hosted, ubuntu-latest]
    outputs:
      matrix: ${{ steps.setup-matrix.outputs.matrix }}
    steps:
    - name: 📚 Checkout
      uses: actions/checkout@v3

    - name: setup matrix
      id: setup-matrix
      run: |
        res=`python .github/workflows/scripts/example_checks/check_example_weekly.py`
        all_loc=$( IFS=',' ; echo "${res[*]}" )
        echo "Found the examples: $all_loc"
        echo "matrix={\"directory\":$(echo "$all_loc")}" >> $GITHUB_OUTPUT

  weekly_check:
    if: github.repository == 'hpcaitech/ColossalAI'
    name: Weekly check all examples
    needs: matrix_preparation
    runs-on: [self-hosted, ubuntu-latest]
    strategy:
      fail-fast: false
      matrix: ${{fromJson(needs.matrix_preparation.outputs.matrix)}}
    container:
      image: image-cloud.luchentech.com/hpcaitech/pytorch-cuda:2.2.2-12.1.0
      options: --gpus all --rm -v /data/scratch/examples-data:/data/ -v /dev/shm
    timeout-minutes: 30
    steps:
      - name: 📚 Checkout
        uses: actions/checkout@v3

      - name: Install Colossal-AI
        run: |
          BUILD_EXT=1 pip install -v -e .

      - name: Traverse all files
        run: |
          example_dir=${{ matrix.directory }}
          echo "Testing ${example_dir} now"
          cd "${PWD}/examples/${example_dir}"
          bash test_ci.sh

      - name: Notify Lark
        id: message-preparation
        if: ${{ failure() }}
        run: |
          url=$SERVER_URL/$REPO/actions/runs/$RUN_ID
          msg="Example tests failed for $EXAMPLE_DIR, please visit $url for details"
          echo $msg
          python .github/workflows/scripts/send_message_to_lark.py -m "$msg" -u $WEBHOOK_URL
        env:
          SERVER_URL: ${{github.server_url }}
          REPO: ${{ github.repository }}
          RUN_ID: ${{ github.run_id }}
          WEBHOOK_URL: ${{ secrets.LARK_NOTIFICATION_WEBHOOK_URL }}
          EXAMPLE_DIR: ${{ matrix.directory }}


================================================
FILE: .github/workflows/release_docker_after_publish.yml
================================================
name: Publish Docker Image to DockerHub after Publish

on:
  workflow_dispatch:
  release:
    types: [published]

jobs:
  release:
    name: Publish Docker Image to DockerHub
    if: github.repository == 'hpcaitech/ColossalAI'
    runs-on: [self-hosted, ubuntu-latest]
    container:
      image: "hpcaitech/docker-in-docker:latest"
      options: --gpus all --rm -v /var/run/docker.sock:/var/run/docker.sock
    steps:
      - uses: actions/checkout@v2
        with:
          fetch-depth: 0

      - name: Build Docker
        id: build
        run: |
          version=$(cat version.txt)
          tag=hpcaitech/colossalai:$version
          latest=hpcaitech/colossalai:latest
          docker build --build-arg VERSION=v${version} -t $tag ./docker
          docker tag $tag $latest
          echo "tag=${tag}" >> $GITHUB_OUTPUT
          echo "latest=${latest}" >> $GITHUB_OUTPUT
        env:
          DOCKER_BUILDKIT: 0

      - name: Log in to Docker Hub
        uses: docker/login-action@f054a8b539a109f9f41c372932f1ae047eff08c9
        with:
          username: ${{ secrets.DOCKER_USERNAME }}
          password: ${{ secrets.DOCKER_PASSWORD }}

      - name: Push Docker image
        id: docker-push
        run: |
          docker push ${{ steps.build.outputs.tag }}
          docker push ${{ steps.build.outputs.latest }}

  notify:
    name: Notify Lark via webhook
    needs: release
    runs-on: [self-hosted, ubuntu-latest]
    if: ${{ always() }}
    steps:
      - uses: actions/checkout@v2

      - uses: actions/setup-python@v2
        with:
          python-version: "3.9"

      - name: Install requests
        run: pip install requests

      - name: Notify Lark
        id: message-preparation
        run: |
          url=$SERVER_URL/$REPO/actions/runs/$RUN_ID
          if [ "$STATUS" == 'success' ]
          then
            msg="The Docker image for the latest release has been successfully built and pushed to DockerHub."
          else
            msg="Failed to build and push the Docker image for the latest release, please visit $url for details."
          fi
          echo $msg
          python .github/workflows/scripts/send_message_to_lark.py -m "$msg" -u $WEBHOOK_URL
        env:
          SERVER_URL: ${{github.server_url }}
          REPO: ${{ github.repository }}
          RUN_ID: ${{ github.run_id }}
          WEBHOOK_URL: ${{ secrets.LARK_NOTIFICATION_WEBHOOK_URL }}
          STATUS: ${{ needs.release.result }}


================================================
FILE: .github/workflows/release_nightly_on_schedule.yml
================================================
name: Publish Nightly Version to PyPI

on:
  workflow_dispatch:
  schedule:
    - cron:  '0 0 * * 6' # release on every Sunday 00:00 UTC time

jobs:
  publish:
    if: github.repository == 'hpcaitech/ColossalAI'
    name: Build and publish Python 🐍 distributions 📦 to PyPI
    runs-on: [self-hosted, ubuntu-latest]
    timeout-minutes: 20
    outputs:
      status: ${{ steps.publish.outcome }}
    steps:
    - uses: actions/checkout@v2

    - uses: actions/setup-python@v2
      with:
        python-version: '3.9'

    - run: |
        python .github/workflows/scripts/update_setup_for_nightly.py
        python setup.py sdist build

    # publish to PyPI if executed on the main branch
    - name: Publish package to PyPI
      uses: pypa/gh-action-pypi-publish@release/v1
      id: publish
      with:
        user: __token__
        password: ${{ secrets.PYPI_API_TOKEN }}
        verbose: true

  notify:
    name: Notify Lark via webhook
    needs: publish
    runs-on: [self-hosted, ubuntu-latest]
    if: ${{ always() }} && github.repository == 'hpcaitech/ColossalAI'
    steps:
      - uses: actions/checkout@v2

      - uses: actions/setup-python@v2
        with:
          python-version: '3.9'

      - name: Install requests
        run: pip install requests

      - name: Notify Lark
        id: message-preparation
        run: |
          url=$SERVER_URL/$REPO/actions/runs/$RUN_ID

          if [ $STATUS == 'success' ]
          then
            msg="The Colossal-AI nightly version has been successfully released to PyPI."
          else
            msg="Failed to release Colossal-AI nightly version to PyPI, please visit $url for details."
          fi
          echo $msg
          python .github/workflows/scripts/send_message_to_lark.py -m "$msg" -u $WEBHOOK_URL
        env:
          SERVER_URL: ${{github.server_url }}
          REPO: ${{ github.repository }}
          RUN_ID: ${{ github.run_id }}
          WEBHOOK_URL: ${{ secrets.LARK_NOTIFICATION_WEBHOOK_URL }}
          STATUS: ${{ needs.publish.outputs.status }}


================================================
FILE: .github/workflows/release_pypi_after_merge.yml
================================================
name: Publish to PyPI

on:
  workflow_dispatch:
  pull_request:
    paths:
      - 'version.txt'
    types:
      - closed
jobs:
  build-n-publish:
    if: github.event_name == 'workflow_dispatch' || github.repository == 'hpcaitech/ColossalAI' && github.event.pull_request.merged == true && github.base_ref == 'main'
    name: Build and publish Python 🐍 distributions 📦 to PyPI
    runs-on: ubuntu-latest
    timeout-minutes: 20
    steps:
    - uses: actions/checkout@v2

    - uses: actions/setup-python@v2
      with:
        python-version: '3.9'

    - run: python setup.py sdist build

    # publish to PyPI if executed on the main branch
    - name: Publish package to PyPI
      id: publish
      uses: pypa/gh-action-pypi-publish@release/v1
      with:
        user: __token__
        password: ${{ secrets.PYPI_API_TOKEN }}
        verbose: true

  notify:
    name: Notify Lark via webhook
    needs: build-n-publish
    runs-on: ubuntu-latest
    if: ${{ always() }}
    steps:
      - uses: actions/checkout@v2

      - uses: actions/setup-python@v2
        with:
          python-version: '3.9'

      - name: Install requests
        run: pip install requests

      - name: Notify Lark
        id: message-preparation
        run: |
          url=$SERVER_URL/$REPO/actions/runs/$RUN_ID

          if [ "$STATUS" == 'success' ]
          then
            msg="The Colossal-AI latest version has been successfully released to PyPI."
          else
            msg="Failed to release Colossal-AI to PyPI, please visit $url for details."
          fi
          echo $msg
          python .github/workflows/scripts/send_message_to_lark.py -m "$msg" -u $WEBHOOK_URL
        env:
          SERVER_URL: ${{github.server_url }}
          REPO: ${{ github.repository }}
          RUN_ID: ${{ github.run_id }}
          WEBHOOK_URL: ${{ secrets.LARK_NOTIFICATION_WEBHOOK_URL }}
          STATUS: ${{ needs.build-n-publish.result }}


================================================
FILE: .github/workflows/release_test_pypi_before_merge.yml
================================================
name: Publish to Test-PyPI Before Merge

on:
  pull_request:
    paths:
      - 'version.txt'

jobs:
  build-n-publish:
    if: github.event_name == 'workflow_dispatch' || github.repository == 'hpcaitech/ColossalAI'
    name: Build and publish Python 🐍 distributions 📦 to Test PyPI
    runs-on: ubuntu-latest
    timeout-minutes: 20
    steps:
    - uses: actions/checkout@v2

    - uses: actions/setup-python@v2
      with:
        python-version: '3.9'

    - name: add timestamp to the version
      id: prep-version
      run: |
        version=$(cat version.txt)
        timestamp=$(date +%s)
        new_version="${version}.post${timestamp}"
        echo $new_version > ./version.txt
        echo "version=$new_version" >> $GITHUB_OUTPUT

    - run: |
        pip install --upgrade pip
        python setup.py sdist build

    # publish to PyPI if executed on the main branch
    - name: Publish package to PyPI
      uses: pypa/gh-action-pypi-publish@release/v1
      with:
        user: __token__
        password: ${{ secrets.TEST_PYPI_API_TOKEN }}
        repository_url: https://test.pypi.org/legacy/
        verbose: true

    - name: Wait for Test-PyPI refresh
      run: sleep 300s
      shell: bash

    - name: Try installation
      run: |
        # we need to install the requirements.txt first
        # as test-pypi may not contain the distributions for libs listed in the txt file
        pip install -r requirements/requirements.txt
        pip install -U setuptools==68.2.2 wheel
        pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.python.org/pypi colossalai==$VERSION
      env:
        VERSION: ${{ steps.prep-version.outputs.version }}


================================================
FILE: .github/workflows/report_leaderboard_to_lark.yml
================================================
name: Generate Community Report and Send to Lark

on:
  workflow_dispatch:
  schedule:
    # release on every Friday 09:00 UTC time, 17:00 Beijing/Singapore time
    - cron:  '0 9 * * 5'

jobs:
  generate-and-publish:
    if: github.repository == 'hpcaitech/ColossalAI'
    name: Generate leaderboard report and publish to Lark
    runs-on: [self-hosted, ubuntu-latest]
    timeout-minutes: 20
    steps:
    - uses: actions/checkout@v2

    - uses: actions/setup-python@v2
      with:
        python-version: '3.9'

    - run: pip install requests matplotlib seaborn requests_toolbelt pytz

    - run: python .github/workflows/scripts/generate_leaderboard_and_send_to_lark.py
      env:
        LARK_APP_ID: ${{ secrets.LARK_LEADERBOARD_APP_ID }}
        LARK_APP_SECRET: ${{ secrets.LARK_LEADERBOARD_APP_SECRET }}
        LARK_WEBHOOK_URL: ${{ secrets.LARK_NOTIFICATION_WEBHOOK_URL }}
        GITHUB_TOKEN: ${{ github.token }}


================================================
FILE: .github/workflows/report_test_coverage.yml
================================================
name: Report Test Coverage

on:
  workflow_run:
    workflows: [Build on PR]
    types:
      - completed

jobs:
  report-test-coverage:
    runs-on: [self-hosted, ubuntu-latest]
    if: ${{ github.event.workflow_run.conclusion == 'success' }}
    steps:
      - name: "Download artifact"
        uses: actions/github-script@v6
        with:
          script: |
            let allArtifacts = await github.rest.actions.listWorkflowRunArtifacts({
               owner: context.repo.owner,
               repo: context.repo.repo,
               run_id: context.payload.workflow_run.id,
            });
            let matchArtifact = allArtifacts.data.artifacts.filter((artifact) => {
              return artifact.name == "report"
            })[0];
            let download = await github.rest.actions.downloadArtifact({
               owner: context.repo.owner,
               repo: context.repo.repo,
               artifact_id: matchArtifact.id,
               archive_format: 'zip',
            });
            let fs = require('fs');
            fs.writeFileSync(`${process.env.GITHUB_WORKSPACE}/report.zip`, Buffer.from(download.data));

      - name: "Unzip artifact"
        id: unzip
        run: |
          unzip report.zip
          if [ -f "coverage.txt" ]; then
            echo "hasReport=true" >> $GITHUB_OUTPUT
          else
            echo "hasReport=false" >> $GITHUB_OUTPUT
          fi

      - name: Make Coverage Report Collapsable
        if: steps.unzip.outputs.hasReport == 'true'
        run: |
          covNum=$(cat cov_number)
          title="The code coverage for the changed files is ${covNum}%."
          touch coverage_report.txt
          echo $title >> coverage_report.txt
          echo " " >> coverage_report.txt
          echo "<details>" >> coverage_report.txt
          echo "<summary>Click me to view the complete report</summary>" >> coverage_report.txt
          echo " " >> coverage_report.txt
          echo "\`\`\`" >> coverage_report.txt
          cat coverage.txt >> coverage_report.txt
          echo "\`\`\`" >> coverage_report.txt
          echo "</details>" >> coverage_report.txt
          mv coverage_report.txt coverage.txt

      - name: "Comment on PR"
        if: steps.unzip.outputs.hasReport == 'true'
        uses: actions/github-script@v6
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          script: |
            let fs = require('fs');
            let issue_number = Number(fs.readFileSync('./pr_number'));
            let owner = context.repo.owner;
            let repo = context.repo.repo;
            let run_id = context.payload.workflow_run.id;
            let run_url = `https://github.com/${owner}/${repo}/actions/runs/${run_id}`
            let body = fs.readFileSync('./coverage.txt', {encoding:'utf8', flag:'r'})

            await github.rest.issues.createComment({
              owner: owner,
              repo: repo,
              issue_number: issue_number,
              body: body
            });


================================================
FILE: .github/workflows/run_chatgpt_examples.yml
================================================
name: Run ChatGPT examples

on:
  pull_request:
    types: [synchronize, opened, reopened]
    paths:
      - "applications/ColossalChat/coati/**"
      - "applications/ColossalChat/requirements.txt"
      - "applications/ColossalChat/setup.py"
      - "applications/ColossalChat/examples/**"
      - "applications/ColossalChat/tests/**"

jobs:
  tests:
    name: Run ChatGPT examples
    if: |
      github.event.pull_request.draft == false &&
      github.base_ref == 'main' &&
      github.event.pull_request.base.repo.full_name == 'hpcaitech/ColossalAI'
    runs-on: [self-hosted, ubuntu-latest]
    container:
      image: image-cloud.luchentech.com/hpcaitech/pytorch-cuda:2.5.1-12.4.1
      options: --gpus all --rm -v /data/scratch/examples-data:/data/scratch/examples-data --shm-size=10.24gb
    timeout-minutes: 180
    defaults:
      run:
        shell: bash
    steps:
      - name: Checkout ColossalAI
        uses: actions/checkout@v2

      - name: Install torch
        run: |
          pip uninstall flash-attn
          pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124

      - name: Install flash-attn
        run: |
          pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

      - name: Install Colossal-AI
        run: |
          BUILD_EXT=1 pip install --no-cache-dir -v -e .

      - name: Install ChatGPT
        env:
          CFLAGS: "-O1"
          CXXFLAGS: "-O1"
          MAX_JOBS: 4
        run: |
          cd applications/ColossalChat
          pip install --no-cache-dir -v -e .
          pip install --no-cache-dir -r examples/requirements.txt

      # - name: Install Transformers
      #   run: |
      #     pip install --no-cache-dir transformers==4.36.2

      - name: Execute Examples
        run: |
          cd applications/ColossalChat
          rm -rf ~/.cache/colossalai
          mkdir models
          mkdir sft_data
          mkdir prompt_data
          mkdir preference_data
          mkdir kto_data
          ./tests/test_data_preparation.sh
          ./tests/test_train.sh
        env:
          NCCL_SHM_DISABLE: 1
          MAX_JOBS: 8
          PRETRAINED_MODEL_PATH: ./models
          SFT_DATASET: ./sft_data
          PROMPT_DATASET: ./prompt_data
          PROMPT_RLVR_DATASET: ./prompt_data
          PREFERENCE_DATASET: ./preference_data
          KTO_DATASET: ./kto_data


================================================
FILE: .github/workflows/run_chatgpt_unit_tests.yml
================================================
name: Run ChatGPT unit tests

on:
  pull_request:
    types: [synchronize, opened, reopened]
    paths:
      - 'applications/ColossalChat/coati/**'
      - 'applications/ColossalChat/requirements.txt'
      - 'applications/ColossalChat/setup.py'
      - 'applications/ColossalChat/tests/**'
      - 'applications/ColossalChat/pytest.ini'

jobs:
  tests:
    name: Run ChatGPT unit tests
    if: |
      github.event.pull_request.draft == false &&
      github.base_ref == 'main' &&
      github.event.pull_request.base.repo.full_name == 'hpcaitech/ColossalAI'
    runs-on: [self-hosted, ubuntu-latest]
    container:
      image: image-cloud.luchentech.com/hpcaitech/pytorch-cuda:2.2.2-12.1.0
      options: --gpus all --rm -v /data/scratch/examples-data:/data/scratch/examples-data
    timeout-minutes: 180
    defaults:
      run:
        shell: bash
    steps:
      - name: Checkout ColossalAI
        uses: actions/checkout@v2

      - name: Install ChatGPT
        env:
          CFLAGS: "-O1"
          CXXFLAGS: "-O1"
          MAX_JOBS: 4
        run: |
          pip install flash-attn --no-build-isolation
          cd applications/ColossalChat
          pip install -v .
          pip install pytest

      - name: Execute Unit Testing
        run: |
          cd applications/ColossalChat
          rm -rf ~/.cache/colossalai
          pytest tests/
          cd ./tests
          ./test_templating.sh
        env:
          NCCL_SHM_DISABLE: 1
          MAX_JOBS: 8


================================================
FILE: .github/workflows/run_colossalqa_unit_tests.yml
================================================
name: Run colossalqa unit tests

on:
  pull_request:
    types: [synchronize, opened, reopened]
    paths:
      - 'applications/ColossalQA/colossalqa/**'
      - 'applications/ColossalQA/requirements.txt'
      - 'applications/ColossalQA/setup.py'
      - 'applications/ColossalQA/tests/**'
      - 'applications/ColossalQA/pytest.ini'

jobs:
  tests:
    name: Run colossalqa unit tests
    if: |
      github.event.pull_request.draft == false &&
      github.base_ref == 'main' &&
      github.event.pull_request.base.repo.full_name == 'hpcaitech/ColossalAI'
    runs-on: [self-hosted, ubuntu-latest]
    container:
      image: image-cloud.luchentech.com/hpcaitech/pytorch-cuda:2.2.2-12.1.0
      volumes:
        - /data/scratch/test_data_colossalqa:/data/scratch/test_data_colossalqa
        - /data/scratch/llama-tiny:/data/scratch/llama-tiny
      options: --gpus all --rm
    timeout-minutes: 30
    defaults:
      run:
        shell: bash
    steps:
      - name: Checkout ColossalAI
        uses: actions/checkout@v2

      - name: Install colossalqa
        run: |
          cd applications/ColossalQA
          pip install -e .

      - name: Execute Unit Testing
        run: |
          cd applications/ColossalQA
          pytest tests/
        env:
          NCCL_SHM_DISABLE: 1
          MAX_JOBS: 8
          ZH_MODEL_PATH: bigscience/bloom-560m
          ZH_MODEL_NAME: bloom
          EN_MODEL_PATH: bigscience/bloom-560m
          EN_MODEL_NAME: bloom
          TEST_DATA_PATH_EN: /data/scratch/test_data_colossalqa/companies.txt
          TEST_DATA_PATH_ZH: /data/scratch/test_data_colossalqa/companies_zh.txt
          TEST_DOCUMENT_LOADER_DATA_PATH: /data/scratch/test_data_colossalqa/tests/*
          SQL_FILE_PATH: /data/scratch/test_data_colossalqa/sql_file_path


================================================
FILE: .github/workflows/scripts/check_doc_i18n.py
================================================
import argparse
import os


def compare_dirs(dir1, dir2):
    # First, we need to check if the two directories exist
    if not os.path.exists(dir1) or not os.path.exists(dir2):
        return False

    # Now, we compare the list of items in each directory
    items1 = os.listdir(dir1)
    items2 = os.listdir(dir2)

    # If the number of items in each directory is different, the directories are different
    if len(items1) != len(items2):
        return False

    # For each item in the first directory, we check if there is a corresponding item in the second directory
    for item in items1:
        item_path1 = os.path.join(dir1, item)
        item_path2 = os.path.join(dir2, item)

        # If the corresponding item doesn't exist in the second directory, the directories are different
        if not os.path.exists(item_path2):
            print(f"Found mismatch: {item_path1}, {item_path2}")
            return False

        # If the corresponding item is a directory, we compare the two directories recursively
        if os.path.isdir(item_path1) and os.path.isdir(item_path2):
            if not compare_dirs(item_path1, item_path2):
                print(f"Found mismatch: {item_path1}, {item_path2}")
                return False

        # both are files
        elif os.path.isfile(item_path1) and os.path.isfile(item_path2):
            continue

        # If the corresponding item is not a file or a directory, the directories are different
        else:
            print(f"Found mismatch: {item_path1}, {item_path2}")
            return False

    # If all items are the same, the directories are the same
    return True


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("-d", "--directory", help="The directory where the multi-language source files are kept.")
    args = parser.parse_args()

    i18n_folders = os.listdir(args.directory)
    i18n_folders = [os.path.join(args.directory, val) for val in i18n_folders]

    if len(i18n_folders) > 1:
        for i in range(1, len(i18n_folders)):
            dir1 = i18n_folders[0]
            dir2 = i18n_folders[i]
            print(f"comparing {dir1} vs {dir2}")
            match = compare_dirs(i18n_folders[0], i18n_folders[i])

            if not match:
                print(
                    f"{dir1} and {dir2} don't match, please ensure that your documentation is available in different languages"
                )
            else:
                print(f"{dir1} and {dir2} match")


================================================
FILE: .github/workflows/scripts/example_checks/check_dispatch_inputs.py
================================================
import argparse
import os


def check_inputs(input_list):
    for path in input_list:
        real_path = os.path.join("examples", path)
        if not os.path.exists(real_path):
            return False
    return True


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("-f", "--fileNameList", type=str, help="List of file names")
    args = parser.parse_args()
    name_list = args.fileNameList.split(",")
    is_correct = check_inputs(name_list)

    if is_correct:
        print("success")
    else:
        print("failure")


if __name__ == "__main__":
    main()


================================================
FILE: .github/workflows/scripts/example_checks/check_example_weekly.py
================================================
import os


def show_files(path, all_files):
    # Traverse all the folder/file in current directory
    file_list = os.listdir(path)
    # Determine the element is folder or file. If file, pass it into list, if folder, recurse.
    for file_name in file_list:
        # Get the abs directory using os.path.join() and store into cur_path.
        cur_path = os.path.join(path, file_name)
        # Determine whether folder
        if os.path.isdir(cur_path):
            show_files(cur_path, all_files)
        else:
            all_files.append(cur_path)
    return all_files


def join(input_list, sep=None):
    return (sep or " ").join(input_list)


def main():
    contents = show_files("examples/", [])
    all_loc = []
    for file_loc in contents:
        split_loc = file_loc.split("/")
        # must have two sub-folder levels after examples folder, such as examples/images/vit is acceptable, examples/images/README.md is not, examples/requirements.txt is not.
        if len(split_loc) >= 4:
            re_loc = "/".join(split_loc[1:3])
            if re_loc not in all_loc:
                all_loc.append(re_loc)
    print(all_loc)


if __name__ == "__main__":
    main()


================================================
FILE: .github/workflows/scripts/example_checks/detect_changed_example.py
================================================
import argparse


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("-f", "--fileNameList", type=str, help="The list of changed files")
    args = parser.parse_args()
    name_list = args.fileNameList.split(":")
    folder_need_check = set()
    for loc in name_list:
        # Find only the sub-sub-folder of 'example' folder
        # the examples folder structure is like
        # - examples
        #   - area
        #     - application
        #       - file
        if loc.split("/")[0] == "examples" and len(loc.split("/")) >= 4:
            folder_need_check.add("/".join(loc.split("/")[1:3]))
    # Output the result using print. Then the shell can get the values.
    print(list(folder_need_check))


if __name__ == "__main__":
    main()


================================================
FILE: .github/workflows/scripts/generate_leaderboard_and_send_to_lark.py
================================================
import os
from datetime import datetime, timedelta
from typing import Any, Dict, List

import matplotlib.pyplot as plt
import pytz
import requests
import seaborn
from requests_toolbelt import MultipartEncoder


class Counter(dict):
    """
    Dataclass for a github contributor.

    Args:
        name (str): name of the contributor
        num_commits_this_week (int): number of commits made within one week
    """

    def record(self, item: str):
        if item in self:
            self[item] += 1
        else:
            self[item] = 1

    def to_sorted_list(self):
        data = [(key, value) for key, value in self.items()]
        data.sort(key=lambda x: x[1], reverse=True)
        return data


def get_utc_time_one_week_ago():
    """
    Get the UTC time one week ago.
    """
    now = datetime.utcnow()
    start_datetime = now - timedelta(days=7)
    return start_datetime


def datetime2str(dt):
    """
    Convert datetime to string in the format of YYYY-MM-DDTHH:MM:SSZ
    """
    return dt.strftime("%Y-%m-%dT%H:%M:%SZ")


def str2datetime(string):
    """
    Convert string in the format of YYYY-MM-DDTHH:MM:SSZ to datetime
    """
    return datetime.strptime(string, "%Y-%m-%dT%H:%M:%SZ")


def plot_bar_chart(x: List[Any], y: List[Any], xlabel: str, ylabel: str, title: str, output_path: str) -> None:
    """
    This function is a utility to plot the bar charts.
    """
    plt.clf()
    seaborn.color_palette()
    fig = seaborn.barplot(x=x, y=y)
    fig.set(xlabel=xlabel, ylabel=ylabel, title=title)
    seaborn.despine()
    plt.tight_layout()
    plt.savefig(output_path, dpi=1200)


def get_organization_repositories(github_token, organization_name) -> List[str]:
    """
    Retrieve the public repositories under the organization.
    """
    url = f"https://api.github.com/orgs/{organization_name}/repos?type=public"

    # prepare header
    headers = {
        "Authorization": f"Bearer {github_token}",
        "Accept": "application/vnd.github+json",
        "X-GitHub-Api-Version": "2022-11-28",
    }

    res = requests.get(url, headers=headers).json()
    repo_list = []

    for item in res:
        repo_list.append(item["name"])
    return repo_list


def get_issue_pull_request_comments(github_token: str, org_name: str, repo_name: str, since: str) -> Dict[str, int]:
    """
    Retrieve the issue/PR comments made by our members in the last 7 days.

    Args:
        github_token (str): GitHub access token for API calls
        since (str): the path parameter required by GitHub Restful APIs, in the format of YYYY-MM-DDTHH:MM:SSZ
    """
    # prepare header
    headers = {
        "Authorization": f"Bearer {github_token}",
        "Accept": "application/vnd.github+json",
        "X-GitHub-Api-Version": "2022-11-28",
    }

    user_engagement_count = {}

    # do pagination to the API
    page = 1
    while True:
        comment_api = f"https://api.github.com/repos/{org_name}/{repo_name}/issues/comments?since={since}&page={page}"
        comment_response = requests.get(comment_api, headers=headers).json()

        if len(comment_response) == 0:
            break
        else:
            for item in comment_response:
                comment_author_relationship = item["author_association"]
                if comment_author_relationship != "MEMBER":
                    # if the comment is not made by our member
                    # we don't count this comment towards user engagement
                    continue

                issue_id = item["issue_url"].split("/")[-1]
                issue_api = f"https://api.github.com/repos/{org_name}/{repo_name}/issues/{issue_id}"
                issue_response = requests.get(issue_api, headers=headers).json()
                issue_author_relationship = issue_response["author_association"]

                if issue_author_relationship != "MEMBER":
                    # this means that the issue/PR is not created by our own people
                    # any comments in this issue/PR by our member will be counted towards the leaderboard
                    member_name = item["user"]["login"]

                    if member_name in user_engagement_count:
                        user_engagement_count[member_name] += 1
                    else:
                        user_engagement_count[member_name] = 1
            page += 1
    return user_engagement_count


def get_discussion_comments(github_token: str, org_name: str, repo_name: str, since: str) -> Dict[str, int]:
    """
    Retrieve the discussion comments made by our members in the last 7 days.
    This is only available via the GitHub GraphQL API.

    Args:
        github_token (str): GitHub access token for API calls
        since (Datetime): the query parameter to determine whether the comment is made this week
    """

    # use graphql to get the discussions updated in the last 7 days
    def _generate_discussion_query(num, cursor: str = None):
        if cursor is None:
            offset_str = ""
        else:
            offset_str = f', after: "{cursor}"'
        query = f"""
        {{
            repository(owner: "{org_name}", name: "{repo_name}"){{
                discussions(first: {num} {offset_str}){{
                    edges {{
                        cursor
                        node{{
                            title
                            author{{
                                login
                            }}
                            number
                            authorAssociation
                            updatedAt
                        }}
                    }}
                }}
            }}
        }}
        """
        return query

    def _generate_comment_reply_count_for_discussion(discussion_number, num, cursor: str = None):
        # here we assume that each comment will not have more than 100 replies for simplicity
        # otherwise, we have to go through pagination for both comment and reply
        if cursor is None:
            offset_str = ""
        else:
            offset_str = f', before: "{cursor}"'
        query = f"""
        {{
            repository(owner: "{org_name}", name: "{repo_name}"){{
                discussion(number: {discussion_number}){{
                    title
                    comments(last: {num} {offset_str}){{
                        edges{{
                            cursor
                            node {{
                                author{{
                                    login
                                }}
                                updatedAt
                                authorAssociation
                                replies (last: 100) {{
                                edges {{
                                    node {{
                                        author {{
                                            login
                                        }}
                                        updatedAt
                                        authorAssociation
                                        }}
                                    }}
                                }}
                            }}
                        }}
                    }}
                }}
            }}
        }}
        """
        return query

    # a utility function to make call to Github GraphQL API
    def _call_graphql_api(query):
        headers = {"Authorization": f"Bearer {github_token}"}
        json_data = {"query": query}
        response = requests.post("https://api.github.com/graphql", json=json_data, headers=headers)
        data = response.json()
        return data

    # get the discussion numbers updated in the last 7 days
    discussion_numbers = []
    num_per_request = 10
    cursor = None
    while True:
        query = _generate_discussion_query(num_per_request, cursor)
        data = _call_graphql_api(query)
        found_discussion_out_of_time_range = False

        edges = data["data"]["repository"]["discussions"]["edges"]
        if len(edges) == 0:
            break
        else:
            # keep the discussion whose author is not a member
            for edge in edges:
                # print the discussion title
                discussion = edge["node"]
                discussion_updated_at = str2datetime(discussion["updatedAt"])

                # check if the updatedAt is within the last 7 days
                # if yes, add it to discussion_numbers
                if discussion_updated_at > since:
                    if discussion["authorAssociation"] != "MEMBER":
                        discussion_numbers.append(discussion["number"])
                else:
                    found_discussion_out_of_time_range = True

        if found_discussion_out_of_time_range:
            break
        else:
            # update cursor
            cursor = edges[-1]["cursor"]

    # get the discussion comments and replies made by our member
    user_engagement_count = {}
    for discussion_number in discussion_numbers:
        cursor = None
        num_per_request = 10

        while True:
            query = _generate_comment_reply_count_for_discussion(discussion_number, num_per_request, cursor)
            data = _call_graphql_api(query)

            # get the comments
            edges = data["data"]["repository"]["discussion"]["comments"]["edges"]

            # update the cursor
            if len(edges) == 0:
                break
            else:
                # update cursor for pagination
                cursor = edges[-1]["cursor"]

                for edge in edges:
                    comment = edge["node"]
                    if comment["authorAssociation"] == "MEMBER":
                        # check if the updatedAt is within the last 7 days
                        # if yes, add it to user_engagement_count
                        comment_updated_at = datetime.strptime(comment["updatedAt"], "%Y-%m-%dT%H:%M:%SZ")
                        if comment_updated_at > since:
                            member_name = comment["author"]["login"]
                            if member_name in user_engagement_count:
                                user_engagement_count[member_name] += 1
                            else:
                                user_engagement_count[member_name] = 1

                    # get the replies
                    reply_edges = comment["replies"]["edges"]
                    if len(reply_edges) == 0:
                        continue
                    else:
                        for reply_edge in reply_edges:
                            reply = reply_edge["node"]
                            if reply["authorAssociation"] == "MEMBER":
                                # check if the updatedAt is within the last 7 days
                                # if yes, add it to discussion_numbers

                                reply_updated_at = datetime.strptime(reply["updatedAt"], "%Y-%m-%dT%H:%M:%SZ")
                                if reply_updated_at > since:
                                    member_name = reply["author"]["login"]
                                    if member_name in user_engagement_count:
                                        user_engagement_count[member_name] += 1
                                    else:
                                        user_engagement_count[member_name] = 1
    return user_engagement_count


def generate_user_engagement_leaderboard_image(
    github_token: str, org_name: str, repo_list: List[str], output_path: str
) -> bool:
    """
    Generate the user engagement leaderboard image for stats within the last 7 days

    Args:
        github_token (str): GitHub access token for API calls
        output_path (str): the path to save the image
    """

    # request to the Github API to get the users who have replied the most in the last 7 days
    start_datetime = get_utc_time_one_week_ago()
    start_datetime_str = datetime2str(start_datetime)

    # get the issue/PR comments and discussion comment count
    total_engagement_count = {}

    def _update_count(counter):
        for name, count in counter.items():
            if name in total_engagement_count:
                total_engagement_count[name] += count
            else:
                total_engagement_count[name] = count

    for repo_name in repo_list:
        print(f"Fetching user engagement count for {repo_name}/{repo_name}")
        issue_pr_engagement_count = get_issue_pull_request_comments(
            github_token=github_token, org_name=org_name, repo_name=repo_name, since=start_datetime_str
        )
        discussion_engagement_count = get_discussion_comments(
            github_token=github_token, org_name=org_name, repo_name=repo_name, since=start_datetime
        )

        # update the total engagement count
        _update_count(issue_pr_engagement_count)
        _update_count(discussion_engagement_count)

    # prepare the data for plotting
    x = []
    y = []

    if len(total_engagement_count) > 0:
        ranking = []
        for name, count in total_engagement_count.items():
            ranking.append((name, count))

        ranking.sort(key=lambda x: x[1], reverse=True)

        for name, count in ranking:
            x.append(count)
            y.append(name)

        # plot the leaderboard
        xlabel = f"Number of Comments made (since {start_datetime_str})"
        ylabel = "Member"
        title = "Active User Engagement Leaderboard"
        plot_bar_chart(x, y, xlabel=xlabel, ylabel=ylabel, title=title, output_path=output_path)
        return True
    else:
        return False


def generate_contributor_leaderboard_image(github_token, org_name, repo_list, output_path) -> bool:
    """
    Generate the contributor leaderboard image for stats within the last 7 days

    Args:
        github_token (str): GitHub access token for API calls
        output_path (str): the path to save the image
    """
    # request to the Github API to get the users who have contributed in the last 7 days
    headers = {
        "Authorization": f"Bearer {github_token}",
        "Accept": "application/vnd.github+json",
        "X-GitHub-Api-Version": "2022-11-28",
    }

    counter = Counter()
    start_datetime = get_utc_time_one_week_ago()

    def _get_url(org_name, repo_name, page):
        return f"https://api.github.com/repos/{org_name}/{repo_name}/pulls?per_page=50&page={page}&state=closed"

    def _iterate_by_page(org_name, repo_name):
        page = 1
        stop = False

        while not stop:
            print(f"Fetching pull request data for {org_name}/{repo_name} - page{page}")
            url = _get_url(org_name, repo_name, page)

            while True:
                response = requests.get(url, headers=headers).json()

                if isinstance(response, list):
                    # sometimes the Github API returns nothing
                    # request again if the response is not a list
                    break
                print("Empty response, request again...")

            if len(response) == 0:
                # if the response is empty, stop
                stop = True
                break

            # count the pull request and author from response
            for pr_data in response:
                merged_at = pr_data["merged_at"]
                author = pr_data["user"]["login"]

                if merged_at is None:
                    continue

                merge_datetime = str2datetime(merged_at)

                if merge_datetime < start_datetime:
                    # if we found a pull request that is merged before the start_datetime
                    # we stop
                    stop = True
                    break
                else:
                    # record the author1
                    counter.record(author)

            # next page
            page += 1

    for repo_name in repo_list:
        _iterate_by_page(org_name, repo_name)

    # convert unix timestamp to Beijing datetime
    bj_start_datetime = datetime.fromtimestamp(start_datetime.timestamp(), tz=pytz.timezone("Asia/Shanghai"))
    bj_start_datetime_str = datetime2str(bj_start_datetime)

    contribution_list = counter.to_sorted_list()

    # remove contributors who has zero commits
    author_list = [x[0] for x in contribution_list]
    num_commit_list = [x[1] for x in contribution_list]

    # plot
    if len(author_list) > 0:
        xlabel = f"Number of Pull Requests (since {bj_start_datetime_str})"
        ylabel = "Contributor"
        title = "Active Contributor Leaderboard"
        plot_bar_chart(num_commit_list, author_list, xlabel=xlabel, ylabel=ylabel, title=title, output_path=output_path)
        return True
    else:
        return False


def upload_image_to_lark(lark_tenant_token: str, image_path: str) -> str:
    """
    Upload image to Lark and return the image key

    Args:
        lark_tenant_token (str): Lark tenant access token
        image_path (str): the path to the image to be uploaded
    """
    url = "https://open.feishu.cn/open-apis/im/v1/images"
    form = {"image_type": "message", "image": (open(image_path, "rb"))}  # 需要替换具体的path
    multi_form = MultipartEncoder(form)
    headers = {
        "Authorization": f"Bearer {lark_tenant_token}",  ## 获取tenant_access_token, 需要替换为实际的token
    }
    headers["Content-Type"] = multi_form.content_type
    response = requests.request("POST", url, headers=headers, data=multi_form).json()
    return response["data"]["image_key"]


def generate_lark_tenant_access_token(app_id: str, app_secret: str) -> str:
    """
    Generate Lark tenant access token.

    Args:
        app_id (str): Lark app id
        app_secret (str): Lark app secret
    """
    url = "https://open.feishu.cn/open-apis/auth/v3/tenant_access_token/internal"
    data = {"app_id": app_id, "app_secret": app_secret}
    response = requests.post(url, json=data).json()
    return response["tenant_access_token"]


def send_image_to_lark(image_key: str, webhook_url: str) -> None:
    """
    Send image to Lark.

    Args:
        image_key (str): the image key returned by Lark
        webhook_url (str): the webhook url to send the image
    """
    data = {"msg_type": "image", "content": {"image_key": image_key}}
    requests.post(webhook_url, json=data)


def send_message_to_lark(message: str, webhook_url: str):
    """
    Send message to Lark.

    Args:
        message (str): the message to be sent
        webhook_url (str): the webhook url to send the message
    """
    data = {"msg_type": "text", "content": {"text": message}}
    requests.post(webhook_url, json=data)


if __name__ == "__main__":
    GITHUB_TOKEN = os.environ["GITHUB_TOKEN"]
    CONTRIBUTOR_IMAGE_PATH = "contributor_leaderboard.png"
    USER_ENGAGEMENT_IMAGE_PATH = "engagement_leaderboard.png"
    ORG_NAME = "hpcaitech"

    # get all open source repositories
    REPO_LIST = get_organization_repositories(GITHUB_TOKEN, ORG_NAME)

    # generate images
    contrib_success = generate_contributor_leaderboard_image(GITHUB_TOKEN, ORG_NAME, REPO_LIST, CONTRIBUTOR_IMAGE_PATH)
    engagement_success = generate_user_engagement_leaderboard_image(
        GITHUB_TOKEN, ORG_NAME, REPO_LIST, USER_ENGAGEMENT_IMAGE_PATH
    )

    # upload images
    APP_ID = os.environ["LARK_APP_ID"]
    APP_SECRET = os.environ["LARK_APP_SECRET"]
    LARK_TENANT_TOKEN = generate_lark_tenant_access_token(app_id=APP_ID, app_secret=APP_SECRET)
    contributor_image_key = upload_image_to_lark(LARK_TENANT_TOKEN, CONTRIBUTOR_IMAGE_PATH)
    user_engagement_image_key = upload_image_to_lark(LARK_TENANT_TOKEN, USER_ENGAGEMENT_IMAGE_PATH)

    # send message to lark
    LARK_WEBHOOK_URL = os.environ["LARK_WEBHOOK_URL"]
    message = """本周的社区榜单出炉啦！
1. 开发贡献者榜单
2. 用户互动榜单

注：
- 开发贡献者测评标准为：本周由公司成员与社区在所有开源仓库提交的Pull Request次数
- 用户互动榜单测评标准为：本周由公司成员在非成员在所有开源仓库创建的issue/PR/discussion中回复的次数
"""

    send_message_to_lark(message, LARK_WEBHOOK_URL)

    # send contributor image to lark
    if contrib_success:
        send_image_to_lark(contributor_image_key, LARK_WEBHOOK_URL)
    else:
        send_message_to_lark("本周没有成员贡献PR，无榜单图片生成。", LARK_WEBHOOK_URL)

    # send user engagement image to lark
    if engagement_success:
        send_image_to_lark(user_engagement_image_key, LARK_WEBHOOK_URL)
    else:
        send_message_to_lark("本周没有成员互动，无榜单图片生成。", LARK_WEBHOOK_URL)


================================================
FILE: .github/workflows/scripts/generate_release_draft.py
================================================
#!/usr/bin/env python
# coding: utf-8

import argparse
import os
import re

import requests

COMMIT_API = "https://api.github.com/repos/hpcaitech/ColossalAI/commits"
TAGS_API = "https://api.github.com/repos/hpcaitech/ColossalAI/tags"


def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--out", type=str, help="output path for the release draft", required=True)
    parser.add_argument("--version", type=str, help="current version to release", required=True)
    return parser.parse_args()


def get_latest_tag_commit(headers=None):
    res = requests.get(url=TAGS_API, headers=headers)
    data = res.json()
    commit_hash = data[0]["commit"]["sha"]
    version = data[0]["name"]
    return commit_hash, version


def get_commit_info(commit_hash, headers=None):
    api = f"{COMMIT_API}/{commit_hash}"
    res = requests.get(url=api, headers=headers)
    return res.json()


def get_all_commit_info(since, headers=None):
    page = 1
    results = []

    while True:
        api = f"{COMMIT_API}?since={since}&per_page=100&page={page}"
        resp = requests.get(url=api, headers=headers)
        data = resp.json()

        # exit when no more data
        if len(data) == 0:
            break

        results.extend(data)
        page += 1

    return results


def collate_release_info(commit_info_list):
    results = dict()
    pattern = pattern = r"\[.*\]"

    for commit_info in commit_info_list:
        author = commit_info["commit"]["author"]["name"]

        try:
            author_url = commit_info["author"]["url"]
        except:
            # author can be None
            author_url = None
        msg = commit_info["commit"]["message"]
        match = re.search(pattern, msg)

        if match:
            tag = match.group().lstrip("[").rstrip("]").capitalize()
            if tag not in results:
                results[tag] = []
            results[tag].append((msg, author, author_url))

    return results


def generate_release_post_markdown(current_version, last_version, release_info):
    text = []

    # add highlights
    highlights = "## What's Changed \n\n"
    text.append(highlights)

    # add items
    for k, v in release_info.items():
        topic = f"### {k} \n"
        text.append(topic)

        for msg, author, author_url in v:
            # only keep the first line
            msg = msg.split("\n")[0]

            if author_url:
                item = f"{msg} by [{author}]({author_url})\n"
            else:
                item = f"{msg} by {author}\n"
            text.append(f"- {item}")

        text.append("\n")

    # add full change log
    text.append(
        f"**Full Changelog**: https://github.com/hpcaitech/ColossalAI/compare/{current_version}...{last_version}"
    )

    return text


if __name__ == "__main__":
    args = parse_args()
    token = os.environ["GITHUB_API_TOKEN"]
    headers = {"Authorization": token}

    # get previous release tag
    last_release_commit, last_version = get_latest_tag_commit(headers)
    last_release_commit_info = get_commit_info(last_release_commit, headers=headers)
    last_release_date = last_release_commit_info["commit"]["author"]["date"]

    # get the commits since last release
    commit_info = get_all_commit_info(since=last_release_date, headers=headers)
    commit_info = commit_info[:-1]  # remove the release commit

    # collate into markdown
    release_info = collate_release_info(commit_info)
    markdown_text = generate_release_post_markdown(args.version, last_version, release_info)

    # write into a file
    with open(args.out, "w") as f:
        for line in markdown_text:
            f.write(line)


================================================
FILE: .github/workflows/scripts/send_message_to_lark.py
================================================
import argparse

import requests


def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("-m", "--message", type=str)
    parser.add_argument("-u", "--url", type=str)
    return parser.parse_args()


def send_message_to_lark(message, webhook_url):
    data = {"msg_type": "text", "content": {"text": message}}
    requests.post(webhook_url, json=data)


if __name__ == "__main__":
    args = parse_args()
    send_message_to_lark(args.message, args.url)


================================================
FILE: .github/workflows/scripts/update_setup_for_nightly.py
================================================
from datetime import datetime


def open_setup_file():
    with open("setup.py", "r") as f:
        file_lines = f.readlines()
    return file_lines


def replace_nightly_package_info(file_lines):
    version = datetime.today().strftime("%Y.%m.%d")
    package_name = "colossalai-nightly"

    for idx, line in enumerate(file_lines):
        if "version = get_version()" in line:
            file_lines[idx] = f'version = "{version}"\n'
        if 'package_name = "colossalai"' in line:
            file_lines[idx] = f'package_name = "{package_name}"\n'
    return file_lines


def write_setup_file(file_lines):
    with open("setup.py", "w") as f:
        f.writelines(file_lines)


def main():
    file_lines = open_setup_file()
    file_lines = replace_nightly_package_info(file_lines)
    write_setup_file(file_lines)


if __name__ == "__main__":
    main()


================================================
FILE: .github/workflows/submodule.yml
================================================
name: Synchronize Submodule

on:
  workflow_dispatch:
  schedule:
    - cron: "0 0 * * *"

jobs:
  sync-submodule:
    runs-on: [self-hosted, ubuntu-latest]
    if: github.repository == 'hpcaitech/ColossalAI'
    steps:
      - name: Checkout
        uses: actions/checkout@v2
        with:
          ref: 'main'
          submodules: true

      - name: echo
        run: |
          echo ${{github}}

      - name: Git Sumbodule Update
        run: |
          git pull --recurse-submodules
          git submodule update --remote --recursive

      - name: Commit update
        run: |
          git config --global user.name 'github-actions'
          git config --global user.email 'github-actions@github.com'
          git remote set-url origin https://x-access-token:${{ secrets.GITHUB_TOKEN }}@github.com/${{ github.repository }}
          git commit -am "Automated submodule synchronization"

      - name: Create Pull Request
        uses: peter-evans/create-pull-request@v3
        with:
          title: '[Bot] Synchronize Submodule References'
          body: |
            Automated PR to update submodule commits
          committer: GitHub <noreply@github.com>
          author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
          assignees: ${{ github.actor }}
          delete-branch: true
          branch: create-pull-request/patch-sync-submodule


================================================
FILE: .github/workflows/translate_comment.yml
================================================
name: 'issue-translator'
on:
  issue_comment:
    types: [created]
  issues:
    types: [opened]

jobs:
  build:
    runs-on: [self-hosted, ubuntu-latest]
    steps:
      - uses: usthe/issues-translate-action@v2.7
        with:
          IS_MODIFY_TITLE: false
          # not require, default false, . Decide whether to modify the issue title
          # if true, the robot account @Issues-translate-bot must have modification permissions, invite @Issues-translate-bot to your project or use your custom bot.
          CUSTOM_BOT_NOTE: Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿
          # not require. Customize the translation robot prefix message.


================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/
docs/.build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# IDE
.idea/
.vscode/

# macos
*.DS_Store
#data/

docs/.build

# pytorch checkpoint
*.pt

# ignore version.py generated by setup.py
colossalai/version.py

# ignore any kernel build files
.o
.so

# ignore python interface defition file
.pyi

# ignore coverage test file
coverage.lcov
coverage.xml

# ignore testmon and coverage files
.coverage
.testmondata*

# log, test files - ColossalChat
applications/ColossalChat/logs
applications/ColossalChat/tests/logs
applications/ColossalChat/wandb
applications/ColossalChat/model
applications/ColossalChat/eval
applications/ColossalChat/rollouts
applications/ColossalChat/*.txt
applications/ColossalChat/*.db
applications/ColossalChat/stdin
applications/ColossalChat/*.zip
applications/ColossalChat/*.prof
applications/ColossalChat/*.png


================================================
FILE: .gitmodules
================================================
[submodule "examples/tutorial/fastfold/FastFold"]
	path = examples/tutorial/fastfold/FastFold
	url = https://github.com/hpcaitech/FastFold


================================================
FILE: .isort.cfg
================================================
[settings]
line_length = 120
multi_line_output=3
include_trailing_comma = true
ignore_comments = true
profile = black
honor_noqa = true


================================================
FILE: .pre-commit-config.yaml
================================================
repos:

  - repo: https://github.com/PyCQA/autoflake
    rev: v2.3.1
    hooks:
      - id: autoflake
        name: autoflake (python)
        args: ['--in-place', '--remove-unused-variables', '--remove-all-unused-imports', '--ignore-init-module-imports']

  - repo: https://github.com/pycqa/isort
    rev: 5.13.2
    hooks:
      - id: isort
        name: sort all imports (python)
        args: ["--profile", "black"] # avoid conflict with black

  - repo: https://github.com/psf/black-pre-commit-mirror
    rev: 24.10.0
    hooks:
    - id: black
      name: black formatter
      args: ['--line-length=120', '--target-version=py37', '--target-version=py38', '--target-version=py39','--target-version=py310']

  - repo: https://github.com/pre-commit/mirrors-clang-format
    rev: v19.1.5
    hooks:
    - id: clang-format
      name: clang formatter
      types_or: [c++, c]

  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v5.0.0
    hooks:
      - id: check-yaml
      - id: check-merge-conflict
      - id: check-case-conflict
      - id: trailing-whitespace
      - id: end-of-file-fixer
      - id: mixed-line-ending
        args: ['--fix=lf']


================================================
FILE: CHANGE_LOG.md
================================================
# Change Log

All notable changes to this project will be documented in this file.

🚩 **We have moved the change log to the GitHub [release page](https://github.com/hpcaitech/ColossalAI/releases)**

## v0.0.2 | 2022-02

### Added

- Unified distributed layers
- MoE support
- DevOps tools such as github action, code review automation, etc.
- New project official website

### Changes

- refactored the APIs for usability, flexibility and modularity
- adapted PyTorch AMP for tensor parallel
- refactored utilities for tensor parallel and pipeline parallel
- Separated benchmarks and examples as independent repositories
- Updated pipeline parallelism to support non-interleaved and interleaved versions
- refactored installation scripts for convenience

### Fixed

- zero level 3 runtime error
- incorrect calculation in gradient clipping


## v0.0.1 beta | 2021-10

The first beta version of Colossal-AI. Thanks to all contributors for the effort to implement the system.

### Added

- Initial architecture of the system
- Features such as tensor parallelism, gradient clipping, gradient accumulation


================================================
FILE: CONTRIBUTING.md
================================================
# Contributing

Colossal-AI welcomes any constructive contribution from the community and the team is more than willing to work on problems you have encountered to make it a better project.

## Environment Setup

To contribute to Colossal-AI, we would like to first guide you to set up a proper development environment so that you can better implement your code. It is good to install this system from source with the `editable` flag (`-e`, for development mode) so that your change to the source code will be reflected in runtime without repeated installation and uninstallation. Here are the steps to set up the development environment.

1. Uninstall any existing Colossal-AI distribution.

```shell
pip uninstall colossalai
```

2. Clone the repository to local workspace

```shell
git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI
```

3. The *Get Started* section of [official documentation](https://colossalai.org) has provided instructions to build from source. Follow to instruction to build from source, **but replace the last `pip install` statement with the command below by adding the `-e` flag.**

```shell
pip install <options> -e .
```

## Coding Standards

### Unit Tests
We use [PyTest](https://docs.pytest.org/en/latest/) to execute tests. You can install pytest by `pip install pytest`. As some of the tests require initialization of the distributed backend, GPUs are needed to execute these tests.

To set up the environment for unit testing, first change your current directory to the root directory of your local ColossalAI repository, then run
```bash
pip install -r requirements/requirements-test.txt
```
If you encounter an error telling "Could not find a version that satisfies the requirement fbgemm-gpu==0.2.0", please downgrade your python version to 3.8 or 3.9 and try again.

If you only want to run CPU tests, you can run

```bash
pytest -m cpu tests/
```

If you have 8 GPUs on your machine, you can run the full test

```bash
pytest tests/
```

If you do not have 8 GPUs on your machine, do not worry. Unit testing will be automatically conducted when you put up a pull request to the main branch.


### Code Style

We have some static checks when you commit your code change, please make sure you can pass all the tests and make sure the coding style meets our requirements. We use pre-commit hook to make sure the code is aligned with the writing standard. To set up the code style checking, you need to follow the steps below.

```shell
# these commands are executed under the Colossal-AI directory
pip install pre-commit
pre-commit install
```

Code format checking will be automatically executed when you commit your changes.


## Contribution Guide

You need to follow these steps below to make contribution to the main repository via pull request. You can learn about the details of pull request [here](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests).

### 1. Fork the Official Repository

Firstly, you need to visit the [Colossal-AI repository](https://github.com/hpcaitech/ColossalAI) and fork into your own account. The `fork` button is at the right top corner of the web page alongside with buttons such as `watch` and `star`.

Now, you can clone your own forked repository into your local environment.

```shell
git clone https://github.com/<YOUR-USERNAME>/ColossalAI.git
```

### 2. Configure Git

You need to set the official repository as your upstream so that you can synchronize with the latest update in the official repository. You can learn about upstream [here](https://www.atlassian.com/git/tutorials/git-forks-and-upstreams).

Then add the original repository as upstream

```shell
cd ColossalAI
git remote add upstream https://github.com/hpcaitech/ColossalAI.git
```

you can use the following command to verify that the remote is set. You should see both `origin` and `upstream` in the output.

```shell
git remote -v
```

### 3. Synchronize with Official Repository

Before you make changes to the codebase, it is always good to fetch the latest updates in the official repository. In order to do so, you can use the commands below.

```shell
git fetch upstream
git checkout main
git merge upstream/main
git push origin main
```

Otherwise, you can click the `fetch upstream` button on the github webpage of the main branch of your forked repository. Then, use these commands to sync.

```
git checkout main
git fetch main
```

### 4. Choose/Create an Issue for Your Pull Request

Generally, your code change should be only targeted at one problem. Stacking multiple commits for different problems into one pull request will only make the code review such dire suffering and make the system prone to new bugs as the reviewer may not understand the code logic correctly. Thus, you should choose an existing issue or [create your own issue](https://github.com/hpcaitech/ColossalAI/issues) as your pull request target. If you wish to create a new issue, do use appropriate title and description and add related labels.


### 5. Create a New Branch

You should not make changes to the `main` branch of your forked repository as this might make upstream synchronization difficult. You can create a new branch with the appropriate name. General branch name format should start with `hotfix/` and `feature/`. `hotfix` is for bug fix and `feature` is for addition of a new feature.


```shell
git checkout -b <NEW-BRANCH-NAME>
```

### 6. Implementation and Code Commit

Now you can implement your code change in the source code. Remember that you installed the system in development, thus you do not need to uninstall and install to make the code take effect. The code change will be reflected in every new PyThon execution.
You can commit and push the changes to your local repository. The changes should be kept logical, modular and atomic.

```shell
git add -A
git commit -m "<COMMIT-MESSAGE>"
git push -u origin <NEW-BRANCH-NAME>
```

### 7. Open a Pull Request

You can now create a pull request on the GitHub webpage of your repository. The source branch is `<NEW-BRANCH-NAME>` of your repository and the target branch should be `main` of `hpcaitech/ColossalAI`. After creating this pull request, you should be able to see it [here](https://github.com/hpcaitech/ColossalAI/pulls).

Do write clearly the description of your pull request and [link the pull request to your target issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue). This will automatically close the issue when the pull request is approved.

In case of code conflict, you should rebase your branch and resolve the conflicts manually.


================================================
FILE: LICENSE
================================================
Copyright 2021- HPC-AI Technology Inc. All rights reserved.
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright 2021- HPC-AI Technology Inc.

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.

   ## Some of colossal-ai's code is derived from others projects, which is subject to the following copyright notice:

   Copyright 2021 The Alpa team.

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

         https://github.com/alpa-projects/alpa/blob/979a45a3e6187df941ef4a4c4c6eea664527d68d/LICENSE

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.

   -------------------------------------------------

   Copyright 2018-2020 Philippe Tillet
   Copyright 2020-2022 OpenAI

   Permission is hereby granted, free of charge, to any person obtaining
   a copy of this software and associated documentation files
   (the "Software"), to deal in the Software without restriction,
   including without limitation the rights to use, copy, modify, merge,
   publish, distribute, sublicense, and/or sell copies of the Software,
   and to permit persons to whom the Software is furnished to do so,
   subject to the following conditions:

   ---------------- LICENSE FOR Microsoft Deepspeed ----------------

   MIT License

   Copyright (c) Microsoft Corporation.

   Permission is hereby granted, free of charge, to any person obtaining a copy
   of this software and associated documentation files (the "Software"), to deal
   in the Software without restriction, including without limitation the rights
   to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
   copies of the Software, and to permit persons to whom the Software is
   furnished to do so, subject to the following conditions:

   The above copyright notice and this permission notice shall be included in all
   copies or substantial portions of the Software.

   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
   IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
   FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
   AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
   LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
   OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
   SOFTWARE

   ---------------- LICENSE FOR NVIDIA Megatron-LM ----------------

   Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.

   Redistribution and use in source and binary forms, with or without
   modification, are permitted provided that the following conditions
   are met:
    * Redistributions of source code must retain the above copyright
      notice, this list of conditions and the following disclaimer.
    * Redistributions in binary form must reproduce the above copyright
      notice, this list of conditions and the following disclaimer in the
      documentation and/or other materials provided with the distribution.
    * Neither the name of NVIDIA CORPORATION nor the names of its
      contributors may be used to endorse or promote products derived
      from this software without specific prior written permission.

   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
   EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
   IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
   PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
   CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
   EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
   PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
   PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
   OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

   ---------------- LICENSE FOR NVIDIA Apex ----------------

   All rights reserved.

   Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

   1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

   2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

   3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

   ---------------- LICENSE FOR Facebook Fairscale ----------------

   Copyright (c) Facebook, Inc. and its affiliates

   Redistribution and use in source and binary forms, with or without
   modification, are permitted provided that the following conditions are met:

   1. Redistributions of source code must retain the above copyright
      notice, this list of conditions and the following disclaimer.

   2. Redistributions in binary form must reproduce the above copyright
      notice, this list of conditions and the following disclaimer in the
      documentation and/or other materials provided with the distribution.

   3. Neither the names of Facebook, Deepmind Technologies, NYU, NEC Laboratories America
      and IDIAP Research Institute nor the names of its contributors may be
      used to endorse or promote products derived from this software without
      specific prior written permission.

   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
   AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
   IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
   ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
   LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
   CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
   SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
   INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
   CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
   ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
   POSSIBILITY OF SUCH DAMAGE.

   ---------------- LICENSE FOR Flash Attention ----------------

   BSD 3-Clause License

   Copyright (c) 2022, the respective contributors, as shown by the AUTHORS file.
   All rights reserved.

   Redistribution and use in source and binary forms, with or without
   modification, are permitted provided that the following conditions are met:

   * Redistributions of source code must retain the above copyright notice, this
   list of conditions and the following disclaimer.

   * Redistributions in binary form must reproduce the above copyright notice,
   this list of conditions and the following disclaimer in the documentation
   and/or other materials provided with the distribution.

   * Neither the name of the copyright holder nor the names of its
   contributors may be used to endorse or promote products derived from
   this software without specific prior written permission.

   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
   AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
   IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
   DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
   FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
   DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
   SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
   CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
   OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

   ---------------- LICENSE FOR Facebook xFormers ----------------

   From xFormers:

   Copyright (c) Facebook, Inc. and its affiliates


   ===

   BSD 3-Clause License

   Redistribution and use in source and binary forms, with or without
   modification, are permitted provided that the following conditions are met:

   1. Redistributions of source code must retain the above copyright
      notice, this list of conditions and the following disclaimer.

   2. Redistributions in binary form must reproduce the above copyright
      notice, this list of conditions and the following disclaimer in the
      documentation and/or other materials provided with the distribution.

   3. Neither the names of Facebook, Deepmind Technologies, NYU, NEC Laboratories America
      and IDIAP Research Institute nor the names of its contributors may be
      used to endorse or promote products derived from this software without
      specific prior written permission.

   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
   AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
   IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
   ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
   LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
   CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
   SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
   INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
   CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
   ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
   POSSIBILITY OF SUCH DAMAGE.

   ---------------- LICENSE FOR VLLM TEAM ----------------

   from VLLM TEAM:

      Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

         https://github.com/vllm-project/vllm/blob/main/LICENSE

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.

   ---------------- LICENSE FOR LIGHTLLM TEAM ----------------

   from LIGHTLLM TEAM:

      Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

         https://github.com/ModelTC/lightllm/blob/main/LICENSE

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
   ---------------- LICENSE FOR AutoGPTQ ----------------

   From AutoGPTQ:

   MIT License

   Copyright (c) 2023 潘其威(William)

   Permission is hereby granted, free of charge, to any person obtaining a copy
   of this software and associated documentation files (the "Software"), to deal
   in the Software without restriction, including without limitation the rights
   to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
   copies of the Software, and to permit persons to whom the Software is
   furnished to do so, subject to the following conditions:

   The above copyright notice and this permission notice shall be included in all
   copies or substantial portions of the Software.

   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
   IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
   FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
   AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
   LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
   OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
   SOFTWARE.

   ---------------- LICENSE FOR exllama ----------------

   From exllama:

   MIT License

   Permission is hereby granted, free of charge, to any person obtaining a copy
   of this software and associated documentation files (the "Software"), to deal
   in the Software without restriction, including without limitation the rights
   to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
   copies of the Software, and to permit persons to whom the Software is
   furnished to do so, subject to the following conditions:

   The above copyright notice and this permission notice shall be included in all
   copies or substantial portions of the Software.

   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
   IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
   FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
   AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
   LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
   OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
   SOFTWARE.


   ---------------- LICENSE FOR torch-int ----------------

   MIT License

   Copyright (c) 2022 Guangxuan Xiao

   Permission is hereby granted, free of charge, to any person obtaining a copy
   of this software and associated documentation files (the "Software"), to deal
   in the Software without restriction, including without limitation the rights
   to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
   copies of the Software, and to permit persons to whom the Software is
   furnished to do so, subject to the following conditions:

   The above copyright notice and this permission notice shall be included in all
   copies or substantial portions of the Software.

   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
   IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
   FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
   AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
   LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
   OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
   SOFTWARE.


   ---------------- LICENSE FOR smoothquant ----------------

   MIT License

   Copyright (c) 2022 MIT HAN Lab

   Permission is hereby granted, free of charge, to any person obtaining a copy
   of this software and associated documentation files (the "Software"), to deal
   in the Software without restriction, including without limitation the rights
   to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
   copies of the Software, and to permit persons to whom the Software is
   furnished to do so, subject to the following conditions:

   The above copyright notice and this permission notice shall be included in all
   copies or substantial portions of the Software.

   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
   IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
   FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
   AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
   LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
   OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
   SOFTWARE.


   ---------------- LICENSE FOR LangChain TEAM ----------------

   The MIT License

   Copyright (c) Harrison Chase

   Permission is hereby granted, free of charge, to any person obtaining a copy
   of this software and associated documentation files (the "Software"), to deal
   in the Software without restriction, including without limitation the rights
   to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
   copies of the Software, and to permit persons to whom the Software is
   furnished to do so, subject to the following conditions:

   The above copyright notice and this permission notice shall be included in
   all copies or substantial portions of the Software.

   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
   IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
   FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
   AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
   LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
   OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
   THE SOFTWARE.
   ---------------- LICENSE FOR Hugging Face accelerate ----------------

   Copyright 2021 The HuggingFace Team

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: MANIFEST.in
================================================
include *.txt README.md
recursive-include requirements *.txt
recursive-include colossalai *.cpp *.h *.cu *.tr *.cuh *.cc *.pyi
recursive-include extensions *.py *.cpp *.h *.cu *.tr *.cuh *.cc *.pyi


================================================
FILE: README.md
================================================
# Colossal-AI
<div id="top" align="center">

   [![logo](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/colossal-ai_logo_vertical.png)](https://www.colossalai.org/)

   Colossal-AI: Making large AI models cheaper, faster, and more accessible

   <h3> <a href="https://arxiv.org/abs/2110.14883"> Paper </a> |
   <a href="https://www.colossalai.org/"> Documentation </a> |
   <a href="https://github.com/hpcaitech/ColossalAI/tree/main/examples"> Examples </a> |
   <a href="https://github.com/hpcaitech/ColossalAI/discussions"> Forum </a> |
   <a href="https://colossalai.org/zh-Hans/docs/get_started/bonus/">GPU Cloud Playground </a> |
   <a href="https://hpc-ai.com/blog"> Blog </a></h3>

   [![GitHub Repo stars](https://img.shields.io/github/stars/hpcaitech/ColossalAI?style=social)](https://github.com/hpcaitech/ColossalAI/stargazers)
   [![Build](https://github.com/hpcaitech/ColossalAI/actions/workflows/build_on_schedule.yml/badge.svg)](https://github.com/hpcaitech/ColossalAI/actions/workflows/build_on_schedule.yml)
   [![Documentation](https://readthedocs.org/projects/colossalai/badge/?version=latest)](https://colossalai.readthedocs.io/en/latest/?badge=latest)
   [![CodeFactor](https://www.codefactor.io/repository/github/hpcaitech/colossalai/badge)](https://www.codefactor.io/repository/github/hpcaitech/colossalai)
   [![HuggingFace badge](https://img.shields.io/badge/%F0%9F%A4%97HuggingFace-Join-yellow)](https://huggingface.co/hpcai-tech)
   [![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack&amp)](https://github.com/hpcaitech/public_assets/tree/main/colossalai/contact/slack)
   [![WeChat badge](https://img.shields.io/badge/微信-加入-green?logo=wechat&amp)](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png)


   | [English](README.md) | [中文](docs/README-zh-Hans.md) |

</div>

## Instantly Run Colossal-AI on Enterprise-Grade GPUs

Skip the setup. Access a powerful, pre-configured Colossal-AI environment on [**HPC-AI Cloud**](https://hpc-ai.com/?utm_source=github&utm_medium=social&utm_campaign=promotion-colossalai).

Train your models and scale your AI workload in one click!

* **NVIDIA Blackwell B200s**: Experience the next generation of AI performance ([See Benchmarks](https://hpc-ai.com/blog/b200)). Now available on cloud from **$2.47/hr**.
* **Cost-Effective H200 Cluster**: Get premier performance with on-demand rental from just **$1.99/hr**.

[**Get Started Now & Claim Your Free Credits →**](https://hpc-ai.com/?utm_source=github&utm_medium=social&utm_campaign=promotion-colossalai)

<div align="center">
   <a href="https://hpc-ai.com/?utm_source=github&utm_medium=social&utm_campaign=promotion-colossalai">
   <img src="https://github.com/hpcaitech/public_assets/blob/main/colossalai/img/2-3.png" width="850" />
   </a>
</div>

### Colossal-AI Benchmark

To see how these performance gains translate to real-world applications, we conducted a large language model training benchmark using Colossal-AI on Llama-like models. The tests were run on both 8-card and 16-card configurations for 7B and 70B models, respectively.

|              GPU              |  GPUs  | Model Size |    Parallelism    | Batch Size per DP | Seqlen | Throughput | TFLOPS/GPU  | Peak Mem(MiB)  |
| :-----------------------------: | :--------: | :-------------: | :------------------: | :-----------: | :--------------: | :-------------: | :-------------: | :-------------: |
|         H200            |     8     |      7B       |   zero2(dp8)     | 36 |        4096     |       17.13 samp/s     |       534.18     |       119040.02     |
|         H200            |     16     |      70B       |   zero2     | 48 |        4096     |       3.27 samp/s     |       469.1     |       150032.23     |
|         B200            |     8     |      7B       |   zero1(dp2)+tp2+pp4     | 128 |        4096     |       25.83 samp/s     |       805.69     |       100119.77     |
|         H200            |     16     |      70B       |   zero1(dp2)+tp2+pp4     | 128 |        4096     |       5.66 samp/s     |       811.79     |       100072.02     |

The results from the Colossal-AI benchmark provide the most practical insight. For the 7B model on 8 cards, the **B200 achieved a 50% higher throughput** and a significant increase in TFLOPS per GPU. For the 70B model on 16 cards, the B200 again demonstrated a clear advantage, with **over 70% higher throughput and TFLOPS per GPU**. These numbers show that the B200's performance gains translate directly to faster training times for large-scale models.

## Latest News
* [2025/02] [DeepSeek 671B Fine-Tuning Guide Revealed—Unlock the Upgraded DeepSeek Suite with One Click, AI Players Ecstatic!](https://company.hpc-ai.com/blog/shocking-release-deepseek-671b-fine-tuning-guide-revealed-unlock-the-upgraded-deepseek-suite-with-one-click-ai-players-ecstatic)
* [2024/12] [The development cost of video generation models has saved by 50%! Open-source solutions are now available with H200 GPU vouchers](https://company.hpc-ai.com/blog/the-development-cost-of-video-generation-models-has-saved-by-50-open-source-solutions-are-now-available-with-h200-gpu-vouchers) [[code]](https://github.com/hpcaitech/Open-Sora/blob/main/scripts/train.py) [[vouchers]](https://colossalai.org/zh-Hans/docs/get_started/bonus/)
* [2024/10] [How to build a low-cost Sora-like app? Solutions for you](https://company.hpc-ai.com/blog/how-to-build-a-low-cost-sora-like-app-solutions-for-you)
* [2024/09] [Singapore Startup HPC-AI Tech Secures 50 Million USD in Series A Funding to Build the Video Generation AI Model and GPU Platform](https://company.hpc-ai.com/blog/singapore-startup-hpc-ai-tech-secures-50-million-usd-in-series-a-funding-to-build-the-video-generation-ai-model-and-gpu-platform)
* [2024/09] [Reducing AI Large Model Training Costs by 30% Requires Just a Single Line of Code From FP8 Mixed Precision Training Upgrades](https://company.hpc-ai.com/blog/reducing-ai-large-model-training-costs-by-30-requires-just-a-single-line-of-code-from-fp8-mixed-precision-training-upgrades)
* [2024/06] [Open-Sora Continues Open Source: Generate Any 16-Second 720p HD Video with One Click, Model Weights Ready to Use](https://hpc-ai.com/blog/open-sora-from-hpc-ai-tech-team-continues-open-source-generate-any-16-second-720p-hd-video-with-one-click-model-weights-ready-to-use)
* [2024/05] [Large AI Models Inference Speed Doubled, Colossal-Inference Open Source Release](https://hpc-ai.com/blog/colossal-inference)
* [2024/04] [Open-Sora Unveils Major Upgrade: Embracing Open Source with Single-Shot 16-Second Video Generation and 720p Resolution](https://hpc-ai.com/blog/open-soras-comprehensive-upgrade-unveiled-embracing-16-second-video-generation-and-720p-resolution-in-open-source)
* [2024/04] [Most cost-effective solutions for inference, fine-tuning and pretraining, tailored to LLaMA3 series](https://hpc-ai.com/blog/most-cost-effective-solutions-for-inference-fine-tuning-and-pretraining-tailored-to-llama3-series)

## Table of Contents
<ul>
 <li><a href="#Why-Colossal-AI">Why Colossal-AI</a> </li>
 <li><a href="#Features">Features</a> </li>
 <li>
   <a href="#Colossal-AI-in-the-Real-World">Colossal-AI for Real World Applications</a>
   <ul>
     <li><a href="#Open-Sora">Open-Sora: Revealing Complete Model Parameters, Training Details, and Everything for Sora-like Video Generation Models</a></li>
     <li><a href="#Colossal-LLaMA-2">Colossal-LLaMA-2: One Half-Day of Training Using a Few Hundred Dollars Yields Similar Results to Mainstream Large Models, Open-Source and Commercial-Free Domain-Specific Llm Solution</a></li>
     <li><a href="#ColossalChat">ColossalChat: An Open-Source Solution for Cloning ChatGPT With a Complete RLHF Pipeline</a></li>
     <li><a href="#AIGC">AIGC: Acceleration of Stable Diffusion</a></li>
     <li><a href="#Biomedicine">Biomedicine: Acceleration of AlphaFold Protein Structure</a></li>
   </ul>
 </li>
 <li>
   <a href="#Parallel-Training-Demo">Parallel Training Demo</a>
   <ul>
     <li><a href="#LLaMA3">LLaMA 1/2/3 </a></li>
     <li><a href="#MoE">MoE</a></li>
     <li><a href="#GPT-3">GPT-3</a></li>
     <li><a href="#GPT-2">GPT-2</a></li>
     <li><a href="#BERT">BERT</a></li>
     <li><a href="#PaLM">PaLM</a></li>
     <li><a href="#OPT">OPT</a></li>
     <li><a href="#ViT">ViT</a></li>
     <li><a href="#Recommendation-System-Models">Recommendation System Models</a></li>
   </ul>
 </li>
 <li>
   <a href="#Single-GPU-Training-Demo">Single GPU Training Demo</a>
   <ul>
     <li><a href="#GPT-2-Single">GPT-2</a></li>
     <li><a href="#PaLM-Single">PaLM</a></li>
   </ul>
 </li>
 <li>
   <a href="#Inference">Inference</a>
   <ul>
     <li><a href="#Colossal-Inference">Colossal-Inference: Large AI  Models Inference Speed Doubled</a></li>
     <li><a href="#Grok-1">Grok-1: 314B model of PyTorch + HuggingFace Inference</a></li>
     <li><a href="#SwiftInfer">SwiftInfer:Breaks the Length Limit of LLM for Multi-Round Conversations with 46% Acceleration</a></li>
   </ul>
 </li>
 <li>
   <a href="#Installation">Installation</a>
   <ul>
     <li><a href="#PyPI">PyPI</a></li>
     <li><a href="#Install-From-Source">Install From Source</a></li>
   </ul>
 </li>
 <li><a href="#Use-Docker">Use Docker</a></li>
 <li><a href="#Community">Community</a></li>
 <li><a href="#Contributing">Contributing</a></li>
 <li><a href="#Cite-Us">Cite Us</a></li>
</ul>

## Why Colossal-AI
<div align="center">
   <a href="https://youtu.be/KnXSfjqkKN0">
   <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/JamesDemmel_Colossal-AI.png" width="600" />
   </a>

   Prof. James Demmel (UC Berkeley): Colossal-AI makes training AI models efficient, easy, and scalable.
</div>

<p align="right">(<a href="#top">back to top</a>)</p>

## Features

Colossal-AI provides a collection of parallel components for you. We aim to support you to write your
distributed deep learning models just like how you write your model on your laptop. We provide user-friendly tools to kickstart
distributed training and inference in a few lines.

- Parallelism strategies
  - Data Parallelism
  - Pipeline Parallelism
  - 1D, [2D](https://arxiv.org/abs/2104.05343), [2.5D](https://arxiv.org/abs/2105.14500), [3D](https://arxiv.org/abs/2105.14450) Tensor Parallelism
  - [Sequence Parallelism](https://arxiv.org/abs/2105.13120)
  - [Zero Redundancy Optimizer (ZeRO)](https://arxiv.org/abs/1910.02054)
  - [Auto-Parallelism](https://arxiv.org/abs/2302.02599)

- Heterogeneous Memory Management
  - [PatrickStar](https://arxiv.org/abs/2108.05818)

- Friendly Usage
  - Parallelism based on the configuration file

<p align="right">(<a href="#top">back to top</a>)</p>

## Colossal-AI in the Real World
### Open-Sora

[Open-Sora](https://github.com/hpcaitech/Open-Sora)：Revealing Complete Model Parameters, Training Details, and Everything for Sora-like Video Generation Models
[[code]](https://github.com/hpcaitech/Open-Sora)
[[blog]](https://hpc-ai.com/blog/open-sora-from-hpc-ai-tech-team-continues-open-source-generate-any-16-second-720p-hd-video-with-one-click-model-weights-ready-to-use)
[[Model weights]](https://github.com/hpcaitech/Open-Sora?tab=readme-ov-file#model-weights)
[[Demo]](https://github.com/hpcaitech/Open-Sora?tab=readme-ov-file#-latest-demo)
[[GPU Cloud Playground]](https://cloud.luchentech.com/)
[[OpenSora Image]](https://cloud.luchentech.com/doc/docs/image/open-sora/)

<div align="center">
   <a href="https://youtu.be/ilMQpU71ddI?si=J4JSPzZ03ycYmlki">
   <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/sora/opensora-v1.2.png" width="700" />
   </a>
</div>

<p align="right">(<a href="#top">back to top</a>)</p>

### Colossal-LLaMA-2

[[GPU Cloud Playground]](https://cloud.luchentech.com/)
[[LLaMA3 Image]](https://cloud.luchentech.com/doc/docs/image/llama)

- 7B: One half-day of training using a few hundred dollars yields similar results to mainstream large models, open-source and commercial-free domain-specific LLM solution.
[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Colossal-LLaMA-2)
[[blog]](https://www.hpc-ai.tech/blog/one-half-day-of-training-using-a-few-hundred-dollars-yields-similar-results-to-mainstream-large-models-open-source-and-commercial-free-domain-specific-llm-solution)
[[HuggingFace model weights]](https://huggingface.co/hpcai-tech/Colossal-LLaMA-2-7b-base)
[[Modelscope model weights]](https://www.modelscope.cn/models/colossalai/Colossal-LLaMA-2-7b-base/summary)

- 13B: Construct refined 13B private model with just $5000 USD.
[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Colossal-LLaMA-2)
[[blog]](https://hpc-ai.com/blog/colossal-llama-2-13b)
[[HuggingFace model weights]](https://huggingface.co/hpcai-tech/Colossal-LLaMA-2-13b-base)
[[Modelscope model weights]](https://www.modelscope.cn/models/colossalai/Colossal-LLaMA-2-13b-base/summary)

|              Model              |  Backbone  | Tokens Consumed |     MMLU (5-shot)    | CMMLU (5-shot)| AGIEval (5-shot) | GAOKAO (0-shot) | CEval (5-shot)  |
| :-----------------------------: | :--------: | :-------------: | :------------------: | :-----------: | :--------------: | :-------------: | :-------------: |
|          Baichuan-7B            |     -      |      1.2T       |    42.32 (42.30)     | 44.53 (44.02) |        38.72     |       36.74     |       42.80     |
|       Baichuan-13B-Base         |     -      |      1.4T       |    50.51 (51.60)     | 55.73 (55.30) |        47.20     |       51.41     |       53.60     |
|       Baichuan2-7B-Base         |     -      |      2.6T       |    46.97 (54.16)     | 57.67 (57.07) |        45.76     |       52.60     |       54.00     |
|       Baichuan2-13B-Base        |     -      |      2.6T       |    54.84 (59.17)     | 62.62 (61.97) |        52.08     |       58.25     |       58.10     |
|           ChatGLM-6B            |     -      |      1.0T       |    39.67 (40.63)     |   41.17 (-)   |        40.10     |       36.53     |       38.90     |
|          ChatGLM2-6B            |     -      |      1.4T       |    44.74 (45.46)     |   49.40 (-)   |        46.36     |       45.49     |       51.70     |
|          InternLM-7B            |     -      |      1.6T       |    46.70 (51.00)     |   52.00 (-)   |        44.77     |       61.64     |       52.80     |
|            Qwen-7B              |     -      |      2.2T       |    54.29 (56.70)     | 56.03 (58.80) |        52.47     |       56.42     |       59.60     |
|           Llama-2-7B            |     -      |      2.0T       |    44.47 (45.30)     |   32.97 (-)   |        32.60     |       25.46     |         -       |
| Linly-AI/Chinese-LLaMA-2-7B-hf  | Llama-2-7B |      1.0T       |        37.43         |     29.92     |        32.00     |       27.57     |         -       |
| wenge-research/yayi-7b-llama2   | Llama-2-7B |        -        |        38.56         |     31.52     |        30.99     |       25.95     |         -       |
| ziqingyang/chinese-llama-2-7b   | Llama-2-7B |        -        |        33.86         |     34.69     |        34.52     |       25.18     |        34.2     |
| TigerResearch/tigerbot-7b-base  | Llama-2-7B |      0.3T       |        43.73         |     42.04     |        37.64     |       30.61     |         -       |
|  LinkSoul/Chinese-Llama-2-7b    | Llama-2-7B |        -        |        48.41         |     38.31     |        38.45     |       27.72     |         -       |
|       FlagAlpha/Atom-7B         | Llama-2-7B |      0.1T       |        49.96         |     41.10     |        39.83     |       33.00     |         -       |
| IDEA-CCNL/Ziya-LLaMA-13B-v1.1   | Llama-13B  |      0.11T      |        50.25         |     40.99     |        40.04     |       30.54     |         -       |
|  **Colossal-LLaMA-2-7b-base**   | Llama-2-7B |   **0.0085T**   |        53.06         |     49.89     |        51.48     |       58.82     |        50.2     |
|  **Colossal-LLaMA-2-13b-base**  | Llama-2-13B |   **0.025T**    |        56.42         |     61.80     |        54.69     |       69.53     |        60.3     |


### ColossalChat

<div align="center">
   <a href="https://www.youtube.com/watch?v=HcTiHzApHm0">
   <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/ColossalChat%20YouTube.png" width="700" />
   </a>
</div>

[ColossalChat](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat): An open-source solution for cloning [ChatGPT](https://openai.com/blog/chatgpt/) with a complete RLHF pipeline.
[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat)
[[blog]](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b)
[[demo]](https://www.youtube.com/watch?v=HcTiHzApHm0)
[[tutorial]](https://www.youtube.com/watch?v=-qFBZFmOJfg)

<p id="ColossalChat-Speed" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/ColossalChat%20Speed.jpg" width=450/>
</p>

- Up to 10 times faster for RLHF PPO Stage3 Training

<p id="ColossalChat_scaling" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/ChatGPT%20scaling.png" width=800/>
</p>

- Up to 7.73 times faster for single server training and 1.42 times faster for single-GPU inference

<p id="ColossalChat-1GPU" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/ChatGPT-1GPU.jpg" width=450/>
</p>

- Up to 10.3x growth in model capacity on one GPU
- A mini demo training process requires only 1.62GB of GPU memory (any consumer-grade GPU)

<p id="ColossalChat-LoRA" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/LoRA%20data.jpg" width=600/>
</p>

- Increase the capacity of the fine-tuning model by up to 3.7 times on a single GPU
- Keep at a sufficiently high running speed

<p align="right">(<a href="#top">back to top</a>)</p>


### AIGC
Acceleration of AIGC (AI-Generated Content) models such as [Stable Diffusion v1](https://github.com/CompVis/stable-diffusion) and [Stable Diffusion v2](https://github.com/Stability-AI/stablediffusion).
<p id="diffusion_train" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Stable%20Diffusion%20v2.png" width=800/>
</p>

- [Training](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion): Reduce Stable Diffusion memory consumption by up to 5.6x and hardware cost by up to 46x (from A100 to RTX3060).

<p id="diffusion_demo" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/DreamBooth.png" width=800/>
</p>

- [DreamBooth Fine-tuning](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/dreambooth): Personalize your model using just 3-5 images of the desired subject.

<p id="inference-sd" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Stable%20Diffusion%20Inference.jpg" width=800/>
</p>

- [Inference](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion): Reduce inference GPU memory consumption by 2.5x.


<p align="right">(<a href="#top">back to top</a>)</p>

### Biomedicine
Acceleration of [AlphaFold Protein Structure](https://alphafold.ebi.ac.uk/)

<p id="FastFold" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/FastFold.jpg" width=800/>
</p>

- [FastFold](https://github.com/hpcaitech/FastFold): Accelerating training and inference on GPU Clusters, faster data processing, inference sequence containing more than 10000 residues.

<p id="FastFold-Intel" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/data%20preprocessing%20with%20Intel.jpg" width=600/>
</p>

- [FastFold with Intel](https://github.com/hpcaitech/FastFold): 3x inference acceleration and 39% cost reduce.

<p id="xTrimoMultimer" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/xTrimoMultimer_Table.jpg" width=800/>
</p>

- [xTrimoMultimer](https://github.com/biomap-research/xTrimoMultimer): accelerating structure prediction of protein monomers and multimer by 11x.


<p align="right">(<a href="#top">back to top</a>)</p>

## Parallel Training Demo
### LLaMA3
<p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/examples/images/LLaMA3-70B-H100.png" width=600/>
</p>

- 70 billion parameter LLaMA3 model training accelerated by 18%
[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/llama)
[[GPU Cloud Playground]](https://cloud.luchentech.com/)
[[LLaMA3 Image]](https://cloud.luchentech.com/doc/docs/image/llama)

### LLaMA2
<p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/llama2_pretraining.png" width=600/>
</p>

- 70 billion parameter LLaMA2 model training accelerated by 195%
[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/llama)
[[blog]](https://www.hpc-ai.tech/blog/70b-llama2-training)

### LLaMA1
<p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/examples/images/LLaMA_pretraining.png" width=600/>
</p>

- 65-billion-parameter large model pretraining accelerated by 38%
[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/llama)
[[blog]](https://www.hpc-ai.tech/blog/large-model-pretraining)

### MoE
<p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/examples/images/MOE_training.png" width=800/>
</p>

- Enhanced MoE parallelism, Open-source MoE model training can be 9 times more efficient
[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/openmoe)
[[blog]](https://www.hpc-ai.tech/blog/enhanced-moe-parallelism-open-source-moe-model-training-can-be-9-times-more-efficient)

### GPT-3
<p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT3-v5.png" width=700/>
</p>

- Save 50% GPU resources and 10.7% acceleration

### GPT-2
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2.png" width=800/>

- 11x lower GPU memory consumption, and superlinear scaling efficiency with Tensor Parallelism

<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/(updated)GPT-2.png" width=800>

- 24x larger model size on the same hardware
- over 3x acceleration
### BERT
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/BERT.png" width=800/>

- 2x faster training, or 50% longer sequence length

### PaLM
- [PaLM-colossalai](https://github.com/hpcaitech/PaLM-colossalai): Scalable implementation of Google's Pathways Language Model ([PaLM](https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html)).

### OPT
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/OPT_update.png" width=800/>

- [Open Pretrained Transformer (OPT)](https://github.com/facebookresearch/metaseq), a 175-Billion parameter AI language model released by Meta, which stimulates AI programmers to perform various downstream tasks and application deployments because of public pre-trained model weights.
- 45% speedup fine-tuning OPT at low cost in lines. [[Example]](https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/opt) [[Online Serving]](https://colossalai.org/docs/advanced_tutorials/opt_service)

Please visit our [documentation](https://www.colossalai.org/) and [examples](https://github.com/hpcaitech/ColossalAI/tree/main/examples) for more details.

### ViT
<p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/ViT.png" width="450" />
</p>

- 14x larger batch size, and 5x faster training for Tensor Parallelism = 64

### Recommendation System Models
- [Cached Embedding](https://github.com/hpcaitech/CachedEmbedding), utilize software cache to train larger embedding tables with a smaller GPU memory budget.

<p align="right">(<a href="#top">back to top</a>)</p>

## Single GPU Training Demo

### GPT-2
<p id="GPT-2-Single" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2-GPU1.png" width=450/>
</p>

- 20x larger model size on the same hardware

<p id="GPT-2-NVME" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2-NVME.png" width=800/>
</p>

- 120x larger model size on the same hardware (RTX 3080)

### PaLM
<p id="PaLM-Single" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/PaLM-GPU1.png" width=450/>
</p>

- 34x larger model size on the same hardware

<p align="right">(<a href="#top">back to top</a>)</p>


## Inference
### Colossal-Inference
<p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/inference/colossal-inference-v1-1.png" width=1000/>
</p>

<p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/inference/colossal-inference-v1-2.png" width=1000/>
</p>

 - Large AI models inference speed doubled, compared to the offline inference performance of vLLM in some cases.
[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/colossalai/inference)
[[blog]](https://hpc-ai.com/blog/colossal-inference)
[[GPU Cloud Playground]](https://cloud.luchentech.com/)
[[LLaMA3 Image]](https://cloud.luchentech.com/doc/docs/image/llama)

### Grok-1
<p id="Grok-1" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/examples/images/grok-1-inference.jpg" width=600/>
</p>

 - 314 Billion Parameter Grok-1 Inference Accelerated by 3.8x, an easy-to-use Python + PyTorch + HuggingFace version for Inference.

[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/grok-1)
[[blog]](https://hpc-ai.com/blog/314-billion-parameter-grok-1-inference-accelerated-by-3.8x-efficient-and-easy-to-use-pytorchhuggingface-version-is-here)
[[HuggingFace Grok-1 PyTorch model weights]](https://huggingface.co/hpcai-tech/grok-1)
[[ModelScope Grok-1 PyTorch model weights]](https://www.modelscope.cn/models/colossalai/grok-1-pytorch/summary)

### SwiftInfer
<p id="SwiftInfer" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/SwiftInfer.jpg" width=800/>
</p>

- [SwiftInfer](https://github.com/hpcaitech/SwiftInfer): Inference performance improved by 46%, open source solution breaks the length limit of LLM for multi-round conversations

<p align="right">(<a href="#top">back to top</a>)</p>

## Installation

Requirements:
- PyTorch >= 2.2
- Python >= 3.7
- CUDA >= 11.0
- [NVIDIA GPU Compute Capability](https://developer.nvidia.com/cuda-gpus) >= 7.0 (V100/RTX20 and higher)
- Linux OS

If you encounter any problem with installation, you may want to raise an [issue](https://github.com/hpcaitech/ColossalAI/issues/new/choose) in this repository.

### Install from PyPI

You can easily install Colossal-AI with the following command. **By default, we do not build PyTorch extensions during installation.**

```bash
pip install colossalai
```

**Note: only Linux is supported for now.**

However, if you want to build the PyTorch extensions during installation, you can set `BUILD_EXT=1`.

```bash
BUILD_EXT=1 pip install colossalai
```

**Otherwise, CUDA kernels will be built during runtime when you actually need them.**

We also keep releasing the nightly version to PyPI every week. This allows you to access the unreleased features and bug fixes in the main branch.
Installation can be made via

```bash
pip install colossalai-nightly
```

### Download From Source

> The version of Colossal-AI will be in line with the main branch of the repository. Feel free to raise an issue if you encounter any problems. :)

```shell
git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI

# install colossalai
pip install .
```

By default, we do not compile CUDA/C++ kernels. ColossalAI will build them during runtime.
If you want to install and enable CUDA kernel fusion (compulsory installation when using fused optimizer):

```shell
BUILD_EXT=1 pip install .
```

For Users with CUDA 10.2, you can still build ColossalAI from source. However, you need to manually download the cub library and copy it to the corresponding directory.

```bash
# clone the repository
git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI

# download the cub library
wget https://github.com/NVIDIA/cub/archive/refs/tags/1.8.0.zip
unzip 1.8.0.zip
cp -r cub-1.8.0/cub/ colossalai/kernel/cuda_native/csrc/kernels/include/

# install
BUILD_EXT=1 pip install .
```

<p align="right">(<a href="#top">back to top</a>)</p>

## Use Docker

### Pull from DockerHub

You can directly pull the docker image from our [DockerHub page](https://hub.docker.com/r/hpcaitech/colossalai). The image is automatically uploaded upon release.


### Build On Your Own

Run the following command to build a docker image from Dockerfile provided.

> Building Colossal-AI from scratch requires GPU support, you need to use Nvidia Docker Runtime as the default when doing `docker build`. More details can be found [here](https://stackoverflow.com/questions/59691207/docker-build-with-nvidia-runtime).
> We recommend you install Colossal-AI from our [project page](https://www.colossalai.org) directly.


```bash
cd ColossalAI
docker build -t colossalai ./docker
```

Run the following command to start the docker container in interactive mode.

```bash
docker run -ti --gpus all --rm --ipc=host colossalai bash
```

<p align="right">(<a href="#top">back to top</a>)</p>

## Community

Join the Colossal-AI community on [Forum](https://github.com/hpcaitech/ColossalAI/discussions),
[Slack](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w),
and [WeChat(微信)](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png "qrcode") to share your suggestions, feedback, and questions with our engineering team.

## Contributing
Referring to the successful attempts of [BLOOM](https://bigscience.huggingface.co/) and [Stable Diffusion](https://en.wikipedia.org/wiki/Stable_Diffusion), any and all developers and partners with computing powers, datasets, models are welcome to join and build the Colossal-AI community, making efforts towards the era of big AI models!

You may contact us or participate in the following ways:
1. [Leaving a Star ⭐](https://github.com/hpcaitech/ColossalAI/stargazers) to show your like and support. Thanks!
2. Posting an [issue](https://github.com/hpcaitech/ColossalAI/issues/new/choose), or submitting a PR on GitHub follow the guideline in [Contributing](https://github.com/hpcaitech/ColossalAI/blob/main/CONTRIBUTING.md)
3. Send your official proposal to email contact@hpcaitech.com

Thanks so much to all of our amazing contributors!

<a href="https://github.com/hpcaitech/ColossalAI/graphs/contributors">
  <img src="https://contrib.rocks/image?repo=hpcaitech/ColossalAI"  width="800px"/>
</a>


<p align="right">(<a href="#top">back to top</a>)</p>


## CI/CD

We leverage the power of [GitHub Actions](https://github.com/features/actions) to automate our development, release and deployment workflows. Please check out this [documentation](.github/workflows/README.md) on how the automated workflows are operated.


## Cite Us

This project is inspired by some related projects (some by our team and some by other organizations). We would like to credit these amazing projects as listed in the [Reference List](./docs/REFERENCE.md).

To cite this project, you can use the following BibTeX citation.

```
@inproceedings{10.1145/3605573.3605613,
author = {Li, Shenggui and Liu, Hongxin and Bian, Zhengda and Fang, Jiarui and Huang, Haichen and Liu, Yuliang and Wang, Boxiang and You, Yang},
title = {Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training},
year = {2023},
isbn = {9798400708435},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3605573.3605613},
doi = {10.1145/3605573.3605613},
abstract = {The success of Transformer models has pushed the deep learning model scale to billions of parameters, but the memory limitation of a single GPU has led to an urgent need for training on multi-GPU clusters. However, the best practice for choosing the optimal parallel strategy is still lacking, as it requires domain expertise in both deep learning and parallel computing. The Colossal-AI system addressed the above challenge by introducing a unified interface to scale your sequential code of model training to distributed environments. It supports parallel training methods such as data, pipeline, tensor, and sequence parallelism and is integrated with heterogeneous training and zero redundancy optimizer. Compared to the baseline system, Colossal-AI can achieve up to 2.76 times training speedup on large-scale models.},
booktitle = {Proceedings of the 52nd International Conference on Parallel Processing},
pages = {766–775},
numpages = {10},
keywords = {datasets, gaze detection, text tagging, neural networks},
location = {Salt Lake City, UT, USA},
series = {ICPP '23}
}
```

Colossal-AI has been accepted as official tutorial by top conferences [NeurIPS](https://nips.cc/), [SC](https://sc22.supercomputing.org/), [AAAI](https://aaai.org/Conferences/AAAI-23/),
[PPoPP](https://ppopp23.sigplan.org/), [CVPR](https://cvpr2023.thecvf.com/), [ISC](https://www.isc-hpc.com/), [NVIDIA GTC](https://www.nvidia.com/en-us/on-demand/session/gtcspring23-S51482/) ,etc.

<p align="right">(<a href="#top">back to top</a>)</p>


================================================
FILE: applications/Colossal-LLaMA/README.md
================================================
<div align="center">
<h1>
Colossal-LLaMA
</h1>

 <h3>
 <a href="https://cloud.luchentech.com/">GPU Cloud Playground </a> </a> |
 <a href="https://cloud.luchentech.com/doc/docs/image/llama"> LLaMA3 Image </a>
 </h3>

</div>

## Table of Contents
- [Table of Contents](#table-of-contents)
- [News](#news)
- [Colossal-LLaMA-2-7B](#colossal-llama-2-7b)
- [Colossal-LLaMA-2-13B](#colossal-llama-2-13b)
  - [Performance Evaluation](#performance-evaluation)
    - [Model with ~7 Billion Parameters](#model-with-7-billion-parameters)
    - [Model with ~13 Billion Parameters](#model-with-13-billion-parameters)
  - [Examples](#examples)
  - [Training Logs](#training-logs)
    - [Colossal-LLaMA-2-7b-base](#colossal-llama-2-7b-base)
    - [Colossal-LLaMA-2-13b-base](#colossal-llama-2-13b-base)
  - [Inference](#inference)
    - [Import from HuggingFace](#import-from-huggingface)
    - [Import from Modelscope](#import-from-modelscope)
    - [Quick Start](#quick-start)
- [Usage](#usage)
  - [Install](#install)
    - [0. Pre-requisite](#0-pre-requisite)
    - [1. Install required packages](#1-install-required-packages)
    - [2. Install Apex](#2-install-apex)
  - [How to run](#how-to-run)
    - [1. Init Tokenizer Preparation](#1-init-tokenizer-preparation)
    - [2. Init Model Preparation](#2-init-model-preparation)
    - [3. Data Preparation](#3-data-preparation)
      - [3.1 Data for Pretraining](#31-data-for-pretraining)
      - [3.2 Data for Supervised Fine-tuning](#32-data-for-supervised-fine-tuning)
    - [4. Command Line Arguments for Training](#4-command-line-arguments-for-training)
      - [4.1 Arguments for Pretraining](#41-arguments-for-pretraining)
      - [4.2 Arguments for Supervised Fine-tuning](#42-arguments-for-supervised-fine-tuning)
    - [5. Running Command](#5-running-command)
      - [5.1 Command for Pretraining](#51-command-for-pretraining)
      - [5.2 Command for Supervised Fine-tuning](#52-command-for-supervised-fine-tuning)
- [Technical Insights](#technical-insights)
  - [Data](#data)
  - [Tokenizer](#tokenizer)
  - [Training Strategy](#training-strategy)
    - [Multi-stage Training](#multi-stage-training)
    - [Bucket-based Training](#bucket-based-training)
  - [Bridging Any Domain-specific Large Models](#bridging-any-domain-specific-large-models)
- [Citations](#citations)

## News
* [2024/4] Support continual pre-training and supervised fine-tuning of LLaMA-3.
* [2024/01] [Construct Refined 13B Private Model With Just $5000 USD, Upgraded Colossal-AI Llama-2 Open Source](https://hpc-ai.com/blog/colossal-llama-2-13b).
[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Colossal-LLaMA-2)
[[blog]](https://hpc-ai.com/blog/colossal-llama-2-13b)
[[HuggingFace model weights]](https://huggingface.co/hpcai-tech/Colossal-LLaMA-2-13b-base)
[[Modelscope model weights]](https://www.modelscope.cn/models/colossalai/Colossal-LLaMA-2-13b-base/summary)
* [2023/09] [One Half-Day of Training Using a Few Hundred Dollars Yields Similar Results to Mainstream Large Models, Open-Source and Commercial-Free Domain-Specific Llm Solution](https://www.hpc-ai.tech/blog/one-half-day-of-training-using-a-few-hundred-dollars-yields-similar-results-to-mainstream-large-models-open-source-and-commercial-free-domain-specific-llm-solution).
[[code]](https://github.com/hpcaitech

Download .txt

gitextract_ft07ahjp/

├── .clang-format
├── .compatibility
├── .coveragerc
├── .cuda_ext.json
├── .github/
│   ├── CODEOWNERS
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug-report.yml
│   │   ├── config.yml
│   │   ├── documentation.yml
│   │   ├── feature_request.yml
│   │   └── proposal.yml
│   ├── pull_request_template.md
│   └── workflows/
│       ├── README.md
│       ├── build_on_pr.yml
│       ├── build_on_schedule.yml
│       ├── close_inactive.yml
│       ├── compatiblity_test_on_dispatch.yml
│       ├── compatiblity_test_on_pr.yml
│       ├── compatiblity_test_on_schedule.yml
│       ├── cuda_ext_check_before_merge.yml
│       ├── doc_build_on_schedule_after_release.yml
│       ├── doc_check_on_pr.yml
│       ├── doc_test_on_pr.yml
│       ├── doc_test_on_schedule.yml
│       ├── draft_github_release_post_after_merge.yml
│       ├── example_check_on_dispatch.yml
│       ├── example_check_on_pr.yml
│       ├── example_check_on_schedule.yml
│       ├── release_docker_after_publish.yml
│       ├── release_nightly_on_schedule.yml
│       ├── release_pypi_after_merge.yml
│       ├── release_test_pypi_before_merge.yml
│       ├── report_leaderboard_to_lark.yml
│       ├── report_test_coverage.yml
│       ├── run_chatgpt_examples.yml
│       ├── run_chatgpt_unit_tests.yml
│       ├── run_colossalqa_unit_tests.yml
│       ├── scripts/
│       │   ├── check_doc_i18n.py
│       │   ├── example_checks/
│       │   │   ├── check_dispatch_inputs.py
│       │   │   ├── check_example_weekly.py
│       │   │   └── detect_changed_example.py
│       │   ├── generate_leaderboard_and_send_to_lark.py
│       │   ├── generate_release_draft.py
│       │   ├── send_message_to_lark.py
│       │   └── update_setup_for_nightly.py
│       ├── submodule.yml
│       └── translate_comment.yml
├── .gitignore
├── .gitmodules
├── .isort.cfg
├── .pre-commit-config.yaml
├── CHANGE_LOG.md
├── CONTRIBUTING.md
├── LICENSE
├── MANIFEST.in
├── README.md
├── applications/
│   ├── Colossal-LLaMA/
│   │   ├── README.md
│   │   ├── colossal_llama/
│   │   │   ├── __init__.py
│   │   │   ├── dataset/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── conversation.py
│   │   │   │   ├── dummy_dataset.py
│   │   │   │   ├── loader.py
│   │   │   │   └── spliced_and_tokenized_dataset.py
│   │   │   ├── model/
│   │   │   │   └── init_model.py
│   │   │   ├── tokenizer/
│   │   │   │   └── init_tokenizer.py
│   │   │   └── utils/
│   │   │       ├── __init__.py
│   │   │       ├── ckpt_io.py
│   │   │       ├── froze.py
│   │   │       ├── neftune_patch.py
│   │   │       ├── stream_chat_patch.py
│   │   │       └── utils.py
│   │   ├── dataset/
│   │   │   ├── prepare_pretrain_dataset.py
│   │   │   └── prepare_sft_dataset.py
│   │   ├── docs/
│   │   │   ├── example_13b.md
│   │   │   └── example_7b.md
│   │   ├── hostfile.example
│   │   ├── inference/
│   │   │   ├── inference_example.py
│   │   │   └── stream_chat_example.py
│   │   ├── requirements.txt
│   │   ├── setup.py
│   │   ├── train.example.sh
│   │   ├── train.py
│   │   ├── train_sft.example.sh
│   │   └── version.txt
│   ├── ColossalChat/
│   │   ├── .gitignore
│   │   ├── LICENSE
│   │   ├── README.md
│   │   ├── benchmarks/
│   │   │   ├── Opt.json
│   │   │   ├── README.md
│   │   │   ├── benchmark_dpo.sh
│   │   │   ├── benchmark_kto.sh
│   │   │   ├── benchmark_memory_consumption.txt
│   │   │   ├── benchmark_orpo.sh
│   │   │   ├── benchmark_performance_summarization.txt
│   │   │   ├── benchmark_ppo.py
│   │   │   ├── benchmark_ppo.sh
│   │   │   ├── benchmark_sft.sh
│   │   │   ├── benchmark_simpo.sh
│   │   │   ├── data_preparation.sh
│   │   │   ├── dummy_dataset.py
│   │   │   ├── prepare_dummy_test_dataset.py
│   │   │   └── ray/
│   │   │       ├── 1mmt_dummy.py
│   │   │       └── mmmt_dummy.py
│   │   ├── coati/
│   │   │   ├── __init__.py
│   │   │   ├── dataset/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── conversation.py
│   │   │   │   ├── loader.py
│   │   │   │   ├── tokenization_utils.py
│   │   │   │   └── utils.py
│   │   │   ├── distributed/
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── comm.py
│   │   │   │   ├── consumer.py
│   │   │   │   ├── grpo_consumer.py
│   │   │   │   ├── inference_backend.py
│   │   │   │   ├── launch.py
│   │   │   │   ├── launch_zero_bubble.py
│   │   │   │   ├── loss.py
│   │   │   │   ├── producer.py
│   │   │   │   ├── profiling_utils.py
│   │   │   │   ├── reward/
│   │   │   │   │   ├── code_reward/
│   │   │   │   │   │   ├── testing_util.py
│   │   │   │   │   │   └── utils.py
│   │   │   │   │   ├── reward_fn.py
│   │   │   │   │   ├── reward_utils.py
│   │   │   │   │   └── verifiable_reward.py
│   │   │   │   ├── utils.py
│   │   │   │   └── zero_bubble/
│   │   │   │       ├── README.md
│   │   │   │       ├── __init__.py
│   │   │   │       ├── consumer.py
│   │   │   │       ├── distributor.py
│   │   │   │       ├── grpo_consumer.py
│   │   │   │       ├── producer.py
│   │   │   │       └── requirements.txt
│   │   │   ├── experience_buffer/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base.py
│   │   │   │   ├── naive.py
│   │   │   │   └── utils.py
│   │   │   ├── experience_maker/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base.py
│   │   │   │   └── naive.py
│   │   │   ├── models/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base.py
│   │   │   │   ├── critic.py
│   │   │   │   ├── generation.py
│   │   │   │   ├── lora.py
│   │   │   │   ├── loss.py
│   │   │   │   ├── reward_model.py
│   │   │   │   ├── rlvr_reward_model.py
│   │   │   │   └── utils.py
│   │   │   ├── quant/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── llama_gptq/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── loader.py
│   │   │   │   │   ├── model_utils.py
│   │   │   │   │   └── quant.py
│   │   │   │   └── utils.py
│   │   │   ├── ray/
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── callbacks/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   └── performance_evaluator.py
│   │   │   │   ├── detached_replay_buffer.py
│   │   │   │   ├── detached_trainer_base.py
│   │   │   │   ├── detached_trainer_ppo.py
│   │   │   │   ├── experience_maker_holder.py
│   │   │   │   ├── lora_constructor.py
│   │   │   │   └── utils.py
│   │   │   ├── trainer/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base.py
│   │   │   │   ├── callbacks/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   └── performance_evaluator.py
│   │   │   │   ├── dpo.py
│   │   │   │   ├── grpo.py
│   │   │   │   ├── kto.py
│   │   │   │   ├── orpo.py
│   │   │   │   ├── ppo.py
│   │   │   │   ├── rm.py
│   │   │   │   ├── sft.py
│   │   │   │   └── utils.py
│   │   │   └── utils/
│   │   │       ├── __init__.py
│   │   │       ├── accumulative_meter.py
│   │   │       ├── ckpt_io.py
│   │   │       └── reward_score/
│   │   │           ├── __init__.py
│   │   │           ├── competition.py
│   │   │           ├── gsm8k.py
│   │   │           └── utils.py
│   │   ├── conversation_template/
│   │   │   ├── 01-ai_Yi-1.5-9B-Chat.json
│   │   │   ├── MiniCPM-2b.json
│   │   │   ├── Qwen_Qwen1.5-110B-Chat.json
│   │   │   ├── Qwen_Qwen1.5-32B-Chat.json
│   │   │   ├── Qwen_Qwen2.5-3B.json
│   │   │   ├── THUDM_chatglm2-6b.json
│   │   │   ├── THUDM_chatglm3-6b.json
│   │   │   ├── baichuan-inc_Baichuan2-13B-Chat.json
│   │   │   ├── colossal-llama2.json
│   │   │   ├── deepseek-ai_DeepSeek-V2-Lite.json
│   │   │   ├── llama2.json
│   │   │   ├── microsoft_phi-2.json
│   │   │   ├── mistralai_Mixtral-8x7B-Instruct-v0.1.json
│   │   │   └── tiny-llama.json
│   │   ├── examples/
│   │   │   ├── README.md
│   │   │   ├── community/
│   │   │   │   ├── README.md
│   │   │   │   ├── peft/
│   │   │   │   │   ├── README.md
│   │   │   │   │   ├── easy_dataset.py
│   │   │   │   │   ├── easy_models.py
│   │   │   │   │   ├── train_peft_prompts.py
│   │   │   │   │   └── train_peft_sft.py
│   │   │   │   └── ray/
│   │   │   │       ├── README.md
│   │   │   │       ├── ray_job_script.py
│   │   │   │       └── train_prompts_on_ray.py
│   │   │   ├── data_preparation_scripts/
│   │   │   │   ├── prepare_dataset.py
│   │   │   │   ├── prepare_kto_dataset.sh
│   │   │   │   ├── prepare_preference_dataset.sh
│   │   │   │   ├── prepare_prompt_dataset.sh
│   │   │   │   └── prepare_sft_dataset.sh
│   │   │   ├── inference/
│   │   │   │   ├── chatio.py
│   │   │   │   ├── inference.py
│   │   │   │   └── web_chatbot/
│   │   │   │       ├── README.md
│   │   │   │       ├── locustfile.py
│   │   │   │       ├── requirements.txt
│   │   │   │       ├── server.py
│   │   │   │       └── utils.py
│   │   │   ├── requirements.txt
│   │   │   └── training_scripts/
│   │   │       ├── hostfile
│   │   │       ├── lora_config.json
│   │   │       ├── lora_finetune.py
│   │   │       ├── lora_sft_data.jsonl
│   │   │       ├── train_dpo.py
│   │   │       ├── train_dpo.sh
│   │   │       ├── train_grpo.py
│   │   │       ├── train_grpo.sh
│   │   │       ├── train_kto.py
│   │   │       ├── train_kto.sh
│   │   │       ├── train_orpo.py
│   │   │       ├── train_orpo.sh
│   │   │       ├── train_ppo.py
│   │   │       ├── train_ppo.sh
│   │   │       ├── train_rm.py
│   │   │       ├── train_rm.sh
│   │   │       ├── train_sft.py
│   │   │       └── train_sft.sh
│   │   ├── profiling.sh
│   │   ├── pytest.ini
│   │   ├── rl_example.py
│   │   ├── rl_example_zero_bubble.py
│   │   ├── setup.py
│   │   ├── start_code_verifier.py
│   │   ├── tests/
│   │   │   ├── __init__.py
│   │   │   ├── generate_dummy_datasets_for_testing.py
│   │   │   ├── llama.json
│   │   │   ├── opt.json
│   │   │   ├── prepare_test_env.sh
│   │   │   ├── test_data/
│   │   │   │   ├── dpo/
│   │   │   │   │   └── test_dpo_data.jsonl
│   │   │   │   ├── kto/
│   │   │   │   │   └── test_kto_data.jsonl
│   │   │   │   └── sft/
│   │   │   │       └── test_sft_data.jsonl
│   │   │   ├── test_data_preparation.sh
│   │   │   ├── test_lora.py
│   │   │   ├── test_templating.sh
│   │   │   ├── test_train.sh
│   │   │   └── verify_chat_data.py
│   │   └── visualization.py
│   ├── ColossalEval/
│   │   ├── README.md
│   │   ├── colossal_eval/
│   │   │   ├── __init__.py
│   │   │   ├── dataset/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── agieval.py
│   │   │   │   ├── base.py
│   │   │   │   ├── ceval.py
│   │   │   │   ├── cmmlu.py
│   │   │   │   ├── colossalai.py
│   │   │   │   ├── cvalues.py
│   │   │   │   ├── gaokaobench.py
│   │   │   │   ├── gsm.py
│   │   │   │   ├── longbench.py
│   │   │   │   ├── mmlu.py
│   │   │   │   ├── mtbench.py
│   │   │   │   ├── safetybench_en.py
│   │   │   │   └── safetybench_zh.py
│   │   │   ├── evaluate/
│   │   │   │   ├── GPT Evaluation.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── dataset_evaluator/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── dataset_evaluator.py
│   │   │   │   │   ├── gpt_judge.py
│   │   │   │   │   └── metrics.py
│   │   │   │   ├── evaluator.py
│   │   │   │   ├── gpt_evaluate.py
│   │   │   │   └── utils.py
│   │   │   ├── models/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base.py
│   │   │   │   ├── chatglm.py
│   │   │   │   ├── huggingface.py
│   │   │   │   └── vllm.py
│   │   │   └── utils/
│   │   │       ├── __init__.py
│   │   │       ├── conversation.py
│   │   │       └── utilities.py
│   │   ├── configs/
│   │   │   └── gpt_evaluation/
│   │   │       ├── config/
│   │   │       │   ├── config_cn.json
│   │   │       │   └── config_en.json
│   │   │       ├── data/
│   │   │       │   ├── eval_cn_examples.json
│   │   │       │   └── eval_en_examples.json
│   │   │       └── prompt/
│   │   │           ├── battle_prompt/
│   │   │           │   ├── battle_prompt_cn.json
│   │   │           │   └── battle_prompt_en.json
│   │   │           └── evaluation_prompt/
│   │   │               ├── evaluation_prompt_cn.json
│   │   │               └── evaluation_prompt_en.json
│   │   ├── examples/
│   │   │   ├── dataset_evaluation/
│   │   │   │   ├── config/
│   │   │   │   │   ├── evaluation/
│   │   │   │   │   │   └── config.json
│   │   │   │   │   └── inference/
│   │   │   │   │       └── config.json
│   │   │   │   ├── eval_dataset.py
│   │   │   │   ├── eval_dataset.sh
│   │   │   │   ├── inference.py
│   │   │   │   └── inference.sh
│   │   │   └── gpt_evaluation/
│   │   │       ├── config/
│   │   │       │   ├── evaluation/
│   │   │       │   │   └── config.json
│   │   │       │   └── inference/
│   │   │       │       └── config.json
│   │   │       ├── eval.py
│   │   │       ├── eval.sh
│   │   │       ├── inference.py
│   │   │       └── inference.sh
│   │   ├── requirements.txt
│   │   └── setup.py
│   ├── ColossalMoE/
│   │   ├── README.md
│   │   ├── infer.py
│   │   ├── infer.sh
│   │   ├── requirements.txt
│   │   ├── setup.py
│   │   ├── tests/
│   │   │   └── __init__.py
│   │   ├── train.py
│   │   ├── train.sh
│   │   ├── utils.py
│   │   └── version.txt
│   ├── ColossalQA/
│   │   ├── .gitignore
│   │   ├── README.md
│   │   ├── colossalqa/
│   │   │   ├── __init__.py
│   │   │   ├── chain/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── memory/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── summary.py
│   │   │   │   └── retrieval_qa/
│   │   │   │       ├── __init__.py
│   │   │   │       ├── base.py
│   │   │   │       ├── load_chain.py
│   │   │   │       └── stuff.py
│   │   │   ├── data_loader/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── document_loader.py
│   │   │   │   └── table_dataloader.py
│   │   │   ├── local/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── colossalcloud_llm.py
│   │   │   │   ├── llm.py
│   │   │   │   ├── pangu_llm.py
│   │   │   │   └── utils.py
│   │   │   ├── memory.py
│   │   │   ├── mylogging.py
│   │   │   ├── prompt/
│   │   │   │   ├── README.md
│   │   │   │   └── prompt.py
│   │   │   ├── retrieval_conversation_en.py
│   │   │   ├── retrieval_conversation_universal.py
│   │   │   ├── retrieval_conversation_zh.py
│   │   │   ├── retriever.py
│   │   │   ├── text_splitter/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── chinese_text_splitter.py
│   │   │   │   └── utils.py
│   │   │   └── utils.py
│   │   ├── data/
│   │   │   ├── data_sample/
│   │   │   │   ├── companies.txt
│   │   │   │   ├── companies_zh.txt
│   │   │   │   ├── csv_organization_100.csv
│   │   │   │   ├── custom_service.json
│   │   │   │   ├── custom_service_classification.json
│   │   │   │   ├── custom_service_preprocessed.json
│   │   │   │   └── luchen_zh.txt
│   │   │   └── tests/
│   │   │       ├── 64KB.json
│   │   │       ├── companies.csv
│   │   │       ├── test.html
│   │   │       ├── test.md
│   │   │       └── test.txt
│   │   ├── examples/
│   │   │   ├── conversation_agent_chatgpt.py
│   │   │   ├── retrieval_conversation_chatgpt.py
│   │   │   ├── retrieval_conversation_en.py
│   │   │   ├── retrieval_conversation_en_customer_service.py
│   │   │   ├── retrieval_conversation_universal.py
│   │   │   ├── retrieval_conversation_zh.py
│   │   │   ├── retrieval_intent_classification_zh_customer_service.py
│   │   │   └── webui_demo/
│   │   │       ├── RAG_ChatBot.py
│   │   │       ├── README.md
│   │   │       ├── config.py
│   │   │       ├── requirements.txt
│   │   │       ├── server.py
│   │   │       ├── utils.py
│   │   │       └── webui.py
│   │   ├── pytest.ini
│   │   ├── requirements.txt
│   │   ├── setup.py
│   │   ├── tests/
│   │   │   ├── __init__.py
│   │   │   ├── test_document_loader.py
│   │   │   ├── test_memory.py
│   │   │   ├── test_retrieval_qa.py
│   │   │   └── test_text_splitter.py
│   │   └── version.txt
│   └── README.md
├── colossalai/
│   ├── _C/
│   │   └── __init__.py
│   ├── __init__.py
│   ├── _analyzer/
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── _subclasses/
│   │   │   ├── __init__.py
│   │   │   ├── _meta_registration.py
│   │   │   ├── _monkey_patch.py
│   │   │   ├── flop_tensor.py
│   │   │   └── meta_tensor.py
│   │   ├── envs.py
│   │   └── fx/
│   │       ├── __init__.py
│   │       ├── codegen.py
│   │       ├── graph_module.py
│   │       ├── node_util.py
│   │       ├── passes/
│   │       │   ├── __init__.py
│   │       │   ├── graph_profile.py
│   │       │   └── shape_prop.py
│   │       ├── symbolic_profile.py
│   │       └── tracer/
│   │           ├── __init__.py
│   │           ├── bias_addition.py
│   │           ├── custom_leaf_module.py
│   │           ├── proxy.py
│   │           ├── symbolic_trace.py
│   │           └── tracer.py
│   ├── accelerator/
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── api.py
│   │   ├── base_accelerator.py
│   │   ├── cpu_accelerator.py
│   │   ├── cuda_accelerator.py
│   │   └── npu_accelerator.py
│   ├── amp/
│   │   ├── __init__.py
│   │   └── naive_amp/
│   │       ├── __init__.py
│   │       ├── grad_scaler/
│   │       │   ├── __init__.py
│   │       │   ├── base_grad_scaler.py
│   │       │   ├── constant_grad_scaler.py
│   │       │   └── dynamic_grad_scaler.py
│   │       ├── mixed_precision_mixin/
│   │       │   ├── __init__.py
│   │       │   ├── base.py
│   │       │   ├── bf16.py
│   │       │   └── fp16.py
│   │       └── mixed_precision_optimizer.py
│   ├── auto_parallel/
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── checkpoint/
│   │   │   ├── __init__.py
│   │   │   ├── build_c_ext.py
│   │   │   ├── ckpt_solver_base.py
│   │   │   ├── ckpt_solver_chen.py
│   │   │   ├── ckpt_solver_rotor.c
│   │   │   ├── ckpt_solver_rotor.py
│   │   │   └── operation.py
│   │   ├── meta_profiler/
│   │   │   ├── __init__.py
│   │   │   ├── constants.py
│   │   │   ├── meta_registry/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── activation.py
│   │   │   │   ├── binary_elementwise_ops.py
│   │   │   │   ├── conv.py
│   │   │   │   ├── embedding.py
│   │   │   │   ├── linear.py
│   │   │   │   ├── non_spmd.py
│   │   │   │   ├── norm.py
│   │   │   │   ├── pooling.py
│   │   │   │   ├── tensor.py
│   │   │   │   └── where.py
│   │   │   ├── registry.py
│   │   │   └── shard_metainfo.py
│   │   ├── offload/
│   │   │   ├── __init__.py
│   │   │   ├── amp_optimizer.py
│   │   │   ├── base_offload_module.py
│   │   │   ├── mem_optimize.py
│   │   │   ├── region.py
│   │   │   ├── region_manager.py
│   │   │   ├── runtime.py
│   │   │   ├── solver.py
│   │   │   ├── training_simulator.py
│   │   │   └── util.py
│   │   ├── passes/
│   │   │   ├── __init__.py
│   │   │   ├── comm_metainfo_pass.py
│   │   │   ├── constants.py
│   │   │   ├── meta_info_prop.py
│   │   │   ├── runtime_apply_pass.py
│   │   │   └── runtime_preparation_pass.py
│   │   ├── pipeline_shard/
│   │   │   └── __init__.py
│   │   └── tensor_shard/
│   │       ├── __init__.py
│   │       ├── constants.py
│   │       ├── initialize.py
│   │       ├── node_handler/
│   │       │   ├── __init__.py
│   │       │   ├── addmm_handler.py
│   │       │   ├── batch_norm_handler.py
│   │       │   ├── binary_elementwise_handler.py
│   │       │   ├── bmm_handler.py
│   │       │   ├── conv_handler.py
│   │       │   ├── default_reshape_handler.py
│   │       │   ├── embedding_handler.py
│   │       │   ├── getattr_handler.py
│   │       │   ├── getitem_handler.py
│   │       │   ├── layer_norm_handler.py
│   │       │   ├── linear_handler.py
│   │       │   ├── matmul_handler.py
│   │       │   ├── node_handler.py
│   │       │   ├── normal_pooling_handler.py
│   │       │   ├── output_handler.py
│   │       │   ├── permute_handler.py
│   │       │   ├── placeholder_handler.py
│   │       │   ├── registry.py
│   │       │   ├── softmax_handler.py
│   │       │   ├── split_handler.py
│   │       │   ├── strategy/
│   │       │   │   ├── __init__.py
│   │       │   │   ├── batch_norm_generator.py
│   │       │   │   ├── binary_elementwise_generator.py
│   │       │   │   ├── conv_strategy_generator.py
│   │       │   │   ├── embedding_generator.py
│   │       │   │   ├── getattr_generator.py
│   │       │   │   ├── getitem_generator.py
│   │       │   │   ├── layer_norm_generator.py
│   │       │   │   ├── matmul_strategy_generator.py
│   │       │   │   ├── normal_pooling_generator.py
│   │       │   │   ├── output_generator.py
│   │       │   │   ├── placeholder_generator.py
│   │       │   │   ├── reshape_generator.py
│   │       │   │   ├── softmax_generator.py
│   │       │   │   ├── strategy_generator.py
│   │       │   │   ├── sum_generator.py
│   │       │   │   ├── tensor_constructor_generator.py
│   │       │   │   ├── unary_elementwise_generator.py
│   │       │   │   └── where_generator.py
│   │       │   ├── sum_handler.py
│   │       │   ├── tensor_constructor_handler.py
│   │       │   ├── transpose_handler.py
│   │       │   ├── unary_elementwise_handler.py
│   │       │   ├── view_handler.py
│   │       │   └── where_handler.py
│   │       ├── options.py
│   │       ├── sharding_strategy.py
│   │       ├── solver/
│   │       │   ├── __init__.py
│   │       │   ├── cost_graph.py
│   │       │   ├── graph_analysis.py
│   │       │   ├── solver.py
│   │       │   └── strategies_constructor.py
│   │       └── utils/
│   │           ├── __init__.py
│   │           ├── broadcast.py
│   │           ├── factory.py
│   │           ├── misc.py
│   │           ├── reshape.py
│   │           └── sharding.py
│   ├── autochunk/
│   │   ├── autochunk_codegen.py
│   │   ├── estimate_memory.py
│   │   ├── reorder_graph.py
│   │   ├── search_chunk.py
│   │   ├── select_chunk.py
│   │   ├── trace_flow.py
│   │   ├── trace_indice.py
│   │   └── utils.py
│   ├── booster/
│   │   ├── __init__.py
│   │   ├── accelerator.py
│   │   ├── booster.py
│   │   ├── mixed_precision/
│   │   │   ├── __init__.py
│   │   │   ├── bf16.py
│   │   │   ├── fp16_apex.py
│   │   │   ├── fp16_naive.py
│   │   │   ├── fp16_torch.py
│   │   │   ├── fp8.py
│   │   │   └── mixed_precision_base.py
│   │   └── plugin/
│   │       ├── __init__.py
│   │       ├── dp_plugin_base.py
│   │       ├── gemini_plugin.py
│   │       ├── hybrid_parallel_plugin.py
│   │       ├── low_level_zero_plugin.py
│   │       ├── moe_hybrid_parallel_plugin.py
│   │       ├── plugin_base.py
│   │       ├── pp_plugin_base.py
│   │       ├── torch_ddp_plugin.py
│   │       └── torch_fsdp_plugin.py
│   ├── checkpoint_io/
│   │   ├── __init__.py
│   │   ├── checkpoint_io_base.py
│   │   ├── general_checkpoint_io.py
│   │   ├── hybrid_parallel_checkpoint_io.py
│   │   ├── index_file.py
│   │   ├── moe_checkpoint.py
│   │   └── utils.py
│   ├── cli/
│   │   ├── __init__.py
│   │   ├── check/
│   │   │   ├── __init__.py
│   │   │   └── check_installation.py
│   │   ├── cli.py
│   │   └── launcher/
│   │       ├── __init__.py
│   │       ├── hostinfo.py
│   │       ├── multinode_runner.py
│   │       └── run.py
│   ├── cluster/
│   │   ├── __init__.py
│   │   ├── device_mesh_manager.py
│   │   ├── dist_coordinator.py
│   │   ├── process_group_manager.py
│   │   └── process_group_mesh.py
│   ├── context/
│   │   ├── __init__.py
│   │   ├── config.py
│   │   └── singleton_meta.py
│   ├── device/
│   │   ├── __init__.py
│   │   ├── alpha_beta_profiler.py
│   │   ├── calc_pipeline_strategy.py
│   │   └── device_mesh.py
│   ├── fx/
│   │   ├── __init__.py
│   │   ├── _compatibility.py
│   │   ├── _meta_regist_12.py
│   │   ├── _meta_regist_13.py
│   │   ├── codegen/
│   │   │   ├── __init__.py
│   │   │   └── activation_checkpoint_codegen.py
│   │   ├── graph_module.py
│   │   ├── passes/
│   │   │   ├── __init__.py
│   │   │   ├── adding_split_node_pass.py
│   │   │   ├── concrete_info_prop.py
│   │   │   ├── experimental/
│   │   │   │   └── adding_shape_consistency_pass.py
│   │   │   ├── meta_info_prop.py
│   │   │   ├── passes_for_gpt2_test.py
│   │   │   ├── shard_1d_pass.py
│   │   │   ├── split_module.py
│   │   │   └── utils.py
│   │   ├── profiler/
│   │   │   ├── __init__.py
│   │   │   ├── constants.py
│   │   │   ├── dataflow.py
│   │   │   ├── experimental/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── constants.py
│   │   │   │   ├── profiler.py
│   │   │   │   ├── profiler_function/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── activation_function.py
│   │   │   │   │   ├── arithmetic.py
│   │   │   │   │   ├── embedding.py
│   │   │   │   │   ├── linear.py
│   │   │   │   │   ├── normalization.py
│   │   │   │   │   ├── pooling.py
│   │   │   │   │   ├── python_ops.py
│   │   │   │   │   └── torch_ops.py
│   │   │   │   ├── profiler_module/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── activation_function.py
│   │   │   │   │   ├── attention.py
│   │   │   │   │   ├── convolution.py
│   │   │   │   │   ├── dropout.py
│   │   │   │   │   ├── embedding.py
│   │   │   │   │   ├── linear.py
│   │   │   │   │   ├── normalization.py
│   │   │   │   │   ├── pooling.py
│   │   │   │   │   ├── rnn.py
│   │   │   │   │   └── torch_op.py
│   │   │   │   ├── registry.py
│   │   │   │   └── shard_utils.py
│   │   │   ├── memory_utils.py
│   │   │   ├── opcount.py
│   │   │   ├── profiler.py
│   │   │   ├── shard_utils.py
│   │   │   └── tensor.py
│   │   ├── proxy.py
│   │   └── tracer/
│   │       ├── __init__.py
│   │       ├── _meta_trace.py
│   │       ├── _symbolic_trace.py
│   │       ├── _tracer_utils.py
│   │       ├── bias_addition_patch/
│   │       │   ├── __init__.py
│   │       │   ├── patched_bias_addition_function/
│   │       │   │   ├── __init__.py
│   │       │   │   ├── addbmm.py
│   │       │   │   ├── addmm.py
│   │       │   │   ├── bias_addition_function.py
│   │       │   │   └── linear.py
│   │       │   └── patched_bias_addition_module/
│   │       │       ├── __init__.py
│   │       │       ├── bias_addition_module.py
│   │       │       ├── conv.py
│   │       │       └── linear.py
│   │       ├── experimental.py
│   │       ├── meta_patch/
│   │       │   ├── __init__.py
│   │       │   ├── patched_function/
│   │       │   │   ├── __init__.py
│   │       │   │   ├── activation_function.py
│   │       │   │   ├── arithmetic.py
│   │       │   │   ├── convolution.py
│   │       │   │   ├── embedding.py
│   │       │   │   ├── normalization.py
│   │       │   │   ├── python_ops.py
│   │       │   │   └── torch_ops.py
│   │       │   └── patched_module/
│   │       │       ├── __init__.py
│   │       │       ├── activation_function.py
│   │       │       ├── convolution.py
│   │       │       ├── embedding.py
│   │       │       ├── linear.py
│   │       │       ├── normalization.py
│   │       │       ├── pooling.py
│   │       │       └── rnn.py
│   │       ├── registry.py
│   │       └── tracer.py
│   ├── inference/
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── batch_bucket.py
│   │   ├── config.py
│   │   ├── core/
│   │   │   ├── __init__.py
│   │   │   ├── async_engine.py
│   │   │   ├── base_engine.py
│   │   │   ├── diffusion_engine.py
│   │   │   ├── engine.py
│   │   │   ├── llm_engine.py
│   │   │   ├── plugin.py
│   │   │   ├── request_handler.py
│   │   │   └── rpc_engine.py
│   │   ├── executor/
│   │   │   ├── __init__.py
│   │   │   └── rpc_worker.py
│   │   ├── flash_decoding_utils.py
│   │   ├── graph_runner.py
│   │   ├── kv_cache/
│   │   │   ├── __init__.py
│   │   │   ├── block_cache.py
│   │   │   └── kvcache_manager.py
│   │   ├── logit_processors.py
│   │   ├── modeling/
│   │   │   ├── __init__.py
│   │   │   ├── backends/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── attention_backend.py
│   │   │   │   └── pre_attention_backend.py
│   │   │   ├── layers/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── attention.py
│   │   │   │   ├── baichuan_tp_linear.py
│   │   │   │   ├── diffusion.py
│   │   │   │   └── distrifusion.py
│   │   │   ├── models/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── glide_llama.py
│   │   │   │   ├── nopadding_baichuan.py
│   │   │   │   ├── nopadding_llama.py
│   │   │   │   ├── pixart_alpha.py
│   │   │   │   └── stablediffusion3.py
│   │   │   └── policy/
│   │   │       ├── __init__.py
│   │   │       ├── glide_llama.py
│   │   │       ├── nopadding_baichuan.py
│   │   │       ├── nopadding_llama.py
│   │   │       ├── pixart_alpha.py
│   │   │       └── stablediffusion3.py
│   │   ├── sampler.py
│   │   ├── server/
│   │   │   ├── __init__.py
│   │   │   ├── api_server.py
│   │   │   ├── chat_service.py
│   │   │   ├── completion_service.py
│   │   │   └── utils.py
│   │   ├── spec/
│   │   │   ├── __init__.py
│   │   │   ├── drafter.py
│   │   │   └── struct.py
│   │   ├── struct.py
│   │   └── utils.py
│   ├── initialize.py
│   ├── interface/
│   │   ├── __init__.py
│   │   ├── model.py
│   │   ├── optimizer.py
│   │   └── pretrained.py
│   ├── kernel/
│   │   ├── __init__.py
│   │   ├── jit/
│   │   │   ├── __init__.py
│   │   │   ├── bias_dropout_add.py
│   │   │   ├── bias_gelu.py
│   │   │   └── option.py
│   │   ├── kernel_loader.py
│   │   └── triton/
│   │       ├── __init__.py
│   │       ├── context_attn_unpad.py
│   │       ├── flash_decoding.py
│   │       ├── fused_rotary_embedding.py
│   │       ├── kvcache_copy.py
│   │       ├── llama_act_combine_kernel.py
│   │       ├── no_pad_rotary_embedding.py
│   │       ├── qkv_matmul_kernel.py
│   │       ├── rms_layernorm.py
│   │       ├── rotary_cache_copy.py
│   │       └── softmax.py
│   ├── lazy/
│   │   ├── __init__.py
│   │   ├── construction.py
│   │   ├── lazy_init.py
│   │   └── pretrained.py
│   ├── legacy/
│   │   ├── __init__.py
│   │   ├── amp/
│   │   │   ├── __init__.py
│   │   │   ├── amp_type.py
│   │   │   ├── apex_amp/
│   │   │   │   ├── __init__.py
│   │   │   │   └── apex_amp.py
│   │   │   ├── naive_amp/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── _fp16_optimizer.py
│   │   │   │   ├── _utils.py
│   │   │   │   └── naive_amp.py
│   │   │   └── torch_amp/
│   │   │       ├── __init__.py
│   │   │       ├── _grad_scaler.py
│   │   │       └── torch_amp.py
│   │   ├── builder/
│   │   │   ├── __init__.py
│   │   │   └── builder.py
│   │   ├── communication/
│   │   │   ├── __init__.py
│   │   │   ├── collective.py
│   │   │   ├── p2p.py
│   │   │   ├── p2p_v2.py
│   │   │   ├── ring.py
│   │   │   └── utils.py
│   │   ├── constants.py
│   │   ├── context/
│   │   │   ├── __init__.py
│   │   │   ├── parallel_context.py
│   │   │   ├── parallel_mode.py
│   │   │   ├── process_group_initializer/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── initializer_1d.py
│   │   │   │   ├── initializer_2d.py
│   │   │   │   ├── initializer_2p5d.py
│   │   │   │   ├── initializer_3d.py
│   │   │   │   ├── initializer_data.py
│   │   │   │   ├── initializer_model.py
│   │   │   │   ├── initializer_pipeline.py
│   │   │   │   ├── initializer_sequence.py
│   │   │   │   ├── initializer_tensor.py
│   │   │   │   └── process_group_initializer.py
│   │   │   └── random/
│   │   │       ├── __init__.py
│   │   │       ├── _helper.py
│   │   │       └── seed_manager.py
│   │   ├── core.py
│   │   ├── engine/
│   │   │   ├── __init__.py
│   │   │   ├── _base_engine.py
│   │   │   ├── gradient_accumulation/
│   │   │   │   ├── __init__.py
│   │   │   │   └── _gradient_accumulation.py
│   │   │   ├── gradient_handler/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── _base_gradient_handler.py
│   │   │   │   ├── _data_parallel_gradient_handler.py
│   │   │   │   ├── _moe_gradient_handler.py
│   │   │   │   ├── _pipeline_parallel_gradient_handler.py
│   │   │   │   ├── _sequence_parallel_gradient_handler.py
│   │   │   │   ├── _zero_gradient_handler.py
│   │   │   │   └── utils.py
│   │   │   └── schedule/
│   │   │       ├── __init__.py
│   │   │       ├── _base_schedule.py
│   │   │       ├── _non_pipeline_schedule.py
│   │   │       ├── _pipeline_schedule.py
│   │   │       └── _pipeline_schedule_v2.py
│   │   ├── global_variables.py
│   │   ├── inference/
│   │   │   ├── README.md
│   │   │   ├── __init__.py
│   │   │   ├── async_engine.py
│   │   │   ├── async_manager.py
│   │   │   ├── dynamic_batching/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── get_tokenizer.py
│   │   │   │   ├── infer_batch.py
│   │   │   │   ├── io_struct.py
│   │   │   │   ├── ray_dist_init.py
│   │   │   │   ├── ray_init_config.py
│   │   │   │   ├── req_queue.py
│   │   │   │   ├── sampling_params.py
│   │   │   │   └── stats.py
│   │   │   ├── hybridengine/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── engine.py
│   │   │   │   ├── modeling/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── _utils.py
│   │   │   │   │   └── llama.py
│   │   │   │   └── polices/
│   │   │   │       ├── __init__.py
│   │   │   │       └── llama.py
│   │   │   ├── manager.py
│   │   │   ├── pipeline/
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── benchmark/
│   │   │   │   │   ├── benchmark.py
│   │   │   │   │   └── run.sh
│   │   │   │   └── microbatch_manager.py
│   │   │   ├── quant/
│   │   │   │   ├── gptq/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── cai_gptq/
│   │   │   │   │       ├── __init__.py
│   │   │   │   │       ├── cai_quant_linear.py
│   │   │   │   │       └── gptq_op.py
│   │   │   │   └── smoothquant/
│   │   │   │       ├── __init__.py
│   │   │   │       └── models/
│   │   │   │           ├── __init__.py
│   │   │   │           ├── base_model.py
│   │   │   │           ├── linear.py
│   │   │   │           └── llama.py
│   │   │   ├── serving/
│   │   │   │   ├── ray_serve/
│   │   │   │   │   ├── Colossal_Inference_rayserve.py
│   │   │   │   │   ├── README.md
│   │   │   │   │   ├── send_request.py
│   │   │   │   │   └── send_requests.py
│   │   │   │   ├── test_ci.sh
│   │   │   │   └── torch_serve/
│   │   │   │       ├── Colossal_Inference_Handler.py
│   │   │   │       ├── README.md
│   │   │   │       ├── config.properties
│   │   │   │       ├── docker/
│   │   │   │       │   └── Dockerfile
│   │   │   │       ├── model-config.yaml
│   │   │   │       └── sample_text.txt
│   │   │   └── tensor_parallel/
│   │   │       ├── __init__.py
│   │   │       ├── batch_infer_state.py
│   │   │       ├── engine.py
│   │   │       ├── kvcache_manager.py
│   │   │       ├── modeling/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── _utils.py
│   │   │       │   ├── bloom.py
│   │   │       │   ├── chatglm2.py
│   │   │       │   └── llama.py
│   │   │       └── policies/
│   │   │           ├── __init__.py
│   │   │           ├── bloom.py
│   │   │           ├── chatglm2.py
│   │   │           └── llama.py
│   │   ├── initialize.py
│   │   ├── moe/
│   │   │   ├── layer/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── experts.py
│   │   │   │   ├── layers.py
│   │   │   │   └── routers.py
│   │   │   ├── load_balance.py
│   │   │   ├── manager.py
│   │   │   ├── openmoe/
│   │   │   │   ├── README.md
│   │   │   │   ├── benchmark/
│   │   │   │   │   ├── benchmark_cai.py
│   │   │   │   │   ├── benchmark_cai.sh
│   │   │   │   │   ├── benchmark_cai_dist.sh
│   │   │   │   │   ├── benchmark_fsdp.py
│   │   │   │   │   ├── benchmark_fsdp.sh
│   │   │   │   │   ├── hostfile.txt
│   │   │   │   │   └── utils.py
│   │   │   │   ├── infer.py
│   │   │   │   ├── infer.sh
│   │   │   │   ├── model/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── convert_openmoe_ckpt.py
│   │   │   │   │   ├── convert_openmoe_ckpt.sh
│   │   │   │   │   ├── modeling_openmoe.py
│   │   │   │   │   ├── openmoe_8b_config.json
│   │   │   │   │   ├── openmoe_base_config.json
│   │   │   │   │   └── openmoe_policy.py
│   │   │   │   ├── requirements.txt
│   │   │   │   ├── test_ci.sh
│   │   │   │   ├── train.py
│   │   │   │   └── train.sh
│   │   │   └── utils.py
│   │   ├── nn/
│   │   │   ├── __init__.py
│   │   │   ├── _ops/
│   │   │   │   ├── __init__.py
│   │   │   │   └── _utils.py
│   │   │   ├── layer/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base_layer.py
│   │   │   │   ├── colossalai_layer/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── _utils.py
│   │   │   │   │   ├── dropout.py
│   │   │   │   │   ├── embedding.py
│   │   │   │   │   ├── linear.py
│   │   │   │   │   └── normalization.py
│   │   │   │   ├── parallel_1d/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── _operation.py
│   │   │   │   │   ├── _utils.py
│   │   │   │   │   └── layers.py
│   │   │   │   ├── parallel_2d/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── _operation.py
│   │   │   │   │   ├── _utils.py
│   │   │   │   │   └── layers.py
│   │   │   │   ├── parallel_2p5d/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── _operation.py
│   │   │   │   │   ├── _utils.py
│   │   │   │   │   └── layers.py
│   │   │   │   ├── parallel_3d/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── _operation.py
│   │   │   │   │   ├── _utils.py
│   │   │   │   │   └── layers.py
│   │   │   │   ├── parallel_sequence/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── _operation.py
│   │   │   │   │   ├── _utils.py
│   │   │   │   │   └── layers.py
│   │   │   │   ├── utils/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── common.py
│   │   │   │   ├── vanilla/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── layers.py
│   │   │   │   └── wrapper/
│   │   │   │       ├── __init__.py
│   │   │   │       └── pipeline_wrapper.py
│   │   │   ├── loss/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── loss_1d.py
│   │   │   │   ├── loss_2d.py
│   │   │   │   ├── loss_2p5d.py
│   │   │   │   └── loss_3d.py
│   │   │   ├── metric/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── _utils.py
│   │   │   │   ├── accuracy_2d.py
│   │   │   │   ├── accuracy_2p5d.py
│   │   │   │   └── accuracy_3d.py
│   │   │   └── parallel/
│   │   │       ├── __init__.py
│   │   │       ├── data_parallel.py
│   │   │       ├── layers/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── cache_embedding/
│   │   │       │   │   ├── __init__.py
│   │   │       │   │   ├── base_embedding.py
│   │   │       │   │   ├── cache_mgr.py
│   │   │       │   │   ├── cached_embedding.py
│   │   │       │   │   ├── copyer.py
│   │   │       │   │   ├── embedding_config.py
│   │   │       │   │   ├── parallel_cached_embedding.py
│   │   │       │   │   ├── parallel_cached_embedding_tablewise.py
│   │   │       │   │   └── parallel_cached_embedding_tablewise_split_cache.py
│   │   │       │   ├── colo_module.py
│   │   │       │   ├── embedding.py
│   │   │       │   ├── linear.py
│   │   │       │   └── module_utils.py
│   │   │       └── reducer.py
│   │   ├── pipeline/
│   │   │   ├── __init__.py
│   │   │   ├── layer_spec.py
│   │   │   ├── middleware/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── adaptor/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── fx.py
│   │   │   │   └── topo.py
│   │   │   ├── pipelinable.py
│   │   │   ├── pipeline_process_group.py
│   │   │   ├── rpc/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── _pipeline_base.py
│   │   │   │   ├── _pipeline_schedule.py
│   │   │   │   └── utils.py
│   │   │   └── utils.py
│   │   ├── registry/
│   │   │   ├── __init__.py
│   │   │   └── registry.py
│   │   ├── tensor/
│   │   │   ├── __init__.py
│   │   │   ├── compute_spec.py
│   │   │   ├── const.py
│   │   │   ├── dist_spec_mgr.py
│   │   │   ├── distspec.py
│   │   │   ├── op_wrapper.py
│   │   │   ├── process_group.py
│   │   │   └── tensor_spec.py
│   │   ├── trainer/
│   │   │   ├── __init__.py
│   │   │   ├── _trainer.py
│   │   │   └── hooks/
│   │   │       ├── __init__.py
│   │   │       ├── _base_hook.py
│   │   │       ├── _checkpoint_hook.py
│   │   │       ├── _commons_.py
│   │   │       ├── _log_hook.py
│   │   │       ├── _lr_scheduler_hook.py
│   │   │       └── _metric_hook.py
│   │   ├── utils/
│   │   │   ├── __init__.py
│   │   │   ├── activation_checkpoint.py
│   │   │   ├── checkpoint/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── module_checkpoint.py
│   │   │   │   └── utils.py
│   │   │   ├── checkpointing.py
│   │   │   ├── common.py
│   │   │   ├── data_sampler/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base_sampler.py
│   │   │   │   └── data_parallel_sampler.py
│   │   │   ├── memory.py
│   │   │   └── profiler/
│   │   │       ├── __init__.py
│   │   │       ├── extention.py
│   │   │       ├── legacy/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── comm_profiler.py
│   │   │       │   ├── pcie_profiler.py
│   │   │       │   └── prof_utils.py
│   │   │       ├── profiler.py
│   │   │       └── stateful_tensor_mem_extention.py
│   │   └── zero/
│   │       ├── __init__.py
│   │       ├── gemini/
│   │       │   ├── __init__.py
│   │       │   ├── colo_init_context.py
│   │       │   ├── gemini_context.py
│   │       │   ├── ophooks/
│   │       │   │   ├── __init__.py
│   │       │   │   ├── _shard_grad_ophook.py
│   │       │   │   ├── _shard_param_ophook.py
│   │       │   │   ├── runtime_mem_tracer_hook.py
│   │       │   │   └── utils.py
│   │       │   ├── paramhooks/
│   │       │   │   ├── __init__.py
│   │       │   │   └── _param_hookmgr.py
│   │       │   ├── stateful_tensor.py
│   │       │   ├── stateful_tensor_mgr.py
│   │       │   ├── tensor_placement_policy.py
│   │       │   └── tensor_utils.py
│   │       ├── init_ctx/
│   │       │   ├── __init__.py
│   │       │   └── init_context.py
│   │       ├── shard_utils/
│   │       │   ├── __init__.py
│   │       │   ├── base_shard_strategy.py
│   │       │   ├── bucket_tensor_shard_strategy.py
│   │       │   ├── commons.py
│   │       │   └── tensor_shard_strategy.py
│   │       ├── sharded_model/
│   │       │   ├── __init__.py
│   │       │   ├── _utils.py
│   │       │   ├── reduce_scatter.py
│   │       │   ├── sharded_model_v2.py
│   │       │   ├── utils.py
│   │       │   └── zero_hook.py
│   │       ├── sharded_optim/
│   │       │   ├── __init__.py
│   │       │   └── sharded_optim_v2.py
│   │       └── sharded_param/
│   │           ├── __init__.py
│   │           ├── sharded_param.py
│   │           └── sharded_tensor.py
│   ├── logging/
│   │   ├── __init__.py
│   │   └── logger.py
│   ├── moe/
│   │   ├── __init__.py
│   │   └── _operation.py
│   ├── nn/
│   │   ├── __init__.py
│   │   ├── init.py
│   │   ├── layer/
│   │   │   ├── __init__.py
│   │   │   ├── layernorm.py
│   │   │   ├── scaled_softmax.py
│   │   │   └── utils.py
│   │   ├── loss/
│   │   │   └── __init__.py
│   │   ├── lr_scheduler/
│   │   │   ├── __init__.py
│   │   │   ├── cosine.py
│   │   │   ├── delayed.py
│   │   │   ├── linear.py
│   │   │   ├── multistep.py
│   │   │   ├── onecycle.py
│   │   │   ├── poly.py
│   │   │   └── torch.py
│   │   └── optimizer/
│   │       ├── README.md
│   │       ├── __init__.py
│   │       ├── adafactor.py
│   │       ├── came.py
│   │       ├── cpu_adam.py
│   │       ├── distributed_adafactor.py
│   │       ├── distributed_came.py
│   │       ├── distributed_galore.py
│   │       ├── distributed_lamb.py
│   │       ├── fused_adam.py
│   │       ├── fused_lamb.py
│   │       ├── fused_sgd.py
│   │       ├── galore.py
│   │       ├── hybrid_adam.py
│   │       ├── lamb.py
│   │       ├── lars.py
│   │       └── nvme_optimizer.py
│   ├── pipeline/
│   │   ├── __init__.py
│   │   ├── p2p.py
│   │   ├── schedule/
│   │   │   ├── __init__.py
│   │   │   ├── _utils.py
│   │   │   ├── base.py
│   │   │   ├── generate.py
│   │   │   ├── interleaved_pp.py
│   │   │   ├── one_f_one_b.py
│   │   │   ├── v_schedule.py
│   │   │   └── zero_bubble_pp.py
│   │   ├── stage_manager.py
│   │   └── weight_grad_store.py
│   ├── quantization/
│   │   ├── __init__.py
│   │   ├── bnb.py
│   │   ├── bnb_config.py
│   │   ├── fp8.py
│   │   ├── fp8_config.py
│   │   ├── fp8_hook.py
│   │   └── utils.py
│   ├── shardformer/
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── _utils.py
│   │   ├── examples/
│   │   │   ├── convergence_benchmark.py
│   │   │   ├── convergence_benchmark.sh
│   │   │   ├── data.py
│   │   │   └── performance_benchmark.py
│   │   ├── layer/
│   │   │   ├── __init__.py
│   │   │   ├── _operation.py
│   │   │   ├── attn.py
│   │   │   ├── dropout.py
│   │   │   ├── embedding.py
│   │   │   ├── linear.py
│   │   │   ├── loss.py
│   │   │   ├── normalization.py
│   │   │   ├── parallel_module.py
│   │   │   ├── qkv_fused_linear.py
│   │   │   └── utils.py
│   │   ├── modeling/
│   │   │   ├── __init__.py
│   │   │   ├── bert.py
│   │   │   ├── blip2.py
│   │   │   ├── bloom.py
│   │   │   ├── chatglm2.py
│   │   │   ├── chatglm2_6b/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── configuration_chatglm.py
│   │   │   │   └── modeling_chatglm.py
│   │   │   ├── command.py
│   │   │   ├── deepseek.py
│   │   │   ├── deepseek_v3.py
│   │   │   ├── falcon.py
│   │   │   ├── gpt2.py
│   │   │   ├── gptj.py
│   │   │   ├── jit.py
│   │   │   ├── llama.py
│   │   │   ├── mistral.py
│   │   │   ├── mixtral.py
│   │   │   ├── opt.py
│   │   │   ├── qwen2.py
│   │   │   ├── qwen3.py
│   │   │   ├── sam.py
│   │   │   ├── t5.py
│   │   │   ├── vit.py
│   │   │   └── whisper.py
│   │   ├── policies/
│   │   │   ├── __init__.py
│   │   │   ├── auto_policy.py
│   │   │   ├── base_policy.py
│   │   │   ├── bert.py
│   │   │   ├── blip2.py
│   │   │   ├── bloom.py
│   │   │   ├── chatglm2.py
│   │   │   ├── command.py
│   │   │   ├── deepseek.py
│   │   │   ├── deepseek_v3.py
│   │   │   ├── falcon.py
│   │   │   ├── gpt2.py
│   │   │   ├── gptj.py
│   │   │   ├── llama.py
│   │   │   ├── mistral.py
│   │   │   ├── mixtral.py
│   │   │   ├── opt.py
│   │   │   ├── qwen2.py
│   │   │   ├── qwen3.py
│   │   │   ├── sam.py
│   │   │   ├── t5.py
│   │   │   ├── vit.py
│   │   │   └── whisper.py
│   │   └── shard/
│   │       ├── __init__.py
│   │       ├── grad_ckpt_config.py
│   │       ├── shard_config.py
│   │       ├── sharder.py
│   │       ├── shardformer.py
│   │       └── utils.py
│   ├── tensor/
│   │   ├── __init__.py
│   │   ├── colo_parameter.py
│   │   ├── colo_tensor.py
│   │   ├── comm_spec.py
│   │   ├── d_tensor/
│   │   │   ├── README.md
│   │   │   ├── __init__.py
│   │   │   ├── api.py
│   │   │   ├── comm_spec.py
│   │   │   ├── layout.py
│   │   │   ├── layout_converter.py
│   │   │   ├── misc.py
│   │   │   ├── sharding_spec.py
│   │   │   └── utils.py
│   │   ├── moe_tensor/
│   │   │   ├── __init__.py
│   │   │   ├── api.py
│   │   │   └── moe_info.py
│   │   ├── padded_tensor/
│   │   │   ├── __init__.py
│   │   │   └── api.py
│   │   ├── param_op_hook.py
│   │   ├── shape_consistency.py
│   │   ├── sharding_spec.py
│   │   └── utils.py
│   ├── testing/
│   │   ├── __init__.py
│   │   ├── comparison.py
│   │   ├── pytest_wrapper.py
│   │   ├── random.py
│   │   └── utils.py
│   ├── utils/
│   │   ├── __init__.py
│   │   ├── common.py
│   │   ├── memory.py
│   │   ├── model/
│   │   │   ├── __init__.py
│   │   │   └── utils.py
│   │   ├── multi_tensor_apply/
│   │   │   ├── __init__.py
│   │   │   └── multi_tensor_apply.py
│   │   ├── rank_recorder/
│   │   │   ├── README.md
│   │   │   ├── __init__.py
│   │   │   └── rank_recorder.py
│   │   ├── safetensors.py
│   │   ├── tensor_detector/
│   │   │   ├── __init__.py
│   │   │   ├── readme.md
│   │   │   └── tensor_detector.py
│   │   └── timer.py
│   └── zero/
│       ├── __init__.py
│       ├── gemini/
│       │   ├── __init__.py
│       │   ├── chunk/
│       │   │   ├── __init__.py
│       │   │   ├── chunk.py
│       │   │   ├── manager.py
│       │   │   ├── search_utils.py
│       │   │   └── utils.py
│       │   ├── gemini_ddp.py
│       │   ├── gemini_hook.py
│       │   ├── gemini_mgr.py
│       │   ├── gemini_optimizer.py
│       │   ├── memory_tracer/
│       │   │   ├── __init__.py
│       │   │   ├── chunk_memstats_collector.py
│       │   │   ├── memory_monitor.py
│       │   │   ├── memory_stats.py
│       │   │   ├── memstats_collector.py
│       │   │   ├── param_runtime_order.py
│       │   │   ├── runtime_mem_tracer.py
│       │   │   ├── static_memstats_collector.py
│       │   │   └── utils.py
│       │   ├── placement_policy.py
│       │   └── utils.py
│       ├── low_level/
│       │   ├── __init__.py
│       │   ├── _utils.py
│       │   ├── bookkeeping/
│       │   │   ├── __init__.py
│       │   │   ├── base_store.py
│       │   │   ├── bucket_store.py
│       │   │   ├── gradient_store.py
│       │   │   └── tensor_bucket.py
│       │   ├── low_level_optim.py
│       │   ├── readme.md
│       │   └── zero_hook.py
│       └── wrapper.py
├── docker/
│   └── Dockerfile
├── docs/
│   ├── README-zh-Hans.md
│   ├── README.md
│   ├── REFERENCE.md
│   ├── conda-doc-test-deps.yml
│   ├── requirements-doc-test.txt
│   ├── sidebars.json
│   ├── source/
│   │   ├── en/
│   │   │   ├── Colossal-Auto/
│   │   │   │   ├── feature/
│   │   │   │   │   ├── auto_checkpoint.md
│   │   │   │   │   ├── device_mesh.md
│   │   │   │   │   ├── layout_converting_management.md
│   │   │   │   │   └── tracer.md
│   │   │   │   └── get_started/
│   │   │   │       ├── installation.md
│   │   │   │       ├── introduction.md
│   │   │   │       └── run_demo.md
│   │   │   ├── advanced_tutorials/
│   │   │   │   ├── integrate_mixture_of_experts_into_your_model.md
│   │   │   │   ├── meet_gemini.md
│   │   │   │   ├── opt_service.md
│   │   │   │   ├── train_gpt_using_hybrid_parallelism.md
│   │   │   │   └── train_vit_with_hybrid_parallelism.md
│   │   │   ├── basics/
│   │   │   │   ├── booster_api.md
│   │   │   │   ├── booster_checkpoint.md
│   │   │   │   ├── booster_plugins.md
│   │   │   │   ├── command_line_tool.md
│   │   │   │   └── launch_colossalai.md
│   │   │   ├── concepts/
│   │   │   │   ├── colossalai_overview.md
│   │   │   │   ├── distributed_training.md
│   │   │   │   └── paradigms_of_parallelism.md
│   │   │   ├── features/
│   │   │   │   ├── 1D_tensor_parallel.md
│   │   │   │   ├── 2D_tensor_parallel.md
│   │   │   │   ├── 2p5D_tensor_parallel.md
│   │   │   │   ├── 3D_tensor_parallel.md
│   │   │   │   ├── cluster_utils.md
│   │   │   │   ├── distributed_optimizers.md
│   │   │   │   ├── gradient_accumulation_with_booster.md
│   │   │   │   ├── gradient_clipping_with_booster.md
│   │   │   │   ├── lazy_init.md
│   │   │   │   ├── mixed_precision_training_with_booster.md
│   │   │   │   ├── nvme_offload.md
│   │   │   │   ├── pipeline_parallel.md
│   │   │   │   ├── sequence_parallelism.md
│   │   │   │   ├── shardformer.md
│   │   │   │   ├── zero_with_chunk.md
│   │   │   │   └── zerobubble_pipeline_parallelism.md
│   │   │   ├── get_started/
│   │   │   │   ├── bonus.md
│   │   │   │   ├── installation.md
│   │   │   │   ├── reading_roadmap.md
│   │   │   │   └── run_demo.md
│   │   │   └── sidebar_category_translation.json
│   │   └── zh-Hans/
│   │       ├── Colossal-Auto/
│   │       │   ├── feature/
│   │       │   │   ├── auto_checkpoint.md
│   │       │   │   ├── device_mesh.md
│   │       │   │   ├── layout_converting_management.md
│   │       │   │   └── tracer.md
│   │       │   └── get_started/
│   │       │       ├── installation.md
│   │       │       ├── introduction.md
│   │       │       └── run_demo.md
│   │       ├── advanced_tutorials/
│   │       │   ├── integrate_mixture_of_experts_into_your_model.md
│   │       │   ├── meet_gemini.md
│   │       │   ├── opt_service.md
│   │       │   ├── train_gpt_using_hybrid_parallelism.md
│   │       │   └── train_vit_with_hybrid_parallelism.md
│   │       ├── basics/
│   │       │   ├── booster_api.md
│   │       │   ├── booster_checkpoint.md
│   │       │   ├── booster_plugins.md
│   │       │   ├── command_line_tool.md
│   │       │   └── launch_colossalai.md
│   │       ├── concepts/
│   │       │   ├── colossalai_overview.md
│   │       │   ├── distributed_training.md
│   │       │   └── paradigms_of_parallelism.md
│   │       ├── features/
│   │       │   ├── 1D_tensor_parallel.md
│   │       │   ├── 2D_tensor_parallel.md
│   │       │   ├── 2p5D_tensor_parallel.md
│   │       │   ├── 3D_tensor_parallel.md
│   │       │   ├── cluster_utils.md
│   │       │   ├── distributed_optimizers.md
│   │       │   ├── gradient_accumulation_with_booster.md
│   │       │   ├── gradient_clipping_with_booster.md
│   │       │   ├── lazy_init.md
│   │       │   ├── mixed_precision_training_with_booster.md
│   │       │   ├── nvme_offload.md
│   │       │   ├── pipeline_parallel.md
│   │       │   ├── sequence_parallelism.md
│   │       │   ├── shardformer.md
│   │       │   ├── zero_with_chunk.md
│   │       │   └── zerobubble_pipeline_parallelism.md
│   │       ├── get_started/
│   │       │   ├── bonus.md
│   │       │   ├── installation.md
│   │       │   ├── reading_roadmap.md
│   │       │   └── run_demo.md
│   │       └── sidebar_category_translation.json
│   └── versions.json
├── examples/
│   ├── README.md
│   ├── __init__.py
│   ├── community/
│   │   ├── README.md
│   │   ├── fp8/
│   │   │   └── mnist/
│   │   │       ├── README.md
│   │   │       └── main.py
│   │   └── roberta/
│   │       ├── README.md
│   │       ├── preprocessing/
│   │       │   ├── Makefile
│   │       │   ├── README.md
│   │       │   ├── get_mask.py
│   │       │   ├── mask.cpp
│   │       │   ├── sentence_split.py
│   │       │   └── tokenize_mask.py
│   │       ├── pretraining/
│   │       │   ├── README.md
│   │       │   ├── arguments.py
│   │       │   ├── bert_dataset_provider.py
│   │       │   ├── evaluation.py
│   │       │   ├── hostfile
│   │       │   ├── loss.py
│   │       │   ├── model/
│   │       │   │   ├── bert.py
│   │       │   │   └── deberta_v2.py
│   │       │   ├── nvidia_bert_dataset_provider.py
│   │       │   ├── pretrain_utils.py
│   │       │   ├── run_pretrain.sh
│   │       │   ├── run_pretrain_resume.sh
│   │       │   ├── run_pretraining.py
│   │       │   └── utils/
│   │       │       ├── WandbLog.py
│   │       │       ├── exp_util.py
│   │       │       ├── global_vars.py
│   │       │       └── logger.py
│   │       ├── requirements.txt
│   │       └── test_ci.sh
│   ├── images/
│   │   ├── diffusion/
│   │   │   ├── LICENSE
│   │   │   ├── README.md
│   │   │   ├── configs/
│   │   │   │   ├── Inference/
│   │   │   │   │   ├── v2-inference-v.yaml
│   │   │   │   │   ├── v2-inference.yaml
│   │   │   │   │   ├── v2-inpainting-inference.yaml
│   │   │   │   │   ├── v2-midas-inference.yaml
│   │   │   │   │   └── x4-upscaling.yaml
│   │   │   │   ├── Teyvat/
│   │   │   │   │   ├── README.md
│   │   │   │   │   └── train_colossalai_teyvat.yaml
│   │   │   │   ├── train_colossalai.yaml
│   │   │   │   ├── train_colossalai_cifar10.yaml
│   │   │   │   └── train_ddp.yaml
│   │   │   ├── docker/
│   │   │   │   └── Dockerfile
│   │   │   ├── environment.yaml
│   │   │   ├── ldm/
│   │   │   │   ├── data/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   ├── cifar10.py
│   │   │   │   │   ├── imagenet.py
│   │   │   │   │   ├── lsun.py
│   │   │   │   │   └── teyvat.py
│   │   │   │   ├── lr_scheduler.py
│   │   │   │   ├── models/
│   │   │   │   │   ├── autoencoder.py
│   │   │   │   │   └── diffusion/
│   │   │   │   │       ├── __init__.py
│   │   │   │   │       ├── classifier.py
│   │   │   │   │       ├── ddim.py
│   │   │   │   │       ├── ddpm.py
│   │   │   │   │       ├── dpm_solver/
│   │   │   │   │       │   ├── __init__.py
│   │   │   │   │       │   ├── dpm_solver.py
│   │   │   │   │       │   └── sampler.py
│   │   │   │   │       ├── plms.py
│   │   │   │   │       └── sampling_util.py
│   │   │   │   ├── modules/
│   │   │   │   │   ├── attention.py
│   │   │   │   │   ├── diffusionmodules/
│   │   │   │   │   │   ├── __init__.py
│   │   │   │   │   │   ├── model.py
│   │   │   │   │   │   ├── openaimodel.py
│   │   │   │   │   │   ├── upscaling.py
│   │   │   │   │   │   └── util.py
│   │   │   │   │   ├── distributions/
│   │   │   │   │   │   ├── __init__.py
│   │   │   │   │   │   └── distributions.py
│   │   │   │   │   ├── ema.py
│   │   │   │   │   ├── encoders/
│   │   │   │   │   │   ├── __init__.py
│   │   │   │   │   │   └── modules.py
│   │   │   │   │   ├── image_degradation/
│   │   │   │   │   │   ├── __init__.py
│   │   │   │   │   │   ├── bsrgan.py
│   │   │   │   │   │   ├── bsrgan_light.py
│   │   │   │   │   │   └── utils_image.py
│   │   │   │   │   └── midas/
│   │   │   │   │       ├── __init__.py
│   │   │   │   │       ├── api.py
│   │   │   │   │       ├── midas/
│   │   │   │   │       │   ├── __init__.py
│   │   │   │   │       │   ├── base_model.py
│   │   │   │   │       │   ├── blocks.py
│   │   │   │   │       │   ├── dpt_depth.py
│   │   │   │   │       │   ├── midas_net.py
│   │   │   │   │       │   ├── midas_net_custom.py
│   │   │   │   │       │   ├── transforms.py
│   │   │   │   │       │   └── vit.py
│   │   │   │   │       └── utils.py
│   │   │   │   └── util.py
│   │   │   ├── main.py
│   │   │   ├── requirements.txt
│   │   │   ├── scripts/
│   │   │   │   ├── download_first_stages.sh
│   │   │   │   ├── download_models.sh
│   │   │   │   ├── img2img.py
│   │   │   │   ├── inpaint.py
│   │   │   │   ├── knn2img.py
│   │   │   │   ├── sample_diffusion.py
│   │   │   │   ├── tests/
│   │   │   │   │   ├── test_checkpoint.py
│   │   │   │   │   └── test_watermark.py
│   │   │   │   ├── train_searcher.py
│   │   │   │   ├── txt2img.py
│   │   │   │   ├── txt2img.sh
│   │   │   │   └── utils.py
│   │   │   ├── setup.py
│   │   │   ├── test_ci.sh
│   │   │   ├── train_colossalai.sh
│   │   │   └── train_ddp.sh
│   │   ├── dreambooth/
│   │   │   ├── README.md
│   │   │   ├── colossalai.sh
│   │   │   ├── debug.py
│   │   │   ├── dreambooth.sh
│   │   │   ├── inference.py
│   │   │   ├── requirements.txt
│   │   │   ├── test_ci.sh
│   │   │   ├── train_dreambooth.py
│   │   │   ├── train_dreambooth_colossalai.py
│   │   │   ├── train_dreambooth_colossalai_lora.py
│   │   │   └── train_dreambooth_inpaint.py
│   │   ├── resnet/
│   │   │   ├── .gitignore
│   │   │   ├── README.md
│   │   │   ├── eval.py
│   │   │   ├── requirements.txt
│   │   │   ├── test_ci.sh
│   │   │   └── train.py
│   │   └── vit/
│   │       ├── README.md
│   │       ├── args.py
│   │       ├── data.py
│   │       ├── requirements.txt
│   │       ├── run_benchmark.sh
│   │       ├── run_demo.sh
│   │       ├── test_ci.sh
│   │       ├── vit_benchmark.py
│   │       └── vit_train_demo.py
│   ├── inference/
│   │   ├── benchmark_ops/
│   │   │   ├── benchmark_context_attn_unpad.py
│   │   │   ├── benchmark_decoding_attn.py
│   │   │   ├── benchmark_flash_decoding_attention.py
│   │   │   ├── benchmark_fused_rotary_embdding_unpad.py
│   │   │   ├── benchmark_kv_cache_memcopy.py
│   │   │   ├── benchmark_rmsnorm.py
│   │   │   ├── benchmark_rotary_embedding.py
│   │   │   ├── benchmark_xine_copy.py
│   │   │   └── test_ci.sh
│   │   ├── client/
│   │   │   ├── locustfile.py
│   │   │   ├── run_locust.sh
│   │   │   └── test_ci.sh
│   │   ├── llama/
│   │   │   ├── README.md
│   │   │   ├── benchmark_llama.py
│   │   │   ├── benchmark_llama3.py
│   │   │   ├── llama_generation.py
│   │   │   ├── run_benchmark.sh
│   │   │   └── test_ci.sh
│   │   └── stable_diffusion/
│   │       ├── README.md
│   │       ├── benchmark_sd3.py
│   │       ├── compute_metric.py
│   │       ├── requirements.txt
│   │       ├── run_benchmark.sh
│   │       ├── sd3_generation.py
│   │       └── test_ci.sh
│   ├── language/
│   │   ├── __init__.py
│   │   ├── bert/
│   │   │   ├── README.md
│   │   │   ├── benchmark.py
│   │   │   ├── benchmark.sh
│   │   │   ├── benchmark_utils.py
│   │   │   ├── data.py
│   │   │   ├── finetune.py
│   │   │   ├── requirements.txt
│   │   │   └── test_ci.sh
│   │   ├── commons/
│   │   │   └── utils.py
│   │   ├── data_utils.py
│   │   ├── deepseek/
│   │   │   ├── benchmark.py
│   │   │   └── test_ci.sh
│   │   ├── gpt/
│   │   │   ├── README.md
│   │   │   ├── experiments/
│   │   │   │   ├── auto_offload/
│   │   │   │   │   ├── README.md
│   │   │   │   │   ├── model_zoo.py
│   │   │   │   │   ├── requirements.txt
│   │   │   │   │   ├── run.sh
│   │   │   │   │   └── train_gpt_offload.py
│   │   │   │   ├── auto_parallel/
│   │   │   │   │   ├── README.md
│   │   │   │   │   ├── auto_parallel_with_gpt.py
│   │   │   │   │   ├── gpt_modules.py
│   │   │   │   │   └── requirements.txt
│   │   │   │   └── pipeline_parallel/
│   │   │   │       ├── README.md
│   │   │   │       ├── model_zoo.py
│   │   │   │       ├── requirements.txt
│   │   │   │       ├── run.sh
│   │   │   │       └── train_gpt_pp.py
│   │   │   ├── gemini/
│   │   │   │   ├── benchmark_gemini.sh
│   │   │   │   ├── commons/
│   │   │   │   │   ├── model_zoo.py
│   │   │   │   │   └── utils.py
│   │   │   │   ├── requirements.txt
│   │   │   │   ├── run_gemini.sh
│   │   │   │   ├── test_ci.sh
│   │   │   │   └── train_gpt_demo.py
│   │   │   ├── hybridparallelism/
│   │   │   │   ├── benchmark.py
│   │   │   │   ├── data.py
│   │   │   │   ├── finetune.py
│   │   │   │   └── run.sh
│   │   │   ├── requirements.txt
│   │   │   ├── test_ci.sh
│   │   │   └── titans/
│   │   │       ├── LICENSE
│   │   │       ├── README.md
│   │   │       ├── configs/
│   │   │       │   ├── gpt2_small_zero3_pp1d.py
│   │   │       │   └── gpt3_zero3_pp1d.py
│   │   │       ├── dataset/
│   │   │       │   └── webtext.py
│   │   │       ├── model/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── embed.py
│   │   │       │   ├── gpt1d.py
│   │   │       │   └── pipeline_gpt1d.py
│   │   │       ├── requirements.txt
│   │   │       ├── run.sh
│   │   │       ├── test_ci.sh
│   │   │       └── train_gpt.py
│   │   ├── grok-1/
│   │   │   ├── README.md
│   │   │   ├── grok1_policy.py
│   │   │   ├── inference.py
│   │   │   ├── inference_tp.py
│   │   │   ├── requirements.txt
│   │   │   ├── run_inference_fast.sh
│   │   │   ├── run_inference_slow.sh
│   │   │   ├── test_ci.sh
│   │   │   └── utils.py
│   │   ├── llama/
│   │   │   ├── README.md
│   │   │   ├── benchmark.py
│   │   │   ├── requirements.txt
│   │   │   ├── scripts/
│   │   │   │   ├── benchmark_70B/
│   │   │   │   │   ├── 3d.sh
│   │   │   │   │   ├── gemini.sh
│   │   │   │   │   └── gemini_auto.sh
│   │   │   │   └── benchmark_7B/
│   │   │   │       ├── gemini.sh
│   │   │   │       └── gemini_auto.sh
│   │   │   └── test_ci.sh
│   │   ├── mixtral/
│   │   │   ├── benchmark.py
│   │   │   └── test_ci.sh
│   │   ├── model_utils.py
│   │   ├── opt/
│   │   │   ├── README.md
│   │   │   ├── args.py
│   │   │   ├── data.py
│   │   │   ├── opt_benchmark.py
│   │   │   ├── opt_train_demo.py
│   │   │   ├── requirements.txt
│   │   │   ├── run_benchmark.sh
│   │   │   ├── run_demo.sh
│   │   │   └── test_ci.sh
│   │   ├── palm/
│   │   │   ├── README.md
│   │   │   ├── data/
│   │   │   │   └── README.md
│   │   │   ├── palm_pytorch/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── autoregressive_wrapper.py
│   │   │   │   └── palm_pytorch.py
│   │   │   ├── requirements.txt
│   │   │   ├── run.sh
│   │   │   ├── test_ci.sh
│   │   │   └── train.py
│   │   └── performance_evaluator.py
│   └── tutorial/
│       ├── .gitignore
│       ├── README.md
│       ├── auto_parallel/
│       │   ├── README.md
│       │   ├── auto_ckpt_batchsize_test.py
│       │   ├── auto_ckpt_solver_test.py
│       │   ├── auto_parallel_with_resnet.py
│       │   ├── bench_utils.py
│       │   ├── config.py
│       │   ├── requirements.txt
│       │   ├── setup.py
│       │   └── test_ci.sh
│       ├── download_cifar10.py
│       ├── fastfold/
│       │   └── README.md
│       ├── hybrid_parallel/
│       │   ├── README.md
│       │   ├── config.py
│       │   ├── requirements.txt
│       │   ├── test_ci.sh
│       │   └── train.py
│       ├── large_batch_optimizer/
│       │   ├── README.md
│       │   ├── config.py
│       │   ├── requirements.txt
│       │   ├── test_ci.sh
│       │   └── train.py
│       ├── new_api/
│       │   ├── README.md
│       │   ├── cifar_resnet/
│       │   │   ├── .gitignore
│       │   │   ├── README.md
│       │   │   ├── eval.py
│       │   │   ├── requirements.txt
│       │   │   ├── test_ci.sh
│       │   │   └── train.py
│       │   ├── cifar_vit/
│       │   │   ├── README.md
│       │   │   ├── requirements.txt
│       │   │   ├── test_ci.sh
│       │   │   └── train.py
│       │   ├── glue_bert/
│       │   │   ├── README.md
│       │   │   ├── data.py
│       │   │   ├── finetune.py
│       │   │   ├── requirements.txt
│       │   │   └── test_ci.sh
│       │   └── test_ci.sh
│       ├── opt/
│       │   ├── inference/
│       │   │   ├── README.md
│       │   │   ├── batch.py
│       │   │   ├── benchmark/
│       │   │   │   └── locustfile.py
│       │   │   ├── cache.py
│       │   │   ├── opt_fastapi.py
│       │   │   ├── opt_server.py
│       │   │   ├── requirements.txt
│       │   │   └── script/
│       │   │       ├── process-opt-175b/
│       │   │       │   ├── README.md
│       │   │       │   ├── convert_ckpt.py
│       │   │       │   ├── flat-meta.json
│       │   │       │   └── unflat.sh
│       │   │       └── processing_ckpt_66b.py
│       │   ├── opt/
│       │   │   ├── README.md
│       │   │   ├── benchmark.sh
│       │   │   ├── colossalai_zero.py
│       │   │   ├── context.py
│       │   │   ├── requirements.txt
│       │   │   ├── run_clm.py
│       │   │   ├── run_clm.sh
│       │   │   ├── run_clm_synthetic.sh
│       │   │   └── test_ci.sh
│       │   └── test_ci.sh
│       ├── requirements.txt
│       └── sequence_parallel/
│           ├── README.md
│           ├── config.py
│           ├── data/
│           │   ├── __init__.py
│           │   ├── bert_helper.py
│           │   ├── datasets/
│           │   │   ├── Makefile
│           │   │   ├── __init__.py
│           │   │   ├── bert_dataset.py
│           │   │   ├── blendable_dataset.py
│           │   │   ├── builder.py
│           │   │   ├── data_samplers.py
│           │   │   ├── dataset_utils.py
│           │   │   ├── helpers.cpp
│           │   │   ├── ict_dataset.py
│           │   │   ├── indexed_dataset.py
│           │   │   └── test/
│           │   │       ├── test_indexed_dataset.py
│           │   │       └── test_preprocess_data.sh
│           │   ├── dummy_dataloader.py
│           │   └── tokenizer/
│           │       ├── __init__.py
│           │       ├── bert_tokenization.py
│           │       └── tokenizer.py
│           ├── loss_func/
│           │   ├── __init__.py
│           │   ├── bert_loss.py
│           │   ├── cross_entropy.py
│           │   └── utils.py
│           ├── lr_scheduler/
│           │   ├── __init__.py
│           │   └── annealing_lr.py
│           ├── model/
│           │   ├── __init__.py
│           │   ├── bert.py
│           │   └── layers/
│           │       ├── __init__.py
│           │       ├── bert_layer.py
│           │       ├── dropout.py
│           │       ├── embedding.py
│           │       ├── head.py
│           │       ├── init_method.py
│           │       ├── linear.py
│           │       ├── mlp.py
│           │       ├── pooler.py
│           │       └── preprocess.py
│           ├── requirements.txt
│           ├── test_ci.sh
│           └── train.py
├── extensions/
│   ├── README.md
│   ├── __init__.py
│   ├── base_extension.py
│   ├── cpp_extension.py
│   ├── csrc/
│   │   ├── __init__.py
│   │   ├── common/
│   │   │   ├── data_type.h
│   │   │   ├── micros.h
│   │   │   ├── mp_type_traits.h
│   │   │   ├── target.h
│   │   │   └── vec_type_traits.h
│   │   ├── funcs/
│   │   │   ├── binary_functor.h
│   │   │   ├── cast_functor.h
│   │   │   ├── reduce_function.h
│   │   │   ├── ternary_functor.h
│   │   │   └── unary_functor.h
│   │   └── kernel/
│   │       ├── arm/
│   │       │   ├── cpu_adam_arm.cpp
│   │       │   └── cpu_adam_arm.h
│   │       ├── cuda/
│   │       │   ├── activation_kernel.cu
│   │       │   ├── attention/
│   │       │   │   └── attention_utils.h
│   │       │   ├── context_kv_cache_memcpy_kernel.cu
│   │       │   ├── convert_fp8_kernel.cu
│   │       │   ├── decode_kv_cache_memcpy_kernel.cu
│   │       │   ├── flash_decoding_attention_kernel.cu
│   │       │   ├── fused_rotary_emb_and_cache_kernel.cu
│   │       │   ├── get_cos_and_sin_kernel.cu
│   │       │   ├── layer_norm_kernel.cu
│   │       │   ├── moe_kernel.cu
│   │       │   ├── multi_tensor_adam_kernel.cu
│   │       │   ├── multi_tensor_apply.cuh
│   │       │   ├── multi_tensor_l2norm_kernel.cu
│   │       │   ├── multi_tensor_lamb_kernel.cu
│   │       │   ├── multi_tensor_scale_kernel.cu
│   │       │   ├── multi_tensor_sgd_kernel.cu
│   │       │   ├── rms_layernorm_kernel.cu
│   │       │   ├── scaled_masked_softmax_kernel.cu
│   │       │   ├── scaled_upper_triang_masked_softmax_kernel.cu
│   │       │   └── utils/
│   │       │       ├── gpu_launch_config.h
│   │       │       ├── micros.h
│   │       │       ├── nvgpu_dev_info.h
│   │       │       └── vec_copy.h
│   │       └── x86/
│   │           ├── cpu_adam.cpp
│   │           └── cpu_adam.h
│   ├── cuda_extension.py
│   ├── pybind/
│   │   ├── __init__.py
│   │   ├── cpu_adam/
│   │   │   ├── __init__.py
│   │   │   ├── cpu_adam_arm.py
│   │   │   └── cpu_adam_x86.py
│   │   ├── flash_attention/
│   │   │   ├── __init__.py
│   │   │   ├── flash_attention_dao_cuda.py
│   │   │   ├── flash_attention_npu.py
│   │   │   └── flash_attention_sdpa_cuda.py
│   │   ├── inference/
│   │   │   ├── __init__.py
│   │   │   ├── inference.cpp
│   │   │   └── inference_ops_cuda.py
│   │   ├── layernorm/
│   │   │   ├── __init__.py
│   │   │   ├── layer_norm.cpp
│   │   │   └── layernorm_cuda.py
│   │   ├── moe/
│   │   │   ├── __init__.py
│   │   │   ├── moe.cpp
│   │   │   └── moe_cuda.py
│   │   ├── optimizer/
│   │   │   ├── __init__.py
│   │   │   ├── fused_optimizer_cuda.py
│   │   │   └── optimizer.cpp
│   │   └── softmax/
│   │       ├── __init__.py
│   │       ├── scaled_masked_softmax.cpp
│   │       ├── scaled_masked_softmax_cuda.py
│   │       ├── scaled_upper_triang_masked_softmax.cpp
│   │       └── scaled_upper_triangle_masked_softmax_cuda.py
│   ├── triton_extension.py
│   └── utils.py
├── pytest.ini
├── requirements/
│   ├── requirements-test.txt
│   └── requirements.txt
├── setup.py
├── tests/
│   ├── __init__.py
│   ├── conftest.py
│   ├── kit/
│   │   ├── __init__.py
│   │   └── model_zoo/
│   │       ├── __init__.py
│   │       ├── custom/
│   │       │   ├── __init__.py
│   │       │   ├── base.py
│   │       │   ├── hanging_param_model.py
│   │       │   ├── nested_model.py
│   │       │   ├── repeated_computed_layers.py
│   │       │   ├── simple_mlp.py
│   │       │   └── simple_net.py
│   │       ├── diffusers/
│   │       │   ├── __init__.py
│   │       │   └── diffusers.py
│   │       ├── executor.py
│   │       ├── registry.py
│   │       ├── timm/
│   │       │   ├── __init__.py
│   │       │   └── timm.py
│   │       ├── torchaudio/
│   │       │   ├── __init__.py
│   │       │   └── torchaudio.py
│   │       ├── torchrec/
│   │       │   ├── __init__.py
│   │       │   └── torchrec.py
│   │       ├── torchvision/
│   │       │   ├── __init__.py
│   │       │   └── torchvision.py
│   │       └── transformers/
│   │           ├── __init__.py
│   │           ├── albert.py
│   │           ├── bert.py
│   │           ├── blip2.py
│   │           ├── bloom.py
│   │           ├── chatglm2.py
│   │           ├── command.py
│   │           ├── deepseek.py
│   │           ├── deepseek_v3.py
│   │           ├── falcon.py
│   │           ├── gpt.py
│   │           ├── gptj.py
│   │           ├── llama.py
│   │           ├── mistral.py
│   │           ├── mixtral.py
│   │           ├── opt.py
│   │           ├── qwen2.py
│   │           ├── qwen3.py
│   │           ├── sam.py
│   │           ├── t5.py
│   │           ├── vit.py
│   │           └── whisper.py
│   ├── test_analyzer/
│   │   ├── __init__.py
│   │   ├── test_fx/
│   │   │   ├── __init__.py
│   │   │   ├── test_bias_addition.py
│   │   │   ├── test_mod_dir.py
│   │   │   ├── test_nested_ckpt.py
│   │   │   ├── test_shape_prop.py
│   │   │   ├── test_symbolic_profile.py
│   │   │   └── zoo.py
│   │   └── test_subclasses/
│   │       ├── __init__.py
│   │       ├── test_aten.py
│   │       ├── test_flop_tensor.py
│   │       └── test_meta_mode.py
│   ├── test_auto_parallel/
│   │   ├── __init__.py
│   │   ├── test_ckpt_solvers/
│   │   │   ├── test_C_solver_consistency.py
│   │   │   ├── test_ckpt_torchvision.py
│   │   │   └── test_linearize.py
│   │   ├── test_offload/
│   │   │   ├── model_utils.py
│   │   │   ├── test_perf.py
│   │   │   └── test_solver.py
│   │   ├── test_pass/
│   │   │   ├── __init__.py
│   │   │   ├── test_node_converting_pass.py
│   │   │   └── test_size_value_converting_pass.py
│   │   └── test_tensor_shard/
│   │       ├── __init__.py
│   │       ├── test_bias_addition_forward.py
│   │       ├── test_broadcast.py
│   │       ├── test_checkpoint.py
│   │       ├── test_compatibility_with_ddp.py
│   │       ├── test_compatibility_with_gemini.py
│   │       ├── test_find_repeat_block.py
│   │       ├── test_gpt/
│   │       │   ├── __init__.py
│   │       │   ├── gpt_modules.py
│   │       │   ├── test_runtime_with_gpt_modules.py
│   │       │   └── test_solver_with_gpt_module.py
│   │       ├── test_liveness_analysis.py
│   │       ├── test_metainfo/
│   │       │   ├── test_activation_metainfo.py
│   │       │   ├── test_binary_elementwise_metainfo.py
│   │       │   ├── test_conv_metainfo.py
│   │       │   ├── test_embedding_metainfo.py
│   │       │   ├── test_linear_metainfo.py
│   │       │   ├── test_matmul_metainfo.py
│   │       │   ├── test_norm_metainfo.py
│   │       │   ├── test_pooling_metainfo.py
│   │       │   ├── test_tensor_metainfo.py
│   │       │   ├── test_where_metainfo.py
│   │       │   └── utils.py
│   │       ├── test_node_handler/
│   │       │   ├── __init__.py
│   │       │   ├── test_addbmm_handler.py
│   │       │   ├── test_addmm_handler.py
│   │       │   ├── test_batch_norm_handler.py
│   │       │   ├── test_bias_linear_function_node.py
│   │       │   ├── test_bias_linear_module_node.py
│   │       │   ├── test_binary_elementwise_handler.py
│   │       │   ├── test_bmm_handler.py
│   │       │   ├── test_conv_handler.py
│   │       │   ├── test_default_reshape_handler.py
│   │       │   ├── test_embedding_handler.py
│   │       │   ├── test_getattr_handler.py
│   │       │   ├── test_getitem_handler.py
│   │       │   ├── test_layer_norm_handler.py
│   │       │   ├── test_linear_handler.py
│   │       │   ├── test_matmul_handler.py
│   │       │   ├── test_norm_pooling_handler.py
│   │       │   ├── test_output_handler.py
│   │       │   ├── test_permute_and_transpose_handler.py
│   │       │   ├── test_placeholder_handler.py
│   │       │   ├── test_shard_option.py
│   │       │   ├── test_softmax_handler.py
│   │       │   ├── test_split_handler.py
│   │       │   ├── test_sum_handler.py
│   │       │   ├── test_tensor_constructor.py
│   │       │   ├── test_unary_element_wise_handler.py
│   │       │   ├── test_view_handler.py
│   │       │   ├── test_where_handler.py
│   │       │   └── utils.py
│   │       └── test_solver_with_resnet_v2.py
│   ├── test_autochunk/
│   │   ├── test_autochunk_alphafold/
│   │   │   ├── benchmark_autochunk_alphafold.py
│   │   │   ├── test_autochunk_alphafold_utils.py
│   │   │   ├── test_autochunk_evoformer_block.py
│   │   │   ├── test_autochunk_evoformer_stack.py
│   │   │   └── test_autochunk_extramsa_block.py
│   │   ├── test_autochunk_diffuser/
│   │   │   ├── benchmark_autochunk_diffuser.py
│   │   │   ├── test_autochunk_diffuser_utils.py
│   │   │   └── test_autochunk_unet.py
│   │   ├── test_autochunk_transformer/
│   │   │   ├── benchmark_autochunk_transformer.py
│   │   │   ├── test_autochunk_gpt.py
│   │   │   └── test_autochunk_transformer_utils.py
│   │   └── test_autochunk_vit/
│   │       ├── test_autochunk_vit.py
│   │       └── test_autochunk_vit_utils.py
│   ├── test_booster/
│   │   ├── test_accelerator.py
│   │   ├── test_mixed_precision/
│   │   │   └── test_fp16_torch.py
│   │   └── test_plugin/
│   │       ├── test_3d_plugin.py
│   │       ├── test_dp_plugin_base.py
│   │       ├── test_gemini_plugin.py
│   │       ├── test_low_level_zero_plugin.py
│   │       ├── test_torch_ddp_plugin.py
│   │       └── test_torch_fsdp_plugin.py
│   ├── test_checkpoint_io/
│   │   ├── test_gemini_checkpoint_io.py
│   │   ├── test_gemini_torch_compability.py
│   │   ├── test_general_checkpoint_io.py
│   │   ├── test_hybrid_parallel_plugin_checkpoint_io.py
│   │   ├── test_low_level_zero_checkpoint_io.py
│   │   ├── test_plugins_huggingface_compatibility.py
│   │   ├── test_safetensors_async_io.py
│   │   ├── test_torch_ddp_checkpoint_io.py
│   │   ├── test_torch_fsdp_checkpoint_io.py
│   │   └── utils.py
│   ├── test_cluster/
│   │   ├── test_device_mesh_manager.py
│   │   └── test_process_group_mesh.py
│   ├── test_config/
│   │   ├── sample_config.py
│   │   └── test_load_config.py
│   ├── test_device/
│   │   ├── test_alpha_beta.py
│   │   ├── test_device_mesh.py
│   │   ├── test_extract_alpha_beta.py
│   │   ├── test_init_logical_pg.py
│   │   └── test_search_logical_device_mesh.py
│   ├── test_fp8/
│   │   ├── test_all_to_all_single.py
│   │   ├── test_fp8_all_to_all.py
│   │   ├── test_fp8_all_to_all_single.py
│   │   ├── test_fp8_allgather.py
│   │   ├── test_fp8_allreduce.py
│   │   ├── test_fp8_cast.py
│   │   ├── test_fp8_ddp_comm_hook.py
│   │   ├── test_fp8_fsdp_comm_hook.py
│   │   ├── test_fp8_hook.py
│   │   ├── test_fp8_linear.py
│   │   └── test_fp8_reduce_scatter.py
│   ├── test_fx/
│   │   ├── test_codegen/
│   │   │   ├── test_activation_checkpoint_codegen.py
│   │   │   ├── test_nested_activation_checkpoint_codegen.py
│   │   │   └── test_offload_codegen.py
│   │   ├── test_coloproxy.py
│   │   ├── test_comm_size_compute.py
│   │   ├── test_graph_manipulation.py
│   │   ├── test_meta/
│   │   │   ├── test_aten.py
│   │   │   ├── test_backward.py
│   │   │   └── test_meta_trace.py
│   │   ├── test_meta_info_prop.py
│   │   ├── test_parallel_1d.py
│   │   ├── test_pipeline/
│   │   │   ├── test_hf_model/
│   │   │   │   ├── hf_utils.py
│   │   │   │   ├── test_albert.py
│   │   │   │   ├── test_bert.py
│   │   │   │   ├── test_gpt.py
│   │   │   │   ├── test_opt.py
│   │   │   │   └── test_t5.py
│   │   │   ├── test_timm_model/
│   │   │   │   ├── test_timm.py
│   │   │   │   └── timm_utils.py
│   │   │   ├── test_topo/
│   │   │   │   ├── test_topo.py
│   │   │   │   └── topo_utils.py
│   │   │   └── test_torchvision/
│   │   │       └── test_torchvision.py
│   │   ├── test_pipeline_passes.py
│   │   ├── test_profiler/
│   │   │   ├── gpt_utils.py
│   │   │   └── test_profiler_meta_info_prop.py
│   │   └── test_tracer/
│   │       ├── test_activation_checkpoint_annotation.py
│   │       ├── test_bias_addition_module.py
│   │       ├── test_control_flow.py
│   │       ├── test_functional_conv.py
│   │       ├── test_hf_model/
│   │       │   ├── hf_tracer_utils.py
│   │       │   ├── test_hf_albert.py
│   │       │   ├── test_hf_bert.py
│   │       │   ├── test_hf_diffuser.py
│   │       │   ├── test_hf_gpt.py
│   │       │   ├── test_hf_opt.py
│   │       │   └── test_hf_t5.py
│   │       ├── test_patched_module.py
│   │       ├── test_patched_op.py
│   │       ├── test_timm_model/
│   │       │   └── test_timm_model.py
│   │       ├── test_torchaudio_model/
│   │       │   ├── test_torchaudio_model.py
│   │       │   └── torchaudio_utils.py
│   │       ├── test_torchrec_model/
│   │       │   ├── test_deepfm_model.py
│   │       │   └── test_dlrm_model.py
│   │       └── test_torchvision_model/
│   │           └── test_torchvision_model.py
│   ├── test_infer/
│   │   ├── __init__.py
│   │   ├── _utils.py
│   │   ├── test_async_engine/
│   │   │   ├── test_async_engine.py
│   │   │   └── test_request_tracer.py
│   │   ├── test_batch_bucket.py
│   │   ├── test_config_and_struct.py
│   │   ├── test_continuous_batching.py
│   │   ├── test_cuda_graph.py
│   │   ├── test_drafter.py
│   │   ├── test_inference_engine.py
│   │   ├── test_kernels/
│   │   │   ├── __init__.py
│   │   │   ├── cuda/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── test_convert_fp8.py
│   │   │   │   ├── test_flash_decoding_attention.py
│   │   │   │   ├── test_get_cos_and_sin.py
│   │   │   │   ├── test_kv_cache_memcpy.py
│   │   │   │   ├── test_rms_layernorm.py
│   │   │   │   ├── test_rotary_embdding_unpad.py
│   │   │   │   └── test_silu_and_mul.py
│   │   │   └── triton/
│   │   │       ├── __init__.py
│   │   │       ├── kernel_utils.py
│   │   │       ├── test_context_attn_unpad.py
│   │   │       ├── test_decoding_attn.py
│   │   │       ├── test_fused_rotary_embedding.py
│   │   │       ├── test_kvcache_copy.py
│   │   │       ├── test_rmsnorm_triton.py
│   │   │       ├── test_rotary_embdding_unpad.py
│   │   │       └── test_xine_copy.py
│   │   ├── test_kvcache_manager.py
│   │   ├── test_models/
│   │   │   ├── test_attention.py
│   │   │   ├── test_baichuan.py
│   │   │   └── test_custom_model.py
│   │   ├── test_request_handler.py
│   │   ├── test_rpc_engine.py
│   │   └── test_streamingllm.py
│   ├── test_lazy/
│   │   ├── lazy_init_utils.py
│   │   ├── test_from_pretrained.py
│   │   ├── test_models.py
│   │   └── test_ops.py
│   ├── test_legacy/
│   │   ├── test_amp/
│   │   │   ├── test_naive_fp16.py
│   │   │   └── test_torch_fp16.py
│   │   ├── test_comm/
│   │   │   ├── test_boardcast_send_recv_v2.py
│   │   │   ├── test_comm.py
│   │   │   ├── test_object_list_p2p.py
│   │   │   └── test_object_list_p2p_v2.py
│   │   ├── test_context/
│   │   │   ├── configs/
│   │   │   │   ├── parallel_2d_init.py
│   │   │   │   ├── parallel_2p5d_init.py
│   │   │   │   └── parallel_3d_init.py
│   │   │   └── test_hybrid_parallel.py
│   │   ├── test_data/
│   │   │   ├── test_cifar10_dataset.py
│   │   │   ├── test_data_parallel_sampler.py
│   │   │   └── test_deterministic_dataloader.py
│   │   ├── test_engine/
│   │   │   ├── test_engine.py
│   │   │   └── test_gradient_accumluation.py
│   │   ├── test_layers/
│   │   │   ├── test_1d/
│   │   │   │   ├── checks_1d/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── check_layer_1d.py
│   │   │   │   │   └── common.py
│   │   │   │   └── test_1d.py
│   │   │   ├── test_2d/
│   │   │   │   ├── checks_2d/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── check_layer_2d.py
│   │   │   │   │   ├── check_operation_2d.py
│   │   │   │   │   └── common.py
│   │   │   │   └── test_2d.py
│   │   │   ├── test_2p5d/
│   │   │   │   ├── checks_2p5d/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── check_layer_2p5d.py
│   │   │   │   │   ├── check_operation_2p5d.py
│   │   │   │   │   └── common.py
│   │   │   │   └── test_2p5d.py
│   │   │   ├── test_3d/
│   │   │   │   ├── checks_3d/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── check_layer_3d.py
│   │   │   │   │   └── common.py
│   │   │   │   └── test_3d.py
│   │   │   ├── test_cache_embedding.py
│   │   │   └── test_sequence/
│   │   │       ├── checks_seq/
│   │   │       │   ├── __init__.py
│   │   │       │   └── check_layer_seq.py
│   │   │       └── test_sequence.py
│   │   ├── test_moe/
│   │   │   ├── moe_utils.py
│   │   │   ├── test_grad_handler.py
│   │   │   ├── test_moe_group.py
│   │   │   ├── test_moe_hybrid_zero.py
│   │   │   └── test_moe_load_balance.py
│   │   ├── test_pipeline/
│   │   │   ├── rpc_test_utils.py
│   │   │   ├── test_cuda_rpc_chimera.py
│   │   │   ├── test_cuda_rpc_optimizer.py
│   │   │   ├── test_cuda_rpc_pipeline.py
│   │   │   ├── test_cuda_rpc_value_correctness.py
│   │   │   ├── test_middleware_1f1b.py
│   │   │   ├── test_pipelinable.py
│   │   │   └── test_pipeline_process_group.py
│   │   ├── test_tensor/
│   │   │   ├── common_utils/
│   │   │   │   ├── __init__.py
│   │   │   │   └── _utils.py
│   │   │   ├── core/
│   │   │   │   └── test_dist_spec_mgr.py
│   │   │   └── test_parameter.py
│   │   ├── test_trainer/
│   │   │   ├── test_pipeline/
│   │   │   │   ├── test_p2p.py
│   │   │   │   └── test_pipeline_schedule.py
│   │   │   ├── test_trainer_with_non_pipe_schedule.py
│   │   │   └── test_trainer_with_pipe_schedule.py
│   │   ├── test_utils/
│   │   │   ├── test_activation_checkpointing.py
│   │   │   ├── test_checkpoint/
│   │   │   │   ├── test_checkpoint_1d.py
│   │   │   │   ├── test_checkpoint_2d.py
│   │   │   │   ├── test_checkpoint_2p5d.py
│   │   │   │   └── test_checkpoint_3d.py
│   │   │   ├── test_memory.py
│   │   │   └── test_norm_gradient_clipping.py
│   │   └── test_zero/
│   │       └── test_commons.py
│   ├── test_lora/
│   │   └── test_lora.py
│   ├── test_moe/
│   │   ├── moe_utils.py
│   │   ├── test_deepseek_layer.py
│   │   ├── test_kernel.py
│   │   ├── test_mixtral_layer.py
│   │   ├── test_moe_checkpoint.py
│   │   ├── test_moe_ep_tp.py
│   │   └── test_moe_ep_zero.py
│   ├── test_optimizer/
│   │   ├── _utils.py
│   │   ├── test_adam_kernel.py
│   │   ├── test_adam_optim.py
│   │   ├── test_dist_adafactor.py
│   │   ├── test_dist_came.py
│   │   ├── test_dist_galore.py
│   │   ├── test_dist_lamb.py
│   │   ├── test_lr_scheduler.py
│   │   └── test_nvme.py
│   ├── test_pipeline/
│   │   ├── test_p2p_communication.py
│   │   ├── test_pipeline_utils/
│   │   │   ├── test_t5_pipeline_utils.py
│   │   │   └── test_whisper_pipeline_utils.py
│   │   ├── test_schedule/
│   │   │   ├── test_interleaved.py
│   │   │   ├── test_oneF_oneB.py
│   │   │   ├── test_pipeline_schedule_utils.py
│   │   │   └── test_zerobubble_pp.py
│   │   └── test_stage_manager.py
│   ├── test_shardformer/
│   │   ├── __init__.py
│   │   ├── test_flash_attention.py
│   │   ├── test_hybrid_parallel_grad_clip_norm/
│   │   │   ├── test_amp_optimizer.py
│   │   │   ├── test_naive_optimizer.py
│   │   │   └── test_zero_optimizer.py
│   │   ├── test_layer/
│   │   │   ├── test_dist_crossentropy.py
│   │   │   ├── test_dist_log_prob.py
│   │   │   ├── test_dropout.py
│   │   │   ├── test_embedding.py
│   │   │   ├── test_gpt2_qkv_fused_linear_1d.py
│   │   │   ├── test_layernorm.py
│   │   │   ├── test_linear_1d.py
│   │   │   ├── test_qkv_fused_linear_1d.py
│   │   │   ├── test_ring_attn.py
│   │   │   ├── test_sequence_parallel.py
│   │   │   └── test_vocab_parallel_embedding_1d.py
│   │   ├── test_model/
│   │   │   ├── __init__.py
│   │   │   ├── _utils.py
│   │   │   ├── test_shard_bert.py
│   │   │   ├── test_shard_blip2.py
│   │   │   ├── test_shard_bloom.py
│   │   │   ├── test_shard_chatglm2.py
│   │   │   ├── test_shard_command.py
│   │   │   ├── test_shard_deepseek.py
│   │   │   ├── test_shard_deepseek_v3.py
│   │   │   ├── test_shard_falcon.py
│   │   │   ├── test_shard_gpt2.py
│   │   │   ├── test_shard_gptj.py
│   │   │   ├── test_shard_llama.py
│   │   │   ├── test_shard_mistral.py
│   │   │   ├── test_shard_mixtral.py
│   │   │   ├── test_shard_opt.py
│   │   │   ├── test_shard_qwen2.py
│   │   │   ├── test_shard_qwen3.py
│   │   │   ├── test_shard_sam.py
│   │   │   ├── test_shard_t5.py
│   │   │   ├── test_shard_vit.py
│   │   │   └── test_shard_whisper.py
│   │   ├── test_shard_utils.py
│   │   └── test_with_torch_ddp.py
│   ├── test_smoothquant/
│   │   ├── test_llama_attention.py
│   │   ├── test_llama_mlp.py
│   │   ├── test_smoothquant_linear.py
│   │   └── test_sq_rotary_embedding.py
│   ├── test_tensor/
│   │   ├── test_comm_spec_apply.py
│   │   ├── test_dtensor/
│   │   │   ├── test_comm_spec.py
│   │   │   ├── test_dtensor.py
│   │   │   ├── test_dtensor_sharding_spec.py
│   │   │   └── test_layout_converter.py
│   │   ├── test_mix_gather.py
│   │   ├── test_padded_tensor.py
│   │   ├── test_shape_consistency.py
│   │   ├── test_shape_consistency_apply.py
│   │   └── test_sharding_spec.py
│   └── test_zero/
│       ├── test_gemini/
│       │   ├── test_chunk_mgrv2.py
│       │   ├── test_chunkv2.py
│       │   ├── test_gemini_use_rmt.py
│       │   ├── test_grad_accum.py
│       │   ├── test_grad_clip.py
│       │   ├── test_inference.py
│       │   ├── test_optim.py
│       │   ├── test_runtime_mem_tracer.py
│       │   ├── test_search.py
│       │   ├── test_zeroddp_state_dict.py
│       │   └── test_zerooptim_state_dict.py
│       └── test_low_level/
│           ├── test_coll_nd.py
│           ├── test_grad_acc.py
│           ├── test_mem_leak.py
│           ├── test_zero1_2.py
│           └── test_zero_ckpt.py
└── version.txt

Download .txt

Showing preview only (1,055K chars total). Download the full file or copy to clipboard to get everything.

SYMBOL INDEX (12172 symbols across 1438 files)

FILE: .github/workflows/scripts/check_doc_i18n.py
  function compare_dirs (line 5) | def compare_dirs(dir1, dir2):

FILE: .github/workflows/scripts/example_checks/check_dispatch_inputs.py
  function check_inputs (line 5) | def check_inputs(input_list):
  function main (line 13) | def main():

FILE: .github/workflows/scripts/example_checks/check_example_weekly.py
  function show_files (line 4) | def show_files(path, all_files):
  function join (line 19) | def join(input_list, sep=None):
  function main (line 23) | def main():

FILE: .github/workflows/scripts/example_checks/detect_changed_example.py
  function main (line 4) | def main():

FILE: .github/workflows/scripts/generate_leaderboard_and_send_to_lark.py
  class Counter (line 12) | class Counter(dict):
    method record (line 21) | def record(self, item: str):
    method to_sorted_list (line 27) | def to_sorted_list(self):
  function get_utc_time_one_week_ago (line 33) | def get_utc_time_one_week_ago():
  function datetime2str (line 42) | def datetime2str(dt):
  function str2datetime (line 49) | def str2datetime(string):
  function plot_bar_chart (line 56) | def plot_bar_chart(x: List[Any], y: List[Any], xlabel: str, ylabel: str,...
  function get_organization_repositories (line 69) | def get_organization_repositories(github_token, organization_name) -> Li...
  function get_issue_pull_request_comments (line 90) | def get_issue_pull_request_comments(github_token: str, org_name: str, re...
  function get_discussion_comments (line 141) | def get_discussion_comments(github_token: str, org_name: str, repo_name:...
  function generate_user_engagement_leaderboard_image (line 315) | def generate_user_engagement_leaderboard_image(
  function generate_contributor_leaderboard_image (line 378) | def generate_contributor_leaderboard_image(github_token, org_name, repo_...
  function upload_image_to_lark (line 467) | def upload_image_to_lark(lark_tenant_token: str, image_path: str) -> str:
  function generate_lark_tenant_access_token (line 486) | def generate_lark_tenant_access_token(app_id: str, app_secret: str) -> str:
  function send_image_to_lark (line 500) | def send_image_to_lark(image_key: str, webhook_url: str) -> None:
  function send_message_to_lark (line 512) | def send_message_to_lark(message: str, webhook_url: str):

FILE: .github/workflows/scripts/generate_release_draft.py
  function parse_args (line 14) | def parse_args():
  function get_latest_tag_commit (line 21) | def get_latest_tag_commit(headers=None):
  function get_commit_info (line 29) | def get_commit_info(commit_hash, headers=None):
  function get_all_commit_info (line 35) | def get_all_commit_info(since, headers=None):
  function collate_release_info (line 54) | def collate_release_info(commit_info_list):
  function generate_release_post_markdown (line 78) | def generate_release_post_markdown(current_version, last_version, releas...

FILE: .github/workflows/scripts/send_message_to_lark.py
  function parse_args (line 6) | def parse_args():
  function send_message_to_lark (line 13) | def send_message_to_lark(message, webhook_url):

FILE: .github/workflows/scripts/update_setup_for_nightly.py
  function open_setup_file (line 4) | def open_setup_file():
  function replace_nightly_package_info (line 10) | def replace_nightly_package_info(file_lines):
  function write_setup_file (line 22) | def write_setup_file(file_lines):
  function main (line 27) | def main():

FILE: applications/Colossal-LLaMA/colossal_llama/dataset/conversation.py
  class SeparatorStyle (line 20) | class SeparatorStyle(Enum):
  class Conversation (line 25) | class Conversation:
    method clear (line 33) | def clear(self):
    method get_prompt (line 36) | def get_prompt(self, length: int = None):
    method save_prompt (line 51) | def save_prompt(self):
    method append_message (line 63) | def append_message(self, role, message):
    method copy (line 66) | def copy(self):
    method dict (line 76) | def dict(self):

FILE: applications/Colossal-LLaMA/colossal_llama/dataset/dummy_dataset.py
  class RandomDataset (line 7) | class RandomDataset(Dataset):
    method __init__ (line 8) | def __init__(self, num_samples: int = 1000, max_length: int = 2048, vo...
    method __len__ (line 16) | def __len__(self):
    method __getitem__ (line 19) | def __getitem__(self, idx):

FILE: applications/Colossal-LLaMA/colossal_llama/dataset/loader.py
  function load_tokenized_dataset (line 19) | def load_tokenized_dataset(
  class DataCollatorForSupervisedDataset (line 51) | class DataCollatorForSupervisedDataset(object):
    method __call__ (line 63) | def __call__(self, instances: Sequence[Dict[str, List[int]]]) -> Dict[...
  class StatefulDistributedSampler (line 141) | class StatefulDistributedSampler(DistributedSampler):
    method __init__ (line 146) | def __init__(
    method __iter__ (line 165) | def __iter__(self) -> Iterator:
    method __len__ (line 171) | def __len__(self) -> int:
    method set_start_index (line 174) | def set_start_index(self, start_index: int) -> None:

FILE: applications/Colossal-LLaMA/colossal_llama/dataset/spliced_and_tokenized_dataset.py
  function supervised_tokenize_pretrain (line 30) | def supervised_tokenize_pretrain(
  function supervised_tokenize_sft (line 73) | def supervised_tokenize_sft(
  class ClosedToConstantLengthSplicedDataset (line 188) | class ClosedToConstantLengthSplicedDataset(IterableDataset):
    method __init__ (line 194) | def __init__(
    method __len__ (line 226) | def __len__(self) -> int:
    method __iter__ (line 229) | def __iter__(self) -> Iterable[Dict[str, List[int]]]:

FILE: applications/Colossal-LLaMA/colossal_llama/model/init_model.py
  function main (line 18) | def main():

FILE: applications/Colossal-LLaMA/colossal_llama/tokenizer/init_tokenizer.py
  function expand_vocab_tokenizer (line 23) | def expand_vocab_tokenizer(
  function main (line 62) | def main():

FILE: applications/Colossal-LLaMA/colossal_llama/utils/ckpt_io.py
  function load_json (line 20) | def load_json(file_path: Union[str, os.PathLike]) -> Dict[str, Any]:
  function save_json (line 28) | def save_json(data: Dict[str, Any], file_path: Union[str, os.PathLike]) ...
  function save_checkpoint (line 36) | def save_checkpoint(
  function load_checkpoint (line 71) | def load_checkpoint(

FILE: applications/Colossal-LLaMA/colossal_llama/utils/froze.py
  function freeze_non_embeds_parameters (line 7) | def freeze_non_embeds_parameters(model: LlamaForCausalLM) -> None:
  function unfreeze_parameters (line 16) | def unfreeze_parameters(model: LlamaForCausalLM) -> None:

FILE: applications/Colossal-LLaMA/colossal_llama/utils/neftune_patch.py
  function unwrap (line 18) | def unwrap(model):
  function neftune_post_forward_hook (line 25) | def neftune_post_forward_hook(module, input, output):
  function activate_neftune (line 51) | def activate_neftune(model, neftune_noise_alpha=0.1):
  function deactivate_neftune (line 65) | def deactivate_neftune(model, neftune_hook_handle):

FILE: applications/Colossal-LLaMA/colossal_llama/utils/stream_chat_patch.py
  function get_prompt_template (line 13) | def get_prompt_template(
  function streaming_chat (line 52) | def streaming_chat(
  function stream_generate (line 141) | def stream_generate(

FILE: applications/Colossal-LLaMA/colossal_llama/utils/utils.py
  function all_reduce_mean (line 11) | def all_reduce_mean(tensor: torch.Tensor, plugin: Plugin = None) -> torc...
  function get_model_numel (line 21) | def get_model_numel(model: torch.nn.Module) -> int:
  function format_numel_str (line 25) | def format_numel_str(numel: int) -> str:

FILE: applications/Colossal-LLaMA/dataset/prepare_pretrain_dataset.py
  function main (line 26) | def main():

FILE: applications/Colossal-LLaMA/dataset/prepare_sft_dataset.py
  function main (line 23) | def main():

FILE: applications/Colossal-LLaMA/inference/inference_example.py
  function load_model (line 12) | def load_model(model_path, device="cuda", **kwargs):
  function generate (line 26) | def generate(args):

FILE: applications/Colossal-LLaMA/inference/stream_chat_example.py
  function main (line 9) | def main(args):

FILE: applications/Colossal-LLaMA/setup.py
  function fetch_requirements (line 4) | def fetch_requirements(path):
  function fetch_readme (line 9) | def fetch_readme():
  function fetch_version (line 14) | def fetch_version():

FILE: applications/Colossal-LLaMA/train.py
  function train (line 40) | def train(args) -> None:

FILE: applications/ColossalChat/benchmarks/benchmark_ppo.py
  function get_model_numel (line 39) | def get_model_numel(model: torch.nn.Module, plugin: str, tp: int) -> int:
  function get_gpt_config (line 46) | def get_gpt_config(model_name: str) -> OPTConfig:
  function benchmark_train (line 65) | def benchmark_train(args):

FILE: applications/ColossalChat/benchmarks/dummy_dataset.py
  class DummyLLMDataset (line 6) | class DummyLLMDataset(Dataset):
    method __init__ (line 7) | def __init__(self, keys, seq_len, size=500, gen_fn={}):
    method _generate_data (line 14) | def _generate_data(self):
    method __len__ (line 23) | def __len__(self):
    method __getitem__ (line 26) | def __getitem__(self, idx):

FILE: applications/ColossalChat/benchmarks/ray/1mmt_dummy.py
  function get_free_port (line 23) | def get_free_port():
  function get_local_ip (line 29) | def get_local_ip():
  function main (line 35) | def main(args):

FILE: applications/ColossalChat/benchmarks/ray/mmmt_dummy.py
  function get_free_port (line 23) | def get_free_port():
  function get_local_ip (line 29) | def get_local_ip():
  function main (line 35) | def main(args):

FILE: applications/ColossalChat/coati/dataset/conversation.py
  class Conversation (line 15) | class Conversation:
    method from_config (line 24) | def from_config(cls, tokenizer: PreTrainedTokenizer, config: Dict):
    method clear (line 35) | def clear(self):
    method get_conversation_template_keys (line 39) | def get_conversation_template_keys(cls):
    method __str__ (line 42) | def __str__(self):
    method get_prompt (line 49) | def get_prompt(self, length: int = None, add_generation_prompt=False) ...
    method save_prompt (line 75) | def save_prompt(self):
    method append_message (line 78) | def append_message(self, role: str, message: str):
    method copy (line 92) | def copy(self):
  function setup_conversation_template (line 96) | def setup_conversation_template(

FILE: applications/ColossalChat/coati/dataset/loader.py
  function load_tokenized_dataset (line 24) | def load_tokenized_dataset(
  class DataCollatorForSupervisedDataset (line 58) | class DataCollatorForSupervisedDataset(object):
    method __call__ (line 69) | def __call__(self, instances: Sequence[Dict[str, List[int]]]) -> Dict[...
  class DataCollatorForPromptDataset (line 146) | class DataCollatorForPromptDataset(DataCollatorForSupervisedDataset):
    method __call__ (line 147) | def __call__(self, instances: Sequence[Dict[str, List[int]]]) -> Dict[...
  class DataCollatorForPreferenceDataset (line 170) | class DataCollatorForPreferenceDataset(object):
    method __call__ (line 180) | def __call__(self, instances: Sequence[Dict[str, List[int]]]) -> Dict[...
  class DataCollatorForKTODataset (line 241) | class DataCollatorForKTODataset(object):
    method __call__ (line 255) | def __call__(self, instances: Sequence[Dict[str, List[int]]]) -> Dict[...
  class StatefulDistributedSampler (line 325) | class StatefulDistributedSampler(DistributedSampler):
    method __init__ (line 326) | def __init__(
    method __iter__ (line 338) | def __iter__(self) -> Iterator:
    method __len__ (line 344) | def __len__(self) -> int:
    method set_start_index (line 347) | def set_start_index(self, start_index: int) -> None:
  function apply_chat_template_and_mask (line 351) | def apply_chat_template_and_mask(
  class RawConversationDataset (line 420) | class RawConversationDataset(Dataset):
    method __init__ (line 426) | def __init__(self, tokenizer: PreTrainedTokenizer, input_file: str, ma...
    method __len__ (line 436) | def __len__(self) -> int:
    method __getitem__ (line 439) | def __getitem__(self, index: int):
  function collate_fn_grpo (line 447) | def collate_fn_grpo(batch):

FILE: applications/ColossalChat/coati/dataset/tokenization_utils.py
  function tokenize_sft (line 26) | def tokenize_sft(
  function tokenize_prompt (line 133) | def tokenize_prompt(
  function apply_rlhf_data_format (line 203) | def apply_rlhf_data_format(template: Conversation, tokenizer: Any):
  function tokenize_rlhf (line 226) | def tokenize_rlhf(
  function tokenize_kto (line 342) | def tokenize_kto(

FILE: applications/ColossalChat/coati/dataset/utils.py
  function is_rank_0 (line 11) | def is_rank_0() -> bool:
  function _make_r_io_base (line 15) | def _make_r_io_base(f, mode: str):
  function jload (line 21) | def jload(f, mode="r"):
  function read_string_by_schema (line 29) | def read_string_by_schema(data: Dict[str, Any], schema: str) -> str:
  function pad_to_max_len (line 46) | def pad_to_max_len(
  function chuncate_sequence (line 71) | def chuncate_sequence(sequence: List[torch.Tensor], max_length: int, dty...
  function find_first_occurrence_subsequence (line 82) | def find_first_occurrence_subsequence(seq: torch.Tensor, subseq: torch.T...
  function tokenize_and_concatenate (line 91) | def tokenize_and_concatenate(
  function split_templated_prompt_into_chunks (line 137) | def split_templated_prompt_into_chunks(messages: List[Dict[str, str]], p...

FILE: applications/ColossalChat/coati/distributed/comm.py
  function ray_broadcast_object (line 11) | def ray_broadcast_object(obj: Any, src: int = 0, device=None, group_name...
  function ray_broadcast_tensor_dict (line 36) | def ray_broadcast_tensor_dict(
  class SharedVariableActor (line 79) | class SharedVariableActor:
    method __init__ (line 80) | def __init__(self, number_of_readers: int = 0, buffer_size_limit: int ...
    method pickup_rollout_task (line 90) | def pickup_rollout_task(self, num_tasks: int):
    method append_data (line 108) | def append_data(self, data):
    method get_data (line 113) | def get_data(self, data_uid: int):
    method acquire_process_lock (line 134) | def acquire_process_lock(self, key: str):
    method release_process_lock (line 145) | def release_process_lock(self, key: str):
    method set_signal (line 150) | def set_signal(self, key: str, signal: str):
    method get_signal (line 153) | def get_signal(self):

FILE: applications/ColossalChat/coati/distributed/consumer.py
  class BaseConsumer (line 24) | class BaseConsumer:
    method __init__ (line 25) | def __init__(
    method setup (line 69) | def setup(self) -> None:
    method state_dict (line 108) | def state_dict(self) -> Dict[str, torch.Tensor]:
    method step (line 111) | def step(self, step_idx: int, **kwargs) -> Optional[float]:
    method prepare_mini_batch (line 114) | def prepare_mini_batch(self, effective_group_to_raw_group_mapping: Dic...
    method calculate_effective_group_to_raw_group_mapping (line 138) | def calculate_effective_group_to_raw_group_mapping(self, step):
    method loop (line 149) | def loop(self) -> None:
    method __del__ (line 358) | def __del__(self):
  class SimpleConsumer (line 364) | class SimpleConsumer(BaseConsumer):
    method __init__ (line 365) | def __init__(
    method setup (line 405) | def setup(self):
    method step (line 409) | def step(self, step_idx: int, pbar: Any, **kwargs) -> Optional[float]:
    method state_dict (line 430) | def state_dict(self):

FILE: applications/ColossalChat/coati/distributed/grpo_consumer.py
  class GRPOConsumer (line 19) | class GRPOConsumer(BaseConsumer):
    method __init__ (line 20) | def __init__(
    method setup (line 143) | def setup(self):
    method step (line 174) | def step(self, step_idx: int, pbar: Any, **kwargs) -> Optional[float]:
    method state_dict (line 607) | def state_dict(self):

FILE: applications/ColossalChat/coati/distributed/inference_backend.py
  class BaseInferenceBackend (line 22) | class BaseInferenceBackend:
    method __init__ (line 23) | def __init__(self, model_config: Dict[str, Any], generate_config: Dict...
    method generate (line 26) | def generate(self, input_ids: torch.Tensor, attention_mask: torch.Tens...
    method load_state_dict (line 42) | def load_state_dict(self, state_dict: Dict[str, torch.Tensor]) -> None:
  class TransformersInferenceBackend (line 46) | class TransformersInferenceBackend(BaseInferenceBackend):
    method __init__ (line 56) | def __init__(
    method generate (line 74) | def generate(self, input_ids: torch.Tensor, attention_mask: torch.Tens...
    method load_state_dict (line 125) | def load_state_dict(self, state_dict: Dict[str, torch.Tensor]) -> None:
  class SGLangInferenceBackend (line 129) | class SGLangInferenceBackend(BaseInferenceBackend):
    method __init__ (line 130) | def __init__(
    method generate (line 152) | def generate(self, input_ids: torch.Tensor, attention_mask: torch.Tens...
    method load_state_dict (line 179) | def load_state_dict(self, state_dict: Dict[str, torch.Tensor]) -> None:
  class VLLMInferenceBackend (line 186) | class VLLMInferenceBackend(BaseInferenceBackend):
    method __init__ (line 195) | def __init__(
    method generate (line 219) | def generate(self, input_ids: torch.Tensor, attention_mask: torch.Tens...
    method load_state_dict (line 283) | def load_state_dict(self, state_dict: Dict[str, torch.Tensor]) -> None:

FILE: applications/ColossalChat/coati/distributed/launch.py
  function get_jsonl_size_fast (line 21) | def get_jsonl_size_fast(path: str) -> int:
  function get_dp_size_fast (line 28) | def get_dp_size_fast(n_procs: int, plugin_config: Dict[str, Any]) -> int:
  function launch_distributed (line 36) | def launch_distributed(

FILE: applications/ColossalChat/coati/distributed/launch_zero_bubble.py
  function get_jsonl_size_fast (line 16) | def get_jsonl_size_fast(path: str) -> int:
  function get_dp_size_fast (line 23) | def get_dp_size_fast(n_procs: int, plugin_config: Dict[str, Any]) -> int:
  function launch_distributed (line 31) | def launch_distributed(

FILE: applications/ColossalChat/coati/distributed/loss.py
  class PolicyLoss (line 8) | class PolicyLoss(nn.Module):
    method __init__ (line 13) | def __init__(
    method forward (line 29) | def forward(

FILE: applications/ColossalChat/coati/distributed/producer.py
  class BaseProducer (line 34) | class BaseProducer:
    method __init__ (line 35) | def __init__(
    method setup (line 198) | def setup(self) -> None:
    method rollout (line 212) | def rollout(self, input_ids: torch.Tensor, attention_mask: torch.Tenso...
    method load_state_dict (line 215) | def load_state_dict(self, state_dict: Dict[str, torch.Tensor]) -> None:
    method loop (line 218) | def loop(self) -> None:
    method __del__ (line 417) | def __del__(self):
  class SimpleProducer (line 422) | class SimpleProducer(BaseProducer):
    method __init__ (line 423) | def __init__(
    method rollout (line 483) | def rollout(self, input_ids, attention_mask, **kwargs):
    method __del__ (line 506) | def __del__(self):
    method load_state_dict (line 512) | def load_state_dict(self, state_dict):

FILE: applications/ColossalChat/coati/distributed/profiling_utils.py
  class CustomProfiler (line 5) | class CustomProfiler:
    method __init__ (line 6) | def __init__(self, name, disabled=True):
    method _log (line 13) | def _log(self, message):
    method log (line 20) | def log(self, message):
    method enter (line 27) | def enter(self, event_name):
    method exit (line 30) | def exit(self, event_name):
    method close (line 33) | def close(self):

FILE: applications/ColossalChat/coati/distributed/reward/code_reward/testing_util.py
  function truncatefn (line 43) | def truncatefn(s, length=300):
  class CODE_TYPE (line 51) | class CODE_TYPE(Enum):
  class Capturing (line 59) | class Capturing(list):
    method __enter__ (line 60) | def __enter__(self):
    method __exit__ (line 67) | def __exit__(self, *args):
  function only_int_check (line 73) | def only_int_check(val):
  function string_int_check (line 77) | def string_int_check(val):
  function combined_int_check (line 81) | def combined_int_check(val):
  function clean_traceback (line 85) | def clean_traceback(error_traceback):
  function run_test (line 92) | def run_test(in_outs, test=None, debug=False, timeout=15, run_all_tests=...
  function custom_compare_ (line 551) | def custom_compare_(output, ground_truth):
  function stripped_string_compare (line 566) | def stripped_string_compare(s1, s2):
  function call_method (line 572) | def call_method(method, inputs):
  function reliability_guard (line 598) | def reliability_guard(maximum_memory_bytes=None):

FILE: applications/ColossalChat/coati/distributed/reward/code_reward/utils.py
  function _temp_run (line 27) | def _temp_run(sample, generation, debug, result, metadata_list, timeout):
  function check_correctness (line 39) | def check_correctness(in_outs: Optional[dict], generation, timeout=10, d...
  function check_correctness_code_api (line 61) | def check_correctness_code_api(

FILE: applications/ColossalChat/coati/distributed/reward/reward_fn.py
  function verify_math_representation (line 36) | def verify_math_representation(completion, gt_answer):
  function verify_model_answer (line 76) | def verify_model_answer(decoded_final_answer, gt_answer, ans_acc, acc_sc...
  function math_reward_fn (line 99) | def math_reward_fn(input_ids, gt_answer, response_idx, **kwargs):
  function boxed_math_reward_fn (line 160) | def boxed_math_reward_fn(input_ids, gt_answer, response_idx, **kwargs):
  function code_reward_fn (line 225) | def code_reward_fn(input_ids, test_cases, response_idx, **kwargs):

FILE: applications/ColossalChat/coati/distributed/reward/reward_utils.py
  function validate_response_structure (line 20) | def validate_response_structure(processed_str: str, tags: Dict = None) -...
  function extract_solution (line 58) | def extract_solution(solution_str: str) -> Tuple[Optional[str], str]:
  function extract_boxed_solution (line 79) | def extract_boxed_solution(text: str) -> Optional[str]:

FILE: applications/ColossalChat/coati/distributed/reward/verifiable_reward.py
  class VerifiableReward (line 11) | class VerifiableReward:
    method __init__ (line 12) | def __init__(self, reward_fns: List[callable], **kwargs: List[Dict[str...
    method __call__ (line 16) | def __call__(

FILE: applications/ColossalChat/coati/distributed/utils.py
  function unbind_batch (line 11) | def unbind_batch(batch: Dict[str, torch.Tensor]) -> List[Dict[str, torch...
  function bind_batch (line 25) | def bind_batch(batches: List[Dict[str, torch.Tensor]]) -> Dict[str, torc...
  function pre_send (line 32) | def pre_send(batch: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]:
  function post_recv (line 41) | def post_recv(batch: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]:
  function update_by_default (line 50) | def update_by_default(data: Dict[str, Any], default: Dict[str, Any]) -> ...
  function log_probs_from_logits (line 58) | def log_probs_from_logits(logits: torch.Tensor, labels: torch.Tensor) ->...
  function memory_efficient_logprob (line 74) | def memory_efficient_logprob(
  function entropy_from_logits (line 113) | def entropy_from_logits(logits: torch.Tensor) -> torch.Tensor:
  function masked_mean (line 123) | def masked_mean(tensor: torch.Tensor, mask: torch.Tensor, dim: int = 1) ...
  function masked_sum (line 143) | def masked_sum(tensor: torch.Tensor, mask: torch.Tensor, dim: int = 1) -...
  function safe_append_to_jsonl_file (line 160) | def safe_append_to_jsonl_file(file_path, data):

FILE: applications/ColossalChat/coati/distributed/zero_bubble/consumer.py
  class BaseConsumer (line 21) | class BaseConsumer:
    method __init__ (line 22) | def __init__(
    method setup (line 69) | def setup(self) -> None:
    method get_ddp_config (line 94) | def get_ddp_config(self) -> Dict[str, Any]:
    method init_collective_group (line 110) | def init_collective_group(
    method state_dict (line 123) | def state_dict(self) -> Dict[str, torch.Tensor]:
    method step (line 126) | def step(self, **kwargs) -> Optional[float]:
    method prepare_mini_batch (line 129) | def prepare_mini_batch(self, effective_group_to_raw_group_mapping: Dic...
    method calculate_effective_group_to_raw_group_mapping (line 153) | def calculate_effective_group_to_raw_group_mapping(self):
    method loop (line 160) | def loop(self) -> None:
    method __del__ (line 345) | def __del__(self):

FILE: applications/ColossalChat/coati/distributed/zero_bubble/distributor.py
  class Distributor (line 13) | class Distributor:
    method __init__ (line 14) | def __init__(
    method init_collective_group (line 31) | def init_collective_group(
    method loop (line 44) | def loop(self):
    method get_weight_version (line 123) | def get_weight_version(self):

FILE: applications/ColossalChat/coati/distributed/zero_bubble/grpo_consumer.py
  class GRPOConsumer (line 19) | class GRPOConsumer(BaseConsumer):
    method __init__ (line 20) | def __init__(
    method setup (line 134) | def setup(self):
    method step (line 164) | def step(self, pbar: Any, **kwargs) -> Optional[float]:
    method state_dict (line 531) | def state_dict(self):

FILE: applications/ColossalChat/coati/distributed/zero_bubble/producer.py
  class BaseProducer (line 33) | class BaseProducer:
    method __init__ (line 34) | def __init__(
    method init_collective_group (line 193) | def init_collective_group(
    method rollout (line 206) | def rollout(self, input_ids: torch.Tensor, attention_mask: torch.Tenso...
    method load_state_dict (line 209) | def load_state_dict(self, state_dict: Dict[str, torch.Tensor]) -> None:
    method loop (line 212) | def loop(self) -> None:
    method __del__ (line 441) | def __del__(self):
  class SimpleProducer (line 446) | class SimpleProducer(BaseProducer):
    method __init__ (line 447) | def __init__(
    method rollout (line 510) | def rollout(self, input_ids, attention_mask, **kwargs):
    method __del__ (line 533) | def __del__(self):
    method load_state_dict (line 539) | def load_state_dict(self, state_dict):

FILE: applications/ColossalChat/coati/experience_buffer/base.py
  class ExperienceBuffer (line 7) | class ExperienceBuffer(ABC):
    method __init__ (line 15) | def __init__(self, sample_batch_size: int, limit: int = 0) -> None:
    method append (line 22) | def append(self, experience: Experience) -> None:
    method clear (line 26) | def clear(self) -> None:
    method sample (line 30) | def sample(self) -> Experience:
    method __len__ (line 34) | def __len__(self) -> int:
    method __getitem__ (line 38) | def __getitem__(self, idx: int) -> Any:
    method collate_fn (line 42) | def collate_fn(self, batch: Any) -> Experience:

FILE: applications/ColossalChat/coati/experience_buffer/naive.py
  class NaiveExperienceBuffer (line 15) | class NaiveExperienceBuffer(ExperienceBuffer):
    method __init__ (line 24) | def __init__(self, sample_batch_size: int, limit: int = 0, cpu_offload...
    method append (line 34) | def append(self, experience: Experience) -> None:
    method clear (line 49) | def clear(self) -> None:
    method sample (line 53) | def sample(self) -> Experience:
    method __len__ (line 69) | def __len__(self) -> int:
    method __getitem__ (line 72) | def __getitem__(self, idx: int) -> BufferItem:
    method collate_fn (line 75) | def collate_fn(self, batch) -> Experience:

FILE: applications/ColossalChat/coati/experience_buffer/utils.py
  class BufferItem (line 10) | class BufferItem:
  function split_experience_batch (line 35) | def split_experience_batch(experience: Experience) -> List[BufferItem]:
  function _zero_pad_sequences (line 53) | def _zero_pad_sequences(sequences: List[torch.Tensor], side: str = "left...
  function make_experience_batch (line 64) | def make_experience_batch(items: List[BufferItem]) -> Experience:

FILE: applications/ColossalChat/coati/experience_maker/base.py
  class Experience (line 11) | class Experience:
    method to_device (line 38) | def to_device(self, device: torch.device) -> None:
    method pin_memory (line 50) | def pin_memory(self):
  class ExperienceMaker (line 64) | class ExperienceMaker(ABC):
    method __init__ (line 69) | def __init__(
    method make_experience (line 79) | def make_experience(self, input_ids: torch.Tensor, attention_mask: tor...

FILE: applications/ColossalChat/coati/experience_maker/naive.py
  function is_rank_0 (line 24) | def is_rank_0() -> bool:
  class NaiveExperienceMaker (line 28) | class NaiveExperienceMaker(ExperienceMaker):
    method __init__ (line 33) | def __init__(
    method calculate_advantage (line 64) | def calculate_advantage(self, value: torch.Tensor, reward: torch.Tenso...
    method make_experience (line 87) | def make_experience(

FILE: applications/ColossalChat/coati/models/base.py
  class BaseModel (line 12) | class BaseModel(nn.Module):
    method __init__ (line 22) | def __init__(self, pretrained: str = None, config: Optional[Pretrained...
    method resize_token_embeddings (line 46) | def resize_token_embeddings(self, *args, **kwargs):

FILE: applications/ColossalChat/coati/models/critic.py
  class Critic (line 13) | class Critic(BaseModel):
    method __init__ (line 22) | def __init__(self, pretrained: str = None, config: Optional[Pretrained...
    method forward (line 27) | def forward(self, input_ids: torch.LongTensor, attention_mask: Optiona...
    method get_input_embeddings (line 36) | def get_input_embeddings(self):
    method get_output_embeddings (line 39) | def get_output_embeddings(self):

FILE: applications/ColossalChat/coati/models/generation.py
  function _prepare_logits_processor (line 19) | def _prepare_logits_processor(
  function _is_sequence_finished (line 44) | def _is_sequence_finished(unfinished_sequences: torch.Tensor) -> bool:
  function update_model_kwargs_fn (line 61) | def update_model_kwargs_fn(outputs: dict, new_mask, **model_kwargs) -> d...
  function prepare_inputs_fn (line 92) | def prepare_inputs_fn(input_ids: torch.Tensor, **model_kwargs) -> dict:
  function _sample (line 97) | def _sample(
  function generate (line 200) | def generate(
  function _sample_streaming (line 262) | def _sample_streaming(
  function generate_streaming (line 378) | def generate_streaming(

FILE: applications/ColossalChat/coati/models/lora.py
  class LoraManager (line 22) | class LoraManager:
  class LoraConfig (line 30) | class LoraConfig:
    method from_file (line 40) | def from_file(cls, config_file: str):
  class LoraBase (line 48) | class LoraBase(lora.LoRALayer, nn.Module):
    method __init__ (line 49) | def __init__(
    method reset_parameters (line 68) | def reset_parameters(self):
    method train (line 103) | def train(self, mode: bool = True):
  class LoraLinear (line 124) | class LoraLinear(LoraBase):
    method __init__ (line 127) | def __init__(
    method forward (line 160) | def forward(self, x: torch.Tensor):
  class LoraEmbedding (line 169) | class LoraEmbedding(LoraBase):
    method __init__ (line 172) | def __init__(
    method _embed (line 218) | def _embed(self, x: torch.Tensor, weight) -> torch.Tensor:
    method forward (line 229) | def forward(self, x: torch.Tensor):
    method train (line 239) | def train(self, mode: bool = True):
  function _lora_linear_wrapper (line 260) | def _lora_linear_wrapper(linear: nn.Linear, lora_config: LoraConfig) -> ...
  function _convert_to_lora_recursively (line 287) | def _convert_to_lora_recursively(module: nn.Module, parent_name: str, lo...
  function convert_to_lora_module (line 337) | def convert_to_lora_module(module: nn.Module, lora_config: LoraConfig) -...

FILE: applications/ColossalChat/coati/models/loss.py
  class GPTLMLoss (line 14) | class GPTLMLoss(nn.Module):
    method __init__ (line 19) | def __init__(self):
    method forward (line 24) | def forward(self, logits: torch.Tensor, labels: torch.Tensor) -> torch...
  class PolicyLoss (line 31) | class PolicyLoss(nn.Module):
    method __init__ (line 36) | def __init__(self, clip_eps: float = 0.2, skip_threshold: float = 20.0...
    method forward (line 41) | def forward(
  class ValueLoss (line 70) | class ValueLoss(nn.Module):
    method __init__ (line 75) | def __init__(self, clip_eps: float = 0.2) -> None:
    method forward (line 79) | def forward(
  class DpoLoss (line 97) | class DpoLoss(nn.Module):
    method __init__ (line 106) | def __init__(self, beta: float = 0.1, gamma: float = 0.0):
    method forward (line 118) | def forward(
  class LogSigLoss (line 174) | class LogSigLoss(nn.Module):
    method forward (line 180) | def forward(self, chosen_reward: torch.Tensor, reject_reward: torch.Te...
  class LogExpLoss (line 184) | class LogExpLoss(nn.Module):
    method forward (line 190) | def forward(self, chosen_reward: torch.Tensor, reject_reward: torch.Te...
  class OddsRatioLoss (line 195) | class OddsRatioLoss(nn.Module):
    method forward (line 201) | def forward(
  class KTOLoss (line 219) | class KTOLoss(nn.Module):
    method __init__ (line 220) | def __init__(self, beta: float = 0.1, desirable_weight: float = 1.0, u...
    method forward (line 232) | def forward(

FILE: applications/ColossalChat/coati/models/reward_model.py
  class RewardModel (line 13) | class RewardModel(BaseModel):
    method __init__ (line 23) | def __init__(self, pretrained: str = None, config: Optional[Pretrained...
    method forward (line 28) | def forward(
    method get_input_embeddings (line 43) | def get_input_embeddings(self):
    method get_output_embeddings (line 46) | def get_output_embeddings(self):

FILE: applications/ColossalChat/coati/models/rlvr_reward_model.py
  class RLVRRewardModel (line 10) | class RLVRRewardModel:
    method __init__ (line 19) | def __init__(self, reward_fn_list: List[Callable], **kwargs) -> None:
    method __call__ (line 23) | def __call__(
    method to (line 46) | def to(self, device):
    method eval (line 49) | def eval(self):

FILE: applications/ColossalChat/coati/models/utils.py
  function get_model_numel (line 9) | def get_model_numel(model: torch.nn.Module) -> int:
  function compute_reward (line 13) | def compute_reward(
  function _log_probs_from_logits (line 41) | def _log_probs_from_logits(logits: torch.Tensor, labels: torch.Tensor) -...
  function calc_action_log_probs (line 57) | def calc_action_log_probs(logits: torch.Tensor, sequences: torch.LongTen...
  function masked_mean (line 72) | def masked_mean(tensor: torch.Tensor, mask: torch.Tensor, dim: int = 1) ...
  function calc_masked_log_probs (line 92) | def calc_masked_log_probs(
  function load_json (line 115) | def load_json(file_path: Union[str, os.PathLike]) -> Dict[str, Any]:
  function save_json (line 123) | def save_json(data: Dict[str, Any], file_path: Union[str, os.PathLike]) ...
  function disable_dropout (line 131) | def disable_dropout(model: torch.nn.Module):
  function repad_to_left (line 147) | def repad_to_left(tensor, tokenizer):

FILE: applications/ColossalChat/coati/quant/llama_gptq/loader.py
  function load_quant (line 8) | def load_quant(model: nn.Module, checkpoint: str, wbits: int, groupsize:...

FILE: applications/ColossalChat/coati/quant/llama_gptq/model_utils.py
  function find_layers (line 6) | def find_layers(module, layers=[nn.Conv2d, nn.Linear], name=""):

FILE: applications/ColossalChat/coati/quant/llama_gptq/quant.py
  function quantize (line 10) | def quantize(x, scale, zero, maxq):
  class Quantizer (line 15) | class Quantizer(nn.Module):
    method __init__ (line 16) | def __init__(self, shape=1):
    method configure (line 22) | def configure(self, bits, perchannel=False, sym=True, mse=False, norm=...
    method find_params (line 31) | def find_params(self, x, weight=False):
    method quantize (line 110) | def quantize(self, x):
    method enabled (line 115) | def enabled(self):
    method ready (line 118) | def ready(self):
  class QuantLinear (line 130) | class QuantLinear(nn.Module):
    method __init__ (line 131) | def __init__(self, bits, groupsize, infeatures, outfeatures):
    method pack (line 150) | def pack(self, linear, scales, zeros):
    method forward (line 239) | def forward(self, x):
  function make_quant (line 274) | def make_quant(module, names, bits, groupsize, name=""):

FILE: applications/ColossalChat/coati/quant/utils.py
  function _noop (line 6) | def _noop(*args, **kwargs):
  function low_resource_init (line 11) | def low_resource_init():

FILE: applications/ColossalChat/coati/ray/callbacks/base.py
  class TrainerCallback (line 6) | class TrainerCallback(ABC):
    method on_fit_start (line 11) | def on_fit_start(self) -> None:
    method on_fit_end (line 14) | def on_fit_end(self) -> None:
    method on_episode_start (line 17) | def on_episode_start(self, episode: int) -> None:
    method on_episode_end (line 20) | def on_episode_end(self, episode: int) -> None:
    method on_epoch_start (line 23) | def on_epoch_start(self, epoch: int) -> None:
    method on_epoch_end (line 26) | def on_epoch_end(self, epoch: int) -> None:
    method on_batch_start (line 29) | def on_batch_start(self) -> None:
    method on_batch_end (line 32) | def on_batch_end(self, metrics: dict, experience: Experience) -> None:
    method on_update_start (line 35) | def on_update_start(self) -> None:
    method on_update_end (line 38) | def on_update_end(self) -> None:
  class MakerCallback (line 42) | class MakerCallback(ABC):
    method on_loop_start (line 43) | def on_loop_start(self) -> None:
    method on_loop_end (line 46) | def on_loop_end(self) -> None:
    method on_make_experience_start (line 49) | def on_make_experience_start(self) -> None:
    method on_make_experience_end (line 52) | def on_make_experience_end(self, experience: Experience) -> None:
    method on_send_start (line 55) | def on_send_start(self) -> None:
    method on_send_end (line 58) | def on_send_end(self) -> None:
    method on_batch_start (line 61) | def on_batch_start(self) -> None:
    method on_batch_end (line 64) | def on_batch_end(self) -> None:

FILE: applications/ColossalChat/coati/ray/callbacks/performance_evaluator.py
  function get_world_size (line 11) | def get_world_size() -> int:
  function print_rank_0 (line 17) | def print_rank_0(*args, **kwargs) -> None:
  function all_reduce_mean (line 23) | def all_reduce_mean(x: float, world_size: int) -> float:
  class Timer (line 32) | class Timer:
    method __init__ (line 33) | def __init__(self) -> None:
    method start (line 37) | def start(self) -> None:
    method end (line 40) | def end(self) -> None:
    method reset (line 43) | def reset(self) -> None:
  class ExperienceMakerPerformanceEvaluator (line 47) | class ExperienceMakerPerformanceEvaluator(MakerCallback):
    method __init__ (line 48) | def __init__(
    method on_make_experience_start (line 68) | def on_make_experience_start(self) -> None:
    method on_make_experience_end (line 71) | def on_make_experience_end(self, experience: Experience) -> None:
    method on_send_start (line 92) | def on_send_start(self) -> None:
    method on_send_end (line 95) | def on_send_end(self) -> None:
    method on_batch_start (line 98) | def on_batch_start(self) -> None:
    method on_batch_end (line 101) | def on_batch_end(self) -> None:
    method on_loop_end (line 104) | def on_loop_end(self) -> None:
  class TrainerPerformanceEvaluator (line 127) | class TrainerPerformanceEvaluator(TrainerCallback):
    method __init__ (line 128) | def __init__(
    method on_episode_start (line 153) | def on_episode_start(self, episodes: int) -> None:
    method on_episode_end (line 159) | def on_episode_end(self, episodes: int) -> None:
    method on_batch_start (line 164) | def on_batch_start(self) -> None:
    method on_batch_end (line 169) | def on_batch_end(self, metrics: dict, experience: Experience) -> None:
    method on_update_start (line 183) | def on_update_start(self) -> None:
    method on_update_end (line 188) | def on_update_end(self) -> None:
    method on_fit_end (line 193) | def on_fit_end(self) -> None:

FILE: applications/ColossalChat/coati/ray/detached_replay_buffer.py
  class DetachedReplayBuffer (line 11) | class DetachedReplayBuffer:
    method __init__ (line 24) | def __init__(self, sample_batch_size: int, limit: int = 0) -> None:
    method append (line 31) | def append(self, experience: Experience) -> None:
    method extend (line 39) | def extend(self, items: List[BufferItem]) -> None:
    method clear (line 50) | def clear(self) -> None:
    method sample (line 58) | def sample(self, worker_rank=0, to_device="cpu") -> Experience:
    method _sample_and_erase (line 64) | def _sample_and_erase(self) -> Experience:
    method get_length (line 68) | def get_length(self) -> int:

FILE: applications/ColossalChat/coati/ray/detached_trainer_base.py
  class DetachedTrainer (line 17) | class DetachedTrainer(ABC):
    method __init__ (line 33) | def __init__(
    method update_target_holder_list (line 51) | def update_target_holder_list(self):
    method _update_remote_makers (line 59) | def _update_remote_makers(self, fully_update: bool = False, **kwargs):
    method sync_models_to_remote_makers (line 62) | def sync_models_to_remote_makers(self, **kwargs):
    method training_step (line 66) | def training_step(self, experience: Experience) -> Dict[str, Any]:
    method _learn (line 69) | def _learn(self, update_steps: int, train_epochs: int) -> None:
    method _learn_epoch (line 86) | def _learn_epoch(self, pbar: tqdm, data: List[Experience]) -> None:
    method fit (line 105) | def fit(self, total_steps: int, update_steps: int, train_epochs: int =...
    method buffer_get_length (line 117) | def buffer_get_length(self):
    method buffer_append (line 124) | def buffer_append(self, experience: Experience):
    method buffer_extend (line 131) | def buffer_extend(self, items: List[BufferItem]):
    method _buffer_sample (line 138) | def _buffer_sample(self):
    method _on_fit_start (line 141) | def _on_fit_start(self) -> None:
    method _on_fit_end (line 145) | def _on_fit_end(self) -> None:
    method _on_episode_start (line 149) | def _on_episode_start(self, episode: int) -> None:
    method _on_episode_end (line 153) | def _on_episode_end(self, episode: int) -> None:
    method _on_epoch_start (line 157) | def _on_epoch_start(self, epoch: int) -> None:
    method _on_epoch_end (line 161) | def _on_epoch_end(self, epoch: int) -> None:
    method _on_batch_start (line 165) | def _on_batch_start(self) -> None:
    method _on_batch_end (line 169) | def _on_batch_end(self, metrics: dict, experience: Experience) -> None:
    method _on_update_start (line 173) | def _on_update_start(self) -> None:
    method _on_update_end (line 177) | def _on_update_end(self) -> None:

FILE: applications/ColossalChat/coati/ray/detached_trainer_ppo.py
  class DetachedPPOTrainer (line 22) | class DetachedPPOTrainer(DetachedTrainer):
    method __init__ (line 43) | def __init__(
    method _update_remote_makers (line 104) | def _update_remote_makers(self, fully_update: bool = False, **config):
    method training_step (line 142) | def training_step(self, experience: Experience) -> Dict[str, float]:
    method strategy_save_actor (line 167) | def strategy_save_actor(self, path: str, only_rank0: bool = False) -> ...
    method strategy_save_critic (line 170) | def strategy_save_critic(self, path: str, only_rank0: bool = False) ->...
    method strategy_save_actor_optim (line 173) | def strategy_save_actor_optim(self, path: str, only_rank0: bool = Fals...
    method strategy_save_critic_optim (line 176) | def strategy_save_critic_optim(self, path: str, only_rank0: bool = Fal...
    method _get_model_state_dict_shard (line 179) | def _get_model_state_dict_shard(self, model: torch.nn.Module, fully_up...
    method _get_model_lora_config_dict (line 187) | def _get_model_lora_config_dict(self, model: torch.nn.Module):

FILE: applications/ColossalChat/coati/ray/experience_maker_holder.py
  class ExperienceMakerHolder (line 22) | class ExperienceMakerHolder:
    method __init__ (line 31) | def __init__(
    method _get_ready (line 93) | def _get_ready(self):
    method _fully_initialized (line 97) | def _fully_initialized(self):
    method _init_target_trainer_list (line 100) | def _init_target_trainer_list(self):
    method _make_experience (line 108) | def _make_experience(self, inputs: Union[Tensor, Dict[str, Tensor]]) -...
    method _send_items (line 117) | def _send_items(self, experience: Experience) -> None:
    method _inference_step (line 128) | def _inference_step(self, batch) -> None:
    method workingloop (line 141) | def workingloop(self, dataloader_fn: Callable[[], Iterable], num_epoch...
    method update_experience_maker (line 171) | def update_experience_maker(
    method _on_make_experience_start (line 231) | def _on_make_experience_start(self) -> None:
    method _on_make_experience_end (line 235) | def _on_make_experience_end(self, experience: Experience) -> None:
    method _on_loop_start (line 239) | def _on_loop_start(self) -> None:
    method _on_loop_end (line 243) | def _on_loop_end(self) -> None:
    method _on_send_start (line 247) | def _on_send_start(self) -> None:
    method _on_send_end (line 251) | def _on_send_end(self) -> None:
    method _on_batch_start (line 255) | def _on_batch_start(self) -> None:
    method _on_batch_end (line 259) | def _on_batch_end(self) -> None:
  function _set_default_generate_kwargs (line 264) | def _set_default_generate_kwargs(generate_kwargs: dict, actor: Actor) ->...

FILE: applications/ColossalChat/coati/ray/lora_constructor.py
  class LoRAConfig (line 10) | class LoRAConfig:
  class LoRAConstructor (line 17) | class LoRAConstructor:
    method __init__ (line 39) | def __init__(self):
    method register_lora_config (line 42) | def register_lora_config(self, lora_config_dict: Dict[str, Any]):
    method reconstruct_increase (line 45) | def reconstruct_increase(self, state_dict_lora: Dict[str, Any], lora_c...
    method _compute (line 72) | def _compute(self, lora_A, lora_B, config=LoRAConfig()):
    method load_state_dict_increase (line 82) | def load_state_dict_increase(self, model: nn.Module, state_dict_increa...
    method filter_state_dict_lora (line 90) | def filter_state_dict_lora(state_dict: Dict[str, Any], keep_non_lora=F...
    method extract_lora_config (line 107) | def extract_lora_config(model: nn.Module) -> Dict[str, LoRAConfig]:

FILE: applications/ColossalChat/coati/ray/utils.py
  function is_rank_0 (line 16) | def is_rank_0() -> bool:
  function get_rank (line 20) | def get_rank() -> int:
  function get_world_size (line 24) | def get_world_size() -> int:
  function get_actor_from_args (line 28) | def get_actor_from_args(model: str, pretrained: str = None, config=None,...
  function get_critic_from_args (line 42) | def get_critic_from_args(model: str, pretrained: str = None, config=None...
  function get_reward_model_from_args (line 56) | def get_reward_model_from_args(model: str, pretrained: str = None, confi...
  function get_strategy_from_args (line 70) | def get_strategy_from_args(strategy: str):
  function get_tokenizer_from_args (line 88) | def get_tokenizer_from_args(model: str, **kwargs):
  function set_dist_env (line 105) | def set_dist_env(env_info: Dict[str, str]):
  function get_model_numel (line 113) | def get_model_numel(model: nn.Module) -> int:
  function get_receivers_per_sender (line 118) | def get_receivers_per_sender(sender_idx: int, num_senders: int, num_rece...
  function state_dict_to (line 133) | def state_dict_to(

FILE: applications/ColossalChat/coati/trainer/base.py
  class SLTrainer (line 24) | class SLTrainer(ABC):
    method __init__ (line 35) | def __init__(
    method _train (line 53) | def _train(self, epoch):
    method _eval (line 57) | def _eval(self, epoch):
    method _before_fit (line 61) | def _before_fit(self):
    method fit (line 64) | def fit(self, *args, **kwargs):
  class OLTrainer (line 71) | class OLTrainer(ABC):
    method __init__ (line 83) | def __init__(
    method _fit_ctx (line 102) | def _fit_ctx(self) -> None:
    method _episode_ctx (line 112) | def _episode_ctx(self, episode: int) -> None:
    method _on_make_experience_start (line 121) | def _on_make_experience_start(self) -> None:
    method _on_make_experience_end (line 125) | def _on_make_experience_end(self, experience: Experience) -> None:
    method _on_learn_epoch_start (line 129) | def _on_learn_epoch_start(self, epoch: int) -> None:
    method _on_learn_epoch_end (line 133) | def _on_learn_epoch_end(self, epoch: int) -> None:
    method _on_learn_batch_start (line 137) | def _on_learn_batch_start(self) -> None:
    method _on_learn_batch_end (line 141) | def _on_learn_batch_end(self, experience: Experience) -> None:
    method _make_experience (line 146) | def _make_experience(self, collect_step: int):
    method _learn (line 153) | def _learn(self, update_step: int):
    method _setup_update_phrase_dataload (line 161) | def _setup_update_phrase_dataload(self):
    method _save_checkpoint (line 168) | def _save_checkpoint(self, episode: int = 0):
    method _collect_phase (line 174) | def _collect_phase(self, collect_step: int):
    method _update_phase (line 180) | def _update_phase(self, update_step: int):
    method _before_fit (line 185) | def _before_fit(self, *args, **kwargs):
    method fit (line 188) | def fit(

FILE: applications/ColossalChat/coati/trainer/callbacks/base.py
  class Callback (line 6) | class Callback(ABC):
    method on_fit_start (line 11) | def on_fit_start(self) -> None:
    method on_fit_end (line 14) | def on_fit_end(self) -> None:
    method on_episode_start (line 17) | def on_episode_start(self, episode: int) -> None:
    method on_episode_end (line 20) | def on_episode_end(self, episode: int) -> None:
    method on_make_experience_start (line 23) | def on_make_experience_start(self) -> None:
    method on_make_experience_end (line 26) | def on_make_experience_end(self, experience: Experience) -> None:
    method on_learn_epoch_start (line 29) | def on_learn_epoch_start(self, epoch: int) -> None:
    method on_learn_epoch_end (line 32) | def on_learn_epoch_end(self, epoch: int) -> None:
    method on_learn_batch_start (line 35) | def on_learn_batch_start(self) -> None:
    method on_learn_batch_end (line 38) | def on_learn_batch_end(self, experience: Experience) -> None:

FILE: applications/ColossalChat/coati/trainer/callbacks/performance_evaluator.py
  function get_world_size (line 11) | def get_world_size() -> int:
  function save_eval_result_rank_0 (line 17) | def save_eval_result_rank_0(s: str, save_path: str, **kwargs) -> None:
  function divide (line 24) | def divide(x: float, y: float) -> float:
  function all_reduce_mean (line 33) | def all_reduce_mean(x: float, world_size: int) -> float:
  class Timer (line 42) | class Timer:
    method __init__ (line 43) | def __init__(self) -> None:
    method start (line 47) | def start(self) -> None:
    method end (line 50) | def end(self) -> None:
    method reset (line 55) | def reset(self) -> None:
  class PerformanceEvaluator (line 59) | class PerformanceEvaluator(Callback):
    method __init__ (line 71) | def __init__(
    method on_episode_start (line 102) | def on_episode_start(self, episode: int) -> None:
    method on_episode_end (line 108) | def on_episode_end(self, episode: int) -> None:
    method on_make_experience_start (line 113) | def on_make_experience_start(self) -> None:
    method on_make_experience_end (line 118) | def on_make_experience_end(self, experience: Experience) -> None:
    method on_learn_batch_start (line 141) | def on_learn_batch_start(self) -> None:
    method on_learn_batch_end (line 146) | def on_learn_batch_end(self, experience: Experience) -> None:
    method on_fit_end (line 160) | def on_fit_end(self) -> None:

FILE: applications/ColossalChat/coati/trainer/dpo.py
  class DPOTrainer (line 29) | class DPOTrainer(SLTrainer):
    method __init__ (line 49) | def __init__(
    method _before_fit (line 86) | def _before_fit(
    method _train (line 123) | def _train(self, epoch: int):
    method _eval (line 406) | def _eval(self, epoch: int):

FILE: applications/ColossalChat/coati/trainer/grpo.py
  function _set_default_generate_kwargs (line 33) | def _set_default_generate_kwargs(actor: PreTrainedModel) -> Dict:
  class GRPOTrainer (line 53) | class GRPOTrainer(OLTrainer):
    method __init__ (line 78) | def __init__(
    method _before_fit (line 164) | def _before_fit(
    method _setup_update_phrase_dataload (line 195) | def _setup_update_phrase_dataload(self):
    method _make_experience (line 210) | def _make_experience(self, collect_step: int) -> Experience:
    method _training_step (line 228) | def _training_step(self, experience: Experience):
    method _learn (line 331) | def _learn(self, update_step: int):
    method _save_checkpoint (line 361) | def _save_checkpoint(self, num_train_step: int = 0):

FILE: applications/ColossalChat/coati/trainer/kto.py
  class KTOTrainer (line 28) | class KTOTrainer(SLTrainer):
    method __init__ (line 50) | def __init__(
    method _before_fit (line 89) | def _before_fit(
    method _train (line 119) | def _train(self, epoch: int):
    method _eval (line 265) | def _eval(self, epoch: int):

FILE: applications/ColossalChat/coati/trainer/orpo.py
  class ORPOTrainer (line 27) | class ORPOTrainer(SLTrainer):
    method __init__ (line 46) | def __init__(
    method _before_fit (line 79) | def _before_fit(
    method _train (line 109) | def _train(self, epoch: int):
    method _eval (line 240) | def _eval(self, epoch: int):

FILE: applications/ColossalChat/coati/trainer/ppo.py
  function _set_default_generate_kwargs (line 33) | def _set_default_generate_kwargs(actor: PreTrainedModel) -> Dict:
  class PPOTrainer (line 54) | class PPOTrainer(OLTrainer):
    method __init__ (line 81) | def __init__(
    method _before_fit (line 155) | def _before_fit(
    method _setup_update_phrase_dataload (line 186) | def _setup_update_phrase_dataload(self):
    method _make_experience (line 201) | def _make_experience(self, collect_step: int) -> Experience:
    method _training_step (line 217) | def _training_step(self, experience: Experience):
    method _learn (line 340) | def _learn(self, update_step: int):
    method _save_checkpoint (line 371) | def _save_checkpoint(self, episode: int = 0):

FILE: applications/ColossalChat/coati/trainer/rm.py
  class RewardModelTrainer (line 26) | class RewardModelTrainer(SLTrainer):
    method __init__ (line 46) | def __init__(
    method _before_fit (line 77) | def _before_fit(
    method _train (line 107) | def _train(self, epoch):
    method _eval (line 199) | def _eval(self, epoch):

FILE: applications/ColossalChat/coati/trainer/sft.py
  class SFTTrainer (line 25) | class SFTTrainer(SLTrainer):
    method __init__ (line 38) | def __init__(
    method _before_fit (line 65) | def _before_fit(
    method _train (line 98) | def _train(self, epoch: int):
    method _eval (line 181) | def _eval(self, epoch: int):

FILE: applications/ColossalChat/coati/trainer/utils.py
  class AnnealingScheduler (line 15) | class AnnealingScheduler:
    method __init__ (line 16) | def __init__(self, start, end, warmup_steps=100, annealing_step=2000):
    method get_temperature (line 23) | def get_temperature(self):
    method step_forward (line 32) | def step_forward(self):
  class CycledDataLoader (line 36) | class CycledDataLoader:
    method __init__ (line 52) | def __init__(
    method next (line 61) | def next(self):
  function is_rank_0 (line 81) | def is_rank_0() -> bool:
  function to_device (line 91) | def to_device(x: Any, device: torch.device) -> Any:
  function all_reduce_mean (line 111) | def all_reduce_mean(tensor: torch.Tensor, plugin: Plugin = None) -> torc...
  function all_reduce_sum (line 131) | def all_reduce_sum(tensor: torch.Tensor, plugin: Plugin = None) -> torch...
  function all_gather_tensors (line 149) | def all_gather_tensors(local_tensor_list: torch.Tensor, plugin: Plugin =...

FILE: applications/ColossalChat/coati/utils/accumulative_meter.py
  class AccumulativeMeanVariable (line 6) | class AccumulativeMeanVariable:
    method __init__ (line 11) | def __init__(self):
    method add (line 15) | def add(self, value, count_update=1):
    method get (line 26) | def get(self):
    method reset (line 35) | def reset(self):
  class AccumulativeMeanMeter (line 43) | class AccumulativeMeanMeter:
    method __init__ (line 56) | def __init__(self):
    method add (line 59) | def add(self, name, value, count_update=1):
    method get (line 64) | def get(self, name):
    method reset (line 67) | def reset(self):

FILE: applications/ColossalChat/coati/utils/ckpt_io.py
  function load_json (line 20) | def load_json(file_path: Union[str, os.PathLike]) -> Dict[str, Any]:
  function save_json (line 28) | def save_json(data: Dict[str, Any], file_path: Union[str, os.PathLike]) ...
  function save_checkpoint (line 36) | def save_checkpoint(
  function load_checkpoint (line 72) | def load_checkpoint(

FILE: applications/ColossalChat/coati/utils/reward_score/competition.py
  function math_competition_reward_fn (line 6) | def math_competition_reward_fn(input_ids, attention_mask, **kwargs):

FILE: applications/ColossalChat/coati/utils/reward_score/gsm8k.py
  function gsm8k_reward_fn (line 6) | def gsm8k_reward_fn(input_ids, attention_mask, **kwargs):

FILE: applications/ColossalChat/coati/utils/reward_score/utils.py
  function validate_response_structure (line 20) | def validate_response_structure(processed_str: str, tags: Dict = None) -...
  function extract_solution (line 58) | def extract_solution(solution_str: str) -> Tuple[Optional[str], str]:

FILE: applications/ColossalChat/examples/community/peft/easy_dataset.py
  function _tokenize_fn (line 13) | def _tokenize_fn(strings: Sequence[str], tokenizer: AutoTokenizer, max_l...
  function preprocess (line 37) | def preprocess(sources: Sequence[str], targets: Sequence[str], tokenizer...
  class EasySupervisedDataset (line 50) | class EasySupervisedDataset(Dataset):
    method __init__ (line 51) | def __init__(self, data_file: str, tokenizer: AutoTokenizer, max_lengt...
    method __len__ (line 71) | def __len__(self):
    method __getitem__ (line 74) | def __getitem__(self, i) -> Dict[str, torch.Tensor]:
    method __repr__ (line 77) | def __repr__(self):
    method __str__ (line 80) | def __str__(self):
  class EasyPromptsDataset (line 84) | class EasyPromptsDataset(Dataset):
    method __init__ (line 85) | def __init__(self, data_file: str, tokenizer: AutoTokenizer, max_lengt...
    method __len__ (line 100) | def __len__(self):
    method __getitem__ (line 103) | def __getitem__(self, idx):
    method __repr__ (line 106) | def __repr__(self):
    method __str__ (line 109) | def __str__(self):
  class EasyRewardDataset (line 113) | class EasyRewardDataset(Dataset):
    method __init__ (line 114) | def __init__(self, train_file: str, tokenizer: AutoTokenizer, special_...
    method __len__ (line 146) | def __len__(self):
    method __getitem__ (line 150) | def __getitem__(self, idx):
    method __repr__ (line 159) | def __repr__(self):
    method __str__ (line 162) | def __str__(self):
  class EasySFTDataset (line 172) | class EasySFTDataset(Dataset):
    method __init__ (line 173) | def __init__(self, data_file: str, tokenizer: AutoTokenizer, max_lengt...
    method __len__ (line 227) | def __len__(self):
    method __getitem__ (line 231) | def __getitem__(self, idx):
    method __repr__ (line 235) | def __repr__(self):
    method __str__ (line 239) | def __str__(self):

FILE: applications/ColossalChat/examples/community/peft/easy_models.py
  class Actor (line 13) | class Actor(Module):
    method __init__ (line 21) | def __init__(self, model: nn.Module) -> None:
    method generate (line 26) | def generate(
    method forward (line 48) | def forward(
    method get_base_model (line 57) | def get_base_model(self):
  class BLOOMActor (line 61) | class BLOOMActor(Actor):
    method __init__ (line 73) | def __init__(
    method print_trainable_parameters (line 92) | def print_trainable_parameters(self):

FILE: applications/ColossalChat/examples/community/peft/train_peft_prompts.py
  function main (line 22) | def main(args):

FILE: applications/ColossalChat/examples/community/peft/train_peft_sft.py
  function train (line 22) | def train(args):

FILE: applications/ColossalChat/examples/community/ray/ray_job_script.py
  function main (line 6) | def main(api_server_endpoint="http://127.0.0.1:8265"):

FILE: applications/ColossalChat/examples/community/ray/train_prompts_on_ray.py
  class ExperienceCompositionRefs (line 28) | class ExperienceCompositionRefs:
    method __init__ (line 29) | def __init__(
  class ExperienceMaker (line 44) | class ExperienceMaker:
    method __init__ (line 45) | def __init__(self, kl_coef) -> None:
    method make_experience (line 49) | def make_experience(self, experiment_computation_refs: ExperienceCompo...
  class DistributedTorchRayActor (line 65) | class DistributedTorchRayActor:
    method __init__ (line 66) | def __init__(self, world_size, rank, local_rank, master_addr, master_p...
    method _get_current_node_ip (line 83) | def _get_current_node_ip():
    method _get_free_port (line 87) | def _get_free_port():
    method get_master_addr_port (line 92) | def get_master_addr_port(self):
  class BasePPORole (line 96) | class BasePPORole(DistributedTorchRayActor):
    method add_experience_maker (line 97) | def add_experience_maker(self, kl_coef: float = 0.1):
    method make_experience (line 100) | def make_experience(self, experience_computation_ref: ExperienceCompos...
    method _init_strategy (line 103) | def _init_strategy(self, strategy: str):
    method _init_optimizer (line 114) | def _init_optimizer(self):
    method _prepare_model_with_strategy (line 120) | def _prepare_model_with_strategy(self, has_optimizer: bool):
    method _load_model_from_pretrained (line 127) | def _load_model_from_pretrained(self, model_class: Type[LoRAModule], p...
    method init_model_from_pretrained (line 130) | def init_model_from_pretrained(
    method eval (line 137) | def eval(self):
  class TrainablePPORole (line 141) | class TrainablePPORole(BasePPORole):
    method _load_model_from_pretrained (line 142) | def _load_model_from_pretrained(self, model_class, pretrain):
    method _train (line 146) | def _train(self):
    method _training_step (line 149) | def _training_step(self, experience: Experience):
    method learn_on_experiences (line 152) | def learn_on_experiences(self, experience_refs):
  class RayPPOActor (line 163) | class RayPPOActor(TrainablePPORole):
    method set_loss_function (line 164) | def set_loss_function(self, eps_clip: float):
    method load_tokenizer_from_pretrained (line 167) | def load_tokenizer_from_pretrained(self, model_type: str, pretrained):
    method setup_generate_kwargs (line 186) | def setup_generate_kwargs(self, generate_kwargs: dict):
    method load_csv_prompt_file_from_url_to_sampler (line 193) | def load_csv_prompt_file_from_url_to_sampler(self, prompt_url):
    method _generate (line 199) | def _generate(self, input_ids, **generate_kwargs):
    method sample_prompts_and_make_sequence (line 202) | def sample_prompts_and_make_sequence(self, experience_batch_size):
    method calculate_action_log_probs (line 211) | def calculate_action_log_probs(self, sequence_attention_action_mask):
    method _training_step (line 215) | def _training_step(self, experience):
    method save_checkpoint (line 226) | def save_checkpoint(self, save_path, should_save_optimizer: bool):
    method generate_answer (line 238) | def generate_answer(self, prompt, max_length=30, num_return_sequences=5):
  class RayPPOCritic (line 250) | class RayPPOCritic(TrainablePPORole):
    method set_loss_function (line 251) | def set_loss_function(self, value_clip: float):
    method _training_step (line 254) | def _training_step(self, experience):
    method calculate_value (line 267) | def calculate_value(self, sequence_attention_action_mask):
  class RayPPORewardModel (line 273) | class RayPPORewardModel(BasePPORole):
    method _load_model_from_pretrained (line 274) | def _load_model_from_pretrained(self, model_class, pretrain):
    method calculate_r (line 282) | def calculate_r(self, sequence_attention_action_mask):
  class RayPPOInitialModel (line 288) | class RayPPOInitialModel(BasePPORole):
    method _load_model_from_pretrained (line 289) | def _load_model_from_pretrained(self, model_class, pretrain):
    method calculate_base_action_log_probs (line 294) | def calculate_base_action_log_probs(self, sequence_attention_action_ma...
  class PPORayActorGroup (line 299) | class PPORayActorGroup:
    method __init__ (line 305) | def __init__(self, num_nodes, num_gpus_per_node, ray_actor_type: Type[...
    method _initiate_actors (line 311) | def _initiate_actors(self):
    method async_init_model_from_pretrained (line 344) | def async_init_model_from_pretrained(
  class TrainableModelRayActorGroup (line 353) | class TrainableModelRayActorGroup(PPORayActorGroup):
    method async_learn_on_experiences (line 354) | def async_learn_on_experiences(self, experience_refs):
  class PPOActorRayActorGroup (line 363) | class PPOActorRayActorGroup(TrainableModelRayActorGroup):
    method __init__ (line 364) | def __init__(self, num_nodes, num_gpus_per_node) -> None:
    method async_prepare_for_sequence_generation (line 367) | def async_prepare_for_sequence_generation(self, model: str, pretrain: ...
    method load_csv_prompt_file_from_url_to_sampler (line 374) | def load_csv_prompt_file_from_url_to_sampler(self, csv_url):
    method async_sample_prompts_and_make_sequence (line 377) | def async_sample_prompts_and_make_sequence(self, experience_batch_size):
    method async_calculate_action_log_probs (line 380) | def async_calculate_action_log_probs(self, sequences_attention_mask_ac...
    method set_loss_function (line 390) | def set_loss_function(self, eps_clip: float = 0.2):
    method save_checkpoint (line 393) | def save_checkpoint(self, save_path, should_save_optimizer):
  class PPOCriticRayActorGroup (line 397) | class PPOCriticRayActorGroup(TrainableModelRayActorGroup):
    method __init__ (line 398) | def __init__(self, num_nodes, num_gpus_per_node) -> None:
    method async_calculate_value (line 401) | def async_calculate_value(self, sequences_attention_mask_action_mask_r...
    method set_loss_function (line 411) | def set_loss_function(self, value_clip: float = 0.4):
  class PPOInitialRayActorGroup (line 415) | class PPOInitialRayActorGroup(PPORayActorGroup):
    method __init__ (line 416) | def __init__(self, num_nodes, num_gpus_per_node) -> None:
    method async_calculate_base_action_log_probs (line 419) | def async_calculate_base_action_log_probs(self, sequences_attention_ma...
  class PPORewardRayActorGroup (line 430) | class PPORewardRayActorGroup(PPORayActorGroup):
    method __init__ (line 431) | def __init__(self, num_nodes, num_gpus_per_node) -> None:
    method async_calculate_r (line 434) | def async_calculate_r(self, sequences_attention_mask_action_mask_refs):
  function main (line 445) | def main(args):

FILE: applications/ColossalChat/examples/data_preparation_scripts/prepare_dataset.py
  function main (line 52) | def main():

FILE: applications/ColossalChat/examples/inference/chatio.py
  class ChatIO (line 17) | class ChatIO(abc.ABC):
    method prompt_for_input (line 19) | def prompt_for_input(self, role: str) -> str:
    method prompt_for_output (line 23) | def prompt_for_output(self, role: str):
    method stream_output (line 27) | def stream_output(self, output_stream):
  class SimpleChatIO (line 31) | class SimpleChatIO(ChatIO):
    method prompt_for_input (line 32) | def prompt_for_input(self, role) -> str:
    method prompt_for_output (line 35) | def prompt_for_output(self, role: str):
    method stream_output (line 38) | def stream_output(self, output_stream):
  class RichChatIO (line 51) | class RichChatIO(ChatIO):
    method __init__ (line 52) | def __init__(self):
    method prompt_for_input (line 57) | def prompt_for_input(self, role) -> str:
    method prompt_for_output (line 68) | def prompt_for_output(self, role: str) -> str:
    method stream_output (line 71) | def stream_output(self, output_stream):
  class DummyChatIO (line 107) | class DummyChatIO(ChatIO):
    method __init__ (line 112) | def __init__(self):
    method prompt_for_input (line 116) | def prompt_for_input(self, role) -> str:
    method prompt_for_output (line 127) | def prompt_for_output(self, role: str) -> str:
    method stream_output (line 130) | def stream_output(self, output_stream):

FILE: applications/ColossalChat/examples/inference/inference.py
  function get_gpu_memory (line 17) | def get_gpu_memory(max_gpus=None):
  function load_model_and_tokenizer (line 42) | def load_model_and_tokenizer(model_path, tokenizer_path, device="cuda", ...
  function _set_default_generate_kwargs (line 64) | def _set_default_generate_kwargs(model: PreTrainedModel) -> Dict:
  function generation_wrapper (line 85) | def generation_wrapper(*args, **kwargs):
  function main (line 92) | def main(args):

FILE: applications/ColossalChat/examples/inference/web_chatbot/locustfile.py
  class GenerationUser (line 17) | class GenerationUser(HttpUser):
    method generate (line 19) | def generate(self):

FILE: applications/ColossalChat/examples/inference/web_chatbot/server.py
  class GenerationTaskReq (line 24) | class GenerationTaskReq(BaseModel):
  function generate_streamingly (line 57) | def generate_streamingly(prompt, max_length, max_new_tokens, top_k, top_...
  function event_generator (line 92) | async def event_generator(request: Request, generator: Generator):
  function generate (line 105) | def generate(data: GenerationTaskReq, request: Request):
  function generate_no_stream (line 116) | def generate_no_stream(data: GenerationTaskReq, request: Request):

FILE: applications/ColossalChat/examples/inference/web_chatbot/utils.py
  function update_model_kwargs_fn (line 12) | def update_model_kwargs_fn(outputs: dict, **model_kwargs) -> dict:
  class Dialogue (line 33) | class Dialogue(BaseModel):
  class ChatPromptProcessor (line 38) | class ChatPromptProcessor:
    method __init__ (line 41) | def __init__(self, censored_words: List[str] = []):
    method preprocess_prompt (line 45) | def preprocess_prompt(self, history: List[Dialogue]) -> str:
    method postprocess_output (line 53) | def postprocess_output(self, output: str) -> str:
    method has_censored_words (line 56) | def has_censored_words(self, text: str) -> bool:
  class LockedIterator (line 63) | class LockedIterator:
    method __init__ (line 64) | def __init__(self, it, lock: Lock) -> None:
    method __iter__ (line 68) | def __iter__(self):
    method __next__ (line 71) | def __next__(self):
  function load_json (line 76) | def load_json(path: str):

FILE: applications/ColossalChat/examples/training_scripts/lora_finetune.py
  function all_reduce_mean (line 39) | def all_reduce_mean(loss: torch.Tensor, plugin: Plugin) -> torch.Tensor:
  function train (line 46) | def train(args) -> None:

FILE: applications/ColossalChat/examples/training_scripts/train_dpo.py
  function train (line 25) | def train(args):

FILE: applications/ColossalChat/examples/training_scripts/train_grpo.py
  function train (line 41) | def train(args):

FILE: applications/ColossalChat/examples/training_scripts/train_kto.py
  function train (line 25) | def train(args):

FILE: applications/ColossalChat/examples/training_scripts/train_orpo.py
  function train (line 25) | def train(args):

FILE: applications/ColossalChat/examples/training_scripts/train_ppo.py
  function train (line 50) | def train(args):

FILE: applications/ColossalChat/examples/training_scripts/train_rm.py
  function train (line 27) | def train(args):

FILE: applications/ColossalChat/examples/training_scripts/train_sft.py
  function train (line 26) | def train(args):

FILE: applications/ColossalChat/setup.py
  function fetch_requirements (line 4) | def fetch_requirements(path):
  function fetch_readme (line 9) | def fetch_readme():
  function fetch_version (line 14) | def fetch_version():

FILE: applications/ColossalChat/start_code_verifier.py
  class CheckCorrectnessRequest (line 10) | class CheckCorrectnessRequest(BaseModel):
  class CheckCorrectnessResponse (line 18) | class CheckCorrectnessResponse(BaseModel):
  function check_correctness_api (line 24) | def check_correctness_api(request: CheckCorrectnessRequest):

FILE: applications/ColossalChat/tests/test_lora.py
  class SimpleNN (line 9) | class SimpleNN(nn.Module):
    method __init__ (line 10) | def __init__(self, input_size, hidden_size, num_classes):
    method forward (line 16) | def forward(self, x):
  function test_overfit (line 23) | def test_overfit():
  function test_lora_linear_accuracy (line 68) | def test_lora_linear_accuracy():
  function test_lora_embedding_accuracy (line 89) | def test_lora_embedding_accuracy():

FILE: applications/ColossalEval/colossal_eval/dataset/agieval.py
  function get_prompt (line 55) | def get_prompt(line: Dict, dataset_name: str, logger: DistributedLogger)...
  function combine_prompt (line 103) | def combine_prompt(prompt_path, dataset_name, load_explanation=True, cha...
  class AGIEvalDataset (line 180) | class AGIEvalDataset(BaseDataset):
    method load (line 200) | def load(path: str, logger: DistributedLogger, few_shot: bool, *args, ...

FILE: applications/ColossalEval/colossal_eval/dataset/base.py
  class BaseDataset (line 9) | class BaseDataset:
    method __init__ (line 18) | def __init__(self, path, logger, *args, **kwargs):
    method save (line 21) | def save(self, save_path):
    method load (line 26) | def load(path, logger: DistributedLogger, *args, **kwargs):
  class DistributedDataset (line 30) | class DistributedDataset(Dataset):
    method __init__ (line 31) | def __init__(self, data):
    method __len__ (line 34) | def __len__(self):
    method __getitem__ (line 37) | def __getitem__(self, idx):

FILE: applications/ColossalEval/colossal_eval/dataset/ceval.py
  function get_few_shot_data (line 78) | def get_few_shot_data(data: List[Dict], subject):
  class CEvalDataset (line 85) | class CEvalDataset(BaseDataset):
    method load (line 93) | def load(path: str, logger: DistributedLogger, few_shot: bool, *args, ...

FILE: applications/ColossalEval/colossal_eval/dataset/cmmlu.py
  function get_few_shot_data (line 89) | def get_few_shot_data(data: List[Dict], subject):
  class CMMLUDataset (line 96) | class CMMLUDataset(BaseDataset):
    method load (line 104) | def load(path: str, logger: DistributedLogger, few_shot: bool, *args, ...

FILE: applications/ColossalEval/colossal_eval/dataset/colossalai.py
  function get_data_per_category (line 24) | def get_data_per_category(data):
  class ColossalDataset (line 33) | class ColossalDataset(BaseDataset):
    method load (line 40) | def load(path: str, logger: DistributedLogger, *args, **kwargs) -> Lis...

FILE: applications/ColossalEval/colossal_eval/dataset/cvalues.py
  class CValuesDataset (line 23) | class CValuesDataset(BaseDataset):
    method load (line 31) | def load(path: str, logger: DistributedLogger, *args, **kwargs) -> Lis...

FILE: applications/ColossalEval/colossal_eval/dataset/gaokaobench.py
  function get_all_classes (line 44) | def get_all_classes(instruction: str):
  class GaoKaoBenchDataset (line 58) | class GaoKaoBenchDataset(BaseDataset):
    method load (line 72) | def load(path: str, logger: DistributedLogger, *args, **kwargs) -> Lis...

FILE: applications/ColossalEval/colossal_eval/dataset/gsm.py
  function get_few_shot_data (line 80) | def get_few_shot_data():
  class GSMDataset (line 88) | class GSMDataset(BaseDataset):
    method load (line 96) | def load(

FILE: applications/ColossalEval/colossal_eval/dataset/longbench.py
  class LongBenchDataset (line 68) | class LongBenchDataset(BaseDataset):
    method load (line 80) | def load(path: str, logger: DistributedLogger, *args, **kwargs) -> Lis...

FILE: applications/ColossalEval/colossal_eval/dataset/mmlu.py
  function get_few_shot_data (line 19) | def get_few_shot_data(data: List[Dict], subject):
  class MMLUDataset (line 26) | class MMLUDataset(BaseDataset):
    method load (line 34) | def load(path: str, logger: DistributedLogger, few_shot: bool, *args, ...

FILE: applications/ColossalEval/colossal_eval/dataset/mtbench.py
  class MTBenchDataset (line 23) | class MTBenchDataset(BaseDataset):
    method __init__ (line 30) | def __init__(self, path, logger: DistributedLogger, *args, **kwargs):
    method load (line 35) | def load(path: str, logger: DistributedLogger, *args, **kwargs) -> Lis...

FILE: applications/ColossalEval/colossal_eval/dataset/safetybench_en.py
  function get_query_str (line 36) | def get_query_str(question, options, choices_templates=CHOICE_TEMP, pad=...
  function process_test (line 55) | def process_test(sample_list, pad_choices=False):
  function process_dev (line 83) | def process_dev(sample_dict, pad_choices=False):
  function get_few_shot_data (line 107) | def get_few_shot_data(data: List[Dict]):
  function add_few_shot_to_test (line 114) | def add_few_shot_to_test(dataset):
  class SafetyBenchENDataset (line 125) | class SafetyBenchENDataset(BaseDataset):
    method load (line 133) | def load(path: str, logger: DistributedLogger, few_shot: bool, *args, ...

FILE: applications/ColossalEval/colossal_eval/dataset/safetybench_zh.py
  function get_query_str (line 36) | def get_query_str(question, options, choices_templates=CHOICE_TEMP, pad=...
  function process_test (line 55) | def process_test(sample_list, pad_choices=False):
  function process_dev (line 83) | def process_dev(sample_dict, pad_choices=False):
  function get_few_shot_data (line 107) | def get_few_shot_data(data: List[Dict]):
  function add_few_shot_to_test (line 114) | def add_few_shot_to_test(dataset):
  class SafetyBenchZHDataset (line 125) | class SafetyBenchZHDataset(BaseDataset):
    method load (line 133) | def load(path: str, logger: DistributedLogger, few_shot: bool, *args, ...

FILE: applications/ColossalEval/colossal_eval/evaluate/dataset_evaluator/dataset_evaluator.py
  class DatasetEvaluator (line 39) | class DatasetEvaluator(object):
    method __init__ (line 45) | def __init__(self, config_path: str, save_path: str):
    method _calculate_label_metrics (line 49) | def _calculate_label_metrics(self, metric: str, category: str):
    method _calculate_combined_metrics (line 93) | def _calculate_combined_metrics(self, metric: str, category: str):
    method _calculate_other_metrics (line 148) | def _calculate_other_metrics(self, metric: str, category: str):
    method _calculate_gpt_metrics (line 174) | def _calculate_gpt_metrics(self, metric: str, category: str):
    method _calculate_loss_metrics (line 192) | def _calculate_loss_metrics(self, metric: str, category: str):
    method _evaluate (line 245) | def _evaluate(self):
    method get_evaluation_results (line 282) | def get_evaluation_results(

FILE: applications/ColossalEval/colossal_eval/evaluate/dataset_evaluator/gpt_judge.py
  function load_mt_prompts (line 28) | def load_mt_prompts(prompt_file: str):
  function get_mt_prompt (line 37) | def get_mt_prompt(prompts: Dict[str, str], multiturn: bool, math: bool):
  function chat_compeletion_openai (line 48) | def chat_compeletion_openai(messages: List[Dict], temperature: float = 0...
  function get_mtbench_judgements (line 69) | def get_mtbench_judgements(question: Dict[str, Any], prompts: Dict[str, ...
  function mtbench_single_judge (line 119) | def mtbench_single_judge(data: List[Dict], config_path: str):

FILE: applications/ColossalEval/colossal_eval/evaluate/dataset_evaluator/metrics.py
  function _fix_fracs (line 205) | def _fix_fracs(string):
  function _fix_a_slash_b (line 237) | def _fix_a_slash_b(string):
  function _remove_right_units (line 252) | def _remove_right_units(string):
  function _fix_sqrt (line 262) | def _fix_sqrt(string):
  function _strip_string (line 277) | def _strip_string(string):
  function parse_math_answer (line 347) | def parse_math_answer(raw_string):
  function math_equivalence (line 418) | def math_equivalence(prediction, reference, **kwargs):
  function multi_choice_accuracy (line 436) | def multi_choice_accuracy(prediction, reference, **kwargs):
  function accuracy_by_options (line 460) | def accuracy_by_options(question, prediction, reference):
  function combined_single_choice_accuracy (line 474) | def combined_single_choice_accuracy(prediction, reference, **kwargs):
  function single_choice_accuracy (line 478) | def single_choice_accuracy(prediction, reference, **kwargs):
  function normalize_answer (line 500) | def normalize_answer(s):
  function normalize_zh_answer (line 519) | def normalize_zh_answer(s):
  function count_score (line 536) | def count_score(prediction, reference, **kwargs):
  function retrieval_score (line 546) | def retrieval_score(prediction, reference, **kwargs):
  function retrieval_zh_score (line 559) | def retrieval_zh_score(prediction, reference, **kwargs):
  function code_sim_score (line 572) | def code_sim_score(prediction, reference, **kwargs):
  function classification_score (line 582) | def classification_score(prediction, reference, **kwargs):
  function rouge_score (line 608) | def rouge_score(prediction, reference, **kwargs):
  function rouge_zh_score (line 617) | def rouge_zh_score(prediction, reference, **kwargs):
  function _f1_score (line 624) | def _f1_score(prediction, reference, **kwargs):
  function f1_score (line 635) | def f1_score(prediction, reference, **kwargs):
  function f1_zh_score (line 644) | def f1_zh_score(prediction, reference, **kwargs):
  function extract_answer_hf (line 654) | def extract_answer_hf(completion):
  function get_match_str (line 664) | def get_match_str(match, idx):
  function extract_answer (line 676) | def extract_answer(completion):
  function is_correct (line 697) | def is_correct(completion, answer):
  function gsm_accuracy (line 704) | def gsm_accuracy(prediction, reference, **kwargs):

FILE: applications/ColossalEval/colossal_eval/evaluate/evaluator.py
  class Evaluator (line 9) | class Evaluator(object):
    method __init__ (line 15) | def __init__(
    method battle (line 33) | def battle(self, answers1: List[Dict], answers2: List[Dict]) -> None:
    method evaluate (line 40) | def evaluate(self, answers: List[Dict], targets: List[Dict], save_path...
    method save (line 81) | def save(self, path: str, model_name_list: List[str]) -> None:

FILE: applications/ColossalEval/colossal_eval/evaluate/gpt_evaluate.py
  function get_battle_result (line 32) | def get_battle_result(sys_prompt: str, user_prompt: str, id: int, max_to...
  function parse_battle_score (line 70) | def parse_battle_score(evaluation: str) -> List[float]:
  function battle (line 108) | def battle(answer1: List[Dict], answer2: List[Dict], prompt_dict: Dict[s...
  function save_battle_results (line 164) | def save_battle_results(evaluations: List[Dict], name1: str, name2: str,...
  function reference_template (line 248) | def reference_template(metric: str, language: str, reference: Dict[str, ...
  function fill_in_message (line 289) | def fill_in_message(role: str, content: str) -> Dict[str, str]:
  function multiturn_chat_completion (line 304) | def multiturn_chat_completion(user_messages: List[str], model: str, max_...
  function get_gpt_evaluation_without_logprobs (line 355) | def get_gpt_evaluation_without_logprobs(
  function get_gpt_evaluation_with_logprobs (line 432) | def get_gpt_evaluation_with_logprobs(
  function evaluate (line 496) | def evaluate(
  function calculate_scores_form_logprobs (line 634) | def calculate_scores_form_logprobs(logprobs: Dict[str, Any]) -> float:
  function calculate_scores_form_response (line 670) | def calculate_scores_form_response(response: str, evaluation: Dict[str, ...
  function save_gpt_evaluation_results (line 694) | def save_gpt_evaluation_results(
  function save_gpt_evaluation_statistics (line 716) | def save_gpt_evaluation_statistics(model_name: str, evaluations: List[Di...
  function analyze_gpt_evaluation_statistics (line 771) | def analyze_gpt_evaluation_statistics(statistics_path: str, save_path: s...

FILE: applications/ColossalEval/colossal_eval/evaluate/utils.py
  function get_data_per_category (line 1) | def get_data_per_category(data, categories):

FILE: applications/ColossalEval/colossal_eval/models/base.py
  class BaseModel (line 9) | class BaseModel:
    method __init__ (line 21) | def __init__(
    method inference (line 41) | def inference(self, data: List[Dict]) -> None:
    method generate (line 51) | def generate(self, inputs: List[str], max_new_tokens: int) -> List[str]:
    method get_loss (line 64) | def get_loss(self, batch: List[str], batch_target: List[str]) -> List[...
    method to (line 77) | def to(self, device):

FILE: applications/ColossalEval/colossal_eval/models/chatglm.py
  class ChatGLMModel (line 13) | class ChatGLMModel(HuggingFaceModel):
    method _get_truncated_prompts (line 14) | def _get_truncated_prompts(self, inputs: List[str], max_new_tokens: in...
    method get_loss (line 30) | def get_loss(
    method _calculate_loss (line 114) | def _calculate_loss(self, input_ids_list: List[torch.LongTensor], labe...
  class ChatGLM2Model (line 150) | class ChatGLM2Model(ChatGLMModel):
    method _get_truncated_prompts (line 151) | def _get_truncated_prompts(self, inputs: List[str], max_new_tokens: in...
    method generate (line 167) | def generate(self, inputs: List[str], max_new_tokens: int, **kwargs) -...
    method get_loss (line 227) | def get_loss(

FILE: applications/ColossalEval/colossal_eval/models/huggingface.py
  class HuggingFaceModel (line 21) | class HuggingFaceModel(BaseModel):
    method __init__ (line 39) | def __init__(
    method _get_choices_indices (line 63) | def _get_choices_indices(self, language: str):
    method _load_tokenizer (line 84) | def _load_tokenizer(self, path: str, tokenizer_path: Optional[str], to...
    method _load_model (line 115) | def _load_model(
    method _calculate_loss (line 150) | def _calculate_loss(self, input_ids_list: List[torch.LongTensor], labe...
    method _get_truncated_prompts (line 186) | def _get_truncated_prompts(self, inputs: List[str], max_new_tokens: in...
    method _get_input_ids_and_labels_pretrain (line 212) | def _get_input_ids_and_labels_pretrain(self, batch_prompt: List[str]) ...
    method _get_input_ids_and_labels (line 253) | def _get_input_ids_and_labels(
    method inference (line 334) | def inference(self, data_loader: DataLoader, inference_kwargs: Dict[st...
    method generate (line 447) | def generate(self, inputs: List[str], max_new_tokens: int, **kwargs) -...
    method get_loss (line 505) | def get_loss(
  class HuggingFaceCausalLM (line 569) | class HuggingFaceCausalLM(HuggingFaceModel):
    method _load_model (line 587) | def _load_model(

FILE: applications/ColossalEval/colossal_eval/models/vllm.py
  class vLLMModel (line 18) | class vLLMModel(HuggingFaceModel):
    method __init__ (line 43) | def __init__(
    method _load_model (line 90) | def _load_model(
    method _calculate_loss (line 177) | def _calculate_loss(self, inputs: List[str], labels: List[str]) -> Tup...
    method inference (line 217) | def inference(self, data_loader: DataLoader, inference_kwargs: Dict[st...
    method generate (line 330) | def generate(self, inputs: List[str], max_new_tokens: int, **kwargs) -...
    method get_loss (line 366) | def get_loss(
  class GetTokenLogitsProcessor (line 469) | class GetTokenLogitsProcessor:
    method __init__ (line 478) | def __init__(
    method __call__ (line 485) | def __call__(self, input_ids: torch.Tensor, logits: torch.Tensor) -> t...
    method get_target_logits (line 497) | def get_target_logits(self) -> torch.Tensor:

FILE: applications/ColossalEval/colossal_eval/utils/conversation.py
  class SeparatorStyle (line 8) | class SeparatorStyle(Enum):
  class Conversation (line 16) | class Conversation:
    method clear (line 24) | def clear(self):
    method get_prompt (line 27) | def get_prompt(self):
    method get_prompt_with_target (line 63) | def get_prompt_with_target(self, target):
    method save_prompt (line 90) | def save_prompt(self):
    method append_message (line 102) | def append_message(self, role, message):
    method copy (line 105) | def copy(self):
    method dict (line 115) | def dict(self):
  function get_few_shot_prefix (line 126) | def get_few_shot_prefix(few_shot_data: List[str], tokenizer: Optional[Au...
  function get_batch_prompt (line 153) | def get_batch_prompt(

FILE: applications/ColossalEval/colossal_eval/utils/utilities.py
  function is_rank_0 (line 8) | def is_rank_0() -> bool:
  function _make_w_io_base (line 12) | def _make_w_io_base(f, mode: str):
  function _make_r_io_base (line 21) | def _make_r_io_base(f, mode: str):
  function jdump (line 27) | def jdump(obj, f, mode="w", indent=4, default=str):
  function jload (line 49) | def jload(f, mode="r"):
  function get_json_list (line 57) | def get_json_list(file_path):

FILE: applications/ColossalEval/examples/dataset_evaluation/eval_dataset.py
  function main (line 9) | def main(args):

FILE: applications/ColossalEval/examples/dataset_evaluation/inference.py
  function rm_and_merge (line 21) | def rm_and_merge(
  function main (line 87) | def main(args):

FILE: applications/ColossalEval/examples/gpt_evaluation/eval.py
  function main (line 9) | def main(args):

FILE: applications/ColossalEval/examples/gpt_evaluation/inference.py
  function rm_and_merge (line 18) | def rm_and_merge(
  function main (line 83) | def main(args):

FILE: applications/ColossalEval/setup.py
  function fetch_requirements (line 4) | def fetch_requirements(path):
  function fetch_readme (line 9) | def fetch_readme():

FILE: applications/ColossalMoE/infer.py
  function parse_args (line 14) | def parse_args():
  function main (line 54) | def main():

FILE: applications/ColossalMoE/setup.py
  function fetch_requirements (line 4) | def fetch_requirements(path):
  function fetch_readme (line 9) | def fetch_readme():
  function fetch_version (line 14) | def fetch_version():

FILE: applications/ColossalMoE/train.py
  function get_global_loss (line 21) | def get_global_loss(loss, booster):
  class RandomDataset (line 28) | class RandomDataset(Dataset):
    method __init__ (line 29) | def __init__(self, num_samples: int = 1000, max_length: int = 2048, vo...
    method __len__ (line 35) | def __len__(self):
    method __getitem__ (line 38) | def __getitem__(self, idx):
  function parse_args (line 46) | def parse_args():
  function main (line 142) | def main():

FILE: applications/ColossalMoE/utils.py
  function move_to_cuda (line 13) | def move_to_cuda(batch, device):
  function load_json (line 17) | def load_json(file_path: Union[str, os.PathLike]) -> Dict[str, Any]:
  function save_json (line 25) | def save_json(data: Dict[str, Any], file_path: Union[str, os.PathLike]) ...
  function save_checkpoint (line 33) | def save_checkpoint(
  function load_checkpoint (line 63) | def load_checkpoint(

FILE: applications/ColossalQA/colossalqa/chain/memory/summary.py
  class SummarizerMixin (line 24) | class SummarizerMixin(BaseModel):
    method predict_new_summary (line 36) | def predict_new_summary(self, messages: List[BaseMessage], existing_su...
  class ConversationSummaryMemory (line 51) | class ConversationSummaryMemory(BaseChatMemory, SummarizerMixin):
    method from_messages (line 58) | def from_messages(
    method memory_variables (line 71) | def memory_variables(self) -> List[str]:
    method load_memory_variables (line 75) | def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, A...
    method validate_prompt_input_variables (line 84) | def validate_prompt_input_variables(cls, values: Dict) -> Dict:
    method save_context (line 95) | def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]...
    method clear (line 100) | def clear(self) -> None:

FILE: applications/ColossalQA/colossalqa/chain/retrieval_qa/base.py
  class CustomBaseRetrievalQA (line 29) | class CustomBaseRetrievalQA(BaseRetrievalQA):
    method from_llm (line 33) | def from_llm(
    method from_chain_type (line 61) | def from_chain_type(
    method _call (line 74) | def _call(
    method _acall (line 133) | async def _acall(
  class RetrievalQA (line 181) | class RetrievalQA(CustomBaseRetrievalQA):
    method _get_docs (line 198) | def _get_docs(
    method _aget_docs (line 207) | async def _aget_docs(
    method _chain_type (line 217) | def _chain_type(self) -> str:

FILE: applications/ColossalQA/colossalqa/chain/retrieval_qa/load_chain.py
  class LoadingCallable (line 25) | class LoadingCallable(Protocol):
    method __call__ (line 28) | def __call__(self, llm: BaseLanguageModel, **kwargs: Any) -> BaseCombi...
  function _load_stuff_chain (line 32) | def _load_stuff_chain(
  function load_qa_chain (line 65) | def load_qa_chain(

FILE: applications/ColossalQA/colossalqa/chain/retrieval_qa/stuff.py
  class CustomStuffDocumentsChain (line 19) | class CustomStuffDocumentsChain(StuffDocumentsChain):
    method _get_inputs (line 57) | def _get_inputs(self, docs: List[Document], **kwargs: Any) -> dict:

FILE: applications/ColossalQA/colossalqa/data_loader/document_loader.py
  class DocumentLoader (line 23) | class DocumentLoader:
    method __init__ (line 28) | def __init__(self, files: List, **kwargs) -> None:
    method load_data (line 52) | def load_data(self, path: str) -> None:
    method clear (line 130) | def clear(self):

FILE: applications/ColossalQA/colossalqa/data_loader/table_dataloader.py
  class TableLoader (line 18) | class TableLoader:
    method __init__ (line 23) | def __init__(self, files: str, sql_path: str = "sqlite:///mydatabase.d...
    method load_data (line 51) | def load_data(self, path):
    method to_sql (line 99) | def to_sql(self, path, table_name):
    method get_sql_path (line 107) | def get_sql_path(self):
    method __del__ (line 110) | def __del__(self):

FILE: applications/ColossalQA/colossalqa/local/colossalcloud_llm.py
  class ColossalCloudLLM (line 32) | class ColossalCloudLLM(LLM):
    method __init__ (line 43) | def __init__(self, gen_config=None, **kwargs):
    method _identifying_params (line 61) | def _identifying_params(self) -> Mapping[str, Any]:
    method _llm_type (line 66) | def _llm_type(self) -> str:
    method set_auth_config (line 69) | def set_auth_config(self, **kwargs):
    method _call (line 78) | def _call(self, prompt: str, stop=None, **kwargs: Any) -> str:
    method text_completion (line 104) | def text_completion(self, prompt, gen_config, auth_config):

FILE: applications/ColossalQA/colossalqa/local/llm.py
  class ColossalAPI (line 28) | class ColossalAPI:
    method __init__ (line 35) | def __init__(self, model_type: str, model_path: str, ckpt_path: str = ...
    method get_api (line 57) | def get_api(model_type: str, model_path: str, ckpt_path: str = None):
    method generate (line 63) | def generate(self, input: str, **kwargs) -> str:
  class VllmAPI (line 89) | class VllmAPI:
    method __init__ (line 90) | def __init__(self, host: str = "localhost", port: int = 8077) -> None:
    method generate (line 96) | def generate(self, input: str, **kwargs):
  class ColossalLLM (line 101) | class ColossalLLM(LLM):
    method _llm_type (line 111) | def _llm_type(self) -> str:
    method _call (line 114) | def _call(
    method _identifying_params (line 136) | def _identifying_params(self) -> Mapping[str, int]:
    method get_token_ids (line 140) | def get_token_ids(self, text: str) -> List[int]:
  class VllmLLM (line 154) | class VllmLLM(LLM):
    method _llm_type (line 164) | def _llm_type(self) -> str:
    method _call (line 167) | def _call(
    method set_host_port (line 187) | def set_host_port(self, host: str = "localhost", port: int = 8077, **k...
    method _identifying_params (line 194) | def _identifying_params(self) -> Mapping[str, int]:

FILE: applications/ColossalQA/colossalqa/local/pangu_llm.py
  class Pangu (line 31) | class Pangu(LLM):
    method __init__ (line 41) | def __init__(self, gen_config=None, **kwargs):
    method _identifying_params (line 49) | def _identifying_params(self) -> Mapping[str, Any]:
    method _llm_type (line 54) | def _llm_type(self) -> str:
    method _call (line 57) | def _call(self, prompt: str, stop: Optional[List[str]] = None, **kwarg...
    method set_auth_config (line 79) | def set_auth_config(self, **kwargs):
    method get_latest_auth_token (line 92) | def get_latest_auth_token(self, region, username, password, domain_name):
    method text_completion (line 110) | def text_completion(self, text, gen_config, auth_config):
    method chat_model (line 131) | def chat_model(self, messages, gen_config, auth_config):

FILE: applications/ColossalQA/colossalqa/local/utils.py
  function post_http_request (line 11) | def post_http_request(
  function get_response (line 27) | def get_response(response: requests.Response) -> List[str]:

FILE: applications/ColossalQA/colossalqa/memory.py
  class ConversationBufferWithSummary (line 18) | class ConversationBufferWithSummary(ConversationSummaryMemory):
    method buffer (line 39) | def buffer(self) -> Any:
    method buffer_as_str (line 44) | def buffer_as_str(self) -> str:
    method buffer_as_messages (line 50) | def buffer_as_messages(self) -> List[BaseMessage]:
    method clear (line 54) | def clear(self):
    method initiate_document_retrieval_chain (line 59) | def initiate_document_retrieval_chain(
    method memory_variables (line 80) | def memory_variables(self) -> List[str]:
    method format_dialogue (line 84) | def format_dialogue(self, lang: str = "en") -> str:
    method get_conversation_length (line 119) | def get_conversation_length(self):
    method load_memory_variables (line 125) | def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, s...
    method save_context (line 165) | def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]...

FILE: applications/ColossalQA/colossalqa/mylogging.py
  class ColossalQALogger (line 8) | class ColossalQALogger:
    method __init__ (line 20) | def __init__(self, name):
    method get_instance (line 30) | def get_instance(name: str):
    method info (line 45) | def info(self, message: str, verbose: bool = False) -> None:
    method warning (line 56) | def warning(self, message: str, verbose: bool = False) -> None:
    method debug (line 66) | def debug(self, message: str, verbose: bool = False) -> None:
    method error (line 76) | def error(self, message: str) -> None:
  function get_logger (line 85) | def get_logger(name: str = None, level=logging.INFO) -> ColossalQALogger:

FILE: applications/ColossalQA/colossalqa/retrieval_conversation_en.py
  class EnglishRetrievalConversation (line 18) | class EnglishRetrievalConversation:
    method __init__ (line 23) | def __init__(self, retriever: CustomRetriever, model_path: str, model_...
    method disambiguity (line 65) | def disambiguity(self, input: str):
    method from_retriever (line 70) | def from_retriever(
    method run (line 75) | def run(self, user_input: str, memory: ConversationBufferWithSummary) ...

FILE: applications/ColossalQA/colossalqa/retrieval_conversation_universal.py
  class UniversalRetrievalConversation (line 20) | class UniversalRetrievalConversation:
    method __init__ (line 25) | def __init__(
    method load_supporting_docs (line 90) | def load_supporting_docs(self, files: List[List[str]] = None, text_spl...
    method start_test_session (line 117) | def start_test_session(self):
    method run (line 130) | def run(self, user_input: str, which_language=str):

FILE: applications/ColossalQA/colossalqa/retrieval_conversation_zh.py
  class ChineseRetrievalConversation (line 18) | class ChineseRetrievalConversation:
    method __init__ (line 23) | def __init__(self, retriever: CustomRetriever, model_path: str, model_...
    method disambiguity (line 71) | def disambiguity(self, input: str):
    method from_retriever (line 76) | def from_retriever(
    method run (line 81) | def run(self, user_input: str, memory: ConversationBufferWithSummary) ...

FILE: applications/ColossalQA/colossalqa/retriever.py
  class CustomRetriever (line 22) | class CustomRetriever(BaseRetriever):
    method from_documents (line 39) | def from_documents(
    method add_documents (line 52) | def add_documents(
    method clear_documents (line 100) | def clear_documents(self):
    method __del__ (line 108) | def __del__(self):
    method set_sql_database_chain (line 113) | def set_sql_database_chain(self, db_chains) -> None:
    method set_rephrase_handler (line 120) | def set_rephrase_handler(self, handler: Callable = None) -> None:
    method _get_relevant_documents (line 126) | def _get_relevant_documents(

FILE: applications/ColossalQA/colossalqa/text_splitter/chinese_text_splitter.py
  class ChineseTextSplitter (line 11) | class ChineseTextSplitter(RecursiveCharacterTextSplitter):
    method __init__ (line 12) | def __init__(self, separators: Optional[List[str]] = None, is_separato...
    method split_text (line 21) | def split_text(self, text: str) -> List[str]:

FILE: applications/ColossalQA/colossalqa/text_splitter/utils.py
  function remove_format (line 4) | def remove_format(text: str) -> str:
  function get_cleaned_paragraph (line 13) | def get_cleaned_paragraph(s: str) -> str:

FILE: applications/ColossalQA/colossalqa/utils.py
  function drop_table (line 12) | def drop_table(engine: Engine) -> None:
  function create_empty_sql_database (line 25) | def create_empty_sql_database(database_uri):
  function destroy_sql_database (line 39) | def destroy_sql_database(sql_engine: Union[Engine, str]) -> None:
  function detect_lang_naive (line 50) | def detect_lang_naive(s):

FILE: applications/ColossalQA/examples/retrieval_conversation_chatgpt.py
  function disambiguity (line 118) | def disambiguity(input):

FILE: applications/ColossalQA/examples/retrieval_conversation_en.py
  function disambiguity (line 58) | def disambiguity(input):

FILE: applications/ColossalQA/examples/retrieval_conversation_en_customer_service.py
  function disambiguity (line 60) | def disambiguity(input):
  function metadata_func (line 85) | def metadata_func(data_sample, additional_fields):

FILE: applications/ColossalQA/examples/retrieval_conversation_zh.py
  function disambiguity (line 64) | def disambiguity(input: str):

FILE: applications/ColossalQA/examples/retrieval_intent_classification_zh_customer_service.py
  function metadata_func (line 47) | def metadata_func(data_sample, additional_fields):

FILE: applications/ColossalQA/examples/webui_demo/RAG_ChatBot.py
  class RAG_ChatBot (line 16) | class RAG_ChatBot:
    method __init__ (line 17) | def __init__(
    method set_embed_model (line 35) | def set_embed_model(self, **kwargs):
    method set_text_splitter (line 42) | def set_text_splitter(self, **kwargs):
    method set_memory (line 46) | def set_memory(self, **kwargs):
    method set_info_retriever (line 58) | def set_info_retriever(self, **kwargs):
    method set_rag_chain (line 63) | def set_rag_chain(self, **kwargs):
    method set_disambig_retriv (line 74) | def set_disambig_retriv(self, **kwargs):
    method load_doc_from_console (line 84) | def load_doc_from_console(self, json_parse_args: Dict = {}):
    method load_doc_from_files (line 96) | def load_doc_from_files(self, files, data_name="default_kb", json_pars...
    method split_docs_and_add_to_mem (line 103) | def split_docs_and_add_to_mem(self, **kwargs):
    method split_docs (line 110) | def split_docs(self, documents):
    method clear_docs (line 114) | def clear_docs(self, **kwargs):
    method reset_config (line 120) | def reset_config(self, rag_config):
    method run (line 130) | def run(self, user_input: str, memory: ConversationBufferWithSummary) ...
    method start_test_session (line 142) | def start_test_session(self):

FILE: applications/ColossalQA/examples/webui_demo/server.py
  function parseArgs (line 16) | def parseArgs():
  class DocUpdateReq (line 26) | class DocUpdateReq(BaseModel):
  class GenerationTaskReq (line 31) | class GenerationTaskReq(BaseModel):
  function update_docs (line 36) | def update_docs(data: DocUpdateReq, request: Request):
  function generate (line 51) | def generate(data: GenerationTaskReq, request: Request):

FILE: applications/ColossalQA/examples/webui_demo/utils.py
  class DocAction (line 4) | class DocAction(str, Enum):

FILE: applications/ColossalQA/examples/webui_demo/webui.py
  function parseArgs (line 10) | def parseArgs():
  function get_response (line 17) | def get_response(data, url):
  function add_text (line 24) | def add_text(history, text):
  function add_file (line 29) | def add_file(history, files):
  function bot (line 39) | def bot(history):
  function restart (line 50) | def restart(chatbot, txt):

FILE: applications/ColossalQA/setup.py
  function fetch_requirements (line 4) | def fetch_requirements(path):
  function fetch_readme (line 9) | def fetch_readme():
  function fetch_version (line 14) | def fetch_version():

FILE: applications/ColossalQA/tests/test_document_loader.py
  function test_add_document (line 6) | def test_add_document():

FILE: applications/ColossalQA/tests/test_memory.py
  function test_memory_long (line 12) | def test_memory_long():
  function test_memory_short (line 66) | def test_memory_short():

FILE: applications/ColossalQA/tests/test_retrieval_qa.py
  function test_en_retrievalQA (line 6) | def test_en_retrievalQA():
  function test_zh_retrievalQA (line 27) | def test_zh_retrievalQA():

FILE: applications/ColossalQA/tests/test_text_splitter.py
  function test_text_splitter (line 4) | def test_text_splitter():

FILE: colossalai/_analyzer/_subclasses/_meta_registration.py
  function new (line 26) | def new(*args, **kwargs):
  function new_strided (line 30) | def new_strided(*args, **kwargs):
  function new_like (line 34) | def new_like(*args, **kwargs):
  function register_meta (line 38) | def register_meta(op, register_dispatcher=True):
  function meta_conv (line 59) | def meta_conv(
  function meta__conv (line 185) | def meta__conv(
  function meta_conv_backward (line 201) | def meta_conv_backward(
  function meta_adaptive_avg_pool2d_backward (line 218) | def meta_adaptive_avg_pool2d_backward(
  function meta_cuda_rnn (line 227) | def meta_cuda_rnn(
  function meta_cudnn_rnn_backward (line 280) | def meta_cudnn_rnn_backward(
  function meta_unregistered_ewise (line 313) | def meta_unregistered_ewise(input: torch.Tensor, *args):
  function meta_bn (line 319) | def meta_bn(input: torch.Tensor, weight, bias, running_mean, running_var...
  function meta_bn_backward (line 325) | def meta_bn_backward(
  function meta_cudnn_bn (line 341) | def meta_cudnn_bn(input: torch.Tensor, weight, bias, running_mean, runni...
  function meta_cudnn_bn_backward (line 355) | def meta_cudnn_bn_backward(
  function meta_ln (line 370) | def meta_ln(input: torch.Tensor, normalized_shape, weight, bias, eps):
  function meta_ln_backward (line 376) | def meta_ln_backward(
  function meta_im2col (line 385) | def meta_im2col(input: torch.Tensor, kernel_size, dilation, padding, str...
  function meta_roll (line 390) | def meta_roll(input: torch.Tensor, shifts, dims):
  function meta_local_scalar_dense (line 395) | def meta_local_scalar_dense(self: torch.Tensor):
  function meta_where_self (line 400) | def meta_where_self(condition: torch.Tensor, self: torch.Tensor, other: ...
  function meta_embedding_dense_backward (line 408) | def meta_embedding_dense_backward(
  function meta_native_dropout_default (line 416) | def meta_native_dropout_default(input: torch.Tensor, p: float, train: bo...
  function meta_native_dropout_backward_default (line 422) | def meta_native_dropout_backward_default(grad: torch.Tensor, mask: torch...
  function meta_eye (line 428) | def meta_eye(n: int, m: int, out: torch.Tensor):
  function meta_index_Tensor (line 432) | def meta_index_Tensor(self, indices):

FILE: colossalai/_analyzer/_subclasses/flop_tensor.py
  class Phase (line 22) | class Phase(Enum):
  function normalize_tuple (line 27) | def normalize_tuple(x):
  function _format_flops (line 33) | def _format_flops(flop):
  function flop_count (line 50) | def flop_count(module: Union[torch.nn.Module, Callable] = None, *args, v...
  function matmul_flop_jit (line 225) | def matmul_flop_jit(inputs: List[Any], outputs: List[Any]) -> Number:
  function addmm_flop_jit (line 259) | def addmm_flop_jit(inputs: List[Any], outputs: List[Any]) -> Number:
  function linear_flop_jit (line 276) | def linear_flop_jit(inputs: List[Any], outputs: List[Any]) -> Number:
  function bmm_flop_jit (line 290) | def bmm_flop_jit(inputs: List[Any], outputs: List[Any]) -> Number:
  function conv_flop_count (line 304) | def conv_flop_count(
  function conv_flop_jit (line 329) | def conv_flop_jit(inputs: List[Any], outputs: List[Any]):
  function transpose_shape (line 340) | def transpose_shape(shape):
  function conv_backward_flop_jit (line 344) | def conv_backward_flop_jit(inputs: List[Any], outputs: List[Any]):
  function norm_flop_counter (line 360) | def norm_flop_counter(affine_arg_index: int, input_arg_index: int) -> Ca...
  function batchnorm_flop_jit (line 386) | def batchnorm_flop_jit(inputs: List[Any], outputs: List[Any], training: ...
  function ewise_flop_counter (line 397) | def ewise_flop_counter(input_scale: float = 1, output_scale: float = 0) ...
  function zero_flop_jit (line 419) | def zero_flop_jit(*args):

FILE: colossalai/_analyzer/_subclasses/meta_tensor.py
  function register_storage (line 14) | def register_storage(r, data_ptr_fn=None):
  function _normalize_tuple (line 23) | def _normalize_tuple(x):
  function _assert_alias (line 30) | def _assert_alias(func):
  class MetaTensor (line 34) | class MetaTensor(torch.Tensor):
    method __new__ (line 50) | def __new__(cls, elem, device=None, data_ptr_fn=None):
    method __repr__ (line 83) | def __repr__(self):
    method __torch_dispatch__ (line 90) | def __torch_dispatch__(cls, func, types, args=(), kwargs=None):
    method to (line 126) | def to(self, *args, **kwargs) -> torch.Tensor:
    method cpu (line 152) | def cpu(self, *args, **kwargs):
    method cuda (line 157) | def cuda(self, device=None, non_blocking=False):
    method data_ptr (line 162) | def data_ptr(self):
  class MetaTensorMode (line 166) | class MetaTensorMode(object):
    method __init__ (line 179) | def __init__(self):
    method __enter__ (line 183) | def __enter__(self):
    method __exit__ (line 200) | def __exit__(self, exc_type, exc_value, traceback):

FILE: colossalai/_analyzer/envs.py
  class MeshConfig (line 5) | class MeshConfig:

FILE: colossalai/_analyzer/fx/codegen.py
  function _gen_ckpt_fn_def (line 28) | def _gen_ckpt_fn_def(label, free_vars: List[str]) -> str:
  function _gen_ckpt_output (line 35) | def _gen_ckpt_output(output_vars: List[str]) -> str:
  function _gen_ckpt_usage (line 42) | def _gen_ckpt_usage(label, input_vars, output_vars, use_reentrant=True):
  function _end_of_ckpt (line 51) | def _end_of_ckpt(node: Node, ckpt_level: int) -> bool:
  function _find_input_and_output_nodes (line 60) | def _find_input_and_output_nodes(nodes: List[Node]):
  function _find_nested_ckpt_regions (line 86) | def _find_nested_ckpt_regions(node_list: List[Node], ckpt_level: int = 0):
  function emit_ckpt_func (line 134) | def emit_ckpt_func(
  function emit_code_with_activation_checkpoint (line 210) | def emit_code_with_activation_checkpoint(body, ckpt_func, nodes, emit_no...
  class ActivationCheckpointCodeGen (line 248) | class ActivationCheckpointCodeGen(CodeGen):
    method _gen_python_code (line 249) | def _gen_python_code(self, nodes, root_module: str, namespace: _Namesp...

FILE: colossalai/_analyzer/fx/graph_module.py
  class _WrappedCall (line 27) | class _WrappedCall:
    method __init__ (line 28) | def __init__(self, cls, cls_call):
    method _generate_error_message (line 42) | def _generate_error_message(frame_summary: traceback.FrameSummary) -> ...
    method __call__ (line 65) | def __call__(self, obj, *args, **kwargs):
  class ColoGraphModule (line 85) | class ColoGraphModule(torch.fx.GraphModule):
    method __init__ (line 107) | def __init__(
    method bind (line 112) | def bind(self, ckpt_def, globals):
    method recompile (line 132) | def recompile(self) -> PythonCode:
    method to_folder (line 176) | def to_folder(self, folder: Union[str, os.PathLike], module_name: str ...

FILE: colossalai/_analyzer/fx/node_util.py
  function intersect (line 11) | def intersect(a, b):
  function subtract (line 15) | def subtract(a, b):
  function union (line 19) | def union(a, b):
  function compute_size_in_bytes (line 23) | def compute_size_in_bytes(elem: Union[torch.Tensor, Dict, List, Tuple, i...
  class MetaInfo (line 48) | class MetaInfo:
    method __new__ (line 119) | def __new__(cls, node: Node, **kwargs):
    method __post_init__ (line 136) | def __post_init__(self):
    method fwd_time (line 140) | def fwd_time(self, tflops: float = MeshConfig.TFLOPS, bandwidth: float...
    method bwd_time (line 144) | def bwd_time(self, tflops: float = MeshConfig.TFLOPS, bandwidth: float...
    method param_size (line 148) | def param_size(self):
    method buffer_size (line 152) | def buffer_size(self):
    method output_size (line 156) | def output_size(self):
    method accumulate_size (line 166) | def accumulate_size(self):
    method temp_size (line 176) | def temp_size(self):
    method backward_size (line 186) | def backward_size(self):
    method __repr__ (line 190) | def __repr__(self):

FILE: colossalai/_analyzer/fx/passes/graph_profile.py
  function _format_flops (line 13) | def _format_flops(flops: float) -> str:
  function _denormalize_tuple (line 26) | def _denormalize_tuple(t: Tuple[int, ...]) -> Tuple[int, ...]:
  function _normalize_tuple (line 30) | def _normalize_tuple(x):
  function _current_device (line 36) | def _current_device(module):
  class GraphProfiler (line 40) | class GraphProfiler(torch.fx.Interpreter):
    method __init__ (line 52) | def __init__(self, module: GraphModule, garbage_collect_values: bool =...
    method run (line 55) | def run(self, *args, initial_env: Optional[Dict[Node, Any]] = None, en...
    method fetch_initial_env (line 91) | def fetch_initial_env(self, device=None) -> Dict[Node, Any]:
    method propagate (line 107) | def propagate(self, *args, device=None):
    method summary (line 123) | def summary(self) -> str:
  class CommunicationProfiler (line 184) | class CommunicationProfiler(GraphProfiler):
    method __init__ (line 189) | def __init__(self, module: GraphModule, garbage_collect_values: bool =...
  class FlopProfiler (line 193) | class FlopProfiler(GraphProfiler):
    method run_node (line 232) | def run_node(self, n: torch.fx.Node) -> Any:
    method call_function (line 269) | def call_function(self, target: "Target", args: Tuple[Argument, ...], ...
    method call_method (line 293) | def call_method(self, target: "Target", args: Tuple[Argument, ...], kw...
    method call_module (line 311) | def call_module(self, target: "Target", args: Tuple[Argument, ...], kw...
  function graph_profile_pass (line 333) | def graph_profile_pass(module: GraphModule, *args, verbose=False) -> Gra...

FILE: colossalai/_analyzer/fx/passes/shape_prop.py
  class sim_env (line 17) | class sim_env(saved_tensors_hooks):
    method __init__ (line 32) | def __init__(self, module: Optional[torch.nn.Module] = None):
    method pack_hook (line 38) | def pack_hook(self, tensor: torch.Tensor):
    method unpack_hook (line 43) | def unpack_hook(self, tensor):
  function _normalize_tuple (line 47) | def _normalize_tuple(x):
  function _current_device (line 53) | def _current_device(module):
  class ShapeProp (line 61) | class ShapeProp(torch.fx.Interpreter):
    method __init__ (line 97) | def __init__(self, module: torch.fx.GraphModule, garbage_collect_value...
    method run_node (line 101) | def run_node(self, n: torch.fx.Node) -> Any:
    method call_function (line 174) | def call_function(self, target: "Target", args: Tuple[Any, ...], kwarg...
    method call_method (line 203) | def call_method(self, target: "Target", args: Tuple[Any, ...], kwargs:...
    method propagate (line 235) | def propagate(self, *args, device=None):
  function shape_prop_pass (line 256) | def shape_prop_pass(module: torch.fx.GraphModule, *args) -> torch.fx.Gra...

FILE: colossalai/_analyzer/fx/symbolic_profile.py
  function register_flop_count_impl (line 7) | def register_flop_count_impl(func):
  function register_shape_impl (line 15) | def register_shape_impl(func):
  function symbolic_profile (line 23) | def symbolic_profile(module: GraphModule, *args, verbose=False) -> Graph...

FILE: colossalai/_analyzer/fx/tracer/bias_addition.py
  function linear_impl (line 16) | def linear_impl(input, weight, bias=None):
  function conv1d_impl (line 24) | def conv1d_impl(input, weight, bias=None, stride=_single(1), padding=_si...
  function conv2d_impl (line 34) | def conv2d_impl(input, weight, bias=None, stride=_pair(1), padding=_pair...
  function conv3d_impl (line 44) | def conv3d_impl(input, weight, bias=None, stride=_triple(1), padding=_tr...
  function conv_transpose1d_impl (line 54) | def conv_transpose1d_impl(
  function conv_transpose2d_impl (line 87) | def conv_transpose2d_impl(
  function conv_transpose3d_impl (line 113) | def conv_transpose3d_impl(
  function addmm_impl (line 147) | def addmm_impl(input, mat1, mat2, beta=1, alpha=1):
  function addbmm_impl (line 160) | def addbmm_impl(input, batch1, batch2, beta=1, alpha=1):

FILE: colossalai/_analyzer/fx/tracer/custom_leaf_module.py
  function torch_nn_normalize (line 17) | def torch_nn_normalize(self, input: torch.Tensor):

FILE: colossalai/_analyzer/fx/tracer/proxy.py
  class ColoProxy (line 13) | class ColoProxy(Proxy):
    method __init__ (line 16) | def __init__(self, *args, data=None, **kwargs):
    method meta_data (line 21) | def meta_data(self):
    method meta_data (line 25) | def meta_data(self, args):
    method __torch_function__ (line 30) | def __torch_function__(cls, orig_method, types, args=(), kwargs=None):
    method from_torch_proxy (line 45) | def from_torch_proxy(cls, proxy: Proxy):
    method __repr__ (line 48) | def __repr__(self):
    method __len__ (line 51) | def __len__(self):
    method __int__ (line 54) | def __int__(self):
    method __index__ (line 57) | def __index__(self):
    method __float__ (line 63) | def __float__(self):
    method __bool__ (line 66) | def __bool__(self):
    method __getattr__ (line 69) | def __getattr__(self, k):
    method __setitem__ (line 72) | def __setitem__(self, key, value):
    method __contains__ (line 77) | def __contains__(self, key):
    method __isinstancecheck__ (line 85) | def __isinstancecheck__(self, type):
  class ColoAttribute (line 89) | class ColoAttribute(ColoProxy):
    method __init__ (line 90) | def __init__(self, root, attr: str, data=None):
    method node (line 98) | def node(self):
    method __call__ (line 105) | def __call__(self, *args, **kwargs):
    method __repr__ (line 108) | def __repr__(self):

FILE: colossalai/_analyzer/fx/tracer/symbolic_trace.py
  function _default_device (line 19) | def _default_device():
  function _current_device (line 23) | def _current_device(module: torch.nn.Module):
  function symbolic_trace (line 30) | def symbolic_trace(

FILE: colossalai/_analyzer/fx/tracer/tracer.py
  function _truncate_suffix (line 19) | def _truncate_suffix(s: str):
  function register_tracer_impl (line 26) | def register_tracer_impl(func: Callable[..., Any], name: Optional[str] =...
  function register_leaf_module_impl (line 35) | def register_leaf_module_impl(module: nn.Module):
  function register_leaf_module (line 43) | def register_leaf_module(module: nn.Module):
  function register_non_leaf_module (line 47) | def register_non_leaf_module(module: nn.Module):
  class ColoTracer (line 51) | class ColoTracer(Tracer):
    method __init__ (line 67) | def __init__(self, trace_act_ckpt: bool = False, bias_addition_split: ...
    method is_leaf_module (line 82) | def is_leaf_module(self, m: nn.Module, module_qualified_name: str) -> ...
    method call_module (line 92) | def call_module(
    method proxy (line 101) | def proxy(self, node: Node) -> "ColoProxy":
    method create_proxy (line 104) | def create_proxy(
    method create_node (line 161) | def create_node(self, *args, **kwargs) -> Node:
    method trace (line 166) | def trace(
    method _tracer_override (line 236) | def _tracer_override(self):
    method _torch_factory_override (line 269) | def _torch_factory_override(self):
    method _post_check (line 306) | def _post_check(self, non_concrete_arg_names: Set[str]):
    method getattr (line 336) | def getattr(self, attr, attr_val, parameter_proxy_cache):
    method _module_getattr (line 339) | def _module_getattr(self, attr, attr_val, parameter_proxy_cache):

FILE: colossalai/accelerator/api.py
  function set_accelerator (line 22) | def set_accelerator(accelerator: Union[str, BaseAccelerator]) -> None:
  function auto_set_accelerator (line 40) | def auto_set_accelerator() -> None:
  function get_accelerator (line 60) | def get_accelerator() -> BaseAccelerator:

FILE: colossalai/accelerator/base_accelerator.py
  class BaseAccelerator (line 11) | class BaseAccelerator(ABC):
    method __init__ (line 14) | def __init__(self, name: str, communication_backend: str, is_synchrono...
    method name (line 24) | def name(self) -> str:
    method communication_backend (line 31) | def communication_backend(self) -> str:
    method is_synchronous (line 38) | def is_synchronous(self) -> bool:
    method __repr__ (line 44) | def __repr__(self) -> str:
    method get_version (line 52) | def get_version(self) -> str:
    method get_current_device (line 58) | def get_current_device(self) -> torch.device:
    method current_device (line 64) | def current_device(self) -> int:
    method set_device (line 70) | def set_device(self, device: Optional[Union[torch.device, int]] = None...
    method get_device_name (line 76) | def get_device_name(self, device: Union[torch.device, int]) -> str:
    method synchronize (line 82) | def synchronize(self, device: Union[torch.device, int] = None):
    method is_available (line 88) | def is_available(self):
    method device_count (line 94) | def device_count(self):
    method set_to_device (line 99) | def set_to_device(self, models: Any) -> Any:
    method get_device_capability (line 116) | def get_device_capability(self, device=None) -> Tuple[int, int]:
    method get_device_name (line 122) | def get_device_name(self, device=None) -> str:
    method get_device_properties (line 128) | def get_device_properties(self, device):
    method utilization (line 134) | def utilization(self, device=None) -> int:
    method get_rng_state (line 143) | def get_rng_state(self, device="cuda") -> torch.Tensor:
    method get_rng_state_all (line 149) | def get_rng_state_all(self) -> List[torch.Tensor]:
    method set_rng_state (line 155) | def set_rng_state(self, new_state: torch.ByteTensor, device: str = "cu...
    method set_rng_state_all (line 161) | def set_rng_state_all(self, new_states: List[torch.ByteTensor]) -> None:
    method manual_seed (line 167) | def manual_seed(self, seed: int) -> None:
    method manual_seed_all (line 173) | def manual_seed_all(self, seed: int) -> None:
    method seed (line 179) | def seed(self) -> None:
    method seed_all (line 185) | def seed_all(self) -> None:
    method initial_seed (line 191) | def initial_seed(self) -> int:
    method empty_cache (line 200) | def empty_cache(self) -> None:
    method memory_stats (line 206) | def memory_stats(self, device=None) -> Dict[str, Any]:
    method memory_summary (line 212) | def memory_summary(self, device=None, abbreviated=False) -> str:
    method memory_snapshot (line 218) | def memory_snapshot(self):
    method memory_allocated (line 224) | def memory_allocated(self, device=None) -> int:
    method max_memory_allocated (line 230) | def max_memory_allocated(self, device=None) -> int:
    method reset_max_memory_allocated (line 236) | def reset_max_memory_allocated(self, device=None) -> None:
    method reset_max_memory_cached (line 242) | def reset_max_memory_cached(self, device=None) -> None:
    method memory_reserved (line 248) | def memory_reserved(self, device=None) -> int:
    method max_memory_reserved (line 254) | def max_memory_reserved(self, device=None) -> int:
    method set_per_process_memory_fraction (line 260) | def set_per_process_memory_fraction(self, fraction: float, device=None...
    method reset_peak_memory_stats (line 266) | def reset_peak_memory_stats(self, device=None) -> None:
    method Stream (line 276) | def Stream(self, device=None, priority=0, **kwargs):
    method Event (line 282) | def Event(self, enable_timing: bool = False, blocking: bool = False, i...
    method current_stream (line 288) | def current_stream(self, device=None):
    method default_stream (line 294) | def default_stream(self, device=None):
    method set_stream (line 300) | def set_stream(self, stream_):
    method stream (line 306) | def stream(self, stream_):
    method autocast (line 315) | def autocast(

FILE: colossalai/accelerator/cpu_accelerator.py
  class CpuAccelerator (line 15) | class CpuAccelerator(BaseAccelerator):
    method __init__ (line 21) | def __init__(self):
    method get_version (line 27) | def get_version(self) -> str:
    method get_current_device (line 33) | def get_current_device(self) -> torch.device:
    method current_device (line 39) | def current_device(self) -> int:
    method set_device (line 45) | def set_device(self, device: Optional[Union[torch.device, int]] = None...
    method get_device_name (line 51) | def get_device_name(self, device: Union[torch.device, int]) -> str:
    method synchronize (line 57) | def synchronize(self, device: Union[torch.device, int] = None):
    method is_available (line 63) | def is_available(self):
    method device_count (line 69) | def device_count(self):
    method get_device_capability (line 75) | def get_device_capability(self, device=None) -> Tuple[int, int]:
    method get_device_name (line 81) | def get_device_name(self, device=None) -> str:
    method get_device_properties (line 87) | def get_device_properties(self, device):
    method utilization (line 93) | def utilization(self, device=None) -> int:
    method get_rng_state (line 102) | def get_rng_state(self, device=None) -> torch.Tensor:
    method get_rng_state_all (line 108) | def get_rng_state_all(self) -> List[torch.Tensor]:
    method set_rng_state (line 114) | def set_rng_state(self, new_state: torch.ByteTensor, device: str = Non...
    method set_rng_state_all (line 120) | def set_rng_state_all(self, new_states: List[torch.ByteTensor]) -> None:
    method manual_seed (line 126) | def manual_seed(self, seed: int) -> None:
    method manual_seed_all (line 132) | def manual_seed_all(self, seed: int) -> None:
    method seed (line 138) | def seed(self) -> None:
    method seed_all (line 144) | def seed_all(self) -> None:
    method initial_seed (line 150) | def initial_seed(self) -> int:
    method empty_cache (line 160) | def empty_cache(self) -> None:
    method memory_stats (line 166) | def memory_stats(self, device=None) -> Dict[str, Any]:
    method memory_summary (line 172) | def memory_summary(self, device=None, abbreviated=False) -> str:
    method memory_snapshot (line 178) | def memory_snapshot(self):
    method memory_allocated (line 184) | def memory_allocated(self, device=None) -> int:
    method max_memory_allocated (line 190) | def max_memory_allocated(self, device=None) -> int:
    method reset_max_memory_allocated (line 196) | def reset_max_memory_allocated(self, device=None) -> None:
    method reset_max_memory_cached (line 202) | def reset_max_memory_cached(self, device=None) -> None:
    method memory_reserved (line 208) | def memory_reserved(self, device=None) -> int:
    method max_memory_reserved (line 214) | def max_memory_reserved(self, device=None) -> int:
    method set_per_process_memory_fraction (line 220) | def set_per_process_memory_fraction(self, fraction: float, device=None...
    method reset_peak_memory_stats (line 228) | def reset_peak_memory_stats(self, device=None) -> None:
    method Stream (line 238) | def Stream(self, device=None, priority=0, **kwargs):
    method Event (line 244) | def Event(self, enable_timing: bool = False, blocking: bool = False, i...
    method current_stream (line 250) | def current_stream(self, device=None):
    method default_stream (line 256) | def default_stream(self, device=None):
    method set_stream (line 262) | def set_stream(self, stream_):
    method stream (line 268) | def stream(self, stream_):
    method autocast (line 277) | def autocast(

FILE: colossalai/accelerator/cuda_accelerator.py
  class CudaAccelerator (line 13) | class CudaAccelerator(BaseAccelerator):
    method __init__ (line 18) | def __init__(self):
    method get_version (line 24) | def get_version(self) -> str:
    method get_current_device (line 30) | def get_current_device(self) -> torch.device:
    method current_device (line 36) | def current_device(self) -> int:
    method set_device (line 42) | def set_device(self, device: Optional[Union[torch.device, int]] = None...
    method get_device_name (line 52) | def get_device_name(self, device: Union[torch.device, int]) -> str:
    method synchronize (line 58) | def synchronize(self, device: Union[torch.device, int] = None):
    method is_available (line 64) | def is_available(self):
    method device_count (line 70) | def device_count(self):
    method get_device_capability (line 76) | def get_device_capability(self, device=None) -> Tuple[int, int]:
    method get_device_name (line 82) | def get_device_name(self, device=None) -> str:
    method get_device_properties (line 88) | def get_device_properties(self, device):
    method utilization (line 94) | def utilization(self, device=None) -> int:
    method get_rng_state (line 103) | def get_rng_state(self, device="cuda") -> torch.Tensor:
    method get_rng_state_all (line 109) | def get_rng_state_all(self) -> List[torch.Tensor]:
    method set_rng_state (line 115) | def set_rng_state(self, new_state: torch.ByteTensor, device: str = "cu...
    method set_rng_state_all (line 121) | def set_rng_state_all(self, new_states: List[torch.ByteTensor]) -> None:
    method manual_seed (line 127) | def manual_seed(self, seed: int) -> None:
    method manual_seed_all (line 133) | def manual_seed_all(self, seed: int) -> None:
    method seed (line 139) | def seed(self) -> None:
    method seed_all (line 145) | def seed_all(self) -> None:
    method initial_seed (line 151) | def initial_seed(self) -> int:
    method empty_cache (line 161) | def empty_cache(self) -> None:
    method memory_stats (line 167) | def memory_stats(self, device=None) -> Dict[str, Any]:
    method memory_summary (line 173) | def memory_summary(self, device=None, abbreviated=False) -> str:
    method memory_snapshot (line 179) | def memory_snapshot(self):
    method memory_allocated (line 185) | def memory_allocated(self, device=None) -> int:
    method max_memory_allocated (line 191) | def max_memory_allocated(self, device=None) -> int:
    method reset_max_memory_allocated (line 197) | def reset_max_memory_allocated(self, device=None) -> None:
    method reset_max_memory_cached (line 203) | def reset_max_memory_cached(self, device=None) -> None:
    method memory_reserved (line 209) | def memory_reserved(self, device=None) -> int:
    method max_memory_reserved (line 215) | def max_memory_reserved(self, device=None) -> int:
    method set_per_process_memory_fraction (line 221) | def set_per_process_memory_fraction(self, fraction: float, device=None...
    method reset_peak_memory_stats (line 227) | def reset_peak_memory_stats(self, device=None) -> None:
    method Stream (line 237) | def Stream(self, device=None, priority=0, **kwargs):
    method Event (line 243) | def Event(self, enable_timing: bool = False, blocking: bool = False, i...
    method current_stream (line 249) | def current_stream(self, device=None):
    method default_stream (line 255) | def default_stream(self, device=None):
    method set_stream (line 261) | def set_stream(self, stream_):
    method stream (line 267) | def stream(self, stream_):
    method autocast (line 276) | def autocast(

FILE: colossalai/accelerator/npu_accelerator.py
  class NpuAccelerator (line 19) | class NpuAccelerator(BaseAccelerator):
    method __init__ (line 24) | def __init__(self):
    method get_version (line 30) | def get_version(self) -> str:
    method get_current_device (line 36) | def get_current_device(self) -> torch.device:
    method current_device (line 42) | def current_device(self) -> int:
    method set_device (line 48) | def set_device(self, device: Optional[Union[torch.device, int]] = None...
    method get_device_name (line 58) | def get_device_name(self, device: Union[torch.device, int]) -> str:
    method synchronize (line 64) | def synchronize(self, device: Union[torch.device, int] = None):
    method is_available (line 70) | def is_available(self):
    method device_count (line 76) | def device_count(self):
    method get_device_capability (line 82) | def get_device_capability(self, device=None) -> Tuple[int, int]:
    method get_device_name (line 88) | def get_device_name(self, device=None) -> str:
    method get_device_properties (line 94) | def get_device_properties(self, device):
    method utilization (line 100) | def utilization(self, device=None) -> int:
    method get_rng_state (line 109) | def get_rng_state(self, device="npu") -> torch.Tensor:
    method get_rng_state_all (line 115) | def get_rng_state_all(self) -> List[torch.Tensor]:
    method set_rng_state (line 121) | def set_rng_state(self, new_state: torch.ByteTensor, device: str = "np...
    method set_rng_state_all (line 127) | def set_rng_state_all(self, new_states: List[torch.ByteTensor]) -> None:
    method manual_seed (line 133) | def manual_seed(self, seed: int) -> None:
    method manual_seed_all (line 139) | def manual_seed_all(self, seed: int) -> None:
    method seed (line 145) | def seed(self) -> None:
    method seed_all (line 151) | def seed_all(self) -> None:
    method initial_seed (line 157) | def initial_seed(self) -> int:
    method empty_cache (line 167) | def empty_cache(self) -> None:
    method memory_stats (line 173) | def memory_stats(self, device=None) -> Dict[str, Any]:
    method memory_summary (line 179) | def memory_summary(self, device=None, abbreviated=False) -> str:
    method memory_snapshot (line 185) | def memory_snapshot(self):
    method memory_allocated (line 191) | def memory_allocated(self, device=None) -> int:
    method max_memory_allocated (line 197) | def max_memory_allocated(self, device=None) -> int:
    method reset_max_memory_allocated (line 203) | def reset_max_memory_allocated(self, device=None) -> None:
    method reset_max_memory_cached (line 209) | def reset_max_memory_cached(self, device=None) -> None:
    method memory_reserved (line 215) | def memory_reserved(self, device=None) -> int:
    method max_memory_reserved (line 221) | def max_memory_reserved(self, device=None) -> int:
    method set_per_process_memory_fraction (line 227) | def set_per_process_memory_fraction(self, fraction: float, device=None...
    method reset_peak_memory_stats (line 233) | def reset_peak_memory_stats(self, device=None) -> None:
    method Stream (line 243) | def Stream(self, device=None, priority=0, **kwargs):
    method Event (line 249) | def Event(self, enable_timing: bool = False, blocking: bool = False, i...
    method current_stream (line 255) | def current_stream(self, device=None):
    method default_stream (line 261) | def default_stream(self, device=None):
    method set_stream (line 267) | def set_stream(self, stream_):
    method stream (line 273) | def stream(self, stream_):
    method autocast (line 282) | def autocast(

FILE: colossalai/amp/naive_amp/grad_scaler/base_grad_scaler.py
  class BaseGradScaler (line 16) | class BaseGradScaler(ABC):
    method __init__ (line 24) | def __init__(self, initial_scale: float, verbose: bool):
    method scale (line 33) | def scale(self) -> Tensor:
    method inv_scale (line 39) | def inv_scale(self) -> Tensor:
    method state_dict (line 44) | def state_dict(self) -> Dict:
    method load_state_dict (line 51) | def load_state_dict(self, state_dict: Dict) -> None:
    method update (line 61) | def update(self, overflow: bool) -> None:
    method log (line 68) | def log(self, message, *args, **kwargs):

FILE: colossalai/amp/naive_amp/grad_scaler/constant_grad_scaler.py
  class ConstantGradScaler (line 8) | class ConstantGradScaler(BaseGradScaler):
    method __init__ (line 16) | def __init__(self, initial_scale: int, verbose: bool):
    method update (line 20) | def update(self, overflow: bool) -> None:

FILE: colossalai/amp/naive_amp/grad_scaler/dynamic_grad_scaler.py
  class DynamicGradScaler (line 15) | class DynamicGradScaler(BaseGradScaler):
    method __init__ (line 29) | def __init__(
    method _sanity_checks (line 65) | def _sanity_checks(self) -> None:
    method update (line 78) | def update(self, overflow: bool) -> None:
    method _backoff_scale (line 103) | def _backoff_scale(self) -> None:
    method _grow_scale (line 110) | def _grow_scale(self) -> None:
    method state_dict (line 117) | def state_dict(self):
    method load_state_dict (line 125) | def load_state_dict(self, state_dict):

FILE: colossalai/amp/naive_amp/mixed_precision_mixin/base.py
  class MixedPrecisionMixin (line 7) | class MixedPrecisionMixin(ABC):
    method pre_backward (line 46) | def pre_backward(self, loss: Tensor, *args, **kwargs) -> Tensor:
    method pre_backward_by_grad (line 57) | def pre_backward_by_grad(self, tensor: Tensor, grad: Tensor) -> Tensor:
    method should_skip_step (line 69) | def should_skip_step(self) -> bool:
    method pre_zero_grad (line 77) | def pre_zero_grad(self) -> None:
    method get_grad_div_scale (line 81) | def get_grad_div_scale(self) -> float:

FILE: colossalai/amp/naive_amp/mixed_precision_mixin/bf16.py
  class BF16MixedPrecisionMixin (line 7) | class BF16MixedPrecisionMixin(MixedPrecisionMixin):
    method pre_backward (line 10) | def pre_backward(self, loss: Tensor) -> Tensor:
    method pre_backward_by_grad (line 13) | def pre_backward_by_grad(self, tensor: Tensor, grad: Tensor) -> Tensor:
    method should_skip_step (line 16) | def should_skip_step(self) -> bool:
    method pre_zero_grad (line 19) | def pre_zero_grad(self) -> None:
    method get_grad_div_scale (line 22) | def get_grad_div_scale(self) -> float:

FILE: colossalai/amp/naive_amp/mixed_precision_mixin/fp16.py
  class OptimState (line 14) | class OptimState(Enum):
  class FP16MixedPrecisionMixin (line 19) | class FP16MixedPrecisionMixin(MixedPrecisionMixin):
    method __init__ (line 22) | def __init__(
    method loss_scale (line 46) | def loss_scale(self) -> float:
    method check_local_overflow (line 50) | def check_local_overflow(self) -> bool:
    method check_overflow (line 57) | def check_overflow(self) -> bool:
    method pre_backward (line 65) | def pre_backward(self, loss: Tensor) -> Tensor:
    method pre_backward_by_grad (line 70) | def pre_backward_by_grad(self, tensor: Tensor, grad: Tensor) -> Tensor:
    method should_skip_step (line 74) | def should_skip_step(self) -> bool:
    method pre_zero_grad (line 81) | def pre_zero_grad(self) -> None:
    method get_grad_div_scale (line 84) | def get_grad_div_scale(self) -> float:

FILE: colossalai/amp/naive_amp/mixed_precision_optimizer.py
  class NaiveFP16MixedPrecisionMixin (line 13) | class NaiveFP16MixedPrecisionMixin(FP16MixedPrecisionMixin):
    method __init__ (line 14) | def __init__(
    method check_local_overflow (line 30) | def check_local_overflow(self) -> bool:
  class MixedPrecisionOptimizer (line 37) | class MixedPrecisionOptimizer(OptimizerWrapper):
    method __init__ (line 38) | def __init__(
    method backward (line 89) | def backward(self, loss: Tensor, inputs=None, retain_graph=False, **kw...
    method backward_by_grad (line 93) | def backward_by_grad(self, tensor: Tensor, grad: Tensor, inputs: Tenso...
    method zero_grad (line 102) | def zero_grad(self, *args, **kwargs):
    method _unscale_and_clip_grads (line 108) | def _unscale_and_clip_grads(self, total_norm: float) -> None:
    method _compute_grad_norm (line 140) | def _compute_grad_norm(self, param_gradient_pairs: List[Tuple[Tensor]]...
    method step (line 169) | def step(self, *args, **kwargs):
    method update_master_params (line 208) | def update_master_params(self, model: Module):
    method get_working_to_master_map (line 217) | def get_working_to_master_map(self) -> Dict[int, torch.Tensor]:
    method get_master_to_working_map (line 220) | def get_master_to_working_map(self) -> Dict[int, torch.Tensor]:
    method get_grad_norm (line 223) | def get_grad_norm(self, norm_type=2, **kwargs):

FILE: colossalai/auto_parallel/checkpoint/ckpt_solver_base.py
  function _copy_output (line 18) | def _copy_output(src: Graph, dst: Graph):
  function _get_param_size (line 25) | def _get_param_size(module: torch.nn.Module):
  class CheckpointSolverBase (line 30) | class CheckpointSolverBase(ABC):
    method __init__ (line 31) | def __init__(
    method solve (line 82) | def solve(self):
    method get_node_list (line 85) | def get_node_list(self):
    method _linearize_graph (line 89) | def _linearize_graph(self) -> List[List[Node]]:

FILE: colossalai/auto_parallel/checkpoint/ckpt_solver_chen.py
  class CheckpointSolverChen (line 14) | class CheckpointSolverChen(CheckpointSolverBase):
    method __init__ (line 15) | def __init__(self, graph: Graph, cnode: List[str] = None, num_grids: i...
    method solve (line 36) | def solve(self) -> Graph:
    method run_chen_greedy (line 52) | def run_chen_greedy(self, b: int = 0) -> Tuple[Set, int]:
    method grid_search (line 73) | def grid_search(self) -> Set:

FILE: colossalai/auto_parallel/checkpoint/ckpt_solver_rotor.c
  function PyObject (line 50) | static PyObject* computeTable(PyObject* self, PyObject* args) {
  type PyModuleDef (line 199) | struct PyModuleDef
  function PyMODINIT_FUNC (line 209) | PyMODINIT_FUNC PyInit_rotorc(void) { return PyModule_Create(&rotorModule...

FILE: colossalai/auto_parallel/checkpoint/ckpt_solver_rotor.py
  class CheckpointSolverRotor (line 24) | class CheckpointSolverRotor(CheckpointSolverBase):
    method __init__ (line 25) | def __init__(
    method solve (line 66) | def solve(self, force_python: bool = False, verbose: bool = False) -> ...
    method print_chain (line 104) | def print_chain(self):
    method print_sequence (line 116) | def print_sequence(self):
    method _construct_chain (line 120) | def _construct_chain(cls, graph: Graph, node_list: List[List[Node]]) -...
    method _extract_node_info (line 141) | def _extract_node_info(cls, node: List[Node]) -> Tuple[int, ...]:
    method _extract_input (line 168) | def _extract_input(graph: Graph) -> Tuple[Tensor, ...]:
    method _extract_unused_output (line 177) | def _extract_unused_output(node: Node) -> int:
    method _extract_btmp (line 182) | def _extract_btmp(node: List[Node]) -> int:
    method _compute_table (line 209) | def _compute_table(chain: Chain, mmax: int) -> Tuple:
    method _compute_table_c (line 276) | def _compute_table_c(chain: Chain, mmax: int) -> Tuple:
    method _backtrack (line 308) | def _backtrack(
    method _annotate_from_sequence (line 361) | def _annotate_from_sequence(sequence: Sequence, node_list: List[List[N...

FILE: colossalai/auto_parallel/checkpoint/operation.py
  class Chain (line 8) | class Chain:
    method __init__ (line 9) | def __init__(
    method check_lengths (line 40) | def check_lengths(self):
    method __repr__ (line 50) | def __repr__(self):
    method __len__ (line 58) | def __len__(self):
    method discretize_all (line 61) | def discretize_all(self, unit: int):
  class Operation (line 70) | class Operation(ABC):
    method __repr__ (line 73) | def __repr__(self) -> str:
    method shift (line 76) | def shift(self, value):
  class Forward (line 83) | class Forward(Operation):
    method __init__ (line 86) | def __init__(self, index):
    method cost (line 89) | def cost(self, chain: Chain):
  class ForwardEnable (line 96) | class ForwardEnable(Forward):
  class ForwardNograd (line 100) | class ForwardNograd(Forward):
  class ForwardCheck (line 104) | class ForwardCheck(Forward):
  class Forwards (line 108) | class Forwards(Operation):
    method __init__ (line 109) | def __init__(self, start, end):
    method __repr__ (line 112) | def __repr__(self):
    method cost (line 115) | def cost(self, chain: Chain):
  function isForward (line 122) | def isForward(op):
  class Backward (line 126) | class Backward(Operation):
    method __init__ (line 129) | def __init__(self, index):
    method cost (line 132) | def cost(self, chain: Chain):
  class Loss (line 139) | class Loss(Operation):
    method __init__ (line 140) | def __init__(self):
    method __repr__ (line 143) | def __repr__(self):
    method cost (line 146) | def cost(self, chain):
  class MemoryAccess (line 150) | class MemoryAccess(Operation):
    method __init__ (line 153) | def __init__(self, index):
    method cost (line 156) | def cost(self, chain: Chain):
  class WriteMemory (line 160) | class WriteMemory(MemoryAccess):
  class ReadMemory (line 164) | class ReadMemory(MemoryAccess):
  class DiscardMemory (line 168) | class DiscardMemory(MemoryAccess):
  class Sequence (line 172) | class Sequence(list):
    method __init__ (line 173) | def __init__(self):
    method __repr__ (line 176) | def __repr__(self):
    method list_operations (line 179) | def list_operations(self):

FILE: colossalai/auto_parallel/meta_profiler/meta_registry/activation.py
  function elementwise_meta_info (line 14) | def elementwise_meta_info(temp_mem_scale: float = 0, buffer_mem_scale: f...

FILE: colossalai/auto_parallel/meta_profiler/meta_registry/binary_elementwise_ops.py
  function binary_elementwise_meta_info (line 16) | def binary_elementwise_meta_info(*args, **kwargs) -> Tuple[TrainCycleIte...

FILE: colossalai/auto_parallel/meta_profiler/meta_registry/conv.py
  function convnd_meta_info (line 20) | def convnd_meta_info(*args, **kwargs) -> Tuple[TrainCycleItem, TrainCycl...

FILE: colossalai/auto_parallel/meta_profiler/meta_registry/embedding.py
  function embedding_meta_info (line 15) | def embedding_meta_info(*args, **kwargs) -> Tuple[TrainCycleItem, TrainC...

FILE: colossalai/auto_parallel/meta_profiler/meta_registry/linear.py
  function linear_meta_info (line 17) | def linear_meta_info(*args, **kwargs) -> Tuple[TrainCycleItem, TrainCycl...
  function matmul_meta_info (line 190) | def matmul_meta_info(*args, **kwargs) -> Tuple[TrainCycleItem, TrainCycl...

FILE: colossalai/auto_parallel/meta_profiler/meta_registry/non_spmd.py
  function non_spmd_meta_info (line 17) | def non_spmd_meta_info(*args, **kwargs) -> Tuple[TrainCycleItem, TrainCy...

FILE: colossalai/auto_parallel/meta_profiler/meta_registry/norm.py
  function batchnormnd_meta_info (line 17) | def batchnormnd_meta_info(*args, **kwargs) -> Tuple[TrainCycleItem, Trai...
  function layernorm_meta_info (line 113) | def layernorm_meta_info(*args, **kwargs) -> Tuple[TrainCycleItem, TrainC...

FILE: colossalai/auto_parallel/meta_profiler/meta_registry/pooling.py
  function avgpool_meta_info (line 17) | def avgpool_meta_info(*args, **kwargs) -> Tuple[TrainCycleItem, TrainCyc...
  function maxpool_meta_info (line 74) | def maxpool_meta_info(*args, **kwargs) -> Tuple[TrainCycleItem, TrainCyc...

FILE: colossalai/auto_parallel/meta_profiler/meta_registry/tensor.py
  function tensor_related_metainfo (line 13) | def tensor_related_metainfo(bwd_mem_out_factor: float = 1, bwd_mem_tmp_f...

FILE: colossalai/auto_parallel/meta_profiler/meta_registry/where.py
  function where_meta_info (line 15) | def where_meta_info(*args, **kwargs) -> Tuple[TrainCycleItem, TrainCycle...

FILE: colossalai/auto_parallel/meta_profiler/registry.py
  class Registry (line 4) | class Registry:
    method __init__ (line 5) | def __init__(self, name):
    method register (line 9) | def register(self, source):
    method get (line 21) | def get(self, source):
    method has (line 26) | def has(self, source):

FILE: colossalai/auto_parallel/meta_profiler/shard_metainfo.py
  class ShardMetaInfo (line 14) | class ShardMetaInfo:
    method __init__ (line 20) | def __init__(self, strategy: ShardingStrategy = None, target: Callable...
    method strategy (line 47) | def strategy(self) -> ShardingStrategy:
    method target (line 51) | def target(self) -> Callable:
    method strategy (line 55) | def strategy(self, strategy: ShardingStrategy) -> None:
    method target (line 61) | def target(self, target: Callable) -> None:
    method compute_sharded_opdata (line 66) | def compute_sharded_opdata(self, operation_data: OperationData, shardi...
    method compute_shard_metainfo (line 91) | def compute_shard_metainfo(self):

FILE: colossalai/auto_parallel/offload/amp_optimizer.py
  class OptimState (line 17) | class OptimState(Enum):
  class AMPOptimizer (line 22) | class AMPOptimizer(OptimizerWrapper):
    method __init__ (line 40) | def __init__(
    method _set_grad_ptr (line 87) | def _set_grad_ptr(self):
    method _update_fp16_params (line 97) | def _update_fp16_params(self):
    method _check_overflow (line 105) | def _check_overflow(self):
    method _get_combined_scale (line 110) | def _get_combined_scale(self):
    method loss_scale (line 125) | def loss_scale(self):
    method zero_grad (line 128) | def zero_grad(self, *args, **kwargs):
    method step (line 132) | def step(self, *args, **kwargs):
    method clip_grad_norm (line 155) | def clip_grad_norm(self, model: torch.nn.Module, max_norm: float, norm...
    method backward (line 158) | def backward(self, loss: torch.Tensor):
    method __init__optimizer (line 163) | def __init__optimizer(self):

FILE: colossalai/auto_parallel/offload/base_offload_module.py
  class BaseOffloadModule (line 14) | class BaseOffloadModule:
    method __init__ (line 24) | def __init__(self, model: nn.Module, region_manager: RegionManager, is...
    method register_grad_hook (line 34) | def register_grad_hook(self):
    method remove_grad_hook (line 39) | def remove_grad_hook(self):
    method __call__ (line 43) | def __call__(self, *args, **kwargs):
    method _pre_forward (line 46) | def _pre_forward(self):
    method forward (line 51) | def forward(self, *args, **kwargs):
    method backward (line 58) | def backward(self, loss):
    method _post_backward (line 62) | def _post_backward(self):
    method grad_handle (line 72) | def grad_handle(self, p, grad):
    method _cast_buffers (line 86) | def _cast_buffers(self):
    method parameters (line 90) | def parameters(self, recurse: bool = True):
    method named_parameters (line 93) | def named_parameters(self, prefix: str = "", recurse: bool = True):
    method named_buffers (line 96) | def named_buffers(self, prefix: str = "", recurse: bool = True):
    method named_children (line 99) | def named_children(self):
    method named_modules (line 102) | def named_modules(

FILE: colossalai/auto_parallel/offload/mem_optimize.py
  function memory_optimize (line 17) | def memory_optimize(

FILE: colossalai/auto_parallel/offload/region.py
  class Region (line 10) | class Region:
    method __init__ (line 18) | def __init__(self, r_id: int = 0) -> None:
    method can_release (line 41) | def can_release(self) -> bool:
    method has_inf_or_nan (line 48) | def has_inf_or_nan(self) -> bool:
    method init_param_data (line 54) | def init_param_data(self, pre_alloc_tensor: torch.Tensor = None):
    method move_param_to_cuda (line 74) | def move_param_to_cuda(self):
    method move_grad_to_cpu (line 92) | def move_grad_to_cpu(self):
    method free_cuda_data (line 105) | def free_cuda_data(self):
    method copy_grad_to_region_slice (line 110) | def copy_grad_to_region_slice(self, param: torch.nn.Parameter, data_sl...
    method split (line 125) | def split(self, cut_node_idx: int, cut_param_idx: int):
    method __update_params_ptr (line 143) | def __update_params_ptr(self) -> None:

FILE: colossalai/auto_parallel/offload/region_manager.py
  class RegionManager (line 12) | class RegionManager:
    method __init__ (line 23) | def __init__(self, graph: Graph, solver_name: str = "asyn", memory_bud...
    method _build_regions (line 42) | def _build_regions(self):
    method _pre_process (line 59) | def _pre_process(self):
    method _post_process (line 99) | def _post_process(self, ts: TrainingSimulator = None):
    method _early_region_placement (line 104) | def _early_region_placement(self, ts: TrainingSimulator):
    method _merge_small_regions (line 144) | def _merge_small_regions(self, orig_reg_list: List[Region]) -> List[Re...
    method _search_block_size (line 173) | def _search_block_size(
    method _init_region_data (line 217) | def _init_region_data(self):
    method _process_shared_region (line 241) | def _process_shared_region(self):
    method _linearize_graph (line 271) | def _linearize_graph(self) -> List[Region]:
    method _set_node_and_region_info (line 466) | def _set_node_and_region_info(self, node_id: int, cur_n: Node, cur_reg...
    method get_region (line 502) | def get_region(self, param: torch.nn.Parameter) -> Region:
    method __update_param_region_map (line 511) | def __update_param_region_map(self, params: List[torch.nn.Parameter], ...

FILE: colossalai/auto_parallel/offload/runtime.py
  class SynPreFwdPostBwdOP (line 10) | class SynPreFwdPostBwdOP(torch.autograd.Function):
    method forward (line 23) | def forward(ctx, input_, fwd_info, bwd_info):
    method backward (line 40) | def backward(ctx, grad_output):
  class AsynPreFwdPostBwdOP (line 50) | class AsynPreFwdPostBwdOP(torch.autograd.Function):
    method forward (line 63) | def forward(ctx, input_, fwd_info, bwd_info):
    method backward (line 88) | def backward(ctx, grad_output):
  function convert_fwd_upload_bwd_offload_to_action (line 114) | def convert_fwd_upload_bwd_offload_to_action(tensor, fwd_info, bwd_info):
  function convert_fwd_prefetch_bwd_offload_to_action (line 130) | def convert_fwd_prefetch_bwd_offload_to_action(tensor, fwd_info, bwd_info):
  function replace_node_users (line 146) | def replace_node_users(orig_node: Node, inserted_node: Node, rep_user_no...
  function runtime_syn_offload_apply_pass (line 166) | def runtime_syn_offload_apply_pass(gm: torch.fx.GraphModule, region_list...
  function runtime_asyn_offload_apply_pass (line 200) | def runtime_asyn_offload_apply_pass(gm: torch.fx.GraphModule, region_lis...

FILE: colossalai/auto_parallel/offload/solver.py
  function benchmark_func (line 21) | def benchmark_func(func, number=1, repeat=1, warmup=3):
  class Solver (line 42) | class Solver(ABC):
    method __init__ (line 53) | def __init__(self, region_list: List[Region], memory_budget: float = -...
    method _call_solver (line 69) | def _call_solver(self):
    method _try_to_offload (line 73) | def _try_to_offload(self, *args):
    method _eval_one_choice (line 77) | def _eval_one_choice(self, *args):
    method _compute_offload_profit (line 80) | def _compute_offload_profit(self, total_mem_saving: float, peak_mem_sa...
    method _compare_profit (line 99) | def _compare_profit(self, profit_a: tuple, profit_b: tuple) -> bool:
    method _update_state (line 116) | def _update_state(self, best_ts: TrainingSimulator):
    method _update_node_mem_info (line 124) | def _update_node_mem_info(self, fwd_mem_info: Dict[Node, float], bwd_m...
    method _extract_computing_power (line 140) | def _extract_computing_power(self):
    method _profile_bandwidth (line 164) | def _profile_bandwidth(self):
  class SynGreedySolver (line 203) | class SynGreedySolver(Solver):
    method __init__ (line 204) | def __init__(self, region_list: List[Region], memory_budget: float = -...
    method _init_state (line 210) | def _init_state(self):
    method _call_solver (line 219) | def _call_solver(self):
    method _call_solver_l2l (line 254) | def _call_solver_l2l(self):
    method _try_to_offload (line 263) | def _try_to_offload(self, offload_region: Region):
    method _eval_one_choice (line 275) | def _eval_one_choice(self, offload_region: Region):
  class AsynGreedySolver (line 299) | class AsynGreedySolver(Solver):
    method __init__ (line 300) | def __init__(self, region_list: List[Region], memory_budget: float = -...
    method _init_state (line 310) | def _init_state(self):
    method _call_solver (line 320) | def _call_solver(self):
    method _try_to_offload (line 383) | def _try_to_offload(self, host_region: Region, offload_region: Region):
    method _try_convert_to_syn_upload (line 408) | def _try_convert_to_syn_upload(self, host_region: Region, offload_regi...
    method _repair_strategy (line 429) | def _repair_strategy(self):
    method _eval_one_choice (line 472) | def _eval_one_choice(self):
  class SolverFactory (line 490) | class SolverFactory:
    method create (line 494) | def create(solver_name: str) -> Type[Solver]:
    method get_solver_names (line 500) | def get_solver_names():

FILE: colossalai/auto_parallel/offload/training_simulator.py
  class ExecutionPeriod (line 13) | class ExecutionPeriod:
  class TrainingSimulator (line 18) | class TrainingSimulator(ABC):
    method __init__ (line 29) | def __init__(self, region_list: List[Region], comp_power: float, link_...
    method execute (line 47) | def execute(self):
    method _eval_fwd_mem_per_region (line 51) | def _eval_fwd_mem_per_region(self, region: Region):
    method _eval_bwd_mem_per_region (line 55) | def _eval_bwd_mem_per_region(self, region: Region):
    method _get_bandwidth (line 58) | def _get_bandwidth(self, link: str, comm_volumn: float) -> float:
    method _get_communication_overhead (line 79) | def _get_communication_overhead(self, link: str, comm_volumn: float) -...
    method _get_computing_overhead (line 82) | def _get_computing_overhead(self, flop: float) -> float:
  class SynTrainingSimulator (line 86) | class SynTrainingSimulator(TrainingSimulator):
    method __init__ (line 87) | def __init__(self, region_list: List[Region], comp_power: float, link_...
    method execute (line 90) | def execute(self):
    method _eval_fwd_mem_per_region (line 101) | def _eval_fwd_mem_per_region(self, region: Region):
    method _eval_bwd_mem_per_region (line 119) | def _eval_bwd_mem_per_region(self, region: Region):
  class AsynTrainingSimulator (line 170) | class AsynTrainingSimulator(TrainingSimulator):
    method __init__ (line 171) | def __init__(self, region_list: List[Region], comp_power: float, link_...
    method execute (line 205) | def execute(self):
    method _insert_h2d_exec (line 234) | def _insert_h2d_exec(self, region: Region, is_fwd: bool = True):
    method _insert_comp_exec (line 248) | def _insert_comp_exec(self, region: Region, is_fwd: bool = True):
    method _insert_d2h_exec (line 269) | def _insert_d2h_exec(self, region: Region):
    method _eval_fwd_cost_per_region (line 280) | def _eval_fwd_cost_per_region(self, region: Region):
    method _eval_fwd_mem_per_region (line 297) | def _eval_fwd_mem_per_region(self, region: Region):
    method _eval_bwd_cost_per_region (line 330) | def _eval_bwd_cost_per_region(self, region: Region):
    method _eval_bwd_mem_per_region (line 361) | def _eval_bwd_mem_per_region(self, region: Region):

FILE: colossalai/auto_parallel/offload/util.py
  class NodeInfo (line 13) | class NodeInfo:
  class NvDevicePower (line 19) | class NvDevicePower:
  class GlobalRuntimeInfo (line 37) | class GlobalRuntimeInfo(metaclass=SingletonMeta):
    method __init__ (line 38) | def __init__(self):
  function compute_act_peak_mem (line 46) | def compute_act_peak_mem(region_list: List[Region]) -> float:
  function compute_max_param_mem (line 76) | def compute_max_param_mem(region_list: List[Region]) -> float:
  function compute_total_param_mem (line 80) | def compute_total_param_mem(region_list: List[Region]) -> float:
  function requires_upload_p_in_fwd (line 84) | def requires_upload_p_in_fwd(shared_reg: Region):
  function requires_release_p_in_bwd (line 90) | def requires_release_p_in_bwd(shared_reg: Region):
  function requires_offload_g_in_bwd (line 96) | def requires_offload_g_in_bwd(region: Region):

FILE: colossalai/auto_parallel/passes/comm_metainfo_pass.py
  function _construct_shard_meta_info (line 17) | def _construct_shard_meta_info(
  function _runtime_apply_meta_info (line 61) | def _runtime_apply_meta_info(node: Node, origin_spec_dict, sharding_spec...
  function _runtime_comm_spec_apply_meta_info (line 77) | def _runtime_comm_spec_apply_meta_info(node: Node, comm_actions_dict: Di...
  function comm_metainfo_pass (line 111) | def comm_metainfo_pass(

FILE: colossalai/auto_parallel/passes/meta_info_prop.py
  function _normalize_tuple (line 16) | def _normalize_tuple(x):
  class MetaInfoProp (line 23) | class MetaInfoProp:
    method __init__ (line 24) | def __init__(self, module: GraphModule) -> None:
    method _set_data_ptr (line 35) | def _set_data_ptr(self, x):
    method _is_inplace (line 44) | def _is_inplace(self, node: Node):
    method run (line 54) | def run(self) -> GraphModule:
    method placeholder_handler (line 63) | def placeholder_handler(self, node: Node) -> None:
    method get_attr_handler (line 73) | def get_attr_handler(self, node: Node) -> None:
    method output_handler (line 81) | def output_handler(self, node: Node) -> None:
    method node_handler (line 94) | def node_handler(self, node: Node) -> None:

FILE: colossalai/auto_parallel/passes/runtime_apply_pass.py
  function runtime_apply (line 15) | def runtime_apply(node: Node, origin_dict: Dict, input_dict: Dict, node_...
  function runtime_apply_for_iterable_object (line 25) | def runtime_apply_for_iterable_object(
  function runtime_comm_spec_apply (line 45) | def runtime_comm_spec_apply(tensor: torch.Tensor, comm_actions_dict: Dic...
  function _preprocess_graph (line 59) | def _preprocess_graph(nodes: List[Node]):
  function _shape_consistency_apply (line 85) | def _shape_consistency_apply(gm: torch.fx.GraphModule):
  function _comm_spec_apply (line 151) | def _comm_spec_apply(gm: torch.fx.GraphModule):
  function _act_annotation_pass (line 225) | def _act_annotation_pass(gm: torch.fx.GraphModule):
  function runtime_apply_pass (line 252) | def runtime_apply_pass(gm: torch.fx.GraphModule):

FILE: colossalai/auto_parallel/passes/runtime_preparation_pass.py
  function size_processing (line 21) | def size_processing(
  function solution_annotation_pass (line 52) | def solution_annotation_pass(
  function size_value_converting_pass (line 131) | def size_value_converting_pass(gm: torch.fx.GraphModule, device_mesh: De...
  function node_args_converting_pass (line 280) | def node_args_converting_pass(gm: torch.fx.GraphModule, device_mesh: Dev...
  function module_params_sharding_pass (line 384) | def module_params_sharding_pass(gm: torch.fx.GraphModule, device_mesh: D...
  function implicit_comm_action_apply (line 496) | def implicit_comm_action_apply(gm: torch.fx.GraphModule):
  function runtime_preparation_pass (line 502) | def runtime_preparation_pass(

FILE: colossalai/auto_parallel/tensor_shard/initialize.py
  class ModuleWrapper (line 22) | class ModuleWrapper(nn.Module):
    method __init__ (line 28) | def __init__(
    method forward (line 48) | def forward(self, *args, **kwargs):
  function extract_meta_args_from_dataloader (line 58) | def extract_meta_args_from_dataloader(data_loader: torch.utils.data.Data...
  function extract_alpha_beta_for_device_mesh (line 65) | def extract_alpha_beta_for_device_mesh(alpha_beta_dict: Dict[Tuple[int],...
  function build_strategy_constructor (line 73) | def build_strategy_constructor(
  function solve_solution (line 117) | def solve_solution(gm: ColoGraphModule, strategy_constructor: Strategies...
  function transform_to_sharded_model (line 135) | def transform_to_sharded_model(
  function initialize_device_mesh (line 160) | def initialize_device_mesh(
  function initialize_model (line 221) | def initialize_model(
  function autoparallelize (line 300) | def autoparallelize(

FILE: colossalai/auto_parallel/tensor_shard/node_handler/addmm_handler.py
  class ADDMMFunctionHandler (line 16) | class ADDMMFunctionHandler(NodeHandler):
    method _infer_op_data_type (line 23) | def _infer_op_data_type(self, tensor: torch.Tensor) -> OperationDataType:
    method get_operation_data_mapping (line 30) | def get_operation_data_mapping(self) -> Dict[str, OperationData]:
    method get_strategy_generator (line 64) | def get_strategy_generator(self) -> List[StrategyGenerator]:
    method post_process (line 72) | def post_process(self, strategy: ShardingStrategy) -> Union[ShardingSt...

FILE: colossalai/auto_parallel/tensor_shard/node_handler/batch_norm_handler.py
  class BatchNormModuleHandler (line 16) | class BatchNormModuleHandler(MetaInfoModuleHandler):
    method get_strategy_generator (line 21) | def get_strategy_generator(self) -> List[StrategyGenerator]:
    method get_operation_data_mapping (line 27) | def get_operation_data_mapping(self) -> Dict[str, OperationData]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/binary_elementwise_handler.py
  class BinaryElementwiseHandler (line 18) | class BinaryElementwiseHandler(MetaInfoNodeHandler):
    method get_operation_data_mapping (line 24) | def get_operation_data_mapping(self) -> Dict[str, OperationData]:
    method get_strategy_generator (line 83) | def get_strategy_generator(self) -> List[StrategyGenerator]:
    method post_process (line 89) | def post_process(self, strategy: ShardingStrategy) -> Union[ShardingSt...

FILE: colossalai/auto_parallel/tensor_shard/node_handler/bmm_handler.py
  function _get_data_mapping_for_bmm_op (line 14) | def _get_data_mapping_for_bmm_op(node, input_idx, other_idx, bias_idx=No...
  class BMMFunctionHandler (line 48) | class BMMFunctionHandler(NodeHandler):
    method get_operation_data_mapping (line 55) | def get_operation_data_mapping(self) -> Dict[str, OperationData]:
    method get_strategy_generator (line 59) | def get_strategy_generator(self) -> List[StrategyGenerator]:
  class AddBMMFunctionHandler (line 68) | class AddBMMFunctionHandler(NodeHandler):
    method get_operation_data_mapping (line 77) | def get_operation_data_mapping(self) -> Dict[str, OperationData]:
    method get_strategy_generator (line 81) | def get_strategy_generator(self) -> List[StrategyGenerator]:
    method post_process (line 90) | def post_process(self, strategy: ShardingStrategy) -> Union[ShardingSt...

FILE: colossalai/auto_parallel/tensor_shard/node_handler/conv_handler.py
  class ConvModuleHandler (line 18) | class ConvModuleHandler(MetaInfoModuleHandler):
    method get_strategy_generator (line 23) | def get_strategy_generator(self) -> List[StrategyGenerator]:
    method get_operation_data_mapping (line 29) | def get_operation_data_mapping(self) -> Dict[str, OperationData]:
    method post_process (line 57) | def post_process(self, strategy: ShardingStrategy):
  class ConvFunctionHandler (line 70) | class ConvFunctionHandler(MetaInfoNodeHandler):
    method get_strategy_generator (line 75) | def get_strategy_generator(self) -> List[StrategyGenerator]:
    method get_operation_data_mapping (line 81) | def get_operation_data_mapping(self) -> Dict[str, OperationData]:
    method post_process (line 121) | def post_process(self, strategy: ShardingStrategy):

FILE: colossalai/auto_parallel/tensor_shard/node_handler/default_reshape_handler.py
  class DefaultReshapeHandler (line 16) | class DefaultReshapeHandler(MetaInfoNodeHandler):
    method get_strategy_generator (line 21) | def get_strategy_generator(self) -> List[StrategyGenerator]:
    method infer_logical_shape (line 27) | def infer_logical_shape(self, data):
    method get_operation_data_mapping (line 45) | def get_operation_data_mapping(self) -> Dict[str, OperationData]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/embedding_handler.py
  function _convert_logical_sharding_to_physical_sharding_spec_for_embedding (line 18) | def _convert_logical_sharding_to_physical_sharding_spec_for_embedding(
  class EmbeddingModuleHandler (line 116) | class EmbeddingModuleHandler(ModuleHandler):
    method get_strategy_generator (line 121) | def get_strategy_generator(self) -> List[StrategyGenerator]:
    method get_operation_data_mapping (line 127) | def get_operation_data_mapping(self) -> Dict[str, OperationData]:
    method post_process (line 163) | def post_process(self, strategy: ShardingStrategy) -> Union[ShardingSt...
  class EmbeddingFunctionHandler (line 177) | class EmbeddingFunctionHandler(NodeHandler):
    method get_strategy_generator (line 182) | def get_strategy_generator(self) -> List[StrategyGenerator]:
    method get_operation_data_mapping (line 188) | def get_operation_data_mapping(self) -> Dict[str, OperationData]:
    method post_process (line 230) | def post_process(self, strategy: ShardingStrategy):

FILE: colossalai/auto_parallel/tensor_shard/node_handler/getattr_handler.py
  class GetattrHandler (line 10) | class GetattrHandler(NodeHandler):
    method get_strategy_generator (line 15) | def get_strategy_generator(self) -> List[StrategyGenerator]:
    method get_operation_data_mapping (line 21) | def get_operation_data_mapping(self) -> Dict[str, OperationData]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/getitem_handler.py
  class GetItemHandler (line 15) | class GetItemHandler(NodeHandler):
    method get_strategy_generator (line 20) | def get_strategy_generator(self) -> List[StrategyGenerator]:
    method get_operation_data_mapping (line 30) | def get_operation_data_mapping(self) -> Dict[str, OperationData]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/layer_norm_handler.py
  class LayerNormModuleHandler (line 14) | class LayerNormModuleHandler(MetaInfoModuleHandler):
    method get_strategy_generator (line 19) | def get_strategy_generator(self) -> List[StrategyGenerator]:
    method get_operation_data_mapping (line 25) | def get_operation_data_mapping(self) -> Dict[str, OperationData]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/linear_handler.py
  function _update_sharding_spec_for_transposed_weight_for_linear (line 18) | def _update_sharding_spec_for_transposed_weight_for_linear(
  function _convert_logical_sharding_to_physical_sharding_spec_for_linear (line 40) | def _convert_logical_sharding_to_physical_sharding_spec_for_linear(
  class LinearModuleHandler (line 152) | class LinearModuleHandler(MetaInfoModuleHandler):
    method get_strategy_generator (line 157) | def get_strategy_generator(self) -> List[StrategyGenerator]:
    method get_operation_data_mapping (line 170) | def get_operation_data_mapping(self) -> Dict[str, OperationData]:
    method post_process (line 205) | def post_process(self, strategy: ShardingStrategy) -> Union[ShardingSt...
  class LinearFunctionHandler (line 224) | class LinearFunctionHandler(MetaInfoNodeHandler):
    method get_strategy_generator (line 229) | def get_strategy_generator(self) -> List[StrategyGenerator]:
    method get_operation_data_mapping (line 237) | def get_operation_data_mapping(self) -> Dict[str, OperationData]:
    method post_process (line 285) | def post_process(self, strategy: ShardingStrategy):

FILE: colossalai/auto_parallel/tensor_shard/node_handler/matmul_handler.py
  class MatMulType (line 30) | class MatMulType(Enum):
  function get_matmul_type (line 47) | def get_matmul_type(input_dim: int, other_dim: int):
  class BmmTransform (line 70) | class BmmTransform(ABC):
    method apply (line 77) | def apply(self, shape_mapping: Dict[str, List[int]]):
    method recover (line 81) | def recover(self, op_data_mapping: Dict[str, OperationData], strategy:...
  class Padder (line 85) | class Padder(BmmTransform):
    method __init__ (line 90) | def __init__(self) -> None:
    method apply (line 94) | def apply(self, shape_mapping: Dict[str, List[int]]):
    method recover (line 113) | def recover(self, op_data_mapping: Dict[str, OperationData], strategy:...
  class Broadcaster (line 159) | class Broadcaster(BmmTransform):
    method __init__ (line 164) | def __init__(self) -> None:
    method apply (line 167) | def apply(self, shape_mapping: Dict[str, List[int]]):
    method recover (line 196) | def recover(self, op_data_mapping: Dict[str, OperationData], strategy:...
  class Viewer (line 236) | class Viewer(BmmTransform):
    method __init__ (line 241) | def __init__(self) -> None:
    method apply (line 244) | def apply(self, shape_mapping: Dict[str, List[int]]):
    method recover (line 262) | def recover(self, op_data_mapping: Dict[str, OperationData], strategy:...
  function _get_bmm_logical_shape (line 305) | def _get_bmm_logical_shape(input_shape, other_shape, transforms):
  class MatMulHandler (line 331) | class MatMulHandler(MetaInfoNodeHandler):
    method __init__ (line 338) | def __init__(self, *args, **kwargs) -> None:
    method get_strategy_generator (line 358) | def get_strategy_generator(self) -> List[StrategyGenerator]:
    method get_operation_data_mapping (line 373) | def get_operation_data_mapping(self) -> Dict[str, OperationData]:
    method _get_op_data_mapping (line 384) | def _get_op_data_mapping(self, input_logical_shape, other_logical_shap...
    method _get_logical_shape_for_dot (line 418) | def _get_logical_shape_for_dot(self):
    method _get_logical_shape_for_mm (line 424) | def _get_logical_shape_for_mm(self):
    method _get_logical_shape_for_mv (line 437) | def _get_logical_shape_for_mv(self):
    method _get_logical_shape_for_bmm (line 443) | def _get_logical_shape_for_bmm(self):
    method post_process (line 448) | def post_process(self, strategy: ShardingStrategy) -> Union[ShardingSt...

FILE: colossalai/auto_parallel/tensor_shard/node_handler/node_handler.py
  class NodeHandler (line 24) | class NodeHandler(ABC):
    method __init__ (line 34) | def __init__(
    method update_resharding_cost (line 50) | def update_resharding_cost(self, strategy: ShardingStrategy) -> None:
    method get_target_function (line 143) | def get_target_function(self) -> callable:
    method register_strategy (line 162) | def register_strategy(self, compute_resharding_cost: bool = True) -> S...
    method post_process (line 221) | def post_process(self, strategy: ShardingStrategy) -> Union[ShardingSt...
    method get_strategy_generator (line 227) | def get_strategy_generator(self) -> List[StrategyGenerator]:
    method get_operation_data_mapping (line 233) | def get_operation_data_mapping(self) -> Dict[str, OperationData]:
  class MetaInfoNodeHandler (line 255) | class MetaInfoNodeHandler(NodeHandler):
    method register_strategy (line 263) | def register_strategy(self, compute_resharding_cost: bool = True) -> S...
  class ModuleHandler (line 291) | class ModuleHandler(NodeHandler):
    method __init__ (line 292) | def __init__(self, *args, **kwargs) -> None:
  class MetaInfoModuleHandler (line 310) | class MetaInfoModuleHandler(ModuleHandler):
    method register_strategy (line 318) | def register_strategy(self, compute_resharding_cost: bool = True) -> S...

FILE: colossalai/auto_parallel/tensor_shard/node_handler/normal_pooling_handler.py
  class NormPoolingHandler (line 19) | class NormPoolingHandler(MetaInfoModuleHandler):
    method get_strategy_generator (line 24) | def get_strategy_generator(self) -> List[StrategyGenerator]:
    method get_operation_data_mapping (line 30) | def get_operation_data_mapping(self) -> Dict[str, OperationData]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/output_handler.py
  class OutputHandler (line 14) | class OutputHandler(NodeHandler):
    method __init__ (line 19) | def __init__(
    method get_strategy_generator (line 25) | def get_strategy_generator(self) -> List[StrategyGenerator]:
    method get_operation_data_mapping (line 31) | def get_operation_data_mapping(self) -> Dict[str, OperationData]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/permute_handler.py
  class PermuteHandler (line 15) | class PermuteHandler(NodeHandler):
    method get_strategy_generator (line 20) | def get_strategy_generator(self) -> List[StrategyGenerator]:
    method get_operation_data_mapping (line 26) | def get_operation_data_mapping(self) -> Dict[str, OperationData]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/placeholder_handler.py
  class PlaceholderHandler (line 14) | class PlaceholderHandler(NodeHandler):
    method __init__ (line 19) | def __init__(
    method get_strategy_generator (line 25) | def get_strategy_generator(self) -> List[StrategyGenerator]:
    method get_operation_data_mapping (line 33) | def get_operation_data_mapping(self) -> Dict[str, OperationData]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/registry.py
  class Registry (line 1) | class Registry:
    method __init__ (line 2) | def __init__(self, name):
    method register (line 6) | def register(self, source):
    method get (line 18) | def get(self, source):
    method has (line 23) | def has(self, source):

FILE: colossalai/auto_parallel/tensor_shard/node_handler/softmax_handler.py
  class SoftmaxHandler (line 15) | class SoftmaxHandler(NodeHandler):
    method get_strategy_generator (line 21) | def get_strategy_generator(self) -> List[StrategyGenerator]:
    method get_operation_data_mapping (line 27) | def get_operation_data_mapping(self) -> Dict[str, OperationData]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/split_handler.py
  class SplitHandler (line 15) | class SplitHandler(NodeHandler):
    method get_strategy_generator (line 20) | def get_strategy_generator(self) -> List[StrategyGenerator]:
    method get_operation_data_mapping (line 26) | def get_operation_data_mapping(self) -> Dict[str, OperationData]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/strategy/batch_norm_generator.py
  class BatchNormStrategyGenerator (line 20) | class BatchNormStrategyGenerator(StrategyGenerator):
    method validate (line 32) | def validate(self) -> bool:
    method update_compute_cost (line 46) | def update_compute_cost(self, strategy: ShardingStrategy):
    method update_memory_cost (line 73) | def update_memory_cost(self, strategy: ShardingStrategy):
    method split_input_channel (line 115) | def split_input_channel(self, mesh_dim_0):
    method split_input_channel_1d (line 139) | def split_input_channel_1d(self, mesh_dim_0, mesh_dim_1):
    method non_split (line 163) | def non_split(self):
    method split_input_batch (line 187) | def split_input_batch(self, mesh_dim_0):
    method split_input_batch_1d (line 224) | def split_input_batch_1d(self, mesh_dim_0, mesh_dim_1):
    method split_input_both_dim (line 261) | def split_input_both_dim(self, mesh_dim_0, mesh_dim_1):
    method collate_strategies (line 311) | def collate_strategies(self) -> List[ShardingStrategy]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/strategy/binary_elementwise_generator.py
  class BinaryElementwiseStrategyGenerator (line 20) | class BinaryElementwiseStrategyGenerator(StrategyGenerator):
    method validate (line 28) | def validate(self) -> bool:
    method update_compute_cost (line 36) | def update_compute_cost(self, strategy: ShardingStrategy) -> ShardingS...
    method update_memory_cost (line 49) | def update_memory_cost(self, strategy: ShardingStrategy) -> ShardingSt...
    method enumerate_all_possible_output (line 67) | def enumerate_all_possible_output(self, mesh_dim_0, mesh_dim_1):
    method collate_strategies (line 111) | def collate_strategies(self) -> List[ShardingStrategy]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/strategy/conv_strategy_generator.py
  class ConvStrategyGenerator (line 18) | class ConvStrategyGenerator(StrategyGenerator):
    method validate (line 24) | def validate(self) -> bool:
    method update_compute_cost (line 38) | def update_compute_cost(self, strategy: ShardingStrategy):
    method update_memory_cost (line 78) | def update_memory_cost(self, strategy: ShardingStrategy):
    method split_input_batch_weight_out_channel (line 111) | def split_input_batch_weight_out_channel(self, mesh_dim_0, mesh_dim_1):
    method split_input_batch (line 178) | def split_input_batch(self, mesh_dim_0):
    method split_input_both_dim_weight_in_channel (line 238) | def split_input_both_dim_weight_in_channel(self, mesh_dim_0, mesh_dim_1):
    method split_input_in_channel_weight_both_channel (line 308) | def split_input_in_channel_weight_both_channel(self, mesh_dim_0, mesh_...
    method split_input_in_channel_weight_in_channel (line 355) | def split_input_in_channel_weight_in_channel(self, mesh_dim_0):
    method split_weight_out_channel (line 390) | def split_weight_out_channel(self, mesh_dim_0):
    method non_split (line 428) | def non_split(self):
    method split_1d_parallel_on_input_batch (line 447) | def split_1d_parallel_on_input_batch(self, mesh_dim_0, mesh_dim_1):
    method split_1d_parallel_on_in_channel (line 509) | def split_1d_parallel_on_in_channel(self, mesh_dim_0, mesh_dim_1):
    method split_1d_parallel_on_out_channel (line 543) | def split_1d_parallel_on_out_channel(self, mesh_dim_0, mesh_dim_1):
    method collate_strategies (line 579) | def collate_strategies(self) -> List[ShardingStrategy]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/strategy/embedding_generator.py
  class EmbeddingStrategyGenerator (line 18) | class EmbeddingStrategyGenerator(StrategyGenerator):
    method validate (line 24) | def validate(self) -> bool:
    method update_compute_cost (line 27) | def update_compute_cost(self, strategy: ShardingStrategy):
    method update_memory_cost (line 54) | def update_memory_cost(self, strategy: ShardingStrategy):
    method non_split (line 83) | def non_split(self):
    method split_input (line 99) | def split_input(self, mesh_dim_0):
    method split_input_and_embedding_dim (line 139) | def split_input_and_embedding_dim(self, mesh_dim_0, mesh_dim_1):
    method split_1d_parallel_on_input (line 193) | def split_1d_parallel_on_input(self, mesh_dim_0, mesh_dim_1):
    method split_embedding_dim (line 235) | def split_embedding_dim(self, mesh_dim_0):
    method split_1d_parallel_on_embedding_dim (line 268) | def split_1d_parallel_on_embedding_dim(self, mesh_dim_0, mesh_dim_1):
    method collate_strategies (line 300) | def collate_strategies(self) -> List[ShardingStrategy]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/strategy/getattr_generator.py
  class GetattrGenerator (line 16) | class GetattrGenerator(StrategyGenerator):
    method validate (line 21) | def validate(self) -> bool:
    method update_compute_cost (line 24) | def update_compute_cost(self, strategy: ShardingStrategy):
    method update_memory_cost (line 28) | def update_memory_cost(self, strategy: ShardingStrategy):
    method enumerate_all_possible_output (line 47) | def enumerate_all_possible_output(self, mesh_dim_0, mesh_dim_1):
    method collate_strategies (line 89) | def collate_strategies(self) -> List[ShardingStrategy]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/strategy/getitem_generator.py
  class GetItemStrategyGenerator (line 13) | class GetItemStrategyGenerator(FollowingStrategyGenerator):
    method validate (line 24) | def validate(self) -> bool:
    method update_compute_cost (line 27) | def update_compute_cost(self, strategy: ShardingStrategy):
    method update_memory_cost (line 31) | def update_memory_cost(self, strategy: ShardingStrategy):
  class TensorStrategyGenerator (line 62) | class TensorStrategyGenerator(GetItemStrategyGenerator):
    method collate_strategies (line 67) | def collate_strategies(self) -> List[ShardingStrategy]:
  class TensorTupleStrategyGenerator (line 137) | class TensorTupleStrategyGenerator(GetItemStrategyGenerator):
    method collate_strategies (line 142) | def collate_strategies(self) -> List[ShardingStrategy]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/strategy/layer_norm_generator.py
  class LayerNormGenerator (line 24) | class LayerNormGenerator(StrategyGenerator):
    method validate (line 30) | def validate(self) -> bool:
    method update_compute_cost (line 33) | def update_compute_cost(self, strategy: ShardingStrategy):
    method update_memory_cost (line 64) | def update_memory_cost(self, strategy: ShardingStrategy):
    method _generate_strategy_with_dim_partition (line 100) | def _generate_strategy_with_dim_partition(self, dim_partition):
    method split_input_batch_single_mesh_dim (line 145) | def split_input_batch_single_mesh_dim(self, mesh_dim_0, batch_dimensio...
    method split_input_batch_both_mesh_dim (line 153) | def split_input_batch_both_mesh_dim(self, mesh_dim_0, mesh_dim_1, batc...
    method non_split (line 162) | def non_split(self):
    method collate_strategies (line 182) | def collate_strategies(self) -> List[ShardingStrategy]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/strategy/matmul_strategy_generator.py
  class MatMulStrategyGenerator (line 18) | class MatMulStrategyGenerator(StrategyGenerator):
    method update_memory_cost (line 24) | def update_memory_cost(self, strategy: ShardingStrategy) -> ShardingSt...
  class DotProductStrategyGenerator (line 54) | class DotProductStrategyGenerator(MatMulStrategyGenerator):
    method validate (line 55) | def validate(self) -> bool:
    method update_compute_cost (line 60) | def update_compute_cost(self, strategy: ShardingStrategy) -> ShardingS...
    method no_split (line 70) | def no_split(self):
    method split_one_dim (line 82) | def split_one_dim(self, mesh_dim):
    method collate_strategies (line 103) | def collate_strategies(self) -> List[ShardingStrategy]:
  class MatVecStrategyGenerator (line 118) | class MatVecStrategyGenerator(MatMulStrategyGenerator):
    method validate (line 119) | def validate(self) -> bool:
    method update_compute_cost (line 124) | def update_compute_cost(self, strategy: ShardingStrategy) -> ShardingS...
    method no_split (line 134) | def no_split(self):
    method split_input_batch (line 146) | def split_input_batch(self, mesh_dim):
    method collate_strategies (line 203) | def collate_strategies(self) -> List[ShardingStrategy]:
  class LinearProjectionStrategyGenerator (line 216) | class LinearProjectionStrategyGenerator(MatMulStrategyGenerator):
    method __init__ (line 217) | def __init__(
    method update_compute_cost (line 228) | def update_compute_cost(self, strategy: ShardingStrategy) -> ShardingS...
    method dp_strategies (line 246) | def dp_strategies(self) -> List[ShardingStrategy]:
    method tp_strategies (line 254) | def tp_strategies(self) -> List[ShardingStrategy]:
    method mix_strategies (line 277) | def mix_strategies(self) -> List[ShardingStrategy]:
    method collate_strategies (line 293) | def collate_strategies(self) -> List[ShardingStrategy]:
    method split_lhs_space_rhs_space (line 308) | def split_lhs_space_rhs_space(self, mesh_dim_0, mesh_dim_1):
    method split_lhs_space_both_contract (line 384) | def split_lhs_space_both_contract(self, mesh_dim_0, mesh_dim_1):
    method split_rhs_space_both_contract (line 463) | def split_rhs_space_both_contract(self, mesh_dim_0, mesh_dim_1):
    method recompute_split_both_contract (line 503) | def recompute_split_both_contract(self, mesh_dim):
    method split_rhs_space_only (line 534) | def split_rhs_space_only(self, mesh_dim):
    method split_lhs_1st_dim_1d (line 566) | def split_lhs_1st_dim_1d(self, mesh_dim_0, mesh_dim_1):
    method split_lhs_2nd_dim_1d (line 632) | def split_lhs_2nd_dim_1d(self, mesh_dim_0, mesh_dim_1):
    method split_rhs_2nd_dim_1d (line 664) | def split_rhs_2nd_dim_1d(self, mesh_dim_0, mesh_dim_1):
    method non_split (line 697) | def non_split(self):
    method validate (line 721) | def validate(self) -> bool:
  class BatchedMatMulStrategyGenerator (line 736) | class BatchedMatMulStrategyGenerator(MatMulStrategyGenerator):
    method __init__ (line 751) | def __init__(self, *args, **kwargs):
    method _pop_batch_dim_sharding_for_output (line 755) | def _pop_batch_dim_sharding_for_output(self, dim_partition_dict):
    method validate (line 767) | def validate(self) -> bool:
    method update_compute_cost (line 776) | def update_compute_cost(self, strategy: ShardingStrategy) -> ShardingS...
    method split_one_batch_dim (line 787) | def split_one_batch_dim(self, mesh_dim):
    method split_two_batch_dim (line 814) | def split_two_batch_dim(self, mesh_dim_0, mesh_dim_1):
    method split_batch_dim_lhs_space (line 845) | def split_batch_dim_lhs_space(self, mesh_dim_0, mesh_dim_1):
    method split_batch_dim_rhs_space (line 887) | def split_batch_dim_rhs_space(self, mesh_dim_0, mesh_dim_1):
    method split_batch_dim_both_contract (line 928) | def split_batch_dim_both_contract(self, mesh_dim_0, mesh_dim_1):
    method collate_strategies (line 968) | def collate_strategies(self) -> List[ShardingStrategy]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/strategy/normal_pooling_generator.py
  class NormalPoolStrategyGenerator (line 16) | class NormalPoolStrategyGenerator(StrategyGenerator):
    method validate (line 23) | def validate(self) -> bool:
    method update_compute_cost (line 37) | def update_compute_cost(self, strategy: ShardingStrategy) -> TrainCycl...
    method update_memory_cost (line 65) | def update_memory_cost(self, strategy: ShardingStrategy) -> ShardingSt...
    method _generate_strategy_with_dim_partition (line 89) | def _generate_strategy_with_dim_partition(self, dim_partition):
    method enumerate_all_possible_batch_dimensions_dim_partition (line 107) | def enumerate_all_possible_batch_dimensions_dim_partition(self, mesh_d...
    method collate_strategies (line 117) | def collate_strategies(self) -> List[ShardingStrategy]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/strategy/output_generator.py
  class OutputGenerator (line 18) | class OutputGenerator(OutputStrategyGenerator):
    method __init__ (line 23) | def __init__(
    method validate (line 33) | def validate(self) -> bool:
    method update_compute_cost (line 36) | def update_compute_cost(self, strategy: ShardingStrategy):
    method update_memory_cost (line 40) | def update_memory_cost(self, strategy: ShardingStrategy):
    method replica_strategy (line 53) | def replica_strategy(self) -> List[ShardingStrategy]:
    method distributed_strategy (line 87) | def distributed_strategy(self, mesh_list: List[List[int]] = None) -> L...
    method collate_strategies (line 118) | def collate_strategies(self) -> List[ShardingStrategy]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/strategy/placeholder_generator.py
  class PlaceholderGenerator (line 16) | class PlaceholderGenerator(StrategyGenerator):
    method __init__ (line 21) | def __init__(
    method validate (line 27) | def validate(self) -> bool:
    method update_compute_cost (line 30) | def update_compute_cost(self, strategy: ShardingStrategy):
    method update_memory_cost (line 34) | def update_memory_cost(self, strategy: ShardingStrategy):
    method replica_placeholder (line 52) | def replica_placeholder(self) -> ShardingStrategy:
    method distributed_placeholder (line 72) | def distributed_placeholder(self, mesh_list) -> ShardingStrategy:
    method collate_strategies (line 92) | def collate_strategies(self) -> List[ShardingStrategy]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/strategy/reshape_generator.py
  class ReshapeGenerator (line 23) | class ReshapeGenerator(FollowingStrategyGenerator):
    method validate (line 28) | def validate(self) -> bool:
    method update_compute_cost (line 31) | def update_compute_cost(self, strategy: ShardingStrategy):
    method update_memory_cost (line 35) | def update_memory_cost(self, strategy: ShardingStrategy):
    method collate_strategies (line 65) | def collate_strategies(self) -> List[ShardingStrategy]:
  class ViewGenerator (line 69) | class ViewGenerator(ReshapeGenerator):
    method collate_strategies (line 74) | def collate_strategies(self) -> List[ShardingStrategy]:
  class PermuteGenerator (line 155) | class PermuteGenerator(ReshapeGenerator):
    method collate_strategies (line 160) | def collate_strategies(self) -> List[ShardingStrategy]:
  class TransposeGenerator (line 195) | class TransposeGenerator(ReshapeGenerator):
    method collate_strategies (line 200) | def collate_strategies(self) -> List[ShardingStrategy]:
  class SplitGenerator (line 241) | class SplitGenerator(ReshapeGenerator):
    method collate_strategies (line 246) | def collate_strategies(self) -> List[ShardingStrategy]:
  class DefaultReshapeGenerator (line 314) | class DefaultReshapeGenerator(ReshapeGenerator):
    method collate_strategies (line 320) | def collate_strategies(self) -> List[ShardingStrategy]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/strategy/softmax_generator.py
  class SoftmaxGenerator (line 12) | class SoftmaxGenerator(FollowingStrategyGenerator):
    method validate (line 17) | def validate(self) -> bool:
    method update_compute_cost (line 20) | def update_compute_cost(self, strategy: ShardingStrategy):
    method update_memory_cost (line 35) | def update_memory_cost(self, strategy: ShardingStrategy):
    method collate_strategies (line 65) | def collate_strategies(self) -> List[ShardingStrategy]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/strategy/strategy_generator.py
  class StrategyGenerator (line 23) | class StrategyGenerator(ABC):
    method __init__ (line 30) | def __init__(self, operation_data_mapping: Dict[str, OperationData], d...
    method has_bias (line 38) | def has_bias(self):
    method is_param (line 44) | def is_param(self, op_data_name):
    method is_buffer (line 48) | def is_buffer(self, op_data_name):
    method get_sharding_strategy (line 52) | def get_sharding_strategy(
    method to_sharding_spec_mapping (line 69) | def to_sharding_spec_mapping(self, mapping: Dict[str, Dict[int, List[i...
    method replace_op_name_with_op_data (line 117) | def replace_op_name_with_op_data(self, mapping: Dict[str, Any]):
    method get_communication_spec (line 127) | def get_communication_spec(
    method get_communication_action (line 140) | def get_communication_action(
    method update_communication_cost (line 163) | def update_communication_cost(self, strategy: ShardingStrategy) -> Sha...
    method update_compute_cost (line 204) | def update_compute_cost(self, strategy: ShardingStrategy) -> ShardingS...
    method update_memory_cost (line 210) | def update_memory_cost(self, strategy: ShardingStrategy) -> ShardingSt...
    method _compute_size_in_bytes (line 215) | def _compute_size_in_bytes(self, strategy: ShardingStrategy, key: str):
    method generate (line 258) | def generate(self) -> List[ShardingStrategy]:
    method collate_strategies (line 281) | def collate_strategies(self) -> List[ShardingStrategy]:
    method validate (line 285) | def validate(self) -> bool:
  class FollowingStrategyGenerator (line 292) | class FollowingStrategyGenerator(StrategyGenerator):
    method __init__ (line 299) | def __init__(
  class OutputStrategyGenerator (line 307) | class OutputStrategyGenerator(StrategyGenerator):
    method __init__ (line 312) | def __init__(

FILE: colossalai/auto_parallel/tensor_shard/node_handler/strategy/sum_generator.py
  class SumGenerator (line 12) | class SumGenerator(FollowingStrategyGenerator):
    method validate (line 17) | def validate(self) -> bool:
    method update_compute_cost (line 20) | def update_compute_cost(self, strategy: ShardingStrategy):
    method update_memory_cost (line 32) | def update_memory_cost(self, strategy: ShardingStrategy):
    method collate_strategies (line 62) | def collate_strategies(self) -> List[ShardingStrategy]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/strategy/tensor_constructor_generator.py
  class TensorConstructorGenerator (line 10) | class TensorConstructorGenerator(StrategyGenerator):
    method validate (line 16) | def validate(self) -> bool:
    method update_compute_cost (line 19) | def update_compute_cost(self, strategy: ShardingStrategy):
    method update_memory_cost (line 23) | def update_memory_cost(self, strategy: ShardingStrategy):
    method collate_strategies (line 43) | def collate_strategies(self) -> List[ShardingStrategy]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/strategy/unary_elementwise_generator.py
  class UnaryElementwiseGenerator (line 11) | class UnaryElementwiseGenerator(FollowingStrategyGenerator):
    method validate (line 16) | def validate(self) -> bool:
    method update_compute_cost (line 19) | def update_compute_cost(self, strategy: ShardingStrategy):
    method update_memory_cost (line 23) | def update_memory_cost(self, strategy: ShardingStrategy):
    method collate_strategies (line 53) | def collate_strategies(self) -> List[ShardingStrategy]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/strategy/where_generator.py
  class WhereGenerator (line 16) | class WhereGenerator(StrategyGenerator):
    method validate (line 21) | def validate(self) -> bool:
    method update_compute_cost (line 24) | def update_compute_cost(self, strategy: ShardingStrategy):
    method update_memory_cost (line 28) | def update_memory_cost(self, strategy: ShardingStrategy):
    method _generate_strategy_with_dim_partition (line 57) | def _generate_strategy_with_dim_partition(self, dim_partition):
    method enumerate_all_possible_output_spec (line 78) | def enumerate_all_possible_output_spec(self, mesh_dim_0, mesh_dim_1, d...
    method collate_strategies (line 88) | def collate_strategies(self) -> List[ShardingStrategy]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/sum_handler.py
  class SumHandler (line 15) | class SumHandler(NodeHandler):
    method get_strategy_generator (line 20) | def get_strategy_generator(self) -> List[StrategyGenerator]:
    method get_operation_data_mapping (line 26) | def get_operation_data_mapping(self) -> Dict[str, OperationData]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/tensor_constructor_handler.py
  class TensorConstructorHandler (line 15) | class TensorConstructorHandler(NodeHandler):
    method get_strategy_generator (line 20) | def get_strategy_generator(self) -> List[StrategyGenerator]:
    method get_operation_data_mapping (line 26) | def get_operation_data_mapping(self) -> Dict[str, OperationData]:

FILE: colossalai/auto_parallel/tensor_shard/node_handler/transpose_handler.py
  class TransposeHandler (line 15) | class TransposeHandler(NodeHandler):
    method get_strategy_generator (line 20) | def get_strategy_generator(self) -> List[StrategyGenerator]:
    method get_operation_data_mapping (line 26) | def get_operation_data_mapping(self) -> Dict[str, OperationData]:

FIL

Copy disabled (too large) Download .json

Condensed preview — 2185 files, each showing path, character count, and a content snippet. Download the .json file for the full structured content (14,285K chars).

[
  {
    "path": ".clang-format",
    "chars": 21,
    "preview": "BasedOnStyle: Google\n"
  },
  {
    "path": ".compatibility",
    "chars": 39,
    "preview": "2.3.0-12.1.0\n2.4.0-12.4.1\n2.5.1-12.4.1\n"
  },
  {
    "path": ".coveragerc",
    "chars": 67,
    "preview": "[run]\nconcurrency = multiprocessing\nparallel = true\nsigterm = true\n"
  },
  {
    "path": ".cuda_ext.json",
    "chars": 476,
    "preview": "{\n  \"build\": [\n    {\n      \"torch_command\": \"pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url "
  },
  {
    "path": ".github/CODEOWNERS",
    "chars": 29,
    "preview": "*   @hpcaitech/colossalai-qa\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/bug-report.yml",
    "chars": 2294,
    "preview": "name: 🐛 Bug Report\ndescription: Create a report to help us reproduce and fix the bug\ntitle: \"[BUG]: \"\nlabels: [bug]\n\nbod"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/config.yml",
    "chars": 747,
    "preview": "blank_issues_enabled: true\ncontact_links:\n  - name: ❓ Simple question - Slack Chat\n    url: https://github.com/hpcaitech"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/documentation.yml",
    "chars": 1084,
    "preview": "name: 📚 Documentation\ndescription: Report an issue related to https://www.colossalai.org/\ntitle: \"[DOC]: \"\nlabels: [docu"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature_request.yml",
    "chars": 1331,
    "preview": "name: 🚀 Feature request\ndescription: Suggest an idea for this project\ntitle: \"[FEATURE]: \"\nlabels: [enhancement]\n\nbody:\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/proposal.yml",
    "chars": 1999,
    "preview": "name: 💥 Proposal\ndescription: Propose a non-trivial change to Colossal-AI\ntitle: \"[PROPOSAL]: \"\nlabels: [enhancement]\n\nb"
  },
  {
    "path": ".github/pull_request_template.md",
    "chars": 1296,
    "preview": "## 📌 Checklist before creating the PR\n\n- [ ] I have created an issue for this PR for traceability\n- [ ] The title follow"
  },
  {
    "path": ".github/workflows/README.md",
    "chars": 11085,
    "preview": "# CI/CD\n\n## Table of Contents\n\n- [CI/CD](#cicd)\n  - [Table of Contents](#table-of-contents)\n  - [Overview](#overview)\n  "
  },
  {
    "path": ".github/workflows/build_on_pr.yml",
    "chars": 7573,
    "preview": "name: Build on PR\n\non:\n  pull_request:\n    types: [synchronize, opened, reopened, ready_for_review, closed]\n    branches"
  },
  {
    "path": ".github/workflows/build_on_schedule.yml",
    "chars": 3186,
    "preview": "name: Build on Schedule\n\non:\n  schedule:\n    # run at 00:00 of every Sunday\n    - cron: \"0 0 * * 0\"\n  workflow_dispatch:"
  },
  {
    "path": ".github/workflows/close_inactive.yml",
    "chars": 1098,
    "preview": "name: Close inactive issues\n\non:\n  schedule:\n    - cron: \"0 0 * * *\"\n\njobs:\n  close-issues:\n    if: github.event.pull_re"
  },
  {
    "path": ".github/workflows/compatiblity_test_on_dispatch.yml",
    "chars": 2877,
    "preview": "name: Compatibility Test on Dispatch\n\non:\n  workflow_dispatch:\n    inputs:\n      torch_version:\n        type: string\n   "
  },
  {
    "path": ".github/workflows/compatiblity_test_on_pr.yml",
    "chars": 2825,
    "preview": "name: Compatibility Test on PR\n\non:\n  pull_request:\n    paths:\n      - \"version.txt\"\n      - \".compatibility\"\n\njobs:\n  m"
  },
  {
    "path": ".github/workflows/compatiblity_test_on_schedule.yml",
    "chars": 3186,
    "preview": "name: Compatibility Test on Schedule\n\non:\n  # run at 03:00 of every Sunday(singapore time) so here is UTC time Saturday "
  },
  {
    "path": ".github/workflows/cuda_ext_check_before_merge.yml",
    "chars": 1491,
    "preview": "name: Check CUDA Extension Build Before Merge\n\non:\n  workflow_dispatch:\n  pull_request:\n    paths:\n      - 'version.txt'"
  },
  {
    "path": ".github/workflows/doc_build_on_schedule_after_release.yml",
    "chars": 866,
    "preview": "name: Build Documentation On Schedule & After Release\n\non:\n  workflow_dispatch:\n  schedule:\n    - cron: \"0 12 * * *\" # b"
  },
  {
    "path": ".github/workflows/doc_check_on_pr.yml",
    "chars": 2482,
    "preview": "name: Check Documentation on PR\n\non:\n  pull_request:\n    branches:\n      - \"main\"\n      - \"develop\"\n      - \"feature/**\""
  },
  {
    "path": ".github/workflows/doc_test_on_pr.yml",
    "chars": 3625,
    "preview": "name: Test Documentation on PR\non:\n  pull_request:\n    branches:\n      - \"main\"\n      - \"develop\"\n      - \"feature/**\"\n "
  },
  {
    "path": ".github/workflows/doc_test_on_schedule.yml",
    "chars": 1455,
    "preview": "name: Test Documentation on Schedule\non:\n  # run at 07:00 of every Sunday(singapore time) so here is UTC time Saturday 2"
  },
  {
    "path": ".github/workflows/draft_github_release_post_after_merge.yml",
    "chars": 1419,
    "preview": "name: Draft GitHub Release Post\n\non:\n  workflow_dispatch:\n  pull_request:\n    paths:\n      - 'version.txt'\n    types:\n  "
  },
  {
    "path": ".github/workflows/example_check_on_dispatch.yml",
    "chars": 2132,
    "preview": "name: Test Example on Dispatch\non:\n  workflow_dispatch:\n    inputs:\n      example_directory:\n        type: string\n      "
  },
  {
    "path": ".github/workflows/example_check_on_pr.yml",
    "chars": 4634,
    "preview": "name: Test Example on PR\non:\n  pull_request:\n    branches:\n      - \"main\"\n      - \"develop\"\n      - \"feature/**\"\n    # a"
  },
  {
    "path": ".github/workflows/example_check_on_schedule.yml",
    "chars": 2348,
    "preview": "name: Test Example on Schedule\non:\n  # run at 00:00 of every Sunday(singapore time) so here is UTC time Saturday 16:00\n "
  },
  {
    "path": ".github/workflows/release_docker_after_publish.yml",
    "chars": 2463,
    "preview": "name: Publish Docker Image to DockerHub after Publish\n\non:\n  workflow_dispatch:\n  release:\n    types: [published]\n\njobs:"
  },
  {
    "path": ".github/workflows/release_nightly_on_schedule.yml",
    "chars": 2051,
    "preview": "name: Publish Nightly Version to PyPI\n\non:\n  workflow_dispatch:\n  schedule:\n    - cron:  '0 0 * * 6' # release on every "
  },
  {
    "path": ".github/workflows/release_pypi_after_merge.yml",
    "chars": 1937,
    "preview": "name: Publish to PyPI\n\non:\n  workflow_dispatch:\n  pull_request:\n    paths:\n      - 'version.txt'\n    types:\n      - clos"
  },
  {
    "path": ".github/workflows/release_test_pypi_before_merge.yml",
    "chars": 1703,
    "preview": "name: Publish to Test-PyPI Before Merge\n\non:\n  pull_request:\n    paths:\n      - 'version.txt'\n\njobs:\n  build-n-publish:\n"
  },
  {
    "path": ".github/workflows/report_leaderboard_to_lark.yml",
    "chars": 929,
    "preview": "name: Generate Community Report and Send to Lark\n\non:\n  workflow_dispatch:\n  schedule:\n    # release on every Friday 09:"
  },
  {
    "path": ".github/workflows/report_test_coverage.yml",
    "chars": 3007,
    "preview": "name: Report Test Coverage\n\non:\n  workflow_run:\n    workflows: [Build on PR]\n    types:\n      - completed\n\njobs:\n  repor"
  },
  {
    "path": ".github/workflows/run_chatgpt_examples.yml",
    "chars": 2519,
    "preview": "name: Run ChatGPT examples\n\non:\n  pull_request:\n    types: [synchronize, opened, reopened]\n    paths:\n      - \"applicati"
  },
  {
    "path": ".github/workflows/run_chatgpt_unit_tests.yml",
    "chars": 1481,
    "preview": "name: Run ChatGPT unit tests\n\non:\n  pull_request:\n    types: [synchronize, opened, reopened]\n    paths:\n      - 'applica"
  },
  {
    "path": ".github/workflows/run_colossalqa_unit_tests.yml",
    "chars": 1793,
    "preview": "name: Run colossalqa unit tests\n\non:\n  pull_request:\n    types: [synchronize, opened, reopened]\n    paths:\n      - 'appl"
  },
  {
    "path": ".github/workflows/scripts/check_doc_i18n.py",
    "chars": 2517,
    "preview": "import argparse\nimport os\n\n\ndef compare_dirs(dir1, dir2):\n    # First, we need to check if the two directories exist\n   "
  },
  {
    "path": ".github/workflows/scripts/example_checks/check_dispatch_inputs.py",
    "chars": 595,
    "preview": "import argparse\nimport os\n\n\ndef check_inputs(input_list):\n    for path in input_list:\n        real_path = os.path.join(\""
  },
  {
    "path": ".github/workflows/scripts/example_checks/check_example_weekly.py",
    "chars": 1186,
    "preview": "import os\n\n\ndef show_files(path, all_files):\n    # Traverse all the folder/file in current directory\n    file_list = os."
  },
  {
    "path": ".github/workflows/scripts/example_checks/detect_changed_example.py",
    "chars": 776,
    "preview": "import argparse\n\n\ndef main():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"-f\", \"--fileNameList\", typ"
  },
  {
    "path": ".github/workflows/scripts/generate_leaderboard_and_send_to_lark.py",
    "chars": 20436,
    "preview": "import os\nfrom datetime import datetime, timedelta\nfrom typing import Any, Dict, List\n\nimport matplotlib.pyplot as plt\ni"
  },
  {
    "path": ".github/workflows/scripts/generate_release_draft.py",
    "chars": 3669,
    "preview": "#!/usr/bin/env python\n# coding: utf-8\n\nimport argparse\nimport os\nimport re\n\nimport requests\n\nCOMMIT_API = \"https://api.g"
  },
  {
    "path": ".github/workflows/scripts/send_message_to_lark.py",
    "chars": 481,
    "preview": "import argparse\n\nimport requests\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"-m\""
  },
  {
    "path": ".github/workflows/scripts/update_setup_for_nightly.py",
    "chars": 862,
    "preview": "from datetime import datetime\n\n\ndef open_setup_file():\n    with open(\"setup.py\", \"r\") as f:\n        file_lines = f.readl"
  },
  {
    "path": ".github/workflows/submodule.yml",
    "chars": 1401,
    "preview": "name: Synchronize Submodule\n\non:\n  workflow_dispatch:\n  schedule:\n    - cron: \"0 0 * * *\"\n\njobs:\n  sync-submodule:\n    r"
  },
  {
    "path": ".github/workflows/translate_comment.yml",
    "chars": 720,
    "preview": "name: 'issue-translator'\non:\n  issue_comment:\n    types: [created]\n  issues:\n    types: [opened]\n\njobs:\n  build:\n    run"
  },
  {
    "path": ".gitignore",
    "chars": 2594,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
  },
  {
    "path": ".gitmodules",
    "chars": 139,
    "preview": "[submodule \"examples/tutorial/fastfold/FastFold\"]\n\tpath = examples/tutorial/fastfold/FastFold\n\turl = https://github.com/"
  },
  {
    "path": ".isort.cfg",
    "chars": 136,
    "preview": "[settings]\nline_length = 120\nmulti_line_output=3\ninclude_trailing_comma = true\nignore_comments = true\nprofile = black\nho"
  },
  {
    "path": ".pre-commit-config.yaml",
    "chars": 1170,
    "preview": "repos:\n\n  - repo: https://github.com/PyCQA/autoflake\n    rev: v2.3.1\n    hooks:\n      - id: autoflake\n        name: auto"
  },
  {
    "path": "CHANGE_LOG.md",
    "chars": 1103,
    "preview": "# Change Log\n\nAll notable changes to this project will be documented in this file.\n\n🚩 **We have moved the change log to "
  },
  {
    "path": "CONTRIBUTING.md",
    "chars": 6729,
    "preview": "# Contributing\n\nColossal-AI welcomes any constructive contribution from the community and the team is more than willing "
  },
  {
    "path": "LICENSE",
    "chars": 30128,
    "preview": "Copyright 2021- HPC-AI Technology Inc. All rights reserved.\n                                 Apache License\n            "
  },
  {
    "path": "MANIFEST.in",
    "chars": 198,
    "preview": "include *.txt README.md\nrecursive-include requirements *.txt\nrecursive-include colossalai *.cpp *.h *.cu *.tr *.cuh *.cc"
  },
  {
    "path": "README.md",
    "chars": 34218,
    "preview": "# Colossal-AI\n<div id=\"top\" align=\"center\">\n\n   [![logo](https://raw.githubusercontent.com/hpcaitech/public_assets/main/"
  },
  {
    "path": "applications/Colossal-LLaMA/README.md",
    "chars": 45928,
    "preview": "<div align=\"center\">\n<h1>\nColossal-LLaMA\n</h1>\n\n <h3>\n <a href=\"https://cloud.luchentech.com/\">GPU Cloud Playground </a>"
  },
  {
    "path": "applications/Colossal-LLaMA/colossal_llama/__init__.py",
    "chars": 47,
    "preview": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n"
  },
  {
    "path": "applications/Colossal-LLaMA/colossal_llama/dataset/__init__.py",
    "chars": 47,
    "preview": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n"
  },
  {
    "path": "applications/Colossal-LLaMA/colossal_llama/dataset/conversation.py",
    "chars": 3332,
    "preview": "#    Copyright 2023 lm-sys@FastChat\n#\n#    Licensed under the Apache License, Version 2.0 (the \"License\");\n#    you may "
  },
  {
    "path": "applications/Colossal-LLaMA/colossal_llama/dataset/dummy_dataset.py",
    "chars": 781,
    "preview": "import torch\nfrom torch.utils.data import Dataset\n\nfrom colossalai.accelerator import get_accelerator\n\n\nclass RandomData"
  },
  {
    "path": "applications/Colossal-LLaMA/colossal_llama/dataset/loader.py",
    "chars": 6647,
    "preview": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\nimport os\nfrom dataclasses import dataclass\nfrom typing import Dict, Ite"
  },
  {
    "path": "applications/Colossal-LLaMA/colossal_llama/dataset/spliced_and_tokenized_dataset.py",
    "chars": 12429,
    "preview": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\"\"\"\nSplicing multiple pre-tokenized sequence data points\n\"\"\"\n\nimport bise"
  },
  {
    "path": "applications/Colossal-LLaMA/colossal_llama/model/init_model.py",
    "chars": 4248,
    "preview": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\n\"\"\"\nInitialize new model with updated tokenizer by calculating the mean "
  },
  {
    "path": "applications/Colossal-LLaMA/colossal_llama/tokenizer/init_tokenizer.py",
    "chars": 3378,
    "preview": "#!/usr/bin/env python\n# -*- encoding: utf-8 -*-\n\n\"\"\"\nInitialize new tokenizer for continual pre-training\n\"\"\"\n\nimport arg"
  },
  {
    "path": "applications/Colossal-LLaMA/colossal_llama/utils/__init__.py",
    "chars": 47,
    "preview": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n"
  },
  {
    "path": "applications/Colossal-LLaMA/colossal_llama/utils/ckpt_io.py",
    "chars": 2775,
    "preview": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\n\"\"\"\nHelper functions for IO\n\"\"\"\n\nimport json\nimport os\nfrom typing impor"
  },
  {
    "path": "applications/Colossal-LLaMA/colossal_llama/utils/froze.py",
    "chars": 580,
    "preview": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\nfrom transformers.models.llama import LlamaForCausalLM\n\n\ndef freeze_non_"
  },
  {
    "path": "applications/Colossal-LLaMA/colossal_llama/utils/neftune_patch.py",
    "chars": 2660,
    "preview": "#    Copyright 2023 The Hugging Face team\n#\n#    Licensed under the Apache License, Version 2.0 (the \"License\");\n#    yo"
  },
  {
    "path": "applications/Colossal-LLaMA/colossal_llama/utils/stream_chat_patch.py",
    "chars": 10801,
    "preview": "from copy import deepcopy\nfrom typing import Any, Callable, Dict, List, Optional, Tuple\n\nimport torch\nfrom torch import "
  },
  {
    "path": "applications/Colossal-LLaMA/colossal_llama/utils/utils.py",
    "chars": 883,
    "preview": "\"\"\"\nUtils for Colossal-LLaMA\n\"\"\"\n\nimport torch\nimport torch.distributed as dist\n\nfrom colossalai.booster import Plugin\n\n"
  },
  {
    "path": "applications/Colossal-LLaMA/dataset/prepare_pretrain_dataset.py",
    "chars": 5892,
    "preview": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\"\"\"\nPrepare dataset for continual pre-training\n\"\"\"\n\nimport argparse\nimpor"
  },
  {
    "path": "applications/Colossal-LLaMA/dataset/prepare_sft_dataset.py",
    "chars": 6009,
    "preview": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\"\"\"\nPrepare sft dataset for fine-tuning\n\"\"\"\n\nimport argparse\nimport json\n"
  },
  {
    "path": "applications/Colossal-LLaMA/docs/example_13b.md",
    "chars": 14097,
    "preview": "# Colossal-LLaMA-2-13B-base Examples\nIn order to conduct a comprehensive evaluation of the performance of the Colossal-L"
  },
  {
    "path": "applications/Colossal-LLaMA/docs/example_7b.md",
    "chars": 109715,
    "preview": "# Colossal-LLaMA-2-7B-base Examples\nTo comprehensively assess the performance of the Colossal-LLaMA-2-7B-base model, our"
  },
  {
    "path": "applications/Colossal-LLaMA/hostfile.example",
    "chars": 20,
    "preview": "hostname1\nhostname2\n"
  },
  {
    "path": "applications/Colossal-LLaMA/inference/inference_example.py",
    "chars": 3084,
    "preview": "import argparse\n\nimport torch\nfrom colossal_llama.dataset.conversation import default_conversation\nfrom transformers imp"
  },
  {
    "path": "applications/Colossal-LLaMA/inference/stream_chat_example.py",
    "chars": 2445,
    "preview": "import argparse\n\nfrom colossal_llama.utils.stream_chat_patch import streaming_chat\nfrom transformers import AutoModelFor"
  },
  {
    "path": "applications/Colossal-LLaMA/requirements.txt",
    "chars": 225,
    "preview": "torch==2.1.2\nhuggingface-hub\npackaging==24.0\ncolossalai>=0.4.0\nautoflake==2.2.1\nblack==23.9.1\ntransformers>=4.39.3\ntenso"
  },
  {
    "path": "applications/Colossal-LLaMA/setup.py",
    "chars": 1142,
    "preview": "from setuptools import find_packages, setup\n\n\ndef fetch_requirements(path):\n    with open(path, \"r\") as fd:\n        retu"
  },
  {
    "path": "applications/Colossal-LLaMA/train.example.sh",
    "chars": 1505,
    "preview": "#!/bin/bash\nset_n_least_used_CUDA_VISIBLE_DEVICES() {\n    local n=${1:-\"9999\"}\n    echo \"GPU Memory Usage:\"\n    local FI"
  },
  {
    "path": "applications/Colossal-LLaMA/train.py",
    "chars": 22625,
    "preview": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\"\"\"\nContinual Pre-training/Supervised fine-tuning of Colossal-LLaMA-2 dev"
  },
  {
    "path": "applications/Colossal-LLaMA/train_sft.example.sh",
    "chars": 1277,
    "preview": "#!/bin/bash\n\n# NCCL IB environment variables\nexport NCCL_IB_HCA=mlx5_1:1,mlx5_2:1,mlx5_3:1,mlx5_4:1\nexport NCCL_IB_DISAB"
  },
  {
    "path": "applications/Colossal-LLaMA/version.txt",
    "chars": 6,
    "preview": "1.1.0\n"
  },
  {
    "path": "applications/ColossalChat/.gitignore",
    "chars": 2460,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
  },
  {
    "path": "applications/ColossalChat/LICENSE",
    "chars": 11413,
    "preview": "Copyright 2021- HPC-AI Technology Inc. All rights reserved.\n                                 Apache License\n            "
  },
  {
    "path": "applications/ColossalChat/README.md",
    "chars": 16638,
    "preview": "<h1 align=\"center\">\n  <img width=\"auto\" height=\"100px\", src=\"https://raw.githubusercontent.com/hpcaitech/public_assets/m"
  },
  {
    "path": "applications/ColossalChat/benchmarks/Opt.json",
    "chars": 775,
    "preview": "{\n    \"chat_template\": \"{% for message in messages %}{% if message['role'] == 'user' %}{{'Human: ' + bos_token + message"
  },
  {
    "path": "applications/ColossalChat/benchmarks/README.md",
    "chars": 970,
    "preview": "# Benchmarks\n\n## Benchmark OPT with LoRA on dummy prompt data\n\nWe provide various OPT models (string in parentheses is t"
  },
  {
    "path": "applications/ColossalChat/benchmarks/benchmark_dpo.sh",
    "chars": 1677,
    "preview": "#!/bin/bash\nset_n_least_used_CUDA_VISIBLE_DEVICES() {\n    local n=${1:-\"9999\"}\n    echo \"GPU Memory Usage:\"\n    local FI"
  },
  {
    "path": "applications/ColossalChat/benchmarks/benchmark_kto.sh",
    "chars": 1669,
    "preview": "#!/bin/bash\nset_n_least_used_CUDA_VISIBLE_DEVICES() {\n    local n=${1:-\"9999\"}\n    echo \"GPU Memory Usage:\"\n    local FI"
  },
  {
    "path": "applications/ColossalChat/benchmarks/benchmark_memory_consumption.txt",
    "chars": 154,
    "preview": "Model=Opt-125m; lora_rank=0; plugin=zero2\nMax CUDA memory usage: 26123.16 MB\nModel=Opt-125m; lora_rank=0; plugin=zero2\nM"
  },
  {
    "path": "applications/ColossalChat/benchmarks/benchmark_orpo.sh",
    "chars": 1675,
    "preview": "#!/bin/bash\nset_n_least_used_CUDA_VISIBLE_DEVICES() {\n    local n=${1:-\"9999\"}\n    echo \"GPU Memory Usage:\"\n    local FI"
  },
  {
    "path": "applications/ColossalChat/benchmarks/benchmark_performance_summarization.txt",
    "chars": 696,
    "preview": "facebook/opt-125m; 0; zero2\nPerformance summary:\nGenerate 768 samples, throughput: 188.48 samples/s, TFLOPS per GPU: 361"
  },
  {
    "path": "applications/ColossalChat/benchmarks/benchmark_ppo.py",
    "chars": 22614,
    "preview": "\"\"\"\nFor becnhmarking ppo. Mudified from examples/training_scripts/train_ppo.py\n\"\"\"\n\nimport argparse\nimport json\nimport o"
  },
  {
    "path": "applications/ColossalChat/benchmarks/benchmark_ppo.sh",
    "chars": 3966,
    "preview": "#!/usr/bin/env bash\n\nset_n_least_used_CUDA_VISIBLE_DEVICES() {\n    local n=${1:-\"9999\"}\n    echo \"GPU Memory Usage:\"\n   "
  },
  {
    "path": "applications/ColossalChat/benchmarks/benchmark_sft.sh",
    "chars": 1723,
    "preview": "set_n_least_used_CUDA_VISIBLE_DEVICES() {\n    local n=${1:-\"9999\"}\n    echo \"GPU Memory Usage:\"\n    local FIRST_N_GPU_ID"
  },
  {
    "path": "applications/ColossalChat/benchmarks/benchmark_simpo.sh",
    "chars": 1791,
    "preview": "#!/bin/bash\nset_n_least_used_CUDA_VISIBLE_DEVICES() {\n    local n=${1:-\"9999\"}\n    echo \"GPU Memory Usage:\"\n    local FI"
  },
  {
    "path": "applications/ColossalChat/benchmarks/data_preparation.sh",
    "chars": 573,
    "preview": "SAVE_DIR=\"\"\n\n\nBASE_DIR=$(dirname $(dirname $(realpath $BASH_SOURCE)))\nEXAMPLES_DIR=$BASE_DIR/examples\nSAVE_DIR=$BASE_DIR"
  },
  {
    "path": "applications/ColossalChat/benchmarks/dummy_dataset.py",
    "chars": 799,
    "preview": "from typing import Callable\n\nfrom torch.utils.data import Dataset\n\n\nclass DummyLLMDataset(Dataset):\n    def __init__(sel"
  },
  {
    "path": "applications/ColossalChat/benchmarks/prepare_dummy_test_dataset.py",
    "chars": 3522,
    "preview": "import argparse\nimport json\nimport os\nimport time\nfrom multiprocessing import cpu_count\n\nfrom datasets import load_datas"
  },
  {
    "path": "applications/ColossalChat/benchmarks/ray/1mmt_dummy.py",
    "chars": 7422,
    "preview": "import argparse\nimport os\nimport socket\nfrom functools import partial\n\nimport ray\nimport torch\nfrom coati.quant import l"
  },
  {
    "path": "applications/ColossalChat/benchmarks/ray/mmmt_dummy.py",
    "chars": 8013,
    "preview": "import argparse\nimport os\nimport socket\nfrom functools import partial\n\nimport ray\nimport torch\nfrom coati.quant import l"
  },
  {
    "path": "applications/ColossalChat/coati/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "applications/ColossalChat/coati/dataset/__init__.py",
    "chars": 768,
    "preview": "from .conversation import Conversation, setup_conversation_template\nfrom .loader import (\n    DataCollatorForKTODataset,"
  },
  {
    "path": "applications/ColossalChat/coati/dataset/conversation.py",
    "chars": 6456,
    "preview": "import dataclasses\nimport json\nimport os\nfrom typing import Any, Dict, List\n\nimport torch.distributed as dist\nfrom trans"
  },
  {
    "path": "applications/ColossalChat/coati/dataset/loader.py",
    "chars": 20251,
    "preview": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\"\"\"\nDataloader for sft, dpo, ppo\n\"\"\"\n\nimport os\nfrom dataclasses import d"
  },
  {
    "path": "applications/ColossalChat/coati/dataset/tokenization_utils.py",
    "chars": 16335,
    "preview": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\"\"\"\ntokenization utils for constructing dataset for ppo, dpo, sft, rm\n\"\"\""
  },
  {
    "path": "applications/ColossalChat/coati/dataset/utils.py",
    "chars": 6586,
    "preview": "import io\nimport json\nfrom typing import Any, Dict, List\n\nimport torch\nimport torch.distributed as dist\nimport torch.nn."
  },
  {
    "path": "applications/ColossalChat/coati/distributed/README.md",
    "chars": 15703,
    "preview": "# Distributed RL Framework for Language Model Fine-Tuning\n\nThis repository implements a distributed Reinforcement Learni"
  },
  {
    "path": "applications/ColossalChat/coati/distributed/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "applications/ColossalChat/coati/distributed/comm.py",
    "chars": 5947,
    "preview": "import copy\nfrom typing import Any, Dict\n\nimport ray\nimport ray.util.collective as cc\nimport torch\nimport torch.distribu"
  },
  {
    "path": "applications/ColossalChat/coati/distributed/consumer.py",
    "chars": 21648,
    "preview": "from contextlib import nullcontext\nfrom typing import Any, Dict, Optional\n\nimport ray\nimport ray.util.collective as cc\ni"
  },
  {
    "path": "applications/ColossalChat/coati/distributed/grpo_consumer.py",
    "chars": 30118,
    "preview": "from contextlib import nullcontext\nfrom typing import Any, Optional\n\nimport ray\nimport torch\nimport wandb\nfrom coati.dis"
  },
  {
    "path": "applications/ColossalChat/coati/distributed/inference_backend.py",
    "chars": 12144,
    "preview": "from typing import Any, Dict\n\nimport torch\nimport torch.nn.functional as F\nfrom transformers import AutoConfig, AutoMode"
  },
  {
    "path": "applications/ColossalChat/coati/distributed/launch.py",
    "chars": 7590,
    "preview": "import copy\nimport os\nimport uuid\nfrom typing import Any, Dict, Optional\n\nimport ray\n\nfrom .consumer import SimpleConsum"
  },
  {
    "path": "applications/ColossalChat/coati/distributed/launch_zero_bubble.py",
    "chars": 13524,
    "preview": "import copy\nimport os\nimport uuid\nfrom typing import Any, Dict, Optional\n\nimport ray\n\nfrom .comm import SharedVariableAc"
  },
  {
    "path": "applications/ColossalChat/coati/distributed/loss.py",
    "chars": 2377,
    "preview": "from typing import Optional\n\nimport torch\nimport torch.nn as nn\nfrom coati.distributed.utils import masked_mean, masked_"
  },
  {
    "path": "applications/ColossalChat/coati/distributed/producer.py",
    "chars": 24093,
    "preview": "import copy\nimport json\nimport os\nfrom typing import Any, Dict, Optional\n\nimport ray\nimport ray.util.collective as cc\nim"
  },
  {
    "path": "applications/ColossalChat/coati/distributed/profiling_utils.py",
    "chars": 1002,
    "preview": "import os\nimport time\n\n\nclass CustomProfiler:\n    def __init__(self, name, disabled=True):\n        self.disabled = disab"
  },
  {
    "path": "applications/ColossalChat/coati/distributed/reward/code_reward/testing_util.py",
    "chars": 26522,
    "preview": "# Code from the verl Project (https://github.com/agentica-project/rllm),\n# which itself is adapted from Prime (https://g"
  },
  {
    "path": "applications/ColossalChat/coati/distributed/reward/code_reward/utils.py",
    "chars": 2725,
    "preview": "# Code from the verl Project (https://github.com/agentica-project/rllm),\n# which itself is adapted from Prime (https://g"
  },
  {
    "path": "applications/ColossalChat/coati/distributed/reward/reward_fn.py",
    "chars": 11568,
    "preview": "# Copyright 2024 ByteDance Group\n\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use th"
  },
  {
    "path": "applications/ColossalChat/coati/distributed/reward/reward_utils.py",
    "chars": 4346,
    "preview": "# Copyright Unakar\n# Modified from https://github.com/Unakar/Logic-RL/blob/086373176ac198c97277ff50f4b6e7e1bfe669d3/verl"
  },
  {
    "path": "applications/ColossalChat/coati/distributed/reward/verifiable_reward.py",
    "chars": 2278,
    "preview": "\"\"\"\nFunction-based reward verification module.\n\"\"\"\n\nimport inspect\nfrom typing import Any, Dict, List\n\nimport torch\n\n\ncl"
  },
  {
    "path": "applications/ColossalChat/coati/distributed/utils.py",
    "chars": 5834,
    "preview": "import json\nimport os\nfrom typing import Any, Dict, List\n\nimport torch\nfrom filelock import FileLock\n\nfrom colossalai.sh"
  },
  {
    "path": "applications/ColossalChat/coati/distributed/zero_bubble/README.md",
    "chars": 3443,
    "preview": "# Zero Bubble Distributed RL Framework for Language Model Fine-Tuning\n\nThis folder contains code for the Zero Bubble dis"
  },
  {
    "path": "applications/ColossalChat/coati/distributed/zero_bubble/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "applications/ColossalChat/coati/distributed/zero_bubble/consumer.py",
    "chars": 18521,
    "preview": "import os\nimport threading\nimport time\nfrom typing import Any, Dict, Optional\n\nimport ray\nimport ray.util.collective as "
  },
  {
    "path": "applications/ColossalChat/coati/distributed/zero_bubble/distributor.py",
    "chars": 5884,
    "preview": "import time\n\nimport ray\nimport ray.util.collective as cc\nimport torch\nfrom coati.distributed.comm import SharedVariableA"
  },
  {
    "path": "applications/ColossalChat/coati/distributed/zero_bubble/grpo_consumer.py",
    "chars": 26832,
    "preview": "from contextlib import nullcontext\nfrom typing import Any, Optional\n\nimport ray\nimport torch\nimport wandb\nfrom coati.dis"
  },
  {
    "path": "applications/ColossalChat/coati/distributed/zero_bubble/producer.py",
    "chars": 25879,
    "preview": "import copy\nimport json\nimport os\nimport threading\nimport time\nfrom typing import Any, Dict, Optional\n\nimport ray\nimport"
  },
  {
    "path": "applications/ColossalChat/coati/distributed/zero_bubble/requirements.txt",
    "chars": 207,
    "preview": "ray==2.49.2\npygloo>=0.2.0  # you need to build from source: https://github.com/ray-project/pygloo  commit 82ae2d72222aef"
  },
  {
    "path": "applications/ColossalChat/coati/experience_buffer/__init__.py",
    "chars": 133,
    "preview": "from .base import ExperienceBuffer\nfrom .naive import NaiveExperienceBuffer\n\n__all__ = [\"ExperienceBuffer\", \"NaiveExperi"
  },
  {
    "path": "applications/ColossalChat/coati/experience_buffer/base.py",
    "chars": 1059,
    "preview": "from abc import ABC, abstractmethod\nfrom typing import Any\n\nfrom coati.experience_maker.base import Experience\n\n\nclass E"
  },
  {
    "path": "applications/ColossalChat/coati/experience_buffer/naive.py",
    "chars": 2632,
    "preview": "import random\nfrom typing import List\n\nimport torch\nfrom coati.experience_maker.base import Experience\n\nfrom colossalai."
  },
  {
    "path": "applications/ColossalChat/coati/experience_buffer/utils.py",
    "chars": 2416,
    "preview": "from dataclasses import dataclass\nfrom typing import List, Optional\n\nimport torch\nimport torch.nn.functional as F\nfrom c"
  },
  {
    "path": "applications/ColossalChat/coati/experience_maker/__init__.py",
    "chars": 155,
    "preview": "from .base import Experience, ExperienceMaker\nfrom .naive import NaiveExperienceMaker\n\n__all__ = [\"Experience\", \"Experie"
  },
  {
    "path": "applications/ColossalChat/coati/experience_maker/base.py",
    "chars": 2922,
    "preview": "from abc import ABC, abstractmethod\nfrom dataclasses import dataclass\nfrom typing import Optional\n\nimport torch\nfrom coa"
  },
  {
    "path": "applications/ColossalChat/coati/experience_maker/naive.py",
    "chars": 14191,
    "preview": "\"\"\"\nexperience maker.\n\"\"\"\n\nfrom typing import Any\n\nimport torch\nimport torch.nn.functional as F\nfrom coati.dataset.utils"
  },
  {
    "path": "applications/ColossalChat/coati/models/__init__.py",
    "chars": 792,
    "preview": "from .base import BaseModel\nfrom .critic import Critic\nfrom .generation import generate, generate_streaming, prepare_inp"
  },
  {
    "path": "applications/ColossalChat/coati/models/base.py",
    "chars": 1996,
    "preview": "\"\"\"\nBase class for critic and reward model\n\"\"\"\n\nfrom typing import Optional\n\nimport torch\nimport torch.nn as nn\nfrom tra"
  },
  {
    "path": "applications/ColossalChat/coati/models/critic.py",
    "chars": 1386,
    "preview": "\"\"\"\nCritic model\n\"\"\"\n\nfrom typing import Optional\n\nimport torch\nimport torch.nn as nn\nfrom coati.models import BaseModel"
  },
  {
    "path": "applications/ColossalChat/coati/models/generation.py",
    "chars": 20234,
    "preview": "import copy\nfrom typing import Any, Callable, List, Optional\n\nimport torch\nimport torch.distributed as dist\nfrom transfo"
  },
  {
    "path": "applications/ColossalChat/coati/models/lora.py",
    "chars": 14600,
    "preview": "\"\"\"\nLORA utils\n\"\"\"\n\nimport dataclasses\nimport math\nimport warnings\nfrom typing import List, Optional, Union\n\nimport lora"
  },
  {
    "path": "applications/ColossalChat/coati/models/loss.py",
    "chars": 11434,
    "preview": "\"\"\"\nloss functions\n\"\"\"\n\nfrom typing import Optional, Tuple\n\nimport torch\nimport torch.distributed as dist\nimport torch.n"
  },
  {
    "path": "applications/ColossalChat/coati/models/reward_model.py",
    "chars": 1629,
    "preview": "\"\"\"\nreward model\n\"\"\"\n\nfrom typing import Optional\n\nimport torch\nimport torch.nn as nn\nfrom coati.models import BaseModel"
  },
  {
    "path": "applications/ColossalChat/coati/models/rlvr_reward_model.py",
    "chars": 1328,
    "preview": "\"\"\"\nreward model\n\"\"\"\n\nfrom typing import Callable, List, Optional\n\nimport torch\n\n\nclass RLVRRewardModel:\n    \"\"\"\n    RLV"
  },
  {
    "path": "applications/ColossalChat/coati/models/utils.py",
    "chars": 5281,
    "preview": "import json\nimport os\nfrom typing import Any, Dict, Optional, Union\n\nimport torch\nimport torch.nn.functional as F\n\n\ndef "
  },
  {
    "path": "applications/ColossalChat/coati/quant/__init__.py",
    "chars": 156,
    "preview": "from .llama_gptq import load_quant as llama_load_quant\nfrom .utils import low_resource_init\n\n__all__ = [\n    \"llama_load"
  },
  {
    "path": "applications/ColossalChat/coati/quant/llama_gptq/__init__.py",
    "chars": 64,
    "preview": "from .loader import load_quant\n\n__all__ = [\n    \"load_quant\",\n]\n"
  },
  {
    "path": "applications/ColossalChat/coati/quant/llama_gptq/loader.py",
    "chars": 671,
    "preview": "import torch\nimport torch.nn as nn\n\nfrom .model_utils import find_layers\nfrom .quant import make_quant\n\n\ndef load_quant("
  },
  {
    "path": "applications/ColossalChat/coati/quant/llama_gptq/model_utils.py",
    "chars": 416,
    "preview": "# copied from https://github.com/qwopqwop200/GPTQ-for-LLaMa/blob/past/modelutils.py\n\nimport torch.nn as nn\n\n\ndef find_la"
  },
  {
    "path": "applications/ColossalChat/coati/quant/llama_gptq/quant.py",
    "chars": 10820,
    "preview": "# copied from https://github.com/qwopqwop200/GPTQ-for-LLaMa/blob/past/quant.py\n\nimport math\n\nimport numpy as np\nimport t"
  },
  {
    "path": "applications/ColossalChat/coati/quant/utils.py",
    "chars": 811,
    "preview": "from contextlib import contextmanager\n\nimport torch\n\n\ndef _noop(*args, **kwargs):\n    pass\n\n\n@contextmanager\ndef low_res"
  },
  {
    "path": "applications/ColossalChat/coati/ray/README.md",
    "chars": 5062,
    "preview": ":warning: **This content may be outdated since the major update of Colossal Chat. We will update this content soon.**\n\n#"
  },
  {
    "path": "applications/ColossalChat/coati/ray/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "applications/ColossalChat/coati/ray/callbacks/__init__.py",
    "chars": 286,
    "preview": "from .base import MakerCallback, TrainerCallback\nfrom .performance_evaluator import ExperienceMakerPerformanceEvaluator,"
  },
  {
    "path": "applications/ColossalChat/coati/ray/callbacks/base.py",
    "chars": 1256,
    "preview": "from abc import ABC\n\nfrom coati.experience_maker import Experience\n\n\nclass TrainerCallback(ABC):\n    \"\"\"\n    Base callba"
  },
  {
    "path": "applications/ColossalChat/coati/ray/callbacks/performance_evaluator.py",
    "chars": 8595,
    "preview": "from time import time\nfrom typing import Optional\n\nimport torch\nimport torch.distributed as dist\nfrom coati.experience_m"
  },
  {
    "path": "applications/ColossalChat/coati/ray/detached_replay_buffer.py",
    "chars": 2526,
    "preview": "from typing import List\n\nimport torch\nfrom coati.experience_buffer.utils import BufferItem, make_experience_batch, split"
  },
  {
    "path": "applications/ColossalChat/coati/ray/detached_trainer_base.py",
    "chars": 6990,
    "preview": "import os\nfrom abc import ABC, abstractmethod\nfrom typing import Any, Dict, List\n\nimport ray\nimport torch\nfrom coati.exp"
  },
  {
    "path": "applications/ColossalChat/coati/ray/detached_trainer_ppo.py",
    "chars": 8755,
    "preview": "from typing import Callable, Dict, List, Tuple\n\nimport ray\nimport torch\nfrom coati.experience_maker import Experience\nfr"
  },
  {
    "path": "applications/ColossalChat/coati/ray/experience_maker_holder.py",
    "chars": 11694,
    "preview": "import os\nimport time\nimport tracemalloc\nfrom threading import Lock\nfrom typing import Any, Callable, Dict, Iterable, Li"
  },
  {
    "path": "applications/ColossalChat/coati/ray/lora_constructor.py",
    "chars": 4181,
    "preview": "from collections import OrderedDict\nfrom dataclasses import dataclass\nfrom typing import Any, Dict\n\nimport torch.nn as n"
  },
  {
    "path": "applications/ColossalChat/coati/ray/utils.py",
    "chars": 5383,
    "preview": "import os\nfrom collections import OrderedDict\nfrom typing import Any, Dict\n\nimport torch\nimport torch.distributed as dis"
  },
  {
    "path": "applications/ColossalChat/coati/trainer/__init__.py",
    "chars": 431,
    "preview": "from .base import OLTrainer, SLTrainer\nfrom .dpo import DPOTrainer\nfrom .grpo import GRPOTrainer\nfrom .kto import KTOTra"
  },
  {
    "path": "applications/ColossalChat/coati/trainer/base.py",
    "chars": 7128,
    "preview": "\"\"\"\nBase trainers for online and offline training\n    SLTrainer: supervised learning trainer\n        pretrain, sft, dpo,"
  },
  {
    "path": "applications/ColossalChat/coati/trainer/callbacks/__init__.py",
    "chars": 131,
    "preview": "from .base import Callback\nfrom .performance_evaluator import PerformanceEvaluator\n\n__all__ = [\"Callback\", \"PerformanceE"
  },
  {
    "path": "applications/ColossalChat/coati/trainer/callbacks/base.py",
    "chars": 825,
    "preview": "from abc import ABC\n\nfrom coati.experience_maker import Experience\n\n\nclass Callback(ABC):\n    \"\"\"\n    Base callback clas"
  },
  {
    "path": "applications/ColossalChat/coati/trainer/callbacks/performance_evaluator.py",
    "chars": 7560,
    "preview": "from time import time\nfrom typing import Optional\n\nimport torch\nimport torch.distributed as dist\nfrom coati.experience_m"
  },
  {
    "path": "applications/ColossalChat/coati/trainer/dpo.py",
    "chars": 32074,
    "preview": "\"\"\"\nDpo trainer\n\"\"\"\n\nimport os\nfrom typing import Any, Optional\n\nimport torch\nimport torch.distributed as dist\nfrom coat"
  },
  {
    "path": "applications/ColossalChat/coati/trainer/grpo.py",
    "chars": 17667,
    "preview": "\"\"\"\nGRPO trainer\n\"\"\"\n\nimport os\nfrom typing import Dict, List, Optional, Union\n\nimport torch\nimport wandb\nfrom coati.exp"
  },
  {
    "path": "applications/ColossalChat/coati/trainer/kto.py",
    "chars": 15865,
    "preview": "\"\"\"\nKTO trainer\n\"\"\"\n\nimport os\nfrom typing import Any, Optional\n\nimport torch\nimport torch.distributed as dist\nfrom coat"
  },
  {
    "path": "applications/ColossalChat/coati/trainer/orpo.py",
    "chars": 15323,
    "preview": "\"\"\"\nOrpo trainer\n\"\"\"\n\nimport os\nfrom typing import Any, Optional\n\nimport torch\nfrom coati.models.loss import OddsRatioLo"
  },
  {
    "path": "applications/ColossalChat/coati/trainer/ppo.py",
    "chars": 18414,
    "preview": "\"\"\"\nPPO trainer\n\"\"\"\n\nimport os\nfrom typing import Dict, List, Optional, Union\n\nimport torch\nimport wandb\nfrom coati.expe"
  },
  {
    "path": "applications/ColossalChat/coati/trainer/rm.py",
    "chars": 10879,
    "preview": "\"\"\"\nReward model trianer\n\"\"\"\n\nimport os\nfrom typing import Any, Callable, Optional\n\nimport torch\nimport tqdm\nfrom coati."
  },
  {
    "path": "applications/ColossalChat/coati/trainer/sft.py",
    "chars": 10434,
    "preview": "\"\"\"\nSFT trainer\n\"\"\"\n\nimport os\nfrom typing import Optional\n\nimport torch\nimport torch.distributed as dist\nfrom coati.tra"
  },
  {
    "path": "applications/ColossalChat/coati/trainer/utils.py",
    "chars": 5269,
    "preview": "\"\"\"\nTraining utilities for Coati.\n\"\"\"\n\nfrom typing import Any\n\nimport torch\nimport torch.distributed as dist\nfrom torch."
  },
  {
    "path": "applications/ColossalChat/coati/utils/__init__.py",
    "chars": 183,
    "preview": "from .accumulative_meter import AccumulativeMeanMeter\nfrom .ckpt_io import load_checkpoint, save_checkpoint\n\n__all__ = ["
  },
  {
    "path": "applications/ColossalChat/coati/utils/accumulative_meter.py",
    "chars": 1908,
    "preview": "\"\"\"\nA class that can be used to calculate the mean of a variable\n\"\"\"\n\n\nclass AccumulativeMeanVariable:\n    \"\"\"\n    A cla"
  },
  {
    "path": "applications/ColossalChat/coati/utils/ckpt_io.py",
    "chars": 2915,
    "preview": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\n\"\"\"\nHelper functions for IO save load checkpoints\n\"\"\"\n\nimport json\nimpor"
  },
  {
    "path": "applications/ColossalChat/coati/utils/reward_score/__init__.py",
    "chars": 148,
    "preview": "from .competition import math_competition_reward_fn\nfrom .gsm8k import gsm8k_reward_fn\n\n__all__ = [\"gsm8k_reward_fn\", \"m"
  },
  {
    "path": "applications/ColossalChat/coati/utils/reward_score/competition.py",
    "chars": 987,
    "preview": "import torch\n\nfrom .utils import extract_solution, validate_response_structure\n\n\ndef math_competition_reward_fn(input_id"
  },
  {
    "path": "applications/ColossalChat/coati/utils/reward_score/gsm8k.py",
    "chars": 1096,
    "preview": "import torch\n\nfrom .utils import extract_solution, validate_response_structure\n\n\ndef gsm8k_reward_fn(input_ids, attentio"
  },
  {
    "path": "applications/ColossalChat/coati/utils/reward_score/utils.py",
    "chars": 2769,
    "preview": "# Copyright Unakar\n# Modified from https://github.com/Unakar/Logic-RL/blob/086373176ac198c97277ff50f4b6e7e1bfe669d3/verl"
  },
  {
    "path": "applications/ColossalChat/conversation_template/01-ai_Yi-1.5-9B-Chat.json",
    "chars": 732,
    "preview": "{\n    \"chat_template\": \"{% if messages[0]['role'] == 'system' %}{% set system_message = messages[0]['content'] %}{% endi"
  },
  {
    "path": "applications/ColossalChat/conversation_template/MiniCPM-2b.json",
    "chars": 586,
    "preview": "{\n    \"chat_template\": \"{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{"
  },
  {
    "path": "applications/ColossalChat/conversation_template/Qwen_Qwen1.5-110B-Chat.json",
    "chars": 507,
    "preview": "{\n    \"chat_template\": \"{% for message in messages %}{{'<|im_start|>' + message['role'] + '\\n' + message['content'] + '<"
  },
  {
    "path": "applications/ColossalChat/conversation_template/Qwen_Qwen1.5-32B-Chat.json",
    "chars": 507,
    "preview": "{\n    \"chat_template\": \"{% for message in messages %}{{'<|im_start|>' + message['role'] + '\\n' + message['content'] + '<"
  },
  {
    "path": "applications/ColossalChat/conversation_template/Qwen_Qwen2.5-3B.json",
    "chars": 1422,
    "preview": "{\n    \"chat_template\": \"{% for message in messages %}{{'<|im_start|>' + message['role'] + '\\n' + message['content'] + '<"
  },
  {
    "path": "applications/ColossalChat/conversation_template/THUDM_chatglm2-6b.json",
    "chars": 546,
    "preview": "{\n    \"chat_template\": \"{% for message in messages %}{{'<|im_start|>' + message['role'] + '\\n' + message['content'] + '<"
  },
  {
    "path": "applications/ColossalChat/conversation_template/THUDM_chatglm3-6b.json",
    "chars": 534,
    "preview": "{\n    \"chat_template\": \"{% for message in messages %}{% if loop.first %}[gMASK]sop<|{{ message['role'] }}|>\\n {{ message"
  },
  {
    "path": "applications/ColossalChat/conversation_template/baichuan-inc_Baichuan2-13B-Chat.json",
    "chars": 486,
    "preview": "{\n    \"chat_template\": \"{% for message in messages %}{{'<|im_start|>' + message['role'] + '\\n' + message['content'] + '<"
  },
  {
    "path": "applications/ColossalChat/conversation_template/colossal-llama2.json",
    "chars": 796,
    "preview": "{\n    \"chat_template\": \"{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{"
  },
  {
    "path": "applications/ColossalChat/conversation_template/deepseek-ai_DeepSeek-V2-Lite.json",
    "chars": 764,
    "preview": "{\n    \"chat_template\": \"{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{"
  },
  {
    "path": "applications/ColossalChat/conversation_template/llama2.json",
    "chars": 1100,
    "preview": "{\n    \"chat_template\": \"{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_mess"
  },
  {
    "path": "applications/ColossalChat/conversation_template/microsoft_phi-2.json",
    "chars": 490,
    "preview": "{\n    \"chat_template\": \"{% for message in messages %}{{'<|im_start|>' + message['role'] + '\\n' + message['content'] + '<"
  },
  {
    "path": "applications/ColossalChat/conversation_template/mistralai_Mixtral-8x7B-Instruct-v0.1.json",
    "chars": 593,
    "preview": "{\n    \"chat_template\": \"{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % "
  },
  {
    "path": "applications/ColossalChat/conversation_template/tiny-llama.json",
    "chars": 705,
    "preview": "{\n    \"chat_template\": \"{% for message in messages %}\\n{% if message['role'] == 'user' %}\\n{{ '<|user|>\\n' + message['co"
  },
  {
    "path": "applications/ColossalChat/examples/README.md",
    "chars": 47775,
    "preview": "# Examples\n\n\n## Table of Contents\n- [Examples](#examples)\n  - [Table of Contents](#table-of-contents)\n  - [Install Requi"
  }
]

// ... and 1985 more files (download for full content)

About this extraction

This page contains the full source code of the hpcaitech/ColossalAI GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 2185 files (13.0 MB), approximately 3.5M tokens, and a symbol index with 12172 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo