Full Code of Oneflow-Inc/oneflow for AI

master 25c8978c1c8b cached

4508 files

25.3 MB

6.9M tokens

25463 symbols

1 requests

Copy disabled (too large) Download .txt

Showing preview only (27,413K chars total). Download the full file to get everything.

Repository: Oneflow-Inc/oneflow
Branch: master
Commit: 25c8978c1c8b
Files: 4508
Total size: 25.3 MB

Directory structure:
gitextract_pzk3dhhw/

├── .clang-format
├── .clang-tidy
├── .cmake-format.py
├── .devcontainer/
│   ├── Dockerfile
│   └── devcontainer.json
├── .dockerignore
├── .github/
│   ├── CODEOWNERS
│   ├── ISSUE_TEMPLATE/
│   │   ├── blank_issue.yml
│   │   ├── bug_report.md
│   │   ├── documention_issue.yml
│   │   ├── feature_request.yml
│   │   ├── performance_issue.yml
│   │   └── question.yml
│   ├── PULL_REQUEST_TEMPLATE/
│   │   ├── general_template.md
│   │   └── op_template.md
│   ├── actions/
│   │   ├── mac-build/
│   │   │   └── action.yml
│   │   ├── setup/
│   │   │   └── action.yml
│   │   ├── upload_oss/
│   │   │   └── action.yml
│   │   ├── upload_ssh/
│   │   │   └── action.yml
│   │   └── whl/
│   │       └── action.yml
│   ├── scripts/
│   │   ├── requirements.txt
│   │   └── set_initial_variables.py
│   └── workflows/
│       ├── canary.yml
│       ├── community_release.yml
│       ├── on_merge.yml
│       ├── pr.yml
│       ├── priv_release.yml
│       ├── release.yml
│       ├── simple.yml
│       └── test.yml
├── .gitignore
├── .lsan-suppressions
├── .mergify.yml
├── .tsan-suppressions
├── .ubsan-suppressions
├── CMakeLists.txt
├── LICENSE
├── README.md
├── ci/
│   ├── CMakeLists.txt
│   ├── build/
│   │   ├── ensure_img.py
│   │   └── make.sh
│   ├── check/
│   │   ├── clang_tidy_warnings_as_errors_on_diff
│   │   ├── lintutils.py
│   │   ├── run_clang_format.py
│   │   ├── run_clang_tidy.py
│   │   ├── run_cmake_format.py
│   │   ├── run_license_format.py
│   │   └── run_py_format.py
│   ├── clang/
│   │   └── build-llvm.sh
│   ├── conda/
│   │   ├── build-clang.sh
│   │   └── tuna.condarc
│   ├── fixed-dev-requirements.txt
│   ├── manylinux/
│   │   ├── build-gcc7-xla.sh
│   │   ├── build-gcc9.sh
│   │   └── build.sh
│   ├── requirements.txt
│   ├── reset_submodule.sh
│   ├── setup_submodule.py
│   ├── setup_submodule.sh
│   └── test/
│       ├── 1node_benchmark_test.sh
│       ├── 1node_benchmark_test_fp16.sh
│       ├── 1node_custom_op_test.sh
│       ├── 1node_model_eager_test.sh
│       ├── 1node_model_test.sh
│       ├── 1node_op_test.sh
│       ├── 2node_op_test.sh
│       ├── 2node_op_test_multi_client.sh
│       ├── CMakeLists.txt
│       ├── build_docs.sh
│       ├── distributed_run.py
│       ├── doctest.sh
│       ├── excludelist
│       ├── expensive_generic_test_multi_client.sh
│       ├── generic_test.sh
│       ├── generic_test_multi_client.sh
│       ├── ir_tests.sh
│       ├── multi_client_exception_test.sh
│       ├── multi_launch.py
│       ├── parallel_run.py
│       ├── print_stack_from_core.sh
│       ├── print_stack_in_all_dirs.sh
│       ├── resource-spec/
│       │   ├── 1x-gtx-1080.json
│       │   ├── 2x-rtx-2080.json
│       │   └── 4x-rtx-2080ti.json
│       ├── test_mock_function.sh
│       ├── test_mock_script.sh
│       ├── test_resnet50_graph_ddp.sh
│       ├── test_speed_multi_client.sh
│       └── try_install.sh
├── cmake/
│   ├── caches/
│   │   ├── ci/
│   │   │   ├── canary/
│   │   │   │   └── cuda.cmake
│   │   │   ├── cpu-asan-ubsan.cmake
│   │   │   ├── cpu-tsan.cmake
│   │   │   ├── cpu.cmake
│   │   │   ├── cuda-xla.cmake
│   │   │   ├── cuda.cmake
│   │   │   ├── gh-hosted/
│   │   │   │   ├── cpu-clang.cmake
│   │   │   │   └── cpu-gcc.cmake
│   │   │   ├── llvm/
│   │   │   │   └── cuda-75-clang.cmake
│   │   │   ├── profiler/
│   │   │   │   └── cuda.cmake
│   │   │   ├── release/
│   │   │   │   ├── cpu.cmake
│   │   │   │   ├── cu118.cmake
│   │   │   │   └── cuda.cmake
│   │   │   └── serving/
│   │   │       ├── cuda-75.cmake
│   │   │       └── openvino.cmake
│   │   ├── cn/
│   │   │   ├── cpu.cmake
│   │   │   ├── cuda.cmake
│   │   │   └── fast/
│   │   │       ├── cpu-clang.cmake
│   │   │       ├── cpu.cmake
│   │   │       ├── cuda-61-clang.cmake
│   │   │       ├── cuda-61.cmake
│   │   │       ├── cuda-75-clang.cmake
│   │   │       ├── cuda-75.cmake
│   │   │       ├── cuda-86.cmake
│   │   │       ├── mlir-cpu.cmake
│   │   │       ├── mlir-cuda-61.cmake
│   │   │       ├── mlir-cuda-75.cmake
│   │   │       ├── mlir-cuda-80.cmake
│   │   │       └── mlir-cuda-86.cmake
│   │   └── international/
│   │       ├── cpu.cmake
│   │       └── cuda.cmake
│   ├── cuda.cmake
│   ├── functional.cmake
│   ├── git_version.cmake
│   ├── oneflow-config.cmake
│   ├── oneflow.cmake
│   ├── op_schema.cmake
│   ├── platform.cmake
│   ├── proto2cpp.cmake
│   ├── pybind11.cmake
│   ├── python.cmake
│   ├── third_party/
│   │   ├── FindBFD.cmake
│   │   ├── FindBLAS.cmake
│   │   ├── FindCUDNN.cmake
│   │   ├── FindUnwind.cmake
│   │   ├── absl.cmake
│   │   ├── cares.cmake
│   │   ├── cocoapi.cmake
│   │   ├── cub.cmake
│   │   ├── cutlass.cmake
│   │   ├── eigen.cmake
│   │   ├── flash_attention.cmake
│   │   ├── flatbuffers.cmake
│   │   ├── glog.cmake
│   │   ├── googletest.cmake
│   │   ├── grpc.cmake
│   │   ├── half.cmake
│   │   ├── header_index/
│   │   │   ├── cub_headers.txt
│   │   │   ├── grpc_headers.txt
│   │   │   ├── libpng_headers.txt
│   │   │   └── opencv_headers.txt
│   │   ├── hwloc.cmake
│   │   ├── json.cmake
│   │   ├── libjpeg-turbo.cmake
│   │   ├── nccl.cmake
│   │   ├── oneDNN.cmake
│   │   ├── opencv.cmake
│   │   ├── openssl.cmake
│   │   ├── patches/
│   │   │   └── tensorflow-logging.patch
│   │   ├── protobuf.cmake
│   │   ├── re2.cmake
│   │   ├── trt_flash_attention.cmake
│   │   └── zlib.cmake
│   ├── third_party.cmake
│   ├── threading.cmake
│   └── util.cmake
├── dev-requirements.txt
├── docker/
│   ├── build/
│   │   ├── Dockerfile
│   │   ├── build-ubuntu.sh
│   │   ├── build.sh
│   │   ├── build.ubuntu.dockerfile
│   │   ├── launch.sh
│   │   └── test.sh
│   ├── ci/
│   │   ├── base/
│   │   │   └── Dockerfile
│   │   ├── fmt/
│   │   │   ├── Dockerfile
│   │   │   └── build.sh
│   │   ├── make/
│   │   │   └── Dockerfile
│   │   ├── test/
│   │   │   ├── Dockerfile
│   │   │   ├── build.sh
│   │   │   ├── launch.sh
│   │   │   └── requirements.txt
│   │   ├── test-v2/
│   │   │   ├── Dockerfile
│   │   │   ├── build.sh
│   │   │   ├── requirements.txt
│   │   │   └── sources.list
│   │   └── third_party/
│   │       └── Dockerfile
│   └── package/
│       └── manylinux/
│           ├── CentOS-Base.repo
│           ├── CentOS7-Base-163.repo
│           ├── Dockerfile
│           ├── README.md
│           ├── build_wheel.py
│           └── launch.sh
├── docs/
│   ├── Makefile
│   ├── requirements.txt
│   └── source/
│       ├── _static/
│       │   └── .gitkeep
│       ├── auto_parallel.rst
│       ├── autograd.rst
│       ├── cn/
│       │   ├── __init__.py
│       │   ├── activation.py
│       │   └── math_ops.py
│       ├── conf.py
│       ├── cuda.rst
│       ├── distributed.rst
│       ├── distributions.rst
│       ├── environment_variables.rst
│       ├── graph.rst
│       ├── hub.rst
│       ├── image.rst
│       ├── index.rst
│       ├── linalg.rst
│       ├── nn.functional.rst
│       ├── nn.init.rst
│       ├── nn.rst
│       ├── one_embedding.rst
│       ├── oneflow.rst
│       ├── optim.rst
│       ├── special.rst
│       ├── tensor.rst
│       ├── tensor_attributes.rst
│       ├── troubleshooting.md
│       ├── type_info.rst
│       ├── utils.data.rst
│       ├── utils.global_view.rst
│       └── utils.tensor.rst
├── external/
│   ├── CMakeLists.txt
│   ├── fmt/
│   │   └── CMakeLists.txt
│   ├── kineto/
│   │   └── CMakeLists.txt
│   ├── onetbb/
│   │   └── CMakeLists.txt
│   └── robin-hood-hashing/
│       └── CMakeLists.txt
├── oneflow/
│   ├── api/
│   │   ├── common/
│   │   │   ├── ir_pass.cpp
│   │   │   ├── job_build_and_infer_ctx.h
│   │   │   ├── sbp.h
│   │   │   └── variable_tensor_mgr.h
│   │   ├── cpp/
│   │   │   ├── api.h
│   │   │   ├── embedding/
│   │   │   │   ├── embedding.cpp
│   │   │   │   └── embedding.h
│   │   │   ├── env.cpp
│   │   │   ├── env.h
│   │   │   ├── env_impl.cpp
│   │   │   ├── env_impl.h
│   │   │   ├── framework/
│   │   │   │   ├── device.cpp
│   │   │   │   ├── device.h
│   │   │   │   ├── dtype.cpp
│   │   │   │   ├── dtype.h
│   │   │   │   ├── graph.cpp
│   │   │   │   ├── graph.h
│   │   │   │   ├── ivalue.cpp
│   │   │   │   ├── ivalue.h
│   │   │   │   ├── shape.cpp
│   │   │   │   ├── shape.h
│   │   │   │   ├── tensor.cpp
│   │   │   │   └── tensor.h
│   │   │   ├── framework.h
│   │   │   ├── nn/
│   │   │   │   └── functional/
│   │   │   │       ├── activation.cpp
│   │   │   │       └── activation.h
│   │   │   ├── nn.h
│   │   │   └── tests/
│   │   │       ├── api_test.cpp
│   │   │       ├── api_test.h
│   │   │       ├── graph_test.cpp
│   │   │       ├── graph_test_model/
│   │   │       │   ├── affine_no_parameter/
│   │   │       │   │   └── model.mlir
│   │   │       │   └── affine_with_parameter/
│   │   │       │       ├── model.a/
│   │   │       │       │   ├── meta
│   │   │       │       │   └── out
│   │   │       │       ├── model.b/
│   │   │       │       │   ├── meta
│   │   │       │       │   └── out
│   │   │       │       └── model.mlir
│   │   │       ├── ivalue_test.cpp
│   │   │       ├── nn_test.cpp
│   │   │       ├── one_embedding_test.cpp
│   │   │       └── tensor_test.cpp
│   │   └── python/
│   │       ├── autograd/
│   │       │   ├── autograd.cpp
│   │       │   ├── autograd_engine.cpp
│   │       │   ├── autograd_function.cpp
│   │       │   ├── autograd_function_state.cpp
│   │       │   ├── autograd_function_state.h
│   │       │   ├── autograd_mode.cpp
│   │       │   └── function_node.cpp
│   │       ├── caster/
│   │       │   ├── autograd_function_state.h
│   │       │   ├── common.h
│   │       │   ├── maybe.h
│   │       │   ├── optional.h
│   │       │   ├── size.h
│   │       │   ├── tensor.h
│   │       │   └── test.cpp
│   │       ├── deprecated.cpp
│   │       ├── dlpack/
│   │       │   ├── converter.cpp
│   │       │   ├── converter.h
│   │       │   └── dlpack.h
│   │       ├── eager/
│   │       │   └── eager.cpp
│   │       ├── env/
│   │       │   ├── env.cpp
│   │       │   └── env.h
│   │       ├── ep/
│   │       │   └── cuda_matmul_mode.cpp
│   │       ├── exception/
│   │       │   ├── exception.cpp
│   │       │   └── exception.h
│   │       ├── flags.cpp
│   │       ├── framework/
│   │       │   ├── autocast.cpp
│   │       │   ├── device.cpp
│   │       │   ├── doc.cpp
│   │       │   ├── dtype.cpp
│   │       │   ├── framework.cpp
│   │       │   ├── framework.h
│   │       │   ├── global_mode.cpp
│   │       │   ├── id_state.cpp
│   │       │   ├── id_util.cpp
│   │       │   ├── instructions_builder.cpp
│   │       │   ├── layout.cpp
│   │       │   ├── memory_format.cpp
│   │       │   ├── memory_format.h
│   │       │   ├── nn_graph.cpp
│   │       │   ├── one_embedding.cpp
│   │       │   ├── op_builder.cpp
│   │       │   ├── op_expr.cpp
│   │       │   ├── parallel_conf_util.cpp
│   │       │   ├── py_kernel_registry.cpp
│   │       │   ├── random_generator.cpp
│   │       │   ├── scope_util.cpp
│   │       │   ├── session_util.cpp
│   │       │   ├── shut_down_util.cpp
│   │       │   ├── size.cpp
│   │       │   ├── size.h
│   │       │   ├── stream.cpp
│   │       │   ├── tensor.cpp
│   │       │   ├── tensor.h
│   │       │   ├── tensor_functions.cpp
│   │       │   ├── tensor_functions_util.h
│   │       │   ├── tensor_tuple.cpp
│   │       │   ├── tensortype.cpp
│   │       │   ├── tensortype.h
│   │       │   ├── thread.cpp
│   │       │   ├── thread.h
│   │       │   ├── typeinfo.cpp
│   │       │   ├── typeinfo.h
│   │       │   └── variable_tensor_mgr.cpp
│   │       ├── functional/
│   │       │   ├── common.cpp
│   │       │   ├── common.h
│   │       │   ├── dispatch_stateful_ops.cpp
│   │       │   ├── dispatch_stateful_ops.yaml
│   │       │   ├── function_def.h
│   │       │   ├── indexing.cpp
│   │       │   ├── indexing.h
│   │       │   ├── python_arg.cpp
│   │       │   ├── python_arg.h
│   │       │   ├── python_arg_parser.cpp
│   │       │   ├── python_arg_parser.h
│   │       │   ├── python_return_types.h
│   │       │   ├── tensor_api.cpp
│   │       │   ├── tensor_api.yaml
│   │       │   ├── value_types.cpp
│   │       │   └── value_types.h
│   │       ├── gil_foreign_lock_helper.cpp
│   │       ├── init.cpp
│   │       ├── ir.cpp
│   │       ├── job_build/
│   │       │   ├── job_build_and_infer.cpp
│   │       │   ├── job_build_and_infer.h
│   │       │   └── lazy_mode.cpp
│   │       ├── multiprocessing/
│   │       │   ├── init.cpp
│   │       │   ├── object_ptr.cpp
│   │       │   ├── object_ptr.h
│   │       │   └── shared_memory.cpp
│   │       ├── numpy/
│   │       │   └── init_numpy_c_api.cpp
│   │       ├── of_api_registry.cpp
│   │       ├── of_api_registry.h
│   │       ├── profiler.cpp
│   │       ├── registry/
│   │       │   └── registry.cpp
│   │       ├── remat/
│   │       │   └── remat.cpp
│   │       ├── rpc/
│   │       │   ├── ccl.cpp
│   │       │   └── rank_group.cpp
│   │       ├── session/
│   │       │   └── session.cpp
│   │       ├── stack_getter.cpp
│   │       ├── symbol/
│   │       │   ├── job_conf_symbol.cpp
│   │       │   ├── op_conf_symbol.cpp
│   │       │   ├── placement_symbol.cpp
│   │       │   ├── sbp_symbol.cpp
│   │       │   └── scope_symbol.cpp
│   │       └── utils/
│   │           ├── dataloader.cpp
│   │           ├── tensor_utils.cpp
│   │           └── tensor_utils.h
│   ├── core/
│   │   ├── auto_parallel/
│   │   │   ├── algorithm_util.cpp
│   │   │   ├── algorithm_util.h
│   │   │   ├── auto_memory.cpp
│   │   │   ├── auto_memory.h
│   │   │   ├── binary_set.cpp
│   │   │   ├── binary_set.h
│   │   │   ├── boxing_collector.cpp
│   │   │   ├── boxing_collector.h
│   │   │   ├── sbp_collector.cpp
│   │   │   ├── sbp_collector.h
│   │   │   ├── sbp_constructor.cpp
│   │   │   ├── sbp_constructor.h
│   │   │   ├── sbp_edge.cpp
│   │   │   ├── sbp_edge.h
│   │   │   ├── sbp_graph.cpp
│   │   │   ├── sbp_graph.h
│   │   │   ├── sbp_node.cpp
│   │   │   ├── sbp_node.h
│   │   │   ├── sbp_util.cpp
│   │   │   └── sbp_util.h
│   │   ├── autograd/
│   │   │   ├── autograd_captured_tensor.h
│   │   │   ├── autograd_engine.cpp
│   │   │   ├── autograd_engine.h
│   │   │   ├── autograd_function.cpp
│   │   │   ├── autograd_function.h
│   │   │   ├── autograd_meta.cpp
│   │   │   ├── autograd_meta.h
│   │   │   ├── autograd_mode.cpp
│   │   │   ├── autograd_mode.h
│   │   │   ├── gradient_funcs/
│   │   │   │   ├── activation.cpp
│   │   │   │   ├── adaptive_avg_pool.cpp
│   │   │   │   ├── adaptive_max_pool.cpp
│   │   │   │   ├── add_n.cpp
│   │   │   │   ├── affine_grid.cpp
│   │   │   │   ├── amp_white_identity.cpp
│   │   │   │   ├── as_strided.cpp
│   │   │   │   ├── avg_pool.cpp
│   │   │   │   ├── batch_gather.cpp
│   │   │   │   ├── bias_add.cpp
│   │   │   │   ├── binary_cross_entropy.cpp
│   │   │   │   ├── binary_cross_entropy_with_logits.cpp
│   │   │   │   ├── binary_cross_entropy_with_logits_reduce_mean.cpp
│   │   │   │   ├── broadcast_binary_ops.cpp
│   │   │   │   ├── broadcast_like.cpp
│   │   │   │   ├── cast.cpp
│   │   │   │   ├── clip_by_scalar.cpp
│   │   │   │   ├── clip_by_scalar_max.cpp
│   │   │   │   ├── clip_by_scalar_min.cpp
│   │   │   │   ├── combined_margin_loss.cpp
│   │   │   │   ├── complex.cpp
│   │   │   │   ├── concat.cpp
│   │   │   │   ├── conv.cpp
│   │   │   │   ├── copy.cpp
│   │   │   │   ├── ctc_loss.cpp
│   │   │   │   ├── cublas_fused_mlp.cpp
│   │   │   │   ├── cum_ops.cpp
│   │   │   │   ├── deconv.cpp
│   │   │   │   ├── deform_conv.cpp
│   │   │   │   ├── depand.cpp
│   │   │   │   ├── det.cpp
│   │   │   │   ├── diag.cpp
│   │   │   │   ├── diagonal.cpp
│   │   │   │   ├── dim_gather.cpp
│   │   │   │   ├── dim_scatter.cpp
│   │   │   │   ├── dot.cpp
│   │   │   │   ├── dropout.cpp
│   │   │   │   ├── eager_ccl_broadcast.cpp
│   │   │   │   ├── elementwise_minimum_maximum.cpp
│   │   │   │   ├── embedding.cpp
│   │   │   │   ├── expand.cpp
│   │   │   │   ├── fake_quantization.cpp
│   │   │   │   ├── fft.cpp
│   │   │   │   ├── fill.cpp
│   │   │   │   ├── flatten.cpp
│   │   │   │   ├── flip.cpp
│   │   │   │   ├── fold.cpp
│   │   │   │   ├── fused_bias_add_dropout.cpp
│   │   │   │   ├── fused_bias_add_gelu.cpp
│   │   │   │   ├── fused_bias_add_scale_mask_softmax_dropout.cpp
│   │   │   │   ├── fused_center.cpp
│   │   │   │   ├── fused_cross_interaction.cpp
│   │   │   │   ├── fused_dot_feature_interaction.cpp
│   │   │   │   ├── fused_fast_gelu_mul.cpp
│   │   │   │   ├── fused_get_boundding_boxes_coord.cpp
│   │   │   │   ├── fused_get_ciou_diagonal_angle.cpp
│   │   │   │   ├── fused_get_ciou_result.cpp
│   │   │   │   ├── fused_get_convex_diagonal_squared.cpp
│   │   │   │   ├── fused_get_intersection_area.cpp
│   │   │   │   ├── fused_get_iou.cpp
│   │   │   │   ├── fused_glu.cpp
│   │   │   │   ├── fused_gru_cell.cpp
│   │   │   │   ├── fused_lstm_cell.cpp
│   │   │   │   ├── fused_matmul_bias.cpp
│   │   │   │   ├── fused_matmul_bias_add_relu_dropout.cpp
│   │   │   │   ├── fused_scale_mask_bias_softmax.cpp
│   │   │   │   ├── fused_scale_mask_softmax.cpp
│   │   │   │   ├── fused_scale_mask_softmax_dropout.cpp
│   │   │   │   ├── fused_scale_tril.cpp
│   │   │   │   ├── fused_scale_tril_softmax_mask_scale.cpp
│   │   │   │   ├── fused_self_attention.cpp
│   │   │   │   ├── fused_weighted_sum.cpp
│   │   │   │   ├── gather.cpp
│   │   │   │   ├── gather_nd.cpp
│   │   │   │   ├── global_cast.cpp
│   │   │   │   ├── global_to_global.cpp
│   │   │   │   ├── gradient_accumulation.cpp
│   │   │   │   ├── graph_feed_and_fetch.cpp
│   │   │   │   ├── grid_sample.cpp
│   │   │   │   ├── group_norm.cpp
│   │   │   │   ├── identity.cpp
│   │   │   │   ├── inv.cpp
│   │   │   │   ├── kl_div.cpp
│   │   │   │   ├── l2_normalize.cpp
│   │   │   │   ├── layer_norm.cpp
│   │   │   │   ├── lerp.cpp
│   │   │   │   ├── linalg_cross.cpp
│   │   │   │   ├── log_softmax.cpp
│   │   │   │   ├── masked_fill.cpp
│   │   │   │   ├── math_binary_op.cpp
│   │   │   │   ├── math_unary_op.cpp
│   │   │   │   ├── matmul.cpp
│   │   │   │   ├── matrix_vector_product.cpp
│   │   │   │   ├── max_pool.cpp
│   │   │   │   ├── max_unpool.cpp
│   │   │   │   ├── median.cpp
│   │   │   │   ├── mode.cpp
│   │   │   │   ├── narrow.cpp
│   │   │   │   ├── nll.cpp
│   │   │   │   ├── noncontiguous_binary_op.cpp
│   │   │   │   ├── normalization.cpp
│   │   │   │   ├── normalization_add_relu.cpp
│   │   │   │   ├── one_embedding_fused_lookup.cpp
│   │   │   │   ├── padding.cpp
│   │   │   │   ├── partial_fc_sample.cpp
│   │   │   │   ├── reduce_ops.cpp
│   │   │   │   ├── reduce_sum_like.cpp
│   │   │   │   ├── reshape.cpp
│   │   │   │   ├── rms_norm.cpp
│   │   │   │   ├── roi_align.cpp
│   │   │   │   ├── roll.cpp
│   │   │   │   ├── rrelu.cpp
│   │   │   │   ├── scalar_add.cpp
│   │   │   │   ├── scalar_div.cpp
│   │   │   │   ├── scalar_floordiv.cpp
│   │   │   │   ├── scalar_fmod.cpp
│   │   │   │   ├── scalar_mul.cpp
│   │   │   │   ├── scalar_pow.cpp
│   │   │   │   ├── scalar_truncdiv.cpp
│   │   │   │   ├── scaled_dot_product_attention.cpp
│   │   │   │   ├── scatter_nd.cpp
│   │   │   │   ├── select_top_n.cpp
│   │   │   │   ├── slice.cpp
│   │   │   │   ├── smooth_l1_loss.cpp
│   │   │   │   ├── softmax.cpp
│   │   │   │   ├── softmax_cross_entropy.cpp
│   │   │   │   ├── sparse_cross_entropy.cpp
│   │   │   │   ├── sparse_softmax_cross_entropy.cpp
│   │   │   │   ├── sparse_softmax_cross_entropy_ms.cpp
│   │   │   │   ├── split_like.cpp
│   │   │   │   ├── squeeze.cpp
│   │   │   │   ├── stack.cpp
│   │   │   │   ├── tensor_scalar_binary.cpp
│   │   │   │   ├── tensor_scatter_nd_update.cpp
│   │   │   │   ├── tf_pool.cpp
│   │   │   │   ├── to_contiguous.cpp
│   │   │   │   ├── transpose.cpp
│   │   │   │   ├── tril.cpp
│   │   │   │   ├── triu.cpp
│   │   │   │   ├── trunc.cpp
│   │   │   │   ├── two_stage_reduce.cpp
│   │   │   │   ├── unfold.cpp
│   │   │   │   ├── unfold_tensor.cpp
│   │   │   │   ├── unsqueeze.cpp
│   │   │   │   ├── upsample.cpp
│   │   │   │   ├── variance.cpp
│   │   │   │   ├── vector_matrix_product.cpp
│   │   │   │   └── where.cpp
│   │   │   └── higher_order_gradient_funcs/
│   │   │       ├── activation.cpp
│   │   │       ├── avg_pool.cpp
│   │   │       ├── binary_cross_entropy_loss.cpp
│   │   │       ├── binary_cross_entropy_with_logits.cpp
│   │   │       ├── binary_cross_entropy_with_logits_reduce_mean.cpp
│   │   │       ├── conv.cpp
│   │   │       ├── div.cpp
│   │   │       ├── kl_div_loss.cpp
│   │   │       ├── log_softmax.cpp
│   │   │       ├── math_unary_op.cpp
│   │   │       ├── matmul.cpp
│   │   │       ├── max_pool.cpp
│   │   │       ├── nll_loss.cpp
│   │   │       ├── pow.cpp
│   │   │       ├── scalar_pow.cpp
│   │   │       ├── slice.cpp
│   │   │       ├── smooth_l1_loss.cpp
│   │   │       └── softmax.cpp
│   │   ├── boxing/
│   │   │   ├── asymmetric_broadcast.cpp
│   │   │   ├── boxing_dividor.h
│   │   │   ├── boxing_dividor_util.cpp
│   │   │   ├── boxing_dividor_util.h
│   │   │   ├── boxing_interpreter_status.cpp
│   │   │   ├── boxing_interpreter_status.h
│   │   │   ├── ccl_boxing_function.cpp
│   │   │   ├── cuda_copy_boxing_interpreter.cpp
│   │   │   ├── eager_boxing_interpreter.cpp
│   │   │   ├── eager_boxing_interpreter.h
│   │   │   ├── eager_boxing_interpreter_mgr.cpp
│   │   │   ├── eager_boxing_interpreter_mgr.h
│   │   │   ├── eager_boxing_logger.cpp
│   │   │   ├── eager_boxing_logger.h
│   │   │   ├── flatten_hierarchy.cpp
│   │   │   ├── generic_symmetric_nd_sbp_boxing.cpp
│   │   │   ├── identity_boxing_interpreter.cpp
│   │   │   ├── naive_1_to_p_boxing.cpp
│   │   │   ├── naive_b_to_1_boxing.cpp
│   │   │   ├── naive_b_to_s_boxing.cpp
│   │   │   ├── naive_p_to_b_boxing.cpp
│   │   │   ├── naive_p_to_s_boxing.cpp
│   │   │   ├── naive_s_to_b_boxing.cpp
│   │   │   ├── naive_s_to_p_boxing.cpp
│   │   │   ├── naive_s_to_s_boxing.cpp
│   │   │   ├── nd_sbp_dim_reduce_boxing.cpp
│   │   │   ├── one_to_one_boxing.cpp
│   │   │   ├── slice_boxing_util.cpp
│   │   │   ├── slice_boxing_util.h
│   │   │   ├── symmetric_acyclic_nd_sbp_boxing.cpp
│   │   │   ├── symmetric_b_to_p_boxing.cpp
│   │   │   ├── symmetric_b_to_s_boxing.cpp
│   │   │   ├── symmetric_s_to_p_boxing.cpp
│   │   │   └── unflatten_hierarchy.cpp
│   │   ├── ccl/
│   │   │   ├── ccl.cpp
│   │   │   └── ccl.h
│   │   ├── comm_network/
│   │   │   ├── comm_network.cpp
│   │   │   ├── comm_network.h
│   │   │   ├── epoll/
│   │   │   │   ├── epoll_comm_network.cpp
│   │   │   │   ├── epoll_comm_network.h
│   │   │   │   ├── io_event_poller.cpp
│   │   │   │   ├── io_event_poller.h
│   │   │   │   ├── socket_helper.cpp
│   │   │   │   ├── socket_helper.h
│   │   │   │   ├── socket_memory_desc.h
│   │   │   │   ├── socket_message.h
│   │   │   │   ├── socket_read_helper.cpp
│   │   │   │   ├── socket_read_helper.h
│   │   │   │   ├── socket_write_helper.cpp
│   │   │   │   └── socket_write_helper.h
│   │   │   └── ibverbs/
│   │   │       ├── ibverbs.proto
│   │   │       ├── ibverbs_comm_network.cpp
│   │   │       ├── ibverbs_comm_network.h
│   │   │       ├── ibverbs_memory_desc.cpp
│   │   │       ├── ibverbs_memory_desc.h
│   │   │       ├── ibverbs_qp.cpp
│   │   │       └── ibverbs_qp.h
│   │   ├── common/
│   │   │   ├── array_ref.h
│   │   │   ├── auto_registration_factory.h
│   │   │   ├── balanced_splitter.cpp
│   │   │   ├── balanced_splitter.h
│   │   │   ├── balanced_splitter_test.cpp
│   │   │   ├── bfloat16.h
│   │   │   ├── bfloat16_math.h
│   │   │   ├── bfloat16_test.cpp
│   │   │   ├── blas.h
│   │   │   ├── blocking_counter.cpp
│   │   │   ├── blocking_counter.h
│   │   │   ├── blocking_then_busy.h
│   │   │   ├── buffer.h
│   │   │   ├── buffer_manager.h
│   │   │   ├── cached_caller.cpp
│   │   │   ├── cached_caller.h
│   │   │   ├── cblas.h
│   │   │   ├── channel.h
│   │   │   ├── channel_test.cpp
│   │   │   ├── check.cpp
│   │   │   ├── check.h
│   │   │   ├── check_level.cpp
│   │   │   ├── check_level.h
│   │   │   ├── constant.h
│   │   │   ├── container_util.h
│   │   │   ├── container_util_test.cpp
│   │   │   ├── cost_util.h
│   │   │   ├── cpp_attribute.h
│   │   │   ├── data_type.cpp
│   │   │   ├── data_type.h
│   │   │   ├── data_type.proto
│   │   │   ├── data_type_converter.h
│   │   │   ├── data_type_converter_test.cpp
│   │   │   ├── data_type_converter_test_static.h
│   │   │   ├── data_type_seq.h
│   │   │   ├── decorator.h
│   │   │   ├── decorator_test.cpp
│   │   │   ├── device.proto
│   │   │   ├── device_type.cpp
│   │   │   ├── device_type.h
│   │   │   ├── device_type.proto
│   │   │   ├── dtype_signature.h
│   │   │   ├── dtype_signature.proto
│   │   │   ├── eigen_util.h
│   │   │   ├── either_ptr.h
│   │   │   ├── env_var/
│   │   │   │   ├── bootstrap.h
│   │   │   │   ├── debug_mode.h
│   │   │   │   ├── eager.h
│   │   │   │   ├── env_var.h
│   │   │   │   ├── remat.h
│   │   │   │   ├── stream.h
│   │   │   │   └── vm.h
│   │   │   ├── error.cpp
│   │   │   ├── error.h
│   │   │   ├── error.proto
│   │   │   ├── error_util.cpp
│   │   │   ├── error_util.h
│   │   │   ├── exception.h
│   │   │   ├── flat_shape.cpp
│   │   │   ├── flat_shape.h
│   │   │   ├── foreign_lock_helper.cpp
│   │   │   ├── foreign_lock_helper.h
│   │   │   ├── function_traits.h
│   │   │   ├── hash.h
│   │   │   ├── hash_container.h
│   │   │   ├── hash_eq_trait_ptr.h
│   │   │   ├── high_order_bool.h
│   │   │   ├── just.h
│   │   │   ├── layout_standardize.h
│   │   │   ├── math_util.cpp
│   │   │   ├── math_util.h
│   │   │   ├── maybe.h
│   │   │   ├── maybe_test.cpp
│   │   │   ├── mem_util.cpp
│   │   │   ├── mem_util.h
│   │   │   ├── memory_format.proto
│   │   │   ├── meta_util.hpp
│   │   │   ├── nd_index.cpp
│   │   │   ├── nd_index.h
│   │   │   ├── nd_index_offset_helper.h
│   │   │   ├── nd_index_offset_helper_test.cpp
│   │   │   ├── not_equal_to_previous_adjacent_iterator.h
│   │   │   ├── notifier.cpp
│   │   │   ├── notifier.h
│   │   │   ├── of_unused.h
│   │   │   ├── op_args_reserved_size.h
│   │   │   ├── op_args_vector.h
│   │   │   ├── optional.h
│   │   │   ├── optional_test.cpp
│   │   │   ├── pcheck.h
│   │   │   ├── permutation_iterator.h
│   │   │   ├── platform.h
│   │   │   ├── preprocessor.h
│   │   │   ├── preprocessor_internal.h
│   │   │   ├── preprocessor_test.cpp
│   │   │   ├── process_state.h
│   │   │   ├── protobuf.cpp
│   │   │   ├── protobuf.h
│   │   │   ├── range.cpp
│   │   │   ├── range.h
│   │   │   ├── range.proto
│   │   │   ├── registry_error.cpp
│   │   │   ├── registry_error.h
│   │   │   ├── scalar.cpp
│   │   │   ├── scalar.h
│   │   │   ├── sequential.proto
│   │   │   ├── shape.cpp
│   │   │   ├── shape.h
│   │   │   ├── shape.proto
│   │   │   ├── shape_test.cpp
│   │   │   ├── shape_vec.h
│   │   │   ├── shape_view.cpp
│   │   │   ├── shape_view.h
│   │   │   ├── shared_or_scalar.h
│   │   │   ├── single_thread_obj_pool.h
│   │   │   ├── single_thread_obj_pool_test.cpp
│   │   │   ├── singleton.h
│   │   │   ├── sized_buffer_view.h
│   │   │   ├── small_vector.h
│   │   │   ├── spin_counter.cpp
│   │   │   ├── spin_counter.h
│   │   │   ├── static_check.h
│   │   │   ├── static_global.h
│   │   │   ├── steady_vector.h
│   │   │   ├── steady_vector_test.cpp
│   │   │   ├── str_util.cpp
│   │   │   ├── str_util.h
│   │   │   ├── stream_type.h
│   │   │   ├── stride.cpp
│   │   │   ├── stride.h
│   │   │   ├── switch_func.h
│   │   │   ├── symbol.h
│   │   │   ├── symbol_test.cpp
│   │   │   ├── tensor_buffer.cpp
│   │   │   ├── tensor_buffer.h
│   │   │   ├── tensor_desc.cpp
│   │   │   ├── tensor_desc.h
│   │   │   ├── tensor_meta.cpp
│   │   │   ├── tensor_meta.h
│   │   │   ├── test_util.h
│   │   │   ├── thread_local_guard.h
│   │   │   ├── thread_local_guard_test.cpp
│   │   │   ├── throw.h
│   │   │   ├── to_string.h
│   │   │   ├── tuple_hash.h
│   │   │   ├── type_traits.h
│   │   │   ├── util.cpp
│   │   │   ├── util.h
│   │   │   ├── wrap_dim_utils.h
│   │   │   └── zero_only_zip.h
│   │   ├── control/
│   │   │   ├── bootstrap_client.h
│   │   │   ├── bootstrap_server.h
│   │   │   ├── control.proto
│   │   │   ├── ctrl_bootstrap.cpp
│   │   │   ├── ctrl_bootstrap.h
│   │   │   ├── ctrl_bootstrap.proto
│   │   │   ├── ctrl_call.h
│   │   │   ├── ctrl_client.cpp
│   │   │   ├── ctrl_client.h
│   │   │   ├── ctrl_server.cpp
│   │   │   ├── ctrl_server.h
│   │   │   ├── ctrl_service.cpp
│   │   │   ├── ctrl_service.h
│   │   │   ├── ctrl_test.cpp
│   │   │   ├── ctrl_util.cpp
│   │   │   ├── ctrl_util.h
│   │   │   ├── global_process_ctx.h
│   │   │   ├── host_list_bootstrap_client.cpp
│   │   │   ├── host_list_bootstrap_client.h
│   │   │   ├── host_list_bootstrap_server.cpp
│   │   │   ├── host_list_bootstrap_server.h
│   │   │   ├── rank_info_bootstrap_client.cpp
│   │   │   ├── rank_info_bootstrap_client.h
│   │   │   ├── rank_info_bootstrap_server.cpp
│   │   │   ├── rank_info_bootstrap_server.h
│   │   │   ├── rpc_client.cpp
│   │   │   ├── rpc_client.h
│   │   │   ├── rpc_server.cpp
│   │   │   ├── rpc_server.h
│   │   │   └── worker_process_info.proto
│   │   ├── cuda/
│   │   │   ├── atomic.cuh
│   │   │   ├── elementwise.cuh
│   │   │   ├── layer_norm.cuh
│   │   │   ├── rms_norm.cuh
│   │   │   ├── softmax.cuh
│   │   │   └── unique.cuh
│   │   ├── device/
│   │   │   ├── cuda_pseudo_bfloat16.h
│   │   │   ├── cuda_pseudo_half.h
│   │   │   ├── cuda_util.cpp
│   │   │   ├── cuda_util.h
│   │   │   ├── cudnn_conv_util.cpp
│   │   │   ├── cudnn_conv_util.h
│   │   │   ├── cudnn_util.cpp
│   │   │   ├── cudnn_util.h
│   │   │   ├── device_id.cpp
│   │   │   ├── device_id.h
│   │   │   ├── ep_based_event_record.h
│   │   │   ├── event_record.h
│   │   │   ├── nccl_util.cpp
│   │   │   └── nccl_util.h
│   │   ├── eager/
│   │   │   ├── call_context.cpp
│   │   │   ├── call_context.h
│   │   │   ├── dev_vm_dep_object_consume_mode.h
│   │   │   ├── eager_blob_object.cpp
│   │   │   ├── eager_blob_object.h
│   │   │   ├── local_dep_object.cpp
│   │   │   ├── local_dep_object.h
│   │   │   ├── tensor_storage.cpp
│   │   │   └── tensor_storage.h
│   │   ├── embedding/
│   │   │   ├── cache.cpp
│   │   │   ├── cache.h
│   │   │   ├── cache_test.cpp
│   │   │   ├── cached_key_value_store.cu
│   │   │   ├── cached_key_value_store.h
│   │   │   ├── embedding_manager.cpp
│   │   │   ├── embedding_manager.h
│   │   │   ├── full_cache.cu
│   │   │   ├── full_cache.h
│   │   │   ├── hash_functions.cuh
│   │   │   ├── key_value_store.h
│   │   │   ├── key_value_store_options.h
│   │   │   ├── key_value_store_test.cpp
│   │   │   ├── kv_iterator.h
│   │   │   ├── lru_cache.cu
│   │   │   ├── lru_cache.h
│   │   │   ├── mock_key_value_store.cu
│   │   │   ├── mock_key_value_store.h
│   │   │   ├── persistent_table.cpp
│   │   │   ├── persistent_table.h
│   │   │   ├── persistent_table_key_value_store.cu
│   │   │   ├── persistent_table_key_value_store.h
│   │   │   └── posix_file.h
│   │   ├── ep/
│   │   │   ├── common/
│   │   │   │   ├── active_device_guard.cpp
│   │   │   │   ├── device.cpp
│   │   │   │   ├── device_manager_registry.cpp
│   │   │   │   ├── onednn.h
│   │   │   │   └── primitive/
│   │   │   │       ├── add.cpp
│   │   │   │       ├── batch_matmul.cpp
│   │   │   │       ├── binary_functor.h
│   │   │   │       ├── broadcast_elementwise_binary.h
│   │   │   │       ├── broadcast_elementwise_unary.h
│   │   │   │       ├── broadcast_matmul.h
│   │   │   │       ├── broadcast_simplify_dims_test.cpp
│   │   │   │       ├── constant_pad.h
│   │   │   │       ├── copy_nd.h
│   │   │   │       ├── elementwise_unary.h
│   │   │   │       ├── matmul.cpp
│   │   │   │       ├── permute.h
│   │   │   │       ├── permute_impl.h
│   │   │   │       ├── permute_test.cpp
│   │   │   │       ├── unary_functor.h
│   │   │   │       ├── util.h
│   │   │   │       └── where.h
│   │   │   ├── cpu/
│   │   │   │   ├── cpu_device.cpp
│   │   │   │   ├── cpu_device.h
│   │   │   │   ├── cpu_device_manager.cpp
│   │   │   │   ├── cpu_device_manager.h
│   │   │   │   ├── cpu_device_manager_factory.cpp
│   │   │   │   ├── cpu_event.cpp
│   │   │   │   ├── cpu_event.h
│   │   │   │   ├── cpu_random_generator.cpp
│   │   │   │   ├── cpu_random_generator.h
│   │   │   │   ├── cpu_stream.cpp
│   │   │   │   ├── cpu_stream.h
│   │   │   │   └── primitive/
│   │   │   │       ├── add.cpp
│   │   │   │       ├── binary_functor.h
│   │   │   │       ├── broadcast_elementwise_binary.cpp
│   │   │   │       ├── broadcast_elementwise_unary.cpp
│   │   │   │       ├── broadcast_matmul.cpp
│   │   │   │       ├── cast.cpp
│   │   │   │       ├── constant_pad.cpp
│   │   │   │       ├── copy_nd.cpp
│   │   │   │       ├── elementwise_unary.cpp
│   │   │   │       ├── fill.cpp
│   │   │   │       ├── memcpy.cpp
│   │   │   │       ├── memset.cpp
│   │   │   │       ├── permute.cpp
│   │   │   │       ├── softmax.cpp
│   │   │   │       ├── softmax_backward.cpp
│   │   │   │       ├── tensor_fill.cpp
│   │   │   │       ├── type_seq.h
│   │   │   │       ├── unary_functor.h
│   │   │   │       └── where.cpp
│   │   │   ├── cuda/
│   │   │   │   ├── cuda_device.cpp
│   │   │   │   ├── cuda_device.h
│   │   │   │   ├── cuda_device_manager.cpp
│   │   │   │   ├── cuda_device_manager.h
│   │   │   │   ├── cuda_device_manager_factory.cpp
│   │   │   │   ├── cuda_event.cpp
│   │   │   │   ├── cuda_event.h
│   │   │   │   ├── cuda_matmul_mode.cpp
│   │   │   │   ├── cuda_matmul_mode.h
│   │   │   │   ├── cuda_random_generator.cpp
│   │   │   │   ├── cuda_random_generator.h
│   │   │   │   ├── cuda_stream.cpp
│   │   │   │   ├── cuda_stream.h
│   │   │   │   └── primitive/
│   │   │   │       ├── add.cu
│   │   │   │       ├── binary_functor.cuh
│   │   │   │       ├── broadcast_elementwise_binary.cu
│   │   │   │       ├── broadcast_elementwise_binary.cuh
│   │   │   │       ├── broadcast_elementwise_binary_activation_grad_0.cu
│   │   │   │       ├── broadcast_elementwise_binary_activation_grad_1.cu
│   │   │   │       ├── broadcast_elementwise_binary_activation_grad_2.cu
│   │   │   │       ├── broadcast_elementwise_binary_bitwise.cu
│   │   │   │       ├── broadcast_elementwise_binary_comparision_0.cu
│   │   │   │       ├── broadcast_elementwise_binary_comparision_1.cu
│   │   │   │       ├── broadcast_elementwise_binary_comparision_complex.cu
│   │   │   │       ├── broadcast_elementwise_binary_logical.cu
│   │   │   │       ├── broadcast_elementwise_binary_math_0.cu
│   │   │   │       ├── broadcast_elementwise_binary_math_1.cu
│   │   │   │       ├── broadcast_elementwise_binary_math_2.cu
│   │   │   │       ├── broadcast_elementwise_binary_math_complex.cu
│   │   │   │       ├── broadcast_elementwise_unary.cu
│   │   │   │       ├── broadcast_matmul.cpp
│   │   │   │       ├── cast.cu
│   │   │   │       ├── constant_pad.cu
│   │   │   │       ├── copy_nd.cu
│   │   │   │       ├── elementwise_unary.cu
│   │   │   │       ├── fill.cu
│   │   │   │       ├── math_elementwise_unary_math_grad_0.cu
│   │   │   │       ├── math_elementwise_unary_math_grad_1.cu
│   │   │   │       ├── math_elementwise_unary_math_grad_2.cu
│   │   │   │       ├── math_elementwise_unary_math_grad_3.cu
│   │   │   │       ├── math_elementwise_unary_math_grad_complex.cu
│   │   │   │       ├── memcpy.cpp
│   │   │   │       ├── memset.cpp
│   │   │   │       ├── permute.cu
│   │   │   │       ├── softmax.cu
│   │   │   │       ├── softmax_backward.cu
│   │   │   │       ├── tensor_fill.cu
│   │   │   │       ├── type_seq.h
│   │   │   │       ├── unary_functor.cuh
│   │   │   │       └── where.cu
│   │   │   ├── include/
│   │   │   │   ├── active_device_guard.h
│   │   │   │   ├── allocation_options.h
│   │   │   │   ├── device.h
│   │   │   │   ├── device_manager.h
│   │   │   │   ├── device_manager_factory.h
│   │   │   │   ├── device_manager_registry.h
│   │   │   │   ├── event.h
│   │   │   │   ├── primitive/
│   │   │   │   │   ├── add.h
│   │   │   │   │   ├── batch_matmul.h
│   │   │   │   │   ├── binary_op.h
│   │   │   │   │   ├── blas.h
│   │   │   │   │   ├── broadcast_elementwise_binary.h
│   │   │   │   │   ├── broadcast_elementwise_unary.h
│   │   │   │   │   ├── broadcast_matmul.h
│   │   │   │   │   ├── cast.h
│   │   │   │   │   ├── constant_pad.h
│   │   │   │   │   ├── copy_nd.h
│   │   │   │   │   ├── elementwise_unary.h
│   │   │   │   │   ├── fast_integer_math.h
│   │   │   │   │   ├── fill.h
│   │   │   │   │   ├── log_softmax.h
│   │   │   │   │   ├── log_softmax_backward.h
│   │   │   │   │   ├── matmul.h
│   │   │   │   │   ├── memcpy.h
│   │   │   │   │   ├── memset.h
│   │   │   │   │   ├── one_hot.h
│   │   │   │   │   ├── permute.h
│   │   │   │   │   ├── primitive.h
│   │   │   │   │   ├── softmax.h
│   │   │   │   │   ├── softmax_backward.h
│   │   │   │   │   ├── tensor_fill.h
│   │   │   │   │   ├── unary_op.h
│   │   │   │   │   └── where.h
│   │   │   │   ├── random_generator.h
│   │   │   │   └── stream.h
│   │   │   └── test/
│   │   │       ├── primitive/
│   │   │       │   ├── add_test.cpp
│   │   │       │   ├── batch_matmul_test.cpp
│   │   │       │   ├── binary_test.cpp
│   │   │       │   ├── broadcast_matmul_test.cpp
│   │   │       │   ├── cast_test.cpp
│   │   │       │   ├── constant_pad_test.cpp
│   │   │       │   ├── copy_nd_test.cpp
│   │   │       │   ├── elementwise_unary_test.cpp
│   │   │       │   ├── fill_test.cpp
│   │   │       │   ├── matmul_test.cpp
│   │   │       │   ├── memcpy_test.cpp
│   │   │       │   ├── memset_test.cpp
│   │   │       │   ├── permute_test.cpp
│   │   │       │   ├── primitive_test.h
│   │   │       │   ├── softmax_backward_test.cpp
│   │   │       │   ├── softmax_test.cpp
│   │   │       │   ├── unary_test.cpp
│   │   │       │   └── where_test.cpp
│   │   │       └── test_util.h
│   │   ├── framework/
│   │   │   ├── arg_tuple.cpp
│   │   │   ├── arg_tuple.h
│   │   │   ├── attr_map.cpp
│   │   │   ├── attr_map.h
│   │   │   ├── attr_map_test.cpp
│   │   │   ├── attr_value.cpp
│   │   │   ├── attr_value.h
│   │   │   ├── attr_value_accessor.cpp
│   │   │   ├── attr_value_accessor.h
│   │   │   ├── auto_random_generator.cpp
│   │   │   ├── auto_random_generator.h
│   │   │   ├── autocast.cpp
│   │   │   ├── autocast.h
│   │   │   ├── compute_complexity_fn_context.h
│   │   │   ├── config_def.cpp
│   │   │   ├── config_def.h
│   │   │   ├── config_def.proto
│   │   │   ├── consistency_check.cpp
│   │   │   ├── consistency_check.h
│   │   │   ├── device.cpp
│   │   │   ├── device.h
│   │   │   ├── dtype.cpp
│   │   │   ├── dtype.h
│   │   │   ├── eager_util.h
│   │   │   ├── framework.h
│   │   │   ├── get_nd_sbp_signature_list_context.h
│   │   │   ├── global_param_grad_sync_mode.cpp
│   │   │   ├── global_param_grad_sync_mode.h
│   │   │   ├── global_tensor_infer_cache.cpp
│   │   │   ├── global_tensor_infer_cache.h
│   │   │   ├── id_util.cpp
│   │   │   ├── id_util.h
│   │   │   ├── infer_nd_sbp_fn_context.h
│   │   │   ├── infer_output_blob_time_shape_fn_context.h
│   │   │   ├── infer_util.cpp
│   │   │   ├── infer_util.h
│   │   │   ├── instructions_builder.cpp
│   │   │   ├── instructions_builder.h
│   │   │   ├── layout.cpp
│   │   │   ├── layout.h
│   │   │   ├── load_library.cpp
│   │   │   ├── load_library.h
│   │   │   ├── local_tensor_infer_cache.cpp
│   │   │   ├── local_tensor_infer_cache.h
│   │   │   ├── multi_client_session_context.cpp
│   │   │   ├── multi_client_session_context.h
│   │   │   ├── multi_thread.cpp
│   │   │   ├── multi_thread.h
│   │   │   ├── mutable_attr_map.h
│   │   │   ├── nd_sbp.cpp
│   │   │   ├── nd_sbp.h
│   │   │   ├── nn_graph.cpp
│   │   │   ├── nn_graph.h
│   │   │   ├── nn_graph_if.h
│   │   │   ├── op_builder.cpp
│   │   │   ├── op_builder.h
│   │   │   ├── op_definition.h
│   │   │   ├── op_expr.cpp
│   │   │   ├── op_expr.h
│   │   │   ├── op_expr_grad_function.cpp
│   │   │   ├── op_expr_grad_function.h
│   │   │   ├── op_interpreter/
│   │   │   │   ├── dispatch_frame.cpp
│   │   │   │   ├── dispatch_frame.h
│   │   │   │   ├── eager_global_op_interpreter.cpp
│   │   │   │   ├── eager_local_op_interpreter.cpp
│   │   │   │   ├── eager_local_op_interpreter.h
│   │   │   │   ├── lazy_op_interpreter.cpp
│   │   │   │   ├── lazy_op_interpreter.h
│   │   │   │   ├── op_interpreter.cpp
│   │   │   │   ├── op_interpreter_util.cpp
│   │   │   │   └── op_interpreter_util.h
│   │   │   ├── op_interpreter.h
│   │   │   ├── op_kernel.cpp
│   │   │   ├── op_kernel.h
│   │   │   ├── op_kernel_infer_cache.cpp
│   │   │   ├── op_kernel_infer_cache.h
│   │   │   ├── ordered_string_list.h
│   │   │   ├── parallel_conf_util.cpp
│   │   │   ├── parallel_conf_util.h
│   │   │   ├── parallel_conf_util_test.cpp
│   │   │   ├── placed_nd_sbp.cpp
│   │   │   ├── placed_nd_sbp.h
│   │   │   ├── placement_sbp_util.cpp
│   │   │   ├── placement_sbp_util.h
│   │   │   ├── placement_sbp_util_test.cpp
│   │   │   ├── placement_utils.cpp
│   │   │   ├── placement_utils.h
│   │   │   ├── random_generator.cpp
│   │   │   ├── random_generator.h
│   │   │   ├── rank_group_rpc_util.cpp
│   │   │   ├── rank_group_rpc_util.h
│   │   │   ├── saved_tensor_hooks.h
│   │   │   ├── sbp_context.cpp
│   │   │   ├── sbp_context.h
│   │   │   ├── sbp_infer_util.cpp
│   │   │   ├── sbp_infer_util.h
│   │   │   ├── sbp_infer_util_test.cpp
│   │   │   ├── scope_util.cpp
│   │   │   ├── scope_util.h
│   │   │   ├── session_util.cpp
│   │   │   ├── session_util.h
│   │   │   ├── shut_down_util.cpp
│   │   │   ├── shut_down_util.h
│   │   │   ├── stream.cpp
│   │   │   ├── stream.h
│   │   │   ├── stream_allocator_is_pinned.h
│   │   │   ├── stream_get_stream_type_name.h
│   │   │   ├── stream_guard.cpp
│   │   │   ├── stream_guard.h
│   │   │   ├── stream_is_comm_net_stream.h
│   │   │   ├── stream_mgr.cpp
│   │   │   ├── stream_mgr.h
│   │   │   ├── stream_need_soft_sync.h
│   │   │   ├── stream_on_independent_thread.h
│   │   │   ├── stream_set.cpp
│   │   │   ├── stream_set.h
│   │   │   ├── stream_support_stream_wait.h
│   │   │   ├── symbol_storage_util.cpp
│   │   │   ├── symbol_storage_util.h
│   │   │   ├── sync_symbol_global_tensor_meta.cpp
│   │   │   ├── sync_symbol_global_tensor_meta.h
│   │   │   ├── sync_symbol_nd_sbp.cpp
│   │   │   ├── sync_symbol_nd_sbp.h
│   │   │   ├── sync_symbol_parallel_desc.cpp
│   │   │   ├── sync_symbol_parallel_desc.h
│   │   │   ├── synced_symbol_map.cpp
│   │   │   ├── synced_symbol_map.h
│   │   │   ├── tensor.cpp
│   │   │   ├── tensor.h
│   │   │   ├── tensor_arg.cpp
│   │   │   ├── tensor_arg.h
│   │   │   ├── tensor_global_id.cpp
│   │   │   ├── tensor_global_id.h
│   │   │   ├── tensor_impl.cpp
│   │   │   ├── tensor_impl.h
│   │   │   ├── tensor_methods.cpp
│   │   │   ├── tensor_methods.h
│   │   │   ├── tensor_name_scope.cpp
│   │   │   ├── tensor_name_scope.h
│   │   │   ├── tensor_rpc_util.cpp
│   │   │   ├── tensor_rpc_util.h
│   │   │   ├── tensor_storage.cpp
│   │   │   ├── tensor_storage.h
│   │   │   ├── tensor_tuple.cpp
│   │   │   ├── tensor_tuple.h
│   │   │   ├── tensor_util.cpp
│   │   │   ├── tensor_util.h
│   │   │   ├── to_string.cpp
│   │   │   ├── to_string.h
│   │   │   ├── transport_token.cpp
│   │   │   ├── transport_token.h
│   │   │   ├── transport_util.cpp
│   │   │   ├── transport_util.h
│   │   │   ├── user_op_attr.proto
│   │   │   ├── user_op_conf.cpp
│   │   │   ├── user_op_conf.h
│   │   │   ├── user_op_conf.proto
│   │   │   ├── user_op_def.cpp
│   │   │   ├── user_op_def.h
│   │   │   ├── user_op_def.proto
│   │   │   ├── user_op_hob.h
│   │   │   ├── user_op_kernel_registry.cpp
│   │   │   ├── user_op_kernel_registry.h
│   │   │   ├── user_op_registry.cpp
│   │   │   ├── user_op_registry.h
│   │   │   ├── user_op_registry_manager.cpp
│   │   │   ├── user_op_registry_manager.h
│   │   │   ├── user_op_tensor.h
│   │   │   ├── util.h
│   │   │   ├── variable_meta_info.proto
│   │   │   ├── variable_tensor_mgr.cpp
│   │   │   └── variable_tensor_mgr.h
│   │   ├── functional/
│   │   │   ├── function_library.h
│   │   │   ├── functional.h
│   │   │   ├── functional_api.yaml
│   │   │   ├── impl/
│   │   │   │   ├── activation_functor.cpp
│   │   │   │   ├── array_functor.cpp
│   │   │   │   ├── binary_functor.cpp
│   │   │   │   ├── binary_functor.h
│   │   │   │   ├── binary_grad_functor.cpp
│   │   │   │   ├── comm_functor.cpp
│   │   │   │   ├── common.cpp
│   │   │   │   ├── common.h
│   │   │   │   ├── dataset_functor.cpp
│   │   │   │   ├── eye_functor.cpp
│   │   │   │   ├── fused_attention_functor.cpp
│   │   │   │   ├── global_cast.cpp
│   │   │   │   ├── gradient_accumulation_functor.cpp
│   │   │   │   ├── higher_derivative_functor.cpp
│   │   │   │   ├── linalg_functor.cpp
│   │   │   │   ├── math_functor.cpp
│   │   │   │   ├── nn_functor.cpp
│   │   │   │   ├── nn_grad_functor.cpp
│   │   │   │   ├── quantization.cpp
│   │   │   │   ├── random_functor.cpp
│   │   │   │   ├── rnn_functor.cpp
│   │   │   │   ├── slice_boxing_functor.cpp
│   │   │   │   ├── test_functor.cpp
│   │   │   │   ├── unary_functor.cpp
│   │   │   │   ├── unary_functor.h
│   │   │   │   └── util_ops_functor.cpp
│   │   │   ├── packed_functor.h
│   │   │   ├── sequence_function.h
│   │   │   ├── tensor_index.cpp
│   │   │   ├── tensor_index.h
│   │   │   ├── tensor_processor.cpp
│   │   │   └── tensor_processor.h
│   │   ├── graph/
│   │   │   ├── boxing/
│   │   │   │   ├── b21_sub_task_graph_builder.cpp
│   │   │   │   ├── b21_sub_task_graph_builder.h
│   │   │   │   ├── boxing_logger.cpp
│   │   │   │   ├── boxing_logger.h
│   │   │   │   ├── ccl_sub_task_graph_builder.cpp
│   │   │   │   ├── ccl_sub_task_graph_builder.h
│   │   │   │   ├── chain_sub_task_graph_builder.cpp
│   │   │   │   ├── chain_sub_task_graph_builder.h
│   │   │   │   ├── collective_boxing.proto
│   │   │   │   ├── collective_boxing_sub_task_graph_builder.cpp
│   │   │   │   ├── collective_boxing_sub_task_graph_builder.h
│   │   │   │   ├── collective_boxing_util.cpp
│   │   │   │   ├── collective_boxing_util.h
│   │   │   │   ├── fallback_to_cpu_slice_boxing_sub_task_graph_builder.cpp
│   │   │   │   ├── fallback_to_cpu_slice_boxing_sub_task_graph_builder.h
│   │   │   │   ├── hierarchical_sub_task_graph_builder.h
│   │   │   │   ├── hierarchical_sub_task_graph_builder_impl.cpp
│   │   │   │   ├── hierarchical_sub_task_graph_builder_impl.h
│   │   │   │   ├── hierarchical_sub_task_graph_builder_util.cpp
│   │   │   │   ├── hierarchical_sub_task_graph_builder_util.h
│   │   │   │   ├── naive_b2b_sub_task_graph_builder.cpp
│   │   │   │   ├── naive_b2b_sub_task_graph_builder.h
│   │   │   │   ├── naive_b2p_sub_task_graph_builder.cpp
│   │   │   │   ├── naive_b2p_sub_task_graph_builder.h
│   │   │   │   ├── one_to_one_sub_task_graph_builder.cpp
│   │   │   │   ├── one_to_one_sub_task_graph_builder.h
│   │   │   │   ├── slice_boxing_sub_task_graph_builder.cpp
│   │   │   │   ├── slice_boxing_sub_task_graph_builder.h
│   │   │   │   ├── sub_task_graph_builder.h
│   │   │   │   ├── sub_task_graph_builder_context.cpp
│   │   │   │   ├── sub_task_graph_builder_context.h
│   │   │   │   ├── sub_task_graph_builder_status_util.cpp
│   │   │   │   ├── sub_task_graph_builder_status_util.h
│   │   │   │   ├── sub_task_graph_builder_util.cpp
│   │   │   │   └── sub_task_graph_builder_util.h
│   │   │   ├── boxing_identity_task_node.cpp
│   │   │   ├── boxing_identity_task_node.h
│   │   │   ├── boxing_task_graph.proto
│   │   │   ├── boxing_zeros_task_node.cpp
│   │   │   ├── boxing_zeros_task_node.h
│   │   │   ├── collective_boxing_pack_task_node.cpp
│   │   │   ├── collective_boxing_pack_task_node.h
│   │   │   ├── collective_boxing_task_node.cpp
│   │   │   ├── collective_boxing_task_node.h
│   │   │   ├── collective_boxing_unpack_task_node.cpp
│   │   │   ├── collective_boxing_unpack_task_node.h
│   │   │   ├── compute_task_node.cpp
│   │   │   ├── compute_task_node.h
│   │   │   ├── copy_task_node.cpp
│   │   │   ├── copy_task_node.h
│   │   │   ├── exec_graph.cpp
│   │   │   ├── exec_graph.h
│   │   │   ├── exec_sequence.proto
│   │   │   ├── fake_consumed_regst_provider.h
│   │   │   ├── graph.h
│   │   │   ├── inplace_lbi_graph.cpp
│   │   │   ├── inplace_lbi_graph.h
│   │   │   ├── inplace_regst_graph.cpp
│   │   │   ├── inplace_regst_graph.h
│   │   │   ├── nccl_send_recv_boxing_task_node.cpp
│   │   │   ├── nccl_send_recv_boxing_task_node.h
│   │   │   ├── node.cpp
│   │   │   ├── node.h
│   │   │   ├── normal_forward_compute_task_node.h
│   │   │   ├── op_graph.cpp
│   │   │   ├── op_graph.h
│   │   │   ├── plan_task_graph.cpp
│   │   │   ├── plan_task_graph.h
│   │   │   ├── slice_boxing_task_node.cpp
│   │   │   ├── slice_boxing_task_node.h
│   │   │   ├── straighten_nodes.cpp
│   │   │   ├── straighten_nodes.h
│   │   │   ├── stream_id.cpp
│   │   │   ├── stream_id.h
│   │   │   ├── stream_index_generator.cpp
│   │   │   ├── stream_index_generator.h
│   │   │   ├── task_edge.proto
│   │   │   ├── task_graph.cpp
│   │   │   ├── task_graph.h
│   │   │   ├── task_graph_rebuild_ctx.cpp
│   │   │   ├── task_graph_rebuild_ctx.h
│   │   │   ├── task_id.cpp
│   │   │   ├── task_id.h
│   │   │   ├── task_id_generator.cpp
│   │   │   ├── task_id_generator.h
│   │   │   ├── task_node.cpp
│   │   │   ├── task_node.h
│   │   │   ├── task_stream_id.h
│   │   │   ├── task_stream_index_manager.cpp
│   │   │   ├── task_stream_index_manager.h
│   │   │   ├── task_type_visitor.h
│   │   │   ├── transport_task_node.cpp
│   │   │   └── transport_task_node.h
│   │   ├── graph_impl/
│   │   │   ├── acc_compute_task_node.cpp
│   │   │   ├── acc_ctrl_tick_compute_task_node.cpp
│   │   │   ├── acc_tick_compute_task_node.cpp
│   │   │   ├── callback_notify_compute_task_node.cpp
│   │   │   ├── case_compute_task_node.cpp
│   │   │   ├── critical_section_wait_compute_task_node.cpp
│   │   │   ├── decode_h2d_compute_task_node.cpp
│   │   │   ├── device_tick_compute_task_node.cpp
│   │   │   ├── distribute_concat_compute_task_node.cpp
│   │   │   ├── distribute_split_compute_task_node.cpp
│   │   │   ├── dst_subset_tick_compute_task_node.cpp
│   │   │   ├── esac_compute_task_node.cpp
│   │   │   ├── normal_forward_compute_task_node.cpp
│   │   │   ├── pack_compute_task_node.cpp
│   │   │   ├── reentrant_lock_compute_task_node.cpp
│   │   │   ├── repeat_compute_task_node.cpp
│   │   │   ├── source_tick_compute_task_node.cpp
│   │   │   ├── src_subset_tick_compute_task_node.cpp
│   │   │   ├── ssp_variable_proxy_task_node.cpp
│   │   │   ├── tick_compute_task_node.cpp
│   │   │   ├── unpack_compute_task_node.cpp
│   │   │   └── wait_and_send_ids_compute_task_node.cpp
│   │   ├── hardware/
│   │   │   ├── basic_device_descriptor_list.cpp
│   │   │   ├── basic_device_descriptor_list.h
│   │   │   ├── cuda_device_descriptor.cpp
│   │   │   ├── cuda_device_descriptor.h
│   │   │   ├── cuda_device_descriptor_class.cpp
│   │   │   ├── device_descriptor.h
│   │   │   ├── device_descriptor_class.cpp
│   │   │   ├── device_descriptor_class.h
│   │   │   ├── device_descriptor_list.h
│   │   │   ├── net_ib_device_descriptor.cpp
│   │   │   ├── net_ib_device_descriptor.h
│   │   │   ├── net_ib_device_descriptor_class.cpp
│   │   │   ├── net_socket_device_descriptor.cpp
│   │   │   ├── net_socket_device_descriptor.h
│   │   │   ├── net_socket_device_descriptor_class.cpp
│   │   │   ├── node_device_descriptor.cpp
│   │   │   ├── node_device_descriptor.h
│   │   │   ├── node_device_descriptor_manager.cpp
│   │   │   ├── node_device_descriptor_manager.h
│   │   │   ├── topology_descriptor.cpp
│   │   │   └── topology_descriptor.h
│   │   ├── intrusive/
│   │   │   ├── README.md
│   │   │   ├── base.h
│   │   │   ├── cpp_attribute.h
│   │   │   ├── dss.h
│   │   │   ├── dss_test.cpp
│   │   │   ├── flat_msg.h
│   │   │   ├── flat_msg_test.cpp
│   │   │   ├── flat_msg_view.h
│   │   │   ├── flat_msg_view_test.cpp
│   │   │   ├── for_each.h
│   │   │   ├── force_standard_layout.h
│   │   │   ├── force_standard_layout_test.cpp
│   │   │   ├── head_free_list.h
│   │   │   ├── head_free_list_test.cpp
│   │   │   ├── intrusive.h
│   │   │   ├── intrusive_core_test.cpp
│   │   │   ├── list.h
│   │   │   ├── list_hook.h
│   │   │   ├── list_hook_test.cpp
│   │   │   ├── list_test.cpp
│   │   │   ├── mutexed_list.h
│   │   │   ├── object_pool.h
│   │   │   ├── object_pool_test.cpp
│   │   │   ├── ref.h
│   │   │   ├── reflective.h
│   │   │   ├── shared_ptr.h
│   │   │   ├── skiplist.h
│   │   │   ├── skiplist_hook.h
│   │   │   ├── skiplist_hook_test.cpp
│   │   │   ├── skiplist_test.cpp
│   │   │   ├── static_counter.h
│   │   │   ├── static_counter_test.cpp
│   │   │   ├── struct_traits.h
│   │   │   └── struct_traits_test.cpp
│   │   ├── ipc/
│   │   │   ├── shared_memory.cpp
│   │   │   └── shared_memory.h
│   │   ├── job/
│   │   │   ├── blob_lifetime_signature.proto
│   │   │   ├── checkpointing_config_def.cpp
│   │   │   ├── cluster_instruction.cpp
│   │   │   ├── cluster_instruction.h
│   │   │   ├── cluster_instruction.proto
│   │   │   ├── collective_boxing/
│   │   │   │   ├── coordinator.h
│   │   │   │   ├── executor.cpp
│   │   │   │   ├── executor.h
│   │   │   │   ├── executor_backend.h
│   │   │   │   ├── executor_backend_manager.cpp
│   │   │   │   ├── executor_backend_manager.h
│   │   │   │   ├── nccl_executor_backend.cu
│   │   │   │   ├── request_store.cpp
│   │   │   │   ├── request_store.h
│   │   │   │   ├── runtime_request_info.h
│   │   │   │   ├── scheduler.cpp
│   │   │   │   ├── scheduler.h
│   │   │   │   ├── static_group_coordinator.cpp
│   │   │   │   └── static_group_coordinator.h
│   │   │   ├── compile_mode.cpp
│   │   │   ├── compile_mode.h
│   │   │   ├── compiler.cpp
│   │   │   ├── compiler.h
│   │   │   ├── critical_section.proto
│   │   │   ├── critical_section_desc.cpp
│   │   │   ├── critical_section_desc.h
│   │   │   ├── critical_section_instance.h
│   │   │   ├── distribute_hirarchy.proto
│   │   │   ├── dlnet_conf.proto
│   │   │   ├── eager_ccl_comm_manager.cpp
│   │   │   ├── eager_ccl_comm_manager.h
│   │   │   ├── eager_nccl_comm_manager.cpp
│   │   │   ├── eager_nccl_comm_manager.h
│   │   │   ├── env.proto
│   │   │   ├── env_desc.cpp
│   │   │   ├── env_desc.h
│   │   │   ├── env_global_objects_scope.cpp
│   │   │   ├── env_global_objects_scope.h
│   │   │   ├── function_config_def.cpp
│   │   │   ├── global_for.cpp
│   │   │   ├── global_for.h
│   │   │   ├── global_mode.cpp
│   │   │   ├── global_mode.h
│   │   │   ├── graph_scope_vars.cpp
│   │   │   ├── graph_scope_vars.h
│   │   │   ├── id_manager.cpp
│   │   │   ├── id_manager.h
│   │   │   ├── id_manager_test.cpp
│   │   │   ├── id_state.h
│   │   │   ├── initializer_conf.proto
│   │   │   ├── inter_job_mem_sharing_util.cpp
│   │   │   ├── inter_job_mem_sharing_util.h
│   │   │   ├── inter_user_job_info.proto
│   │   │   ├── intra_job_mem_sharing_util.cpp
│   │   │   ├── intra_job_mem_sharing_util.h
│   │   │   ├── job.proto
│   │   │   ├── job_build_and_infer_ctx.cpp
│   │   │   ├── job_build_and_infer_ctx.h
│   │   │   ├── job_build_and_infer_ctx_mgr.cpp
│   │   │   ├── job_build_and_infer_ctx_mgr.h
│   │   │   ├── job_builder.cpp
│   │   │   ├── job_builder.h
│   │   │   ├── job_conf.proto
│   │   │   ├── job_desc.cpp
│   │   │   ├── job_desc.h
│   │   │   ├── job_instance.h
│   │   │   ├── job_interpreter.cpp
│   │   │   ├── job_interpreter.h
│   │   │   ├── job_ir.cpp
│   │   │   ├── job_ir.h
│   │   │   ├── job_set.proto
│   │   │   ├── job_set_compile_ctx.h
│   │   │   ├── job_set_compile_ctx.proto
│   │   │   ├── lazy_mode.cpp
│   │   │   ├── lazy_mode.h
│   │   │   ├── learning_rate_schedule_conf.proto
│   │   │   ├── local_parallel.proto
│   │   │   ├── local_sig_infer_hint.h
│   │   │   ├── memory_share_strategy.cpp
│   │   │   ├── memory_share_strategy.h
│   │   │   ├── module_conf.proto
│   │   │   ├── nd_sbp_infer_hint.h
│   │   │   ├── nd_sbp_util.cpp
│   │   │   ├── nd_sbp_util.h
│   │   │   ├── oneflow.cpp
│   │   │   ├── oneflow.h
│   │   │   ├── parallel_conf_signature.proto
│   │   │   ├── parallel_desc.cpp
│   │   │   ├── parallel_desc.h
│   │   │   ├── parallel_desc_test.cpp
│   │   │   ├── parallel_signature.proto
│   │   │   ├── pipeline_config_def.cpp
│   │   │   ├── placement.proto
│   │   │   ├── placement_scope.cpp
│   │   │   ├── placement_scope.h
│   │   │   ├── plan.proto
│   │   │   ├── plan_util.cpp
│   │   │   ├── plan_util.h
│   │   │   ├── qat_config_def.cpp
│   │   │   ├── rank_compiler.cpp
│   │   │   ├── rank_compiler.h
│   │   │   ├── rank_group.cpp
│   │   │   ├── rank_group.h
│   │   │   ├── rank_group_scope.cpp
│   │   │   ├── rank_group_scope.h
│   │   │   ├── rank_group_test.cpp
│   │   │   ├── regularizer_conf.proto
│   │   │   ├── resource.proto
│   │   │   ├── resource_desc.cpp
│   │   │   ├── resource_desc.h
│   │   │   ├── runtime.cpp
│   │   │   ├── runtime.h
│   │   │   ├── runtime_buffer_managers_scope.cpp
│   │   │   ├── runtime_buffer_managers_scope.h
│   │   │   ├── runtime_buffers_scope.cpp
│   │   │   ├── runtime_buffers_scope.h
│   │   │   ├── runtime_context.cpp
│   │   │   ├── runtime_context.h
│   │   │   ├── runtime_job_descs.cpp
│   │   │   ├── runtime_job_descs.h
│   │   │   ├── sbp_infer_hint.h
│   │   │   ├── sbp_parallel.cpp
│   │   │   ├── sbp_parallel.h
│   │   │   ├── sbp_parallel.proto
│   │   │   ├── sbp_signature_builder.cpp
│   │   │   ├── sbp_signature_builder.h
│   │   │   ├── scope.cpp
│   │   │   ├── scope.h
│   │   │   ├── scope.proto
│   │   │   ├── session.cpp
│   │   │   ├── session.h
│   │   │   ├── ssp_config_def.cpp
│   │   │   ├── sub_plan.proto
│   │   │   ├── task.proto
│   │   │   ├── utils/
│   │   │   │   ├── progress_bar.cpp
│   │   │   │   └── progress_bar.h
│   │   │   ├── version.cpp
│   │   │   └── version.h
│   │   ├── job_rewriter/
│   │   │   ├── adadelta_optim.cpp
│   │   │   ├── adagrad_optm.cpp
│   │   │   ├── adam_optm.cpp
│   │   │   ├── add_ssp_variable_proxy.cpp
│   │   │   ├── auto_learning_rate.cpp
│   │   │   ├── auto_mixed_precision.cpp
│   │   │   ├── auto_mixed_precision.h
│   │   │   ├── auto_mixed_precision_lists.cpp
│   │   │   ├── auto_mixed_precision_lists.h
│   │   │   ├── auto_parallel.cpp
│   │   │   ├── auto_train_step.cpp
│   │   │   ├── autograd.cpp
│   │   │   ├── autograd.h
│   │   │   ├── autotick.cpp
│   │   │   ├── autotick.h
│   │   │   ├── boxing_with_middle_nodes.cpp
│   │   │   ├── boxing_with_middle_nodes.h
│   │   │   ├── calculation_pass.cpp
│   │   │   ├── calculation_pass.h
│   │   │   ├── checkpointing_pass.cpp
│   │   │   ├── clip_by_global_norm_job_pass_state.h
│   │   │   ├── clone_grad.cpp
│   │   │   ├── clone_grad.h
│   │   │   ├── cudnn_fused_normalization_add_relu_pass.cpp
│   │   │   ├── cutlass_conv_tuning_warmup_pass.cpp
│   │   │   ├── delay_variable_op_execution_pass.cpp
│   │   │   ├── device_tick_autotick.cpp
│   │   │   ├── do_parallel_cast_before_widening_type_cast_pass.cpp
│   │   │   ├── dump_blob_parallel_conf_pass.cpp
│   │   │   ├── dump_variable_info_pass.cpp
│   │   │   ├── dynamic_loss_scale_job_pass_state.h
│   │   │   ├── dynamic_loss_scale_schedule_pass.cpp
│   │   │   ├── eliminate_dead_nodes_pass.cpp
│   │   │   ├── fix_pipeline_stage_id_pass.cpp
│   │   │   ├── ftrl_optm.cpp
│   │   │   ├── fuse_add_to_output_pass.cpp
│   │   │   ├── fuse_bce_reduce_mean_fw_bw_pass.cpp
│   │   │   ├── fuse_cast_scale_pass.cpp
│   │   │   ├── fuse_consecutive_add_pass.cpp
│   │   │   ├── fuse_embedding_interaction_pass.cpp
│   │   │   ├── fuse_model_update_cast_pass.cpp
│   │   │   ├── fuse_update_ops_pass.cpp
│   │   │   ├── generate_optimizer_op_confs.cpp
│   │   │   ├── group_boxing_by_dst_parallel.cpp
│   │   │   ├── group_boxing_by_dst_parallel.h
│   │   │   ├── indexed_slices_optimizer_rewrite_pass.cpp
│   │   │   ├── input_autotick.cpp
│   │   │   ├── insert_nccl_logical_op_pass.cpp
│   │   │   ├── insert_pinned_identity_op_pass.cpp
│   │   │   ├── job_completer.cpp
│   │   │   ├── job_completer.h
│   │   │   ├── job_pass.cpp
│   │   │   ├── job_pass.h
│   │   │   ├── lamb_optm.cpp
│   │   │   ├── lars_optm.cpp
│   │   │   ├── logical_chain_pass.cpp
│   │   │   ├── momentum_optm.cpp
│   │   │   ├── multi_tensor_model_update.cpp
│   │   │   ├── nccl_logical_chain_strict_order_pass.cpp
│   │   │   ├── nccl_logical_op_fusion_pass.cpp
│   │   │   ├── normalization_exponential_average_auto_tick_rewrite_pass.cpp
│   │   │   ├── optimizer.cpp
│   │   │   ├── optimizer.h
│   │   │   ├── optimizer_placement_optimization_pass.cpp
│   │   │   ├── pass_util.cpp
│   │   │   ├── pass_util.h
│   │   │   ├── pipeline_buffer_pass.cpp
│   │   │   ├── prune_amp_white_identity_op_pass.cpp
│   │   │   ├── prune_cast_to_static_shape_op_pass.cpp
│   │   │   ├── prune_depend_op_pass.cpp
│   │   │   ├── prune_parallel_cast_op_pass.cpp
│   │   │   ├── prune_pinned_identity_op_pass.cpp
│   │   │   ├── quantization_aware_training.cpp
│   │   │   ├── replace_embedding_ops_pass.cpp
│   │   │   ├── rmsprop_optm.cpp
│   │   │   ├── sequential_one_embedding_shuffle_ops_pass.cpp
│   │   │   ├── sgd_optm.cpp
│   │   │   ├── source_user_op_auto_tick.cpp
│   │   │   ├── split_sparse_softmax_cross_entropy_op_pass.cpp
│   │   │   ├── system_op_fill_job_name_pass.cpp
│   │   │   ├── tick_autotick.cpp
│   │   │   └── variable_autotick.cpp
│   │   ├── kernel/
│   │   │   ├── assign_kernel.cpp
│   │   │   ├── blob_access_checker_kernel_observer.cpp
│   │   │   ├── blob_access_checker_kernel_observer.h
│   │   │   ├── blob_tensor_view.cpp
│   │   │   ├── blob_tensor_view.h
│   │   │   ├── boxing_kernel.cpp
│   │   │   ├── boxing_zeros_kernel.cpp
│   │   │   ├── broadcast_to_compatible_with_kernel.cpp
│   │   │   ├── callback_notify_kernel.cpp
│   │   │   ├── case_kernel.cpp
│   │   │   ├── case_kernel.h
│   │   │   ├── chain_kernel_observer.cpp
│   │   │   ├── chain_kernel_observer.h
│   │   │   ├── collective_boxing_kernels.cpp
│   │   │   ├── collective_boxing_pack_kernel.cpp
│   │   │   ├── collective_boxing_unpack_kernel.cpp
│   │   │   ├── constant_like_kernel.cpp
│   │   │   ├── cpu_check_numerics_kernel_observer.h
│   │   │   ├── cpu_numerics_kernel_observer.cpp
│   │   │   ├── critical_section_callback_tick_kernel.cpp
│   │   │   ├── critical_section_wait_tick_kernel.cpp
│   │   │   ├── cuda_check_numerics_kernel_observer.cu
│   │   │   ├── cuda_check_numerics_kernel_observer.h
│   │   │   ├── cuda_graph_support.h
│   │   │   ├── distribute_kernels.cpp
│   │   │   ├── dynamic_reshape_kernel.cpp
│   │   │   ├── dynamic_reshape_like_kernel.cpp
│   │   │   ├── esac_kernel.cpp
│   │   │   ├── esac_kernel.h
│   │   │   ├── identity_kernel.cpp
│   │   │   ├── image_decoder_random_crop_resize_kernel.cpp
│   │   │   ├── input_kernel.cpp
│   │   │   ├── kernel.cpp
│   │   │   ├── kernel.h
│   │   │   ├── kernel.proto
│   │   │   ├── kernel_context.h
│   │   │   ├── kernel_observer.h
│   │   │   ├── kernel_registration.cpp
│   │   │   ├── kernel_registration.h
│   │   │   ├── kernel_util.cpp
│   │   │   ├── kernel_util.cuh
│   │   │   ├── kernel_util.h
│   │   │   ├── learning_rate_schedule_kernel.cpp
│   │   │   ├── nccl_send_recv_boxing_kernel.cpp
│   │   │   ├── new_kernel_util.h
│   │   │   ├── nop_kernel.cpp
│   │   │   ├── output_kernel.cpp
│   │   │   ├── profiler_kernel_observer.cpp
│   │   │   ├── profiler_kernel_observer.h
│   │   │   ├── random_generator.cpp
│   │   │   ├── random_generator.cu
│   │   │   ├── random_generator.h
│   │   │   ├── reentrant_lock_kernel.cpp
│   │   │   ├── reentrant_lock_kernel.h
│   │   │   ├── return_kernel.cpp
│   │   │   ├── runtime_blob_shape_infer_helper.cpp
│   │   │   ├── runtime_blob_shape_infer_helper.h
│   │   │   ├── shape_elem_cnt_kernel.cpp
│   │   │   ├── slice_boxing_kernel.cpp
│   │   │   ├── sync_check_kernel_observer.cpp
│   │   │   ├── sync_check_kernel_observer.h
│   │   │   ├── sync_dynamic_resize_kernel.cpp
│   │   │   ├── total_loss_instance_num_kernel.cpp
│   │   │   ├── user_kernel.cpp
│   │   │   ├── user_kernel.h
│   │   │   ├── util/
│   │   │   │   ├── cuda_half_util.h
│   │   │   │   ├── numeric_limits.cuh
│   │   │   │   └── numerics.cuh
│   │   │   ├── wait_and_send_ids_kernel.cpp
│   │   │   └── wait_and_send_ids_kernel.h
│   │   ├── lazy/
│   │   │   ├── actor/
│   │   │   │   ├── acc_actor.cpp
│   │   │   │   ├── acc_ctrl_tick_actor.cpp
│   │   │   │   ├── acc_tick_actor.cpp
│   │   │   │   ├── actor.cpp
│   │   │   │   ├── actor.h
│   │   │   │   ├── actor_base.cpp
│   │   │   │   ├── actor_base.h
│   │   │   │   ├── actor_context.cpp
│   │   │   │   ├── actor_context.h
│   │   │   │   ├── actor_message.cpp
│   │   │   │   ├── actor_message.h
│   │   │   │   ├── actor_message_bus.cpp
│   │   │   │   ├── actor_message_bus.h
│   │   │   │   ├── boxing_zeros_actor.cpp
│   │   │   │   ├── callback_notify_actor.cpp
│   │   │   │   ├── case_actor.cpp
│   │   │   │   ├── collective_boxing_actor_context.cpp
│   │   │   │   ├── collective_boxing_actor_context.h
│   │   │   │   ├── copy_comm_net_actor.cpp
│   │   │   │   ├── esac_actor.cpp
│   │   │   │   ├── generic_actor_context.cpp
│   │   │   │   ├── generic_actor_context.h
│   │   │   │   ├── input_wise_actor.cpp
│   │   │   │   ├── input_wise_actor.h
│   │   │   │   ├── light_actor.cpp
│   │   │   │   ├── light_actor.h
│   │   │   │   ├── naive_actor.cpp
│   │   │   │   ├── naive_actor.h
│   │   │   │   ├── pack_actor.cpp
│   │   │   │   ├── reentrant_lock_actor.cpp
│   │   │   │   ├── register_slot.cpp
│   │   │   │   ├── register_slot.h
│   │   │   │   ├── repeat_actor.cpp
│   │   │   │   ├── sink_actor.cpp
│   │   │   │   ├── sink_actor.h
│   │   │   │   ├── source_tick_actor.cpp
│   │   │   │   ├── ssp_variable_proxy_actor.cpp
│   │   │   │   ├── tick_actor.cpp
│   │   │   │   ├── unpack_actor.cpp
│   │   │   │   └── wait_and_send_ids_actor.cpp
│   │   │   └── stream_context/
│   │   │       ├── common/
│   │   │       │   └── generic_stream_context.cpp
│   │   │       ├── cpu/
│   │   │       │   └── cpu_stream_context.cpp
│   │   │       ├── cuda/
│   │   │       │   └── cuda_stream_context.cpp
│   │   │       └── include/
│   │   │           ├── generic_stream_context.h
│   │   │           └── stream_context.h
│   │   ├── memory/
│   │   │   ├── chunk_manager.cpp
│   │   │   ├── chunk_manager.h
│   │   │   ├── memory_allocator.cpp
│   │   │   ├── memory_allocator.h
│   │   │   ├── memory_block.proto
│   │   │   ├── memory_case.proto
│   │   │   ├── memory_case_util.cpp
│   │   │   ├── memory_case_util.h
│   │   │   ├── memory_zone.cpp
│   │   │   └── memory_zone.h
│   │   ├── ndarray/
│   │   │   ├── binary_func.h
│   │   │   ├── cpu_concat_var_ndarray.h
│   │   │   ├── cpu_concat_var_ndarray_test.cpp
│   │   │   ├── cpu_ndarray.h
│   │   │   ├── cpu_ndarray_builder.h
│   │   │   ├── cpu_ndarray_copy.h
│   │   │   ├── cpu_slice_var_ndarray.h
│   │   │   ├── cpu_slice_var_ndarray_test.cpp
│   │   │   ├── cpu_var_ndarray.h
│   │   │   ├── cpu_var_ndarray_test.cpp
│   │   │   ├── ndarray_apply_binary.h
│   │   │   ├── ndarray_apply_binary_core.cpp
│   │   │   ├── ndarray_apply_binary_core.cu
│   │   │   ├── ndarray_apply_binary_core.h
│   │   │   ├── ndarray_apply_broadcast_binary.h
│   │   │   ├── ndarray_apply_broadcast_binary_core.cpp
│   │   │   ├── ndarray_apply_broadcast_binary_core.cu
│   │   │   ├── ndarray_apply_broadcast_binary_core.h
│   │   │   ├── ndarray_apply_broadcast_unary.h
│   │   │   ├── ndarray_apply_broadcast_unary_core.cpp
│   │   │   ├── ndarray_apply_broadcast_unary_core.cu
│   │   │   ├── ndarray_apply_broadcast_unary_core.h
│   │   │   ├── ndarray_apply_unary.h
│   │   │   ├── ndarray_apply_unary_core.cpp
│   │   │   ├── ndarray_apply_unary_core.cu
│   │   │   ├── ndarray_apply_unary_core.h
│   │   │   ├── ndarray_assign_core.cpp
│   │   │   ├── ndarray_assign_core.cu
│   │   │   ├── ndarray_assign_core.h
│   │   │   ├── ndarray_reduce.h
│   │   │   ├── ndarray_reduce_impl.cpp
│   │   │   ├── ndarray_reduce_impl.cu
│   │   │   ├── ndarray_reduce_impl.h
│   │   │   ├── ndarray_util.h
│   │   │   ├── slice.cpp
│   │   │   ├── slice.h
│   │   │   ├── slice_test.cpp
│   │   │   ├── unary_func.h
│   │   │   ├── xpu_binary_func_ndarray.h
│   │   │   ├── xpu_broadcast_ndarray.h
│   │   │   ├── xpu_ndarray_assign.cu
│   │   │   ├── xpu_ndarray_assign.h
│   │   │   ├── xpu_ndarray_base.h
│   │   │   ├── xpu_reduced_ndarray.h
│   │   │   ├── xpu_reshape_ndarray.h
│   │   │   ├── xpu_shape.cpp
│   │   │   ├── xpu_shape.h
│   │   │   ├── xpu_transpose_ndarray.h
│   │   │   ├── xpu_unary_func_ndarray.h
│   │   │   ├── xpu_util.h
│   │   │   ├── xpu_var_ndarray.h
│   │   │   └── xpu_var_ndarray_builder.h
│   │   ├── operator/
│   │   │   ├── acc_tick_op.cpp
│   │   │   ├── acc_tick_op.h
│   │   │   ├── arg_modifier_signature.proto
│   │   │   ├── assign_op.cpp
│   │   │   ├── boxing_identity_op.cpp
│   │   │   ├── boxing_op.cpp
│   │   │   ├── boxing_op.h
│   │   │   ├── boxing_zeros_op.cpp
│   │   │   ├── broadcast_to_compatible_with_op.cpp
│   │   │   ├── callback_notify_op.cpp
│   │   │   ├── callback_notify_op.h
│   │   │   ├── case_op.cpp
│   │   │   ├── case_op.h
│   │   │   ├── collective_boxing_ops.cpp
│   │   │   ├── collective_boxing_pack_op.cpp
│   │   │   ├── collective_boxing_unpack_op.cpp
│   │   │   ├── constant_like_op.cpp
│   │   │   ├── copy_comm_net_op.cpp
│   │   │   ├── copy_comm_net_op.h
│   │   │   ├── critical_section_callback_tick_op.cpp
│   │   │   ├── critical_section_wait_tick_op.cpp
│   │   │   ├── cwise_op.cpp
│   │   │   ├── cwise_op.h
│   │   │   ├── decode_random_op.h
│   │   │   ├── device_tick_op.cpp
│   │   │   ├── device_tick_op.h
│   │   │   ├── distribute_add_op.cpp
│   │   │   ├── distribute_clone_op.cpp
│   │   │   ├── distribute_concat_op.cpp
│   │   │   ├── distribute_split_op.cpp
│   │   │   ├── dst_subset_tick_op.cpp
│   │   │   ├── dynamic_reshape_op.cpp
│   │   │   ├── esac_op.cpp
│   │   │   ├── esac_op.h
│   │   │   ├── identity_op.cpp
│   │   │   ├── image_decoder_random_crop_resize_op.cpp
│   │   │   ├── input_op.cpp
│   │   │   ├── input_op.h
│   │   │   ├── interface_blob_conf.proto
│   │   │   ├── interface_op_util.cpp
│   │   │   ├── interface_op_util.h
│   │   │   ├── learning_rate_schedule_op.cpp
│   │   │   ├── nccl_send_recv_boxing_op.cpp
│   │   │   ├── nccl_send_recv_boxing_op_util.cpp
│   │   │   ├── nccl_send_recv_boxing_op_util.h
│   │   │   ├── op_attribute.proto
│   │   │   ├── op_conf.proto
│   │   │   ├── op_conf_symbol.cpp
│   │   │   ├── op_conf_symbol.h
│   │   │   ├── op_conf_util.h
│   │   │   ├── op_infer_cache.h
│   │   │   ├── op_node_signature.proto
│   │   │   ├── operator.cpp
│   │   │   ├── operator.h
│   │   │   ├── operator_util.cpp
│   │   │   ├── operator_util.h
│   │   │   ├── output_op.cpp
│   │   │   ├── output_op.h
│   │   │   ├── reduce_sbp_util.cpp
│   │   │   ├── reduce_sbp_util.h
│   │   │   ├── reentrant_lock_op.cpp
│   │   │   ├── reentrant_lock_op.h
│   │   │   ├── return_op.cpp
│   │   │   ├── return_op.h
│   │   │   ├── scalar_op_base.cpp
│   │   │   ├── scalar_op_base.h
│   │   │   ├── shape_elem_cnt_op.cpp
│   │   │   ├── shape_elem_cnt_op.h
│   │   │   ├── sink_tick_op.cpp
│   │   │   ├── sink_tick_op.h
│   │   │   ├── slice_boxing_op.cpp
│   │   │   ├── source_tick_op.cpp
│   │   │   ├── source_tick_op.h
│   │   │   ├── src_subset_tick_op.cpp
│   │   │   ├── sync_dynamic_resize_op.cpp
│   │   │   ├── tick_op.cpp
│   │   │   ├── tick_op.h
│   │   │   ├── total_loss_instance_num_op.cpp
│   │   │   ├── total_loss_instance_num_op.h
│   │   │   ├── user_op.cpp
│   │   │   ├── user_op.h
│   │   │   ├── variable_op.cpp
│   │   │   ├── variable_op.h
│   │   │   ├── wait_and_send_ids_op.cpp
│   │   │   └── wait_and_send_ids_op.h
│   │   ├── persistence/
│   │   │   ├── binary_in_stream.h
│   │   │   ├── binary_in_stream_with_local_copy.cpp
│   │   │   ├── binary_in_stream_with_local_copy.h
│   │   │   ├── binary_in_stream_without_local_copy.cpp
│   │   │   ├── binary_in_stream_without_local_copy.h
│   │   │   ├── file_system.cpp
│   │   │   ├── file_system.h
│   │   │   ├── file_system_test.cpp
│   │   │   ├── hadoop/
│   │   │   │   ├── hadoop_file_system.cpp
│   │   │   │   ├── hadoop_file_system.h
│   │   │   │   └── hdfs.h
│   │   │   ├── persistent_in_stream.cpp
│   │   │   ├── persistent_in_stream.h
│   │   │   ├── persistent_out_stream.cpp
│   │   │   ├── persistent_out_stream.h
│   │   │   ├── posix/
│   │   │   │   ├── posix_file_system.cpp
│   │   │   │   └── posix_file_system.h
│   │   │   ├── stream_scanner.cpp
│   │   │   ├── stream_scanner.h
│   │   │   ├── tee_persistent_log_stream.cpp
│   │   │   └── tee_persistent_log_stream.h
│   │   ├── platform/
│   │   │   ├── include/
│   │   │   │   ├── ibv.h
│   │   │   │   ├── pthread_fork.h
│   │   │   │   └── wrapper.h
│   │   │   └── lib/
│   │   │       ├── ibv_wrapper.cpp
│   │   │       ├── pthread_fork.cpp
│   │   │       └── wrapper.cpp
│   │   ├── profiler/
│   │   │   ├── event.cpp
│   │   │   ├── event.h
│   │   │   ├── event_recorder.cpp
│   │   │   ├── event_recorder.h
│   │   │   ├── kernel.cpp
│   │   │   ├── kernel.h
│   │   │   ├── kineto_shim.cpp
│   │   │   ├── kineto_shim.h
│   │   │   ├── profile_manager.cpp
│   │   │   ├── profile_manager.h
│   │   │   ├── profiler.cpp
│   │   │   ├── profiler.h
│   │   │   └── util.h
│   │   ├── record/
│   │   │   ├── coco.proto
│   │   │   └── record.proto
│   │   ├── register/
│   │   │   ├── blob.cpp
│   │   │   ├── blob.h
│   │   │   ├── blob_desc.cpp
│   │   │   ├── blob_desc.h
│   │   │   ├── blob_desc.proto
│   │   │   ├── logical_blob_id.proto
│   │   │   ├── op_blob_arg.proto
│   │   │   ├── op_blob_arg_info.h
│   │   │   ├── register.cpp
│   │   │   ├── register.h
│   │   │   ├── register_desc.cpp
│   │   │   ├── register_desc.h
│   │   │   ├── register_desc.proto
│   │   │   ├── register_manager.cpp
│   │   │   ├── register_manager.h
│   │   │   ├── runtime_register_desc.cpp
│   │   │   ├── runtime_register_desc.h
│   │   │   ├── tensor_slice_copier.cpp
│   │   │   ├── tensor_slice_copier.h
│   │   │   ├── tensor_slice_view.cpp
│   │   │   ├── tensor_slice_view.h
│   │   │   └── tensor_slice_view.proto
│   │   ├── rpc/
│   │   │   ├── include/
│   │   │   │   ├── base.h
│   │   │   │   ├── ctrl.h
│   │   │   │   ├── global_process_ctx.h
│   │   │   │   ├── grpc.h
│   │   │   │   ├── local.h
│   │   │   │   └── manager.h
│   │   │   └── lib/
│   │   │       ├── global_process_ctx.cpp
│   │   │       ├── grpc.cpp
│   │   │       └── local.cpp
│   │   ├── summary/
│   │   │   ├── event.proto
│   │   │   ├── graph.proto
│   │   │   ├── plugin_data.proto
│   │   │   ├── projector.proto
│   │   │   ├── summary.proto
│   │   │   └── tensor.proto
│   │   ├── thread/
│   │   │   ├── is_main_thread_test.cpp
│   │   │   ├── thread.cpp
│   │   │   ├── thread.h
│   │   │   ├── thread_global_id.cpp
│   │   │   ├── thread_global_id.h
│   │   │   ├── thread_manager.cpp
│   │   │   ├── thread_manager.h
│   │   │   ├── thread_pool.cpp
│   │   │   ├── thread_pool.h
│   │   │   ├── thread_runtime.h
│   │   │   ├── thread_runtime_factory.cpp
│   │   │   └── thread_runtime_factory.h
│   │   ├── transport/
│   │   │   ├── transport.cpp
│   │   │   ├── transport.h
│   │   │   └── transport_message.h
│   │   └── vm/
│   │       ├── access_blob_arg_cb_instruction_policy.h
│   │       ├── allocate_tensor_instruction_policy.cpp
│   │       ├── allocate_tensor_instruction_policy.h
│   │       ├── allocator.h
│   │       ├── barrier_instruction_policy.h
│   │       ├── bin_allocator.h
│   │       ├── bin_allocator_test.cpp
│   │       ├── caching_allocator.h
│   │       ├── control_stream_policy.h
│   │       ├── critical_section_instruction_policy.cpp
│   │       ├── critical_section_instruction_policy.h
│   │       ├── critical_section_status_querier.h
│   │       ├── critical_section_stream_policy.cpp
│   │       ├── critical_section_stream_policy.h
│   │       ├── ep_backend_allocator.cpp
│   │       ├── ep_backend_allocator.h
│   │       ├── ep_backend_host_allocator.cpp
│   │       ├── ep_backend_host_allocator.h
│   │       ├── ep_d2h_stream_policy.cpp
│   │       ├── ep_d2h_stream_policy.h
│   │       ├── ep_event.cpp
│   │       ├── ep_event.h
│   │       ├── ep_optional_event_record_status_querier.cpp
│   │       ├── ep_optional_event_record_status_querier.h
│   │       ├── ep_record_event_instruction_policy.h
│   │       ├── ep_stream_policy.cpp
│   │       ├── ep_stream_policy.h
│   │       ├── ep_stream_policy_base.cpp
│   │       ├── ep_stream_policy_base.h
│   │       ├── event_recorded_ep_stream_policy.cpp
│   │       ├── event_recorded_ep_stream_policy.h
│   │       ├── fuse_instruction_policy.h
│   │       ├── global_sync_instruction_policy.h
│   │       ├── instruction.cpp
│   │       ├── instruction.h
│   │       ├── instruction_fuse_type.h
│   │       ├── instruction_policy.cpp
│   │       ├── instruction_policy.h
│   │       ├── instruction_policy_util.h
│   │       ├── lazy_job_instruction_policy.h
│   │       ├── lazy_job_stream_policy.cpp
│   │       ├── lazy_job_stream_policy.h
│   │       ├── naive_instruction_status_querier.h
│   │       ├── op_call_instruction_policy.cpp
│   │       ├── op_call_instruction_policy.h
│   │       ├── pinned_ep_stream_policy.cpp
│   │       ├── pinned_ep_stream_policy.h
│   │       ├── probe.h
│   │       ├── ref_cnt_instruction_status_querier.h
│   │       ├── release_tensor_instruction_policy.h
│   │       ├── remat/
│   │       │   ├── allocator.cpp
│   │       │   ├── allocator.h
│   │       │   ├── disjoint_set.cpp
│   │       │   ├── disjoint_set.h
│   │       │   ├── env.cpp
│   │       │   ├── env.h
│   │       │   ├── util.cpp
│   │       │   └── util.h
│   │       ├── stream.cpp
│   │       ├── stream.h
│   │       ├── stream_create_stream_policy.h
│   │       ├── stream_get_allocator_stream_type.h
│   │       ├── stream_policy.cpp
│   │       ├── stream_policy.h
│   │       ├── stream_record_event_instruction_policy.cpp
│   │       ├── stream_record_event_instruction_policy.h
│   │       ├── stream_wait_event_instruction_policy.cpp
│   │       ├── stream_wait_event_instruction_policy.h
│   │       ├── stream_wait_instruction_policy.cpp
│   │       ├── stream_wait_instruction_policy.h
│   │       ├── symbol_storage.cpp
│   │       ├── symbol_storage.h
│   │       ├── sync_access_instruction_policy.cpp
│   │       ├── sync_access_instruction_policy.h
│   │       ├── sync_vm_mode_guard.h
│   │       ├── thread_ctx.cpp
│   │       ├── thread_ctx.h
│   │       ├── thread_safe_guard.h
│   │       ├── touch_tensors_instruction_policy.h
│   │       ├── virtual_machine.cpp
│   │       ├── virtual_machine.h
│   │       ├── virtual_machine_engine.cpp
│   │       ├── virtual_machine_engine.h
│   │       ├── virtual_machine_scope.cpp
│   │       ├── virtual_machine_scope.h
│   │       ├── vm_object.cpp
│   │       ├── vm_object.h
│   │       ├── vm_sync.h
│   │       ├── vm_util.cpp
│   │       └── vm_util.h
│   ├── extension/
│   │   ├── python/
│   │   │   ├── numpy.cpp
│   │   │   ├── numpy.h
│   │   │   ├── numpy_internal.h
│   │   │   ├── py_compute.cpp
│   │   │   ├── py_compute.h
│   │   │   ├── py_kernel_caller.cpp
│   │   │   ├── py_kernel_caller.h
│   │   │   ├── py_kernel_registry.cpp
│   │   │   └── py_kernel_registry.h
│   │   └── stack/
│   │       ├── foreign_stack_getter.h
│   │       ├── python/
│   │       │   ├── custom_eval_frame.c
│   │       │   ├── custom_eval_frame.h
│   │       │   ├── stack_getter.cpp
│   │       │   └── stack_getter.h
│   │       └── stacktrace.h
│   ├── ir/
│   │   ├── .gitignore
│   │   ├── CMakeLists.txt
│   │   ├── README.md
│   │   ├── include/
│   │   │   ├── CMakeLists.txt
│   │   │   ├── OneFlow/
│   │   │   │   ├── CMakeLists.txt
│   │   │   │   ├── Conversion/
│   │   │   │   │   ├── NVVMToCubin.h
│   │   │   │   │   └── OneFlowToTosa.h
│   │   │   │   ├── Extension.h
│   │   │   │   ├── OKL/
│   │   │   │   │   ├── Conversion/
│   │   │   │   │   │   ├── Conversion.h
│   │   │   │   │   │   └── OKLToLLVM.h
│   │   │   │   │   ├── Kernel/
│   │   │   │   │   │   ├── ComputeContext.h
│   │   │   │   │   │   ├── InferContext.h
│   │   │   │   │   │   ├── InitContext.h
│   │   │   │   │   │   ├── JITEngine.h
│   │   │   │   │   │   ├── JITOpInfer.h
│   │   │   │   │   │   ├── LauncherContext.h
│   │   │   │   │   │   ├── LauncherState.h
│   │   │   │   │   │   ├── README.md
│   │   │   │   │   │   ├── RegContext.h
│   │   │   │   │   │   ├── TmpBufferManager.h
│   │   │   │   │   │   └── WrapperContext.h
│   │   │   │   │   ├── OKLAttributes.h
│   │   │   │   │   ├── OKLAttributes.td
│   │   │   │   │   ├── OKLBase.td
│   │   │   │   │   ├── OKLDialect.h
│   │   │   │   │   ├── OKLDialect.td
│   │   │   │   │   ├── OKLOps.h
│   │   │   │   │   ├── OKLOps.td
│   │   │   │   │   ├── OKLTypes.h
│   │   │   │   │   ├── OKLTypes.td
│   │   │   │   │   └── passes.h
│   │   │   │   ├── OKM/
│   │   │   │   │   ├── Conversion/
│   │   │   │   │   │   └── Conversion.h
│   │   │   │   │   ├── OKMAttributes.h
│   │   │   │   │   ├── OKMAttributes.td
│   │   │   │   │   ├── OKMBase.td
│   │   │   │   │   ├── OKMDialect.h
│   │   │   │   │   ├── OKMDialect.td
│   │   │   │   │   ├── OKMOps.h
│   │   │   │   │   ├── OKMOps.td
│   │   │   │   │   ├── OKMPasses.td
│   │   │   │   │   └── passes.h
│   │   │   │   ├── OneFlowBase.td
│   │   │   │   ├── OneFlowDataTypeConversion.h
│   │   │   │   ├── OneFlowDialect.h
│   │   │   │   ├── OneFlowDialect.td
│   │   │   │   ├── OneFlowEnums.td
│   │   │   │   ├── OneFlowInterfaces.td
│   │   │   │   ├── OneFlowOpGetGen.td
│   │   │   │   ├── OneFlowOpTraits.h
│   │   │   │   ├── OneFlowOps.h
│   │   │   │   ├── OneFlowOps.td
│   │   │   │   ├── OneFlowPDLLPatterns.h
│   │   │   │   ├── OneFlowPasses.td
│   │   │   │   ├── OneFlowPatternUtils.h
│   │   │   │   ├── OneFlowPatterns.td
│   │   │   │   ├── OneFlowSupport.h
│   │   │   │   ├── OneFlowTypes.h
│   │   │   │   ├── OneFlowUserOps.td
│   │   │   │   ├── OneFlowUtils.h
│   │   │   │   ├── Passes.h
│   │   │   │   ├── SBP/
│   │   │   │   │   ├── SBPAttributes.h
│   │   │   │   │   ├── SBPBase.td
│   │   │   │   │   ├── SBPDialect.h
│   │   │   │   │   ├── SBPDialect.td
│   │   │   │   │   ├── SBPImporter.h
│   │   │   │   │   └── SBPOps.td
│   │   │   │   ├── Transform/
│   │   │   │   │   ├── AggregateOps.h
│   │   │   │   │   ├── AutoNhwc.h
│   │   │   │   │   ├── BufferHostRegister.h
│   │   │   │   │   ├── CSEWithAttributesIgnored.h
│   │   │   │   │   ├── ConvertInferenceOp.h
│   │   │   │   │   ├── EliminateAllocOps.h
│   │   │   │   │   ├── FuncOps.h
│   │   │   │   │   ├── OneFlow MLIR CodeGen ABI.md
│   │   │   │   │   ├── OneFlowMemPool.h
│   │   │   │   │   ├── OneFlowStream.h
│   │   │   │   │   ├── OutlineAndFuse.h
│   │   │   │   │   ├── TraitFolder.h
│   │   │   │   │   └── TransposeHelpers.h
│   │   │   │   ├── UserOpConversion.h
│   │   │   │   └── UserOpReflection.h
│   │   │   └── Transform/
│   │   │       ├── CMakeLists.txt
│   │   │       ├── TransformDialectExtension.h
│   │   │       ├── TransformDialectExtension.td
│   │   │       └── TransformStateExtension.h
│   │   ├── install-llvm.cmake
│   │   ├── lib/
│   │   │   ├── CMakeLists.txt
│   │   │   ├── OneFlow/
│   │   │   │   ├── CMakeLists.txt
│   │   │   │   ├── Conversion/
│   │   │   │   │   ├── NVVMToCubin.cpp
│   │   │   │   │   ├── OneFlowToLinalg.cpp
│   │   │   │   │   └── OneFlowToTosa.cpp
│   │   │   │   ├── OKL/
│   │   │   │   │   ├── Conversion/
│   │   │   │   │   │   ├── Conversion.cpp
│   │   │   │   │   │   ├── CudaGraphSupport.cpp
│   │   │   │   │   │   └── OKLToLLVM.cpp
│   │   │   │   │   ├── Kernel/
│   │   │   │   │   │   ├── ComputeContext.cpp
│   │   │   │   │   │   ├── InferContext.cpp
│   │   │   │   │   │   ├── JITEngine.cpp
│   │   │   │   │   │   ├── JITOpInfer.cpp
│   │   │   │   │   │   ├── KernelLaunchOp.cpp
│   │   │   │   │   │   ├── LauncherContext.cpp
│   │   │   │   │   │   ├── LauncherState.cpp
│   │   │   │   │   │   ├── RegContext.cpp
│   │   │   │   │   │   └── TmpBufferManager.cpp
│   │   │   │   │   ├── OKLDialect.cpp
│   │   │   │   │   ├── OKLOps.cpp
│   │   │   │   │   ├── OKLTypes.cpp
│   │   │   │   │   └── README-OriginVersion.md
│   │   │   │   ├── OKM/
│   │   │   │   │   ├── Conversion/
│   │   │   │   │   │   └── Conversion.cpp
│   │   │   │   │   ├── OKMDialect.cpp
│   │   │   │   │   └── passes.cpp
│   │   │   │   ├── OneFlowCanonicalizers.cpp
│   │   │   │   ├── OneFlowDataTypeConversion.cpp
│   │   │   │   ├── OneFlowDialect.cpp
│   │   │   │   ├── OneFlowInferReturnTypes.cpp
│   │   │   │   ├── OneFlowOpFolders.cpp
│   │   │   │   ├── OneFlowOpGetGen.cpp.in
│   │   │   │   ├── OneFlowOpTraits.cpp
│   │   │   │   ├── OneFlowOps.cpp
│   │   │   │   ├── OneFlowRewrites.cpp
│   │   │   │   ├── OneFlowSupport.cpp
│   │   │   │   ├── OneFlowTypes.cpp
│   │   │   │   ├── OneFlowUtils.cpp
│   │   │   │   ├── PDLL/
│   │   │   │   │   ├── AllocEliminationPatterns.cpp
│   │   │   │   │   ├── AllocEliminationPatterns.pdll
│   │   │   │   │   ├── CMakeLists.txt
│   │   │   │   │   ├── ForwardOpPatterns.cpp
│   │   │   │   │   ├── ForwardOpPatterns.pdll
│   │   │   │   │   ├── FuseConv2DBatchNormPattern.cpp
│   │   │   │   │   ├── FuseConv2DBatchNormPattern.pdll
│   │   │   │   │   ├── FuseOpsWithBackwardImplPattern.cpp
│   │   │   │   │   ├── FuseOpsWithBackwardImplPattern.pdll
│   │   │   │   │   ├── NormalizationPatterns.cpp
│   │   │   │   │   ├── NormalizationPatterns.pdll
│   │   │   │   │   └── OneFlowPDLLUtils.pdll
│   │   │   │   ├── Passes.cpp
│   │   │   │   ├── SBP/
│   │   │   │   │   ├── SBPAttributes.cpp
│   │   │   │   │   ├── SBPDialect.cpp
│   │   │   │   │   └── SBPImporter.cpp
│   │   │   │   ├── Transform/
│   │   │   │   │   ├── AggregateOps.cpp
│   │   │   │   │   ├── AutoNHWCOps.cpp
│   │   │   │   │   ├── AutoNhwc.cpp
│   │   │   │   │   ├── BufferHostRegister.cpp
│   │   │   │   │   ├── CSEWithAttributesIgnored.cpp
│   │   │   │   │   ├── ConvertInferenceOp.cpp
│   │   │   │   │   ├── EliminateAllocOps.cpp
│   │   │   │   │   ├── FuncOps.cpp
│   │   │   │   │   ├── GroupMatMulOps.cpp
│   │   │   │   │   ├── JITPasses.cpp
│   │   │   │   │   ├── OneFlowMemPool.cpp
│   │   │   │   │   ├── OneFlowStream.cpp
│   │   │   │   │   ├── OutlineAndFuse.cpp
│   │   │   │   │   └── TraitFolder.cpp
│   │   │   │   ├── TransposeHelpers.cpp
│   │   │   │   ├── UserOpConversion.cpp
│   │   │   │   └── UserOpReflection.cpp
│   │   │   └── Transform/
│   │   │       ├── CMakeLists.txt
│   │   │       ├── TransformDialectExtension.cpp
│   │   │       ├── TransformDialectInterpreter.cpp
│   │   │       └── TransformStateExtension.cpp
│   │   ├── llvm-in-tree.cmake
│   │   ├── oneflow-extension/
│   │   │   ├── CMakeLists.txt
│   │   │   ├── README.md
│   │   │   ├── include/
│   │   │   │   ├── CMakeLists.txt
│   │   │   │   ├── OneFlow/
│   │   │   │   │   ├── CMakeLists.txt
│   │   │   │   │   ├── JITOpInfer.h
│   │   │   │   │   ├── OneFlowLRJITRegistry.h
│   │   │   │   │   └── OneFlowRoundTrip.h
│   │   │   │   └── PyAst/
│   │   │   │       ├── Ast.h
│   │   │   │       └── AstMlirGen.h
│   │   │   ├── ir_pass.cpp
│   │   │   ├── lr_jit.cpp
│   │   │   ├── mlir_gen.cpp
│   │   │   ├── mlir_jit_op.cpp
│   │   │   └── mlir_jit_op_kernel.cpp
│   │   ├── oneflow-lite/
│   │   │   ├── CMakeLists.txt
│   │   │   ├── OneFlowLiteCompileMain.cpp
│   │   │   ├── include/
│   │   │   │   └── OneFlow/
│   │   │   │       ├── ConvertToLiteExecutable.h
│   │   │   │       ├── FlatbufferUtils.h
│   │   │   │       ├── OneFlowLiteUtils.h
│   │   │   │       └── Transform/
│   │   │   │           ├── FoldVariable.h
│   │   │   │           ├── InferPlacement.h
│   │   │   │           ├── InsertTransferOp.h
│   │   │   │           ├── Lowering/
│   │   │   │           │   ├── LoweringAscend.h
│   │   │   │           │   └── LoweringAscendUtils.h
│   │   │   │           ├── LoweringLaunchJob.h
│   │   │   │           ├── MemoryPlanning.h
│   │   │   │           └── PartitionLaunchJob.h
│   │   │   ├── lib/
│   │   │   │   ├── CMakeLists.txt
│   │   │   │   └── OneFlow/
│   │   │   │       ├── CMakeLists.txt
│   │   │   │       ├── ConvertToLiteExecutable.cpp
│   │   │   │       ├── FlatbufferUtils.cpp
│   │   │   │       ├── OneFlowLiteUtils.cpp
│   │   │   │       ├── Transform/
│   │   │   │       │   ├── FoldVariable.cpp
│   │   │   │       │   ├── InferPlacement.cpp
│   │   │   │       │   ├── InsertTransferOp.cpp
│   │   │   │       │   ├── Lowering/
│   │   │   │       │   │   └── LoweringAscend.cpp
│   │   │   │       │   ├── LoweringLaunchJob.cpp
│   │   │   │       │   ├── MemoryPlanning.cpp
│   │   │   │       │   └── PartitionLaunchJob.cpp
│   │   │   │       └── cmake/
│   │   │   │           └── FindAscendSdk.cmake
│   │   │   └── schemas/
│   │   │       ├── CMakeLists.txt
│   │   │       ├── attributes/
│   │   │       │   ├── CMakeLists.txt
│   │   │       │   ├── bool.fbs
│   │   │       │   ├── f32.fbs
│   │   │       │   ├── f32s.fbs
│   │   │       │   ├── f64.fbs
│   │   │       │   ├── i32.fbs
│   │   │       │   ├── i32s.fbs
│   │   │       │   ├── i64.fbs
│   │   │       │   ├── i64s.fbs
│   │   │       │   ├── shape.fbs
│   │   │       │   ├── shapes.fbs
│   │   │       │   ├── str.fbs
│   │   │       │   └── strs.fbs
│   │   │       ├── executable.fbs
│   │   │       └── install_flatcc.cmake
│   │   ├── oneflow-opt/
│   │   │   ├── CMakeLists.txt
│   │   │   ├── README.md
│   │   │   └── oneflow-opt.cpp
│   │   ├── oneflow-runner/
│   │   │   ├── CMakeLists.txt
│   │   │   └── oneflow-runner.cpp
│   │   ├── oneflow-runtime/
│   │   │   ├── CMakeLists.txt
│   │   │   └── lib/
│   │   │       ├── CMakeLists.txt
│   │   │       └── Runtime.cpp
│   │   ├── oneflow-translate/
│   │   │   ├── CMakeLists.txt
│   │   │   ├── README.md
│   │   │   ├── include/
│   │   │   │   ├── CMakeLists.txt
│   │   │   │   └── OneFlow/
│   │   │   │       ├── CMakeLists.txt
│   │   │   │       └── MLIROneFlowTranslation.h
│   │   │   ├── lib/
│   │   │   │   ├── CMakeLists.txt
│   │   │   │   └── OneFlow/
│   │   │   │       ├── CMakeLists.txt
│   │   │   │       ├── Importer.cpp
│   │   │   │       └── MLIROneFlowTranslation.cpp
│   │   │   └── oneflow-translate.cpp
│   │   └── test/
│   │       ├── CMakeLists.txt
│   │       ├── Frontend/
│   │       │   ├── lit.local.cfg
│   │       │   ├── oneflow_to_iree.mlir
│   │       │   └── tosa_to_elf.mlir
│   │       ├── GPU/
│   │       │   ├── lit.local.cfg
│   │       │   └── nvvm_to_cubin.mlir
│   │       ├── OneFlow/
│   │       │   ├── auto_nhwc/
│   │       │   │   ├── lit.local.cfg
│   │       │   │   ├── test_nhwc_batchnorm_relu.py
│   │       │   │   ├── test_nhwc_bias_add.py
│   │       │   │   ├── test_nhwc_conv.py
│   │       │   │   ├── test_nhwc_conv2d_maxpool2d.py
│   │       │   │   ├── test_nhwc_conv_relu_add.py
│   │       │   │   ├── test_nhwc_lenet.py
│   │       │   │   ├── test_nhwc_maxpool_2d.py
│   │       │   │   ├── test_nhwc_resnet.py
│   │       │   │   ├── test_nhwc_transpose_eliminate.py
│   │       │   │   └── test_resnet101_benchmark.py
│   │       │   ├── conversion/
│   │       │   │   ├── lower_to_tosa.mlir
│   │       │   │   ├── lower_to_tosa_signed.mlir
│   │       │   │   └── oneflow_to_tosa.mlir
│   │       │   ├── cse.mlir
│   │       │   ├── cuda_code_gen/
│   │       │   │   ├── gpu_copy_arg.mlir
│   │       │   │   ├── lit.local.cfg
│   │       │   │   ├── test_append_oneflow_stream.mlir
│   │       │   │   ├── test_cast_ops_to_signless.mlir
│   │       │   │   ├── test_fold_alloc_to_subview.mlir
│   │       │   │   ├── test_fuser_cast_scale.py
│   │       │   │   ├── test_gpu_all_reduce.mlir
│   │       │   │   ├── test_insert_ofmempool.mlir
│   │       │   │   ├── test_matmul.py
│   │       │   │   ├── test_mgpu_to_oneflow_stream.mlir
│   │       │   │   └── tosa_to_linalg.mlir
│   │       │   ├── folding/
│   │       │   │   ├── test_conv_bn.py
│   │       │   │   └── test_simple_multiply.py
│   │       │   ├── fuse/
│   │       │   │   ├── fuse_forward_ops.mlir
│   │       │   │   ├── test_cast_optimal_pass.py
│   │       │   │   └── test_fuse_pad_conv.py
│   │       │   ├── group_matmul.mlir
│   │       │   ├── jit_outline_func.mlir
│   │       │   ├── kernel_launch/
│   │       │   │   ├── OKLPass/
│   │       │   │   │   ├── lower_launcher_to_llvm_ptr.mlir
│   │       │   │   │   ├── lower_okl_to_llvm_call.mlir
│   │       │   │   │   └── tag_cuda_graph_support.mlir
│   │       │   │   ├── OKMPass/
│   │       │   │   │   ├── extract_okm_tensor.mlir
│   │       │   │   │   ├── okm_to_okl.mlir
│   │       │   │   │   ├── opt_okm_memref.mlir
│   │       │   │   │   └── wrap_okm_kernel.mlir
│   │       │   │   ├── OneFlowPass/
│   │       │   │   │   ├── aggregate_compute_ops.mlir
│   │       │   │   │   └── wrap_ops_to_kernel_launch/
│   │       │   │   │       ├── cuda_graph.mlir
│   │       │   │   │       ├── lit.local.cfg
│   │       │   │   │       └── simple.mlir
│   │       │   │   └── test_resnet.py
│   │       │   ├── networks/
│   │       │   │   ├── __init__.py
│   │       │   │   └── resnet50.py
│   │       │   ├── oneflow-opt.mlir
│   │       │   ├── oneflow-translate.mlir
│   │       │   ├── psig/
│   │       │   │   ├── error_parse.mlir
│   │       │   │   ├── sbp_parse.mlir
│   │       │   │   ├── test_2nd_basic_parse.py
│   │       │   │   └── test_basic_parse.py
│   │       │   ├── traits.mlir
│   │       │   └── with_cuda/
│   │       │       ├── lit.local.cfg
│   │       │       ├── test_conv_bn_auto_nhwc.py
│   │       │       ├── test_fuse_bias_add_dropout.py
│   │       │       ├── test_fuse_bias_add_gelu.py
│   │       │       ├── test_fuse_bn_add_relu.py
│   │       │       ├── test_fuse_gelu.py
│   │       │       ├── test_fuse_scale_tril.py
│   │       │       ├── test_fused_matmul_bias.py
│   │       │       ├── test_fused_multi_head_attention_inference.py
│   │       │       └── test_graph_save_and_load.py
│   │       ├── Transform/
│   │       │   ├── lit.local.cfg
│   │       │   ├── matmul.mlir
│   │       │   ├── softmax.mlir
│   │       │   ├── softmax_codegen_spec.mlir
│   │       │   ├── softmax_codegen_spec_no_vectorize.mlir
│   │       │   └── test_dialect.mlir
│   │       ├── lit.cfg.py
│   │       └── lit.site.cfg.py.in
│   ├── maybe/
│   │   ├── config.h
│   │   ├── error.h
│   │   ├── error_test.cpp
│   │   ├── just.h
│   │   ├── just_test.cpp
│   │   ├── maybe.h
│   │   ├── maybe_test.cpp
│   │   ├── optional.h
│   │   ├── optional_test.cpp
│   │   ├── type_traits.h
│   │   ├── type_traits_test.cpp
│   │   ├── utility.h
│   │   ├── utility_test.cpp
│   │   ├── variant.h
│   │   └── variant_test.cpp
│   └── user/
│       ├── data/
│       │   ├── batch_dataset.h
│       │   ├── batch_random_shuffle_dataset.h
│       │   ├── coco_data_reader.cpp
│       │   ├── coco_data_reader.h
│       │   ├── coco_dataset.cpp
│       │   ├── coco_dataset.h
│       │   ├── coco_parser.cpp
│       │   ├── coco_parser.h
│       │   ├── data_reader.h
│       │   ├── dataset.h
│       │   ├── distributed_training_dataset.h
│       │   ├── distributed_util.h
│       │   ├── gpt_dataset.cpp
│       │   ├── gpt_dataset.h
│       │   ├── group_batch_dataset.h
│       │   ├── ofrecord_data_reader.h
│       │   ├── ofrecord_dataset.h
│       │   ├── ofrecord_image_classification_data_reader.h
│       │   ├── ofrecord_image_classification_dataset.cpp
│       │   ├── ofrecord_image_classification_dataset.h
│       │   ├── ofrecord_image_classification_parser.h
│       │   ├── ofrecord_parser.h
│       │   ├── parser.h
│       │   └── random_shuffle_dataset.h
│       ├── image/
│       │   ├── crop_window.h
│       │   ├── image_util.cpp
│       │   ├── image_util.h
│       │   ├── jpeg_decoder.cpp
│       │   ├── jpeg_decoder.h
│       │   ├── jpeg_decoder_test.cpp
│       │   ├── random_crop_generator.cpp
│       │   └── random_crop_generator.h
│       ├── kernels/
│       │   ├── acc_kernel.cpp
│       │   ├── activation_kernels.cpp
│       │   ├── adaptive_avg_pool_cpu_kernel.cpp
│       │   ├── adaptive_avg_pool_gpu_kernel.cu
│       │   ├── adaptive_max_pool_cpu_kernel.cpp
│       │   ├── adaptive_max_pool_gpu_kernel.cu
│       │   ├── adaptive_pool_kernel_util.h
│       │   ├── add_n_kernel.cpp
│       │   ├── affine_grid_kernel.cpp
│       │   ├── affine_grid_kernel.cu
│       │   ├── affine_grid_kernel.h
│       │   ├── arange_kernel.cpp
│       │   ├── arange_kernel_util.cpp
│       │   ├── arange_kernel_util.cu
│       │   ├── arange_kernel_util.h
│       │   ├── arg_sort_kernel.cpp
│       │   ├── arg_sort_kernel.cu
│       │   ├── arg_where_kernel.cpp
│       │   ├── arg_where_kernel_util.cpp
│       │   ├── arg_where_kernel_util.cu
│       │   ├── arg_where_kernel_util.h
│       │   ├── argmax_kernel.cpp
│       │   ├── argmax_kernel.cu
│       │   ├── as_strided_kernel.cpp
│       │   ├── as_strided_kernel.cu
│       │   ├── assign_if_kernel.cpp
│       │   ├── assign_if_kernel.cu
│       │   ├── assign_kernel.cpp
│       │   ├── avg_pool_kernel.cpp
│       │   ├── avg_pool_kernel.cu
│       │   ├── avg_pool_kernel_util.cpp
│       │   ├── avg_pool_kernel_util.h
│       │   ├── batch_gather_kernel.cpp
│       │   ├── batch_gather_kernel_util.cpp
│       │   ├── batch_gather_kernel_util.cu
│       │   ├── batch_gather_kernel_util.h
│       │   ├── batch_norm_backward_elemt_kernel.cu
│       │   ├── batch_norm_backward_reduce_kernel.cu
│       │   ├── batch_norm_elemt_kernel.cu
│       │   ├── batch_norm_gather_stats_with_counts_kernel.cu
│       │   ├── batch_norm_kernel_utils.h
│       │   ├── batch_norm_stats_kernel.cu
│       │   ├── bernoulli_kernel.cpp
│       │   ├── bias_add_kernel.cpp
│       │   ├── binary_concat_kernel.cu
│       │   ├── binary_cross_entropy_kernel.cpp
│       │   ├── binary_cross_entropy_kernel.cu
│       │   ├── binary_cross_entropy_with_logits_kernel.cpp
│       │   ├── binary_cross_entropy_with_logits_kernel.cu
│       │   ├── binary_cross_entropy_with_logits_mean_kernel.cu
│       │   ├── binary_cross_entropy_with_logits_mean_kernel_util.h
│       │   ├── binary_cross_entropy_with_logits_reduce_mean.cpp
│       │   ├── bincount_kernel.cpp
│       │   ├── bincount_kernel.cu
│       │   ├── broadcast_div_grad_kernel.cpp
│       │   ├── broadcast_like_kernel.cpp
│       │   ├── cast_kernel.cpp
│       │   ├── cast_to_static_shape_kernel.cpp
│       │   ├── categorical_ordinal_encode_kernel.cpp
│       │   ├── categorical_ordinal_encode_kernel_util.cpp
│       │   ├── categorical_ordinal_encode_kernel_util.cu
│       │   ├── categorical_ordinal_encode_kernel_util.h
│       │   ├── clip_by_value_kernel.cpp
│       │   ├── clip_by_value_kernel.cu
│       │   ├── clip_by_value_kernel.h
│       │   ├── coco_reader_kernel.cpp
│       │   ├── collective_communication/
│       │   │   ├── cpu/
│       │   │   │   ├── cpu_all_gather.cpp
│       │   │   │   ├── cpu_all_reduce.cpp
│       │   │   │   ├── cpu_broadcast.cpp
│       │   │   │   ├── cpu_collective_communication_util.h
│       │   │   │   ├── cpu_communication_context.cpp
│       │   │   │   ├── cpu_communication_context.h
│       │   │   │   ├── cpu_recv.cpp
│       │   │   │   ├── cpu_reduce.cpp
│       │   │   │   ├── cpu_reduce_scatter.cpp
│       │   │   │   └── cpu_send.cpp
│       │   │   ├── cuda/
│       │   │   │   ├── cuda_all_gather.cpp
│       │   │   │   ├── cuda_all_reduce.cpp
│       │   │   │   ├── cuda_all_to_all.cpp
│       │   │   │   ├── cuda_broadcast.cpp
│       │   │   │   ├── cuda_communication_context.cpp
│       │   │   │   ├── cuda_communication_context.h
│       │   │   │   ├── cuda_recv.cpp
│       │   │   │   ├── cuda_reduce.cpp
│       │   │   │   ├── cuda_reduce_scatter.cpp
│       │   │   │   ├── cuda_send.cpp
│       │   │   │   ├── cuda_send_recv_util.cpp
│       │   │   │   └── cuda_send_recv_util.h
│       │   │   └── include/
│       │   │       ├── all_gather.h
│       │   │       ├── all_reduce.h
│       │   │       ├── all_to_all.h
│       │   │       ├── broadcast.h
│       │   │       ├── collective_communication.h
│       │   │       ├── communication_context.h
│       │   │       ├── recv.h
│       │   │       ├── reduce.h
│       │   │       ├── reduce_scatter.h
│       │   │       └── send.h
│       │   ├── combined_margin_loss_kernel.cpp
│       │   ├── combined_margin_loss_kernel.cu
│       │   ├── communicate_util.cpp
│       │   ├── communicate_util.h
│       │   ├── complex_kernels.cpp
│       │   ├── concat_kernel.cpp
│       │   ├── constant_kernel.cpp
│       │   ├── conv_cudnn_kernels.cpp
│       │   ├── conv_cutlass_kernels.cu
│       │   ├── conv_kernels.cpp
│       │   ├── convert_memory_format_kernel.cpp
│       │   ├── convert_memory_format_util.cpp
│       │   ├── convert_memory_format_util.h
│       │   ├── copy_data_content_kernel.cpp
│       │   ├── copy_hd_kernel.cpp
│       │   ├── copy_kernel.cpp
│       │   ├── count_not_finite_kernel.cpp
│       │   ├── count_not_finite_kernel.cu
│       │   ├── ctc_greedy_decoder.cpp
│       │   ├── ctc_greedy_decoder.cu
│       │   ├── ctc_greedy_decoder.h
│       │   ├── ctc_loss_kernel.cpp
│       │   ├── ctc_loss_kernel_util.cpp
│       │   ├── ctc_loss_kernel_util.cu
│       │   ├── ctc_loss_kernel_util.h
│       │   ├── cublas_bias_add_relu_matmul_grad_kernel.cu
│       │   ├── cublas_fused_matmul_bias_add_grad.cu
│       │   ├── cublas_fused_mlp_grad_kernel.cu
│       │   ├── cublas_fused_mlp_kernel.cu
│       │   ├── cublas_fused_mlp_util.cuh
│       │   ├── cufft_plan_cache.h
│       │   ├── cum_backward_kernel.cpp
│       │   ├── cum_backward_kernel.cu
│       │   ├── cum_forward_kernel.cpp
│       │   ├── cum_forward_kernel.cu
│       │   ├── cutlass_conv_tuner.cpp
│       │   ├── cutlass_conv_tuner.h
│       │   ├── data_shuffle_kernel.cu
│       │   ├── deconv_cpu_kernel.cpp
│       │   ├── deconv_cudnn_kernel.cpp
│       │   ├── deform_conv_kernel.cpp
│       │   ├── deform_conv_kernel.cu
│       │   ├── det_kernel.cpp
│       │   ├── diag_kernel.cpp
│       │   ├── diag_kernel.cu
│       │   ├── diag_kernel.h
│       │   ├── diagonal_kernel.cpp
│       │   ├── diagonal_kernel.cu
│       │   ├── dim_gather_kernel_util.cpp
│       │   ├── dim_gather_kernel_util.cu
│       │   ├── dim_gather_kernel_util.h
│       │   ├── dim_gather_kernels.cpp
│       │   ├── dim_scatter_kernel_util.cpp
│       │   ├── dim_scatter_kernel_util.cu
│       │   ├── dim_scatter_kernel_util.h
│       │   ├── dim_scatter_kernels.cpp
│       │   ├── dim_scatter_scalar_kernel_util.cpp
│       │   ├── dim_scatter_scalar_kernel_util.cu
│       │   ├── dim_scatter_scalar_kernel_util.h
│       │   ├── dim_scatter_scalar_kernels.cpp
│       │   ├── distributions/
│       │   │   ├── common.h
│       │   │   ├── distribution_template_util.cuh
│       │   │   ├── exponential_distribution.cpp
│       │   │   ├── exponential_distribution.cu
│       │   │   ├── exponential_distribution.h
│       │   │   ├── exponential_kernel.cpp
│       │   │   ├── exponential_kernel.h
│       │   │   ├── multinomial_with_replacement_kernel.cpp
│       │   │   ├── multinomial_with_replacement_kernel.cu
│       │   │   ├── normal_distribution.cpp
│       │   │   ├── normal_distribution.cu
│       │   │   ├── normal_distribution.h
│       │   │   ├── normal_kernel.cpp
│       │   │   ├── normal_kernel.h
│       │   │   ├── uniform_distribution.cpp
│       │   │   ├── uniform_distribution.cu
│       │   │   ├── uniform_distribution.h
│       │   │   ├── uniform_int_distribution.cpp
│       │   │   ├── uniform_int_distribution.cu
│       │   │   ├── uniform_int_distribution.h
│       │   │   ├── uniform_int_kernel.cpp
│       │   │   ├── uniform_int_kernel.h
│       │   │   ├── uniform_kernel.cpp
│       │   │   └── uniform_kernel.h
│       │   ├── dot_kernel.cpp
│       │   ├── dropout_kernel.cpp
│       │   ├── dropout_kernel.cu
│       │   ├── dropout_kernel.h
│       │   ├── dynamic_loss_scale_schedule_kernel.cpp
│       │   ├── dynamic_loss_scale_schedule_kernel.cu
│       │   ├── eager_b_to_s_kernel.cpp
│       │   ├── eager_ccl_kernel.cpp
│       │   ├── eager_nccl_s2s_kernel.cu
│       │   ├── eager_p_to_b_kernel.cpp
│       │   ├── eager_p_to_s_kernel.cpp
│       │   ├── eager_s_to_b_kernel.cpp
│       │   ├── eager_s_to_p_kernel.cpp
│       │   ├── eager_s_to_s_kernel.cpp
│       │   ├── eager_symmetric_s_to_p_kernel.cpp
│       │   ├── elementwise_maximum_minimum_kernel.cpp
│       │   ├── elementwise_maximum_minimum_kernel.cu
│       │   ├── elementwise_maximum_minimum_kernel.h
│       │   ├── elementwise_primitive_kernel.h
│       │   ├── embedding_kernel.cpp
│       │   ├── embedding_kernel.cu
│       │   ├── embedding_kernel_util.cpp
│       │   ├── embedding_kernel_util.cu
│       │   ├── embedding_kernel_util.h
│       │   ├── empty_kernel.cpp
│       │   ├── erfinv_kernel.cpp
│       │   ├── erfinv_kernel.cu
│       │   ├── expand_kernel.cpp
│       │   ├── eye_kernel.cpp
│       │   ├── eye_kernel_util.cpp
│       │   ├── eye_kernel_util.cu
│       │   ├── eye_kernel_util.h
│       │   ├── fake_quantization_kernel.cpp
│       │   ├── fake_quantization_kernel.cu
│       │   ├── fft_kernel_util.cpp
│       │   ├── fft_kernel_util.cu
│       │   ├── fft_kernel_util.h
│       │   ├── fft_kernels.cpp
│       │   ├── fill_kernel.cpp
│       │   ├── fill_kernel.cu
│       │   ├── flip_kernel.cpp
│       │   ├── flip_kernel.cu
│       │   ├── fold_kernel.cpp
│       │   ├── fold_kernel_util.cpp
│       │   ├── fold_kernel_util.cu
│       │   ├── fold_kernel_util.h
│       │   ├── frac_kernel.cpp
│       │   ├── frac_kernel.cu
│       │   ├── fused_attention_kernels.cu
│       │   ├── fused_bias_add_kernel.cu
│       │   ├── fused_bias_add_scale_mask_softmax_dropout.cu
│       │   ├── fused_cast_scale_kernel.cpp
│       │   ├── fused_cast_scale_kernel.cu
│       │   ├── fused_center_kernel.cu
│       │   ├── fused_clip_grad.cu
│       │   ├── fused_clip_grad.h
│       │   ├── fused_clip_grad_util.h
│       │   ├── fused_codegeex_qkv_reshape_kernel.cu
│       │   ├── fused_cross_feature_interaction.cu
│       │   ├── fused_cross_feature_interaction_grad.cu
│       │   ├── fused_dot_feature_interaction_kernel.cu
│       │   ├── fused_gelu_mul_kernel.cu
│       │   ├── fused_get_bounding_boxes_coord_kernel.cu
│       │   ├── fused_get_ciou_diagonal_angle_kernel.cu
│       │   ├── fused_get_ciou_result_kernel.cu
│       │   ├── fused_get_convex_diagonal_squared_kernel.cu
│       │   ├── fused_get_intersection_area_kernel.cu
│       │   ├── fused_get_iou_kernel.cu
│       │   ├── fused_glu_kernel.cu
│       │   ├── fused_glu_without_linear_grad_kernel.cu
│       │   ├── fused_gru_cell_kernel.cu
│       │   ├── fused_lstm_cell_kernel.cu
│       │   ├── fused_matmul_bias_add_relu_dropout.cu
│       │   ├── fused_matmul_bias_kernel.cu
│       │   ├── fused_relu_dropout_grad_kernel.cu
│       │   ├── fused_rnn_cell_kernel_util.h
│       │   ├── fused_scale_mask_bias_softmax.cu
│       │   ├── fused_scale_mask_softmax.cu
│       │   ├── fused_scale_mask_softmax_dropout.cu
│       │   ├── fused_self_attention_query_mul_key_and_value_kernel.cu
│       │   ├── fused_softmax.cuh
│       │   ├── fused_tril_scale_softmax_mask_scale_kernel.cu
│       │   ├── fused_weighted_sum_kernel.cpp
│       │   ├── fused_weighted_sum_kernel.cu
│       │   ├── gather_kernel.cpp
│       │   ├── gather_kernel_util.cpp
│       │   ├── gather_kernel_util.cu
│       │   ├── gather_kernel_util.h
│       │   ├── generate_random_batch_permutation_indices_kernel.cpp
│       │   ├── generate_random_batch_permutation_indices_kernel.cu
│       │   ├── gpt_data_loader_kernel.cpp
│       │   ├── greater_inplace_kernel.cpp
│       │   ├── greater_inplace_kernel_util.cpp
│       │   ├── greater_inplace_kernel_util.cu
│       │   ├── greater_inplace_kernel_util.h
│       │   ├── grid_sample_kernel.cpp
│       │   ├── grid_sample_kernel_util.cpp
│       │   ├── grid_sample_kernel_util.cu
│       │   ├── grid_sample_kernel_util.h
│       │   ├── group_conv_kernel.cpp
│       │   ├── group_deconv_kernel.cpp
│       │   ├── group_norm_kernel.cu
│       │   ├── grouped_matmul_bias.cu
│       │   ├── groupwise_quantization_kernels.cu
│       │   ├── host_scalar_add_by_tensor_kernel.cu
│       │   ├── image_batch_align_kernel.cpp
│       │   ├── image_decode_kernel.cpp
│       │   ├── image_object_preprocess_kernels.cpp
│       │   ├── image_preprocess_kernels.cpp
│       │   ├── image_preprocess_kernels.cu
│       │   ├── image_resize_kernels.cpp
│       │   ├── image_target_resize_kernel.cpp
│       │   ├── in_top_k_kernel.cpp
│       │   ├── in_top_k_kernel_util.cpp
│       │   ├── in_top_k_kernel_util.cu
│       │   ├── in_top_k_kernel_util.h
│       │   ├── index_add_kernel.cpp
│       │   ├── index_add_kernel.cu
│       │   ├── indexed_slices_reduce_sum_kernel.cpp
│       │   ├── indexed_slices_reduce_sum_kernel_util.cpp
│       │   ├── indexed_slices_reduce_sum_kernel_util.h
│       │   ├── inv_kernels.cpp
│       │   ├── inv_kernels.cu
│       │   ├── kl_div_kernel.cpp
│       │   ├── kl_div_kernel.cu
│       │   ├── l1_l2_regularize_gradient_kernel.cpp
│       │   ├── l1_l2_regularize_gradient_kernel_util.cpp
│       │   ├── l1_l2_regularize_gradient_kernel_util.cu
│       │   ├── l1_l2_regularize_gradient_kernel_util.h
│       │   ├── l2_normalize_kernel.cpp
│       │   ├── l2_normalize_kernel.cu
│       │   ├── layer_norm_cpu_kernel.cpp
│       │   ├── layer_norm_gpu_kernel.cu
│       │   ├── lerp_kernel.cpp
│       │   ├── lerp_kernel_util.cpp
│       │   ├── lerp_kernel_util.cu
│       │   ├── lerp_kernel_util.h
│       │   ├── linalg_cross_kernel.cpp
│       │   ├── linalg_cross_kernel.cu
│       │   ├── log_softmax_kernel.cpp
│       │   ├── logical_not_kernel.cpp
│       │   ├── loss_kernel_util.h
│       │   ├── lu_decomposition_kernel.cu
│       │   ├── masked_fill_kernel.cpp
│       │   ├── math_binary_broadcast_kernels.cpp
│       │   ├── math_binary_elementwise_func.h
│       │   ├── math_binary_elementwise_kernel.cpp
│       │   ├── math_binary_elementwise_kernel.cu
│       │   ├── math_unary_elementwise_func.h
│       │   ├── math_unary_elementwise_primitive_kernel.cpp
│       │   ├── matmul_kernels.cpp
│       │   ├── matrix_vector_product_kernel.cpp
│       │   ├── max_pool_kernel.cpp
│       │   ├── max_pool_kernel.cu
│       │   ├── max_pool_kernel_util.cpp
│       │   ├── max_pool_kernel_util.h
│       │   ├── max_unpool_kernel.cpp
│       │   ├── max_unpool_kernel.cu
│       │   ├── max_unpool_kernel_util.cpp
│       │   ├── max_unpool_kernel_util.h
│       │   ├── median_kernel.cpp
│       │   ├── median_kernel.cu
│       │   ├── median_with_indices_kernel.cpp
│       │   ├── median_with_indices_kernel.cu
│       │   ├── min_max_observer_kernel.cpp
│       │   ├── min_max_observer_kernel.cu
│       │   ├── mode_kernel.cpp
│       │   ├── model_update_kernel_util.cpp
│       │   ├── model_update_kernel_util.cu
│       │   ├── model_update_kernel_util.h
│       │   ├── model_update_kernels.cpp
│       │   ├── moving_average_min_max_observer_kernel.cpp
│       │   ├── moving_average_min_max_observer_kernel.cu
│       │   ├── multi_reduce_kernel_util.h
│       │   ├── multi_reduce_kernels.cpp
│       │   ├── multi_reduce_kernels.cu
│       │   ├── multi_reduce_kernels.h
│       │   ├── multi_tensor_model_update_kernel.cpp
│       │   ├── multi_tensor_model_update_kernel_util.cu
│       │   ├── multi_tensor_model_update_kernel_util.h
│       │   ├── mutable_cast_once_kernel.cpp
│       │   ├── narrow_kernel.cpp
│       │   ├── nccl_logical_2d_sbp_kernels.cpp
│       │   ├── nccl_logical_fusion_kernel.cpp
│       │   ├── nccl_logical_kernels.cpp
│       │   ├── nccl_logical_send_recv_kernel.cpp
│       │   ├── nd_index_slice_kernels.cpp
│       │   ├── nd_index_slice_kernels.cu
│       │   ├── nd_index_slice_kernels.h
│       │   ├── nd_index_slice_util.h
│       │   ├── nll_kernel.cpp
│       │   ├── nll_kernel_util.cpp
│       │   ├── nll_kernel_util.cu
│       │   ├── nll_kernel_util.h
│       │   ├── nms_kernel.cpp
│       │   ├── nms_kernel.cu
│       │   ├── noncontiguous_binary_op.cu
│       │   ├── nop_kernel.cpp
│       │   ├── normalization_kernel.cpp
│       │   ├── normalization_kernel.cu
│       │   ├── nvtx_range_kernel.cu
│       │   ├── ofrecord_decoder_kernels.cpp
│       │   ├── ofrecord_image_classification_reader_kernel.cpp
│       │   ├── ofrecord_reader_kernel.cpp
│       │   ├── one_embedding_data_shuffle.cuh
│       │   ├── one_embedding_embedding_gradient_shuffle_p2p_kernel.cu
│       │   ├── one_embedding_embedding_shuffle_p2p_kernel.cu
│       │   ├── one_embedding_id_shuffle_p2p_kernel.cu
│       │   ├── one_embedding_kernels.cu
│       │   ├── one_embedding_update_kernels.cu
│       │   ├── one_hot_kernel.cpp
│       │   ├── one_hot_kernel.cu
│       │   ├── ones_like_kernel.cpp
│       │   ├── op_kernel_wrapper.h
│       │   ├── p2p_comm_kernel.cpp
│       │   ├── pack_kernel.cpp
│       │   ├── pad_kernel.cpp
│       │   ├── partial_fc_sample_kernel.cu
│       │   ├── pocketfft_hdronly.h
│       │   ├── pocketfftplan.h
│       │   ├── prelu_kernel.cpp
│       │   ├── prelu_kernel.cu
│       │   ├── quantization_kernel.cpp
│       │   ├── quantization_kernel.cu
│       │   ├── radix_sort.cuh
│       │   ├── random_crop_kernel_state.cpp
│       │   ├── random_crop_kernel_state.h
│       │   ├── random_mask_generator.cpp
│       │   ├── random_mask_generator.cu
│       │   ├── random_mask_generator.h
│       │   ├── random_mask_like_kernel.cpp
│       │   ├── random_mask_like_kernel.h
│       │   ├── random_seed_util.cpp
│       │   ├── random_seed_util.h
│       │   ├── randperm_kernel.cpp
│       │   ├── randperm_kernel.cu
│       │   ├── raw_reader_kernel.cpp
│       │   ├── reduce_kernel.cpp
│       │   ├── reduce_like_kernels.cpp
│       │   ├── reflection_pad_kernels.cpp
│       │   ├── reflection_pad_kernels_util.cpp
│       │   ├── reflection_pad_kernels_util.cu
│       │   ├── reflection_pad_kernels_util.h
│       │   ├── repeat_interleave_kernel.cpp
│       │   ├── repeat_interleave_kernel.cu
│       │   ├── replication_pad_kernels.cpp
│       │   ├── replication_pad_kernels_util.cpp
│       │   ├── replication_pad_kernels_util.cu
│       │   ├── replication_pad_kernels_util.h
│       │   ├── rms_norm_gpu_kernel.cu
│       │   ├── roc_auc_score_kernel.cpp
│       │   ├── roi_align_kernel.cu
│       │   ├── roll_kernel.cpp
│       │   ├── roll_kernel.cu
│       │   ├── roll_kernel_utils.h
│       │   ├── rrelu_kernel.cpp
│       │   ├── rrelu_kernel.cu
│       │   ├── same_padding_kernel.cpp
│       │   ├── scalar_bitwise_kernels.cpp
│       │   ├── scalar_by_tensor_kernel.cpp
│       │   ├── scalar_logical_kernels.cpp
│       │   ├── scalar_math_kernels.cpp
│       │   ├── scaled_dot_product_attention_grad_kernel.cu
│       │   ├── scaled_dot_product_attention_kernel.cu
│       │   ├── scaled_dot_product_attention_kernel.h
│       │   ├── scaled_dot_product_attention_util.h
│       │   ├── search_sorted_kernel.cpp
│       │   ├── search_sorted_kernel.cu
│       │   ├── search_sorted_kernel_util.h
│       │   ├── sigmoid_cross_entropy_kernel.cpp
│       │   ├── sigmoid_cross_entropy_kernel.cu
│       │   ├── sigmoid_cross_entropy_kernel.h
│       │   ├── skip_layer_norm_kernel.cu
│       │   ├── skip_rms_norm_kernel.cu
│       │   ├── slice_kernel.cpp
│       │   ├── slice_util.cpp
│       │   ├── slice_util.cu
│       │   ├── slice_util.h
│       │   ├── smooth_l1_loss_kernel.cpp
│       │   ├── smooth_l1_loss_kernel.cu
│       │   ├── softmax_cross_entropy_kernel.cpp
│       │   ├── softmax_cross_entropy_kernel.cu
│       │   ├── softmax_cross_entropy_kernel.h
│       │   ├── softmax_kernel.cpp
│       │   ├── sort_kernel.cpp
│       │   ├── sort_kernel.cu
│       │   ├── sparse_cross_entropy_kernel.cpp
│       │   ├── sparse_cross_entropy_kernel_util.cpp
│       │   ├── sparse_cross_entropy_kernel_util.cu
│       │   ├── sparse_cross_entropy_kernel_util.h
│       │   ├── sparse_softmax_cross_entropy_kernel.cpp
│       │   ├── sparse_softmax_cross_entropy_kernel.cu
│       │   ├── sparse_softmax_cross_entropy_kernel_util.cpp
│       │   ├── sparse_softmax_cross_entropy_kernel_util.cu
│       │   ├── sparse_softmax_cross_entropy_kernel_util.h
│       │   ├── split_like_kernel.cpp
│       │   ├── sqrt_square_sum_kernel.cpp
│       │   ├── sqrt_square_sum_kernel_util.cpp
│       │   ├── sqrt_square_sum_kernel_util.cu
│       │   ├── sqrt_square_sum_kernel_util.h
│       │   ├── square_sum_kernel.cpp
│       │   ├── square_sum_kernel_util.cpp
│       │   ├── square_sum_kernel_util.cu
│       │   ├── square_sum_kernel_util.h
│       │   ├── ssp_variable_proxy_kernel.cpp
│       │   ├── stack_kernel.cpp
│       │   ├── stateful_opkernel.cpp
│       │   ├── stateful_opkernel.h
│       │   ├── summary_kernels.cpp
│       │   ├── tensor_buffer_kernels.cpp
│       │   ├── tensor_constant_kernel.cpp
│       │   ├── tf_pool_cpu_kernel.cpp
│       │   ├── tf_pool_gpu_kernel.cpp
│       │   ├── tf_prelu_kernel.cpp
│       │   ├── tf_prelu_kernel.cu
│       │   ├── throw_error_kernel.cpp
│       │   ├── to_contiguous_kernel.cpp
│       │   ├── to_contiguous_kernel.cu
│       │   ├── to_contiguous_kernel.h
│       │   ├── top_k_kernel.cpp
│       │   ├── top_k_kernel.cu
│       │   ├── transpose_kernel.cpp
│       │   ├── tril_kernel.cpp
│       │   ├── tril_kernel.cu
│       │   ├── triu_kernel.cpp
│       │   ├── triu_kernel.cu
│       │   ├── tuple_identity_kernel.cpp
│       │   ├── two_stage_reduce_kernel.cpp
│       │   ├── two_stage_reduce_kernel_util.cpp
│       │   ├── two_stage_reduce_kernel_util.cu
│       │   ├── two_stage_reduce_kernel_util.h
│       │   ├── unfold_kernel.cpp
│       │   ├── unfold_kernel_util.cpp
│       │   ├── unfold_kernel_util.cu
│       │   ├── unfold_kernel_util.h
│       │   ├── unfold_tensor_kernel.cpp
│       │   ├── unfold_tensor_kernel.cu
│       │   ├── unfold_tensor_kernel_utils.h
│       │   ├── unique_kernel.cpp
│       │   ├── unique_kernel_util.cpp
│       │   ├── unique_kernel_util.cu
│       │   ├── unique_kernel_util.h
│       │   ├── unique_with_counts_kernel.cpp
│       │   ├── unpack_kernel.cpp
│       │   ├── unsorted_batch_segment_sum_kernel.cpp
│       │   ├── unsorted_segment_sum_kernel.cpp
│       │   ├── unsorted_segment_sum_kernel_util.cpp
│       │   ├── unsorted_segment_sum_kernel_util.cu
│       │   ├── unsorted_segment_sum_kernel_util.h
│       │   ├── upsample_bicubic_2d_kernel.cpp
│       │   ├── upsample_bicubic_2d_kernel.cu
│       │   ├── upsample_bilinear_2d_kernel.cpp
│       │   ├── upsample_bilinear_2d_kernel.cu
│       │   ├── upsample_kernel.h
│       │   ├── upsample_linear_1d_kernel.cpp
│       │   ├── upsample_linear_1d_kernel.cu
│       │   ├── upsample_nearest_kernel.cpp
│       │   ├── upsample_nearest_kernel.cu
│       │   ├── upsample_trilinear_3d_kernel.cpp
│       │   ├── upsample_trilinear_3d_kernel.cu
│       │   ├── util_ops_kernels.cpp
│       │   ├── variance_kernel.cpp
│       │   ├── variance_kernel_util.cpp
│       │   ├── variance_kernel_util.cu
│       │   ├── variance_kernel_util.h
│       │   ├── vector_matrix_product_kernel.cpp
│       │   ├── where_kernel.cpp
│       │   ├── where_kernel_util.cpp
│       │   ├── where_kernel_util.cu
│       │   ├── where_kernel_util.h
│       │   └── zero_like_kernel.cpp
│       ├── ops/
│       │   ├── acc_ctrl_tick_op.cpp
│       │   ├── acc_op.cpp
│       │   ├── adaptive_max_pool_op.cpp
│       │   ├── adaptive_pool_op.cpp
│       │   ├── add_n_op.cpp
│       │   ├── affine_grid_op.cpp
│       │   ├── amp_white_identity_op.cpp
│       │   ├── arange_op.cpp
│       │   ├── arg_sort_op.cpp
│       │   ├── arg_where_op.cpp
│       │   ├── argmax_op.cpp
│       │   ├── as_strided_op.cpp
│       │   ├── assign_op.cpp
│       │   ├── avg_pool_op.cpp
│       │   ├── batch_gather_op.cpp
│       │   ├── batch_norm_backward_elemt_op.cpp
│       │   ├── batch_norm_backward_reduce_op.cpp
│       │   ├── batch_norm_elemt_op.cpp
│       │   ├── batch_norm_gather_stats_with_counts_op.cpp
│       │   ├── batch_norm_stats_op.cpp
│       │   ├── bernoulli_op.cpp
│       │   ├── bias_add_op.cpp
│       │   ├── binary_cross_entropy_op.cpp
│       │   ├── binary_cross_entropy_with_logits_op.cpp
│       │   ├── binary_cross_entropy_with_logits_reduce_mean_op.cpp
│       │   ├── bincount_op.cpp
│       │   ├── broadcast_div_grad_op.cpp
│       │   ├── broadcast_like_op.cpp
│       │   ├── buffer_op.cpp
│       │   ├── cast_like_op.cpp
│       │   ├── cast_op.cpp
│       │   ├── cast_to_static_shape_op.cpp
│       │   ├── cast_to_tick_op.cpp
│       │   ├── categorical_ordinal_encode_op.cpp
│       │   ├── celu_op.cpp
│       │   ├── clip_by_value_op.cpp
│       │   ├── coco_reader_op.cpp
│       │   ├── combined_margin_loss_op.cpp
│       │   ├── comm_net_device_infer_util.cpp
│       │   ├── comm_net_device_infer_util.h
│       │   ├── complex_ops.cpp
│       │   ├── concat_op.cpp
│       │   ├── constant_op.cpp
│       │   ├── conv_op.cpp
│       │   ├── convert_memory_format_op.cpp
│       │   ├── convert_memory_format_op.h
│       │   ├── copy_hd_op.cpp
│       │   ├── copy_op.cpp
│       │   ├── count_not_finite_op.cpp
│       │   ├── ctc_loss_op.cpp
│       │   ├── cublas_bias_add_relu_matmul_grad_op.cpp
│       │   ├── cublas_fused_matmul_bias_add_grad_op.cpp
│       │   ├── cublas_fused_mlp_grad_op.cpp
│       │   ├── cublas_fused_mlp_op.cpp
│       │   ├── cum_ops.cpp
│       │   ├── data_shuffle_op.cpp
│       │   ├── deconv_op.cpp
│       │   ├── deform_conv_op.cpp
│       │   ├── depend_op.cpp
│       │   ├── det_op.cpp
│       │   ├── diag_op.cpp
│       │   ├── diagonal_op.cpp
│       │   ├── dim_gather_op.cpp
│       │   ├── dim_scatter_ops.cpp
│       │   ├── distributions/
│       │   │   ├── exponential_op.cpp
│       │   │   ├── multinomial_with_replacement_op.cpp
│       │   │   ├── normal_op.cpp
│       │   │   ├── uniform_int_op.cpp
│       │   │   └── uniform_op.cpp
│       │   ├── dot_op.cpp
│       │   ├── dropout_op.cpp
│       │   ├── dynamic_loss_scale_schedule_op.cpp
│       │   ├── eager_b_to_s_op.cpp
│       │   ├── eager_ccl_ops.cpp
│       │   ├── eager_p_to_b_op.cpp
│       │   ├── eager_p_to_s_op.cpp
│       │   ├── eager_s_to_b_op.cpp
│       │   ├── eager_s_to_p_op.cpp
│       │   ├── eager_s_to_s_op.cpp
│       │   ├── eager_symmetric_s_to_p_op.cpp
│       │   ├── elementwise_maximum_minimum_ops.cpp
│       │   ├── elu_op.cpp
│       │   ├── embedding_op.cpp
│       │   ├── empty_op.cpp
│       │   ├── erfinv_op.cpp
│       │   ├── expand_dims_op.cpp
│       │   ├── expand_op.cpp
│       │   ├── eye_op.cpp
│       │   ├── fake_quantization_op.cpp
│       │   ├── fft_ops.cpp
│       │   ├── fill_op.cpp
│       │   ├── flip_op.cpp
│       │   ├── frac_op.cpp
│       │   ├── fused_attention_ops.cpp
│       │   ├── fused_bias_add_op.cpp
│       │   ├── fused_bias_add_scale_mask_softmax_dropout_op.cpp
│       │   ├── fused_cast_scale_op.cpp
│       │   ├── fused_center_op.cpp
│       │   ├── fused_clip_grad_ops.cpp
│       │   ├── fused_codegeex_qkv_reshape.cpp
│       │   ├── fused_cross_feature_interaction_op.cpp
│       │   ├── fused_dot_feature_interaction_op.cpp
│       │   ├── fused_get_boundding_boxes_coord_op.cpp
│       │   ├── fused_get_ciou_diagonal_angle_op.cpp
│       │   ├── fused_get_ciou_result_op.cpp
│       │   ├── fused_get_convex_diagonal_squared_op.cpp
│       │   ├── fused_get_intersection_area_op.cpp
│       │   ├── fused_get_iou_op.cpp
│       │   ├── fused_glu_op.cpp
│       │   ├── fused_glu_without_linear_grad_op.cpp
│       │   ├── fused_gru_cell_op.cpp
│       │   ├── fused_linear_with_groupwise_quantized_weight_op.cpp
│       │   ├── fused_lstm_cell_op.cpp
│       │   ├── fused_matmul_bias_add_relu_dropout_op.cpp
│       │   ├── fused_matmul_bias_op.cpp
│       │   ├── fused_relu_dropout_grad_op.cpp
│       │   ├── fused_scale_mask_bias_softmax_op.cpp
│       │   ├── fused_scale_mask_softmax_dropout_op.cpp
│       │   ├── fused_scale_mask_softmax_op.cpp
│       │   ├── fused_scale_tril_softmax_mask_scale_op.cpp
│       │   ├── fused_self_attention_query_mul_key_and_value_ops.cpp
│       │   ├── fused_weighted_sum_op.cpp
│       │   ├── gather_op.cpp
│       │   ├── gelu_op.cpp
│       │   ├── generate_random_batch_permutation_indices_op.cpp
│       │   ├── gpt_data_loader_op.cpp
│       │   ├── greater_inplace_op.cpp
│       │   ├── grid_sample_op.cpp
│       │   ├── group_norm_op.cpp
│       │   ├── grouped_matmul_bias_op.cpp
│       │   ├── groupwise_dequantize_op.cpp
│       │   ├── hardshrink_op.cpp
│       │   ├── hardsigmoid_op.cpp
│       │   ├── hardswish_op.cpp
│       │   ├── hardtanh_op.cpp
│       │   ├── hierarchical_parallel_cast_op.cpp
│       │   ├── identity_op.cpp
│       │   ├── image_batch_align_op.cpp
│       │   ├── image_decode_op.cpp
│       │   ├── image_object_preprocess_ops.cpp
│       │   ├── image_preprocess_ops.cpp
│       │   ├── image_resize_ops.cpp
│       │   ├── image_target_resize_op.cpp
│       │   ├── in_top_k_op.cpp
│       │   ├── index_add_op.cpp
│       │   ├── indexed_slices_reduce_sum_op.cpp
│       │   ├── inv_op.cpp
│       │   ├── kl_div_op.cpp
│       │   ├── l1_l2_regularize_gradient_op.cpp
│       │   ├── l2_normalize_op.cpp
│       │   ├── layer_norm_op.cpp
│       │   ├── leaky_relu_op.cpp
│       │   ├── lerp_op.cpp
│       │   ├── linalg_cross_op.cpp
│       │   ├── log_softmax_op.cpp
│       │   ├── logical_not_op.cpp
│       │   ├── loss_op_util.cpp
│       │   ├── loss_op_util.h
│       │   ├── lu_composition_op.cpp
│       │   ├── masked_fill_op.cpp
│       │   ├── math_binary_broadcast_ops.cpp
│       │   ├── math_binary_broadcast_seq.h
│       │   ├── math_binary_elementwise_ops.cpp
│       │   ├── math_binary_elementwise_seq.h
│       │   ├── math_unary_elementwise_op.cpp
│       │   ├── math_unary_elementwise_seq.h
│       │   ├── matmul_op.cpp
│       │   ├── matrix_vector_product_op.cpp
│       │   ├── max_pool_op.cpp
│       │   ├── max_unpool_op.cpp
│       │   ├── median_op.cpp
│       │   ├── median_with_indices_op.cpp
│       │   ├── min_max_observer_op.cpp
│       │   ├── mish_op.cpp
│       │   ├── mode_op.cpp
│       │   ├── model_update_ops.cpp
│       │   ├── moving_average_min_max_observer_op.cpp
│       │   ├── multi_reduce_ops.cpp
│       │   ├── multi_tensor_model_update_ops.cpp
│       │   ├── mutable_cast_once_op.cpp
│       │   ├── narrow_op.cpp
│       │   ├── nccl_logical_2d_sbp_ops.cpp
│       │   ├── nccl_logical_fusion_op.cpp
│       │   ├── nccl_logical_ops.cpp
│       │   ├── nccl_logical_util.cpp
│       │   ├── nccl_logical_util.h
│       │   ├── nd_index_slice_ops.cpp
│       │   ├── nll_op.cpp
│       │   ├── nms_op.cpp
│       │   ├── nn_util.cpp
│       │   ├── nn_util.h
│       │   ├── noncontiguous_binary_op.cpp
│       │   ├── normalization_op.cpp
│       │   ├── nvtx_range_op.cpp
│       │   ├── ofrecord_decoder_ops.cpp
│       │   ├── ofrecord_image_classification_reader_op.cpp
│       │   ├── ofrecord_reader_op.cpp
│       │   ├── one_embedding_ops.cpp
│       │   ├── one_hot_op.cpp
│       │   ├── ones_like_op.cpp
│       │   ├── p2p_comm_op.cpp
│       │   ├── pack_op.cpp
│       │   ├── pad_op.cpp
│       │   ├── parallel_cast_op.cpp
│       │   ├── partial_fc_sample_op.cpp
│       │   ├── pinned_identity_op.cpp
│       │   ├── prelu_op.cpp
│       │   ├── quantization_op.cpp
│       │   ├── quick_gelu_op.cpp
│       │   ├── randperm_op.cpp
│       │   ├── raw_reader_op.cpp
│       │   ├── reduce_like_ops.cpp
│       │   ├── reduce_ops.cpp
│       │   ├── reflection_pad_op.cpp
│       │   ├── relu_op.cpp
│       │   ├── repeat_interleave_op.cpp
│       │   ├── repeat_op.cpp
│       │   ├── replication_pad_op.cpp
│       │   ├── reshape_like_op.cpp
│       │   ├── reshape_op.cpp
│       │   ├── reshape_user_op_util.cpp
│       │   ├── reshape_user_op_util.h
│       │   ├── reshape_user_op_util_test.cpp
│       │   ├── rms_norm_op.cpp
│       │   ├── roc_auc_score_op.cpp
│       │   ├── roi_align_op.cpp
│       │   ├── roll_op.cpp
│       │   ├── rrelu_op.cpp
│       │   ├── same_padding_op.cpp
│       │   ├── scalar_bitwise_op.cpp
│       │   ├── scalar_by_tensor_op.cpp
│       │   ├── scalar_logical_op.cpp
│       │   ├── scalar_math_op.cpp
│       │   ├── scaled_dot_product_flash_attention_op.cpp
│       │   ├── search_sorted_op.cpp
│       │   ├── selu_op.cpp
│       │   ├── sigmoid_cross_entropy_op.cpp
│       │   ├── silu_op.cpp
│       │   ├── skip_layer_norm_op.cpp
│       │   ├── skip_rms_norm_op.cpp
│       │   ├── slice_op.cpp
│       │   ├── smooth_l1_loss_op.cpp
│       │   ├── softmax_cross_entropy_op.cpp
│       │   ├── softmax_op.cpp
│       │   ├── softplus_op.cpp
│       │   ├── softshrink_op.cpp
│       │   ├── softsign_op.cpp
│       │   ├── sort_op.cpp
│       │   ├── sparse_cross_entropy_op.cpp
│       │   ├── sparse_softmax_cross_entropy_op.cpp
│       │   ├── split_like_op.cpp
│       │   ├── sqrt_square_sum_op.cpp
│       │   ├── square_relu_op.cpp
│       │   ├── square_sum_op.cpp
│       │   ├── squeeze_op.cpp
│       │   ├── ssp_variable_proxy_op.cpp
│       │   ├── stack_op.cpp
│       │   ├── stft_op.cpp
│       │   ├── summary_ops.cpp
│       │   ├── tanh_op.cpp
│       │   ├── tensor_buffer_ops.cpp
│       │   ├── tensor_constant_op.cpp
│       │   ├── tf_pool_op.cpp
│       │   ├── tf_prelu_op.cpp
│       │   ├── threshold_op.cpp
│       │   ├── throw_error_op.cpp
│       │   ├── to_contiguous_op.cpp
│       │   ├── top_k_op.cpp
│       │   ├── transpose_ops.cpp
│       │   ├── tril_op.cpp
│       │   ├── triu_op.cpp
│       │   ├── trunc_op.cpp
│       │   ├── tuple_identity_op.cpp
│       │   ├── two_stage_reduce_ops.cpp
│       │   ├── unfold_fold_op.cpp
│       │   ├── unfold_tensor_op.cpp
│       │   ├── unique_op.cpp
│       │   ├── unique_with_counts_op.cpp
│       │   ├── unpack_op.cpp
│       │   ├── unsorted_batch_segment_sum_op.cpp
│       │   ├── unsorted_segment_sum_op.cpp
│       │   ├── upsample_op.cpp
│       │   ├── util_ops.cpp
│       │   ├── variance_op.cpp
│       │   ├── vector_matrix_product_op.cpp
│       │   ├── where_op.cpp
│       │   └── zero_like_op.cpp
│       ├── summary/
│       │   ├── crc32c.h
│       │   ├── env_time.h
│       │   ├── event_writer_helper.cpp
│       │   ├── event_writer_helper.h
│       │   ├── events_writer.cpp
│       │   ├── events_writer.h
│       │   ├── histogram.cpp
│       │   ├── histogram.h
│       │   ├── plan_to_physical_graph.cpp
│       │   ├── plan_to_physical_graph.h
│       │   └── summary_converter.h
│       └── utils/
│           ├── pool_util.cpp
│           └── pool_util.h
├── python/
│   ├── .gitignore
│   ├── oneflow/
│   │   ├── _C/
│   │   │   ├── __init__.py
│   │   │   └── _nn.py
│   │   ├── __init__.py
│   │   ├── __main__.py
│   │   ├── _dynamo/
│   │   │   └── __init__.py
│   │   ├── _utils.py
│   │   ├── amp/
│   │   │   ├── __init__.py
│   │   │   ├── autocast_mode.py
│   │   │   └── grad_scaler.py
│   │   ├── ao/
│   │   │   └── quantization.py
│   │   ├── asyncs/
│   │   │   ├── __init__.py
│   │   │   └── thread.py
│   │   ├── autograd/
│   │   │   ├── __init__.py
│   │   │   ├── autograd.py
│   │   │   ├── autograd_function.py
│   │   │   ├── autograd_mode.py
│   │   │   ├── functional.py
│   │   │   ├── graph.py
│   │   │   └── profiler.py
│   │   ├── autoprof/
│   │   │   ├── __init__.py
│   │   │   ├── __main__.py
│   │   │   └── util.py
│   │   ├── backends/
│   │   │   ├── __init__.py
│   │   │   ├── cuda/
│   │   │   │   └── __init__.py
│   │   │   ├── cudnn/
│   │   │   │   └── __init__.py
│   │   │   └── mps/
│   │   │       └── __init__.py
│   │   ├── boxing/
│   │   │   ├── __init__.py
│   │   │   └── nccl/
│   │   │       └── __init__.py
│   │   ├── comm/
│   │   │   ├── __init__.py
│   │   │   └── comm_ops.py
│   │   ├── cuda/
│   │   │   ├── __init__.py
│   │   │   ├── _utils.py
│   │   │   ├── amp/
│   │   │   │   ├── __init__.py
│   │   │   │   └── autocast_mode.py
│   │   │   ├── random.py
│   │   │   └── type_tensor.py
│   │   ├── data.py
│   │   ├── distributed/
│   │   │   ├── __init__.py
│   │   │   ├── constants.py
│   │   │   └── launch.py
│   │   ├── distributions/
│   │   │   ├── __init__.py
│   │   │   ├── categorical.py
│   │   │   ├── distribution.py
│   │   │   └── utils.py
│   │   ├── env.py
│   │   ├── experimental/
│   │   │   └── load_mnist.py
│   │   ├── fft/
│   │   │   └── __init__.py
│   │   ├── framework/
│   │   │   ├── __init__.py
│   │   │   ├── args_tree.py
│   │   │   ├── attr_util.py
│   │   │   ├── balanced_splitter.py
│   │   │   ├── c_api_util.py
│   │   │   ├── check_point_v2.py
│   │   │   ├── config_util.py
│   │   │   ├── distribute.py
│   │   │   ├── docstr/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── activation.py
│   │   │   │   ├── addcdiv.py
│   │   │   │   ├── amax.py
│   │   │   │   ├── amin.py
│   │   │   │   ├── arange.py
│   │   │   │   ├── argsort.py
│   │   │   │   ├── array_ops.py
│   │   │   │   ├── as_tensor.py
│   │   │   │   ├── autograd.py
│   │   │   │   ├── baddbmm.py
│   │   │   │   ├── bitwise_ops.py
│   │   │   │   ├── bmm.py
│   │   │   │   ├── broadcast_like.py
│   │   │   │   ├── cast.py
│   │   │   │   ├── chunk.py
│   │   │   │   ├── clamp.py
│   │   │   │   ├── comm.py
│   │   │   │   ├── comparison.py
│   │   │   │   ├── constant.py
│   │   │   │   ├── conv.py
│   │   │   │   ├── convolution.py
│   │   │   │   ├── ctc_decode.py
│   │   │   │   ├── dataset.py
│   │   │   │   ├── deconv.py
│   │   │   │   ├── depend.py
│   │   │   │   ├── distance.py
│   │   │   │   ├── dropout.py
│   │   │   │   ├── einsum.py
│   │   │   │   ├── erfinv.py
│   │   │   │   ├── expand.py
│   │   │   │   ├── flatten.py
│   │   │   │   ├── flip.py
│   │   │   │   ├── hann_window.py
│   │   │   │   ├── in_top_k.py
│   │   │   │   ├── index_add.py
│   │   │   │   ├── index_select.py
│   │   │   │   ├── inv.py
│   │   │   │   ├── is_floating_point.py
│   │   │   │   ├── lerp.py
│   │   │   │   ├── linalg.py
│   │   │   │   ├── logaddexp.py
│   │   │   │   ├── logical_ops.py
│   │   │   │   ├── loss.py
│   │   │   │   ├── masked_fill.py
│   │   │   │   ├── math_ops.py
│   │   │   │   ├── meshgrid.py
│   │   │   │   ├── module.py
│   │   │   │   ├── nms.py
│   │   │   │   ├── nonzero.py
│   │   │   │   ├── norm.py
│   │   │   │   ├── normalization.py
│   │   │   │   ├── oneflow.py
│   │   │   │   ├── onehot.py
│   │   │   │   ├── pooling.py
│   │   │   │   ├── quantile.py
│   │   │   │   ├── random.py
│   │   │   │   ├── reduce_ops.py
│   │   │   │   ├── repeat.py
│   │   │   │   ├── repeat_interleave.py
│   │   │   │   ├── roc_auc_score.py
│   │   │   │   ├── searchsorted.py
│   │   │   │   ├── sort.py
│   │   │   │   ├── special_ops.py
│   │   │   │   ├── split.py
│   │   │   │   ├── swapaxes.py
│   │   │   │   ├── swapdims.py
│   │   │   │   ├── tensor.py
│   │   │   │   ├── tensor_attributes.py
│   │   │   │   ├── tensor_ops.py
│   │   │   │   ├── tensor_t.py
│   │   │   │   ├── tensordot.py
│   │   │   │   ├── tile.py
│   │   │   │   ├── topk.py
│   │   │   │   ├── trigonometric_ops.py
│   │   │   │   ├── unbind.py
│   │   │   │   ├── util_ops.py
│   │   │   │   ├── utils.py
│   │   │   │   ├── vision.py
│   │   │   │   └── where.py
│   │   │   ├── dtype.py
│   │   │   ├── env_util.py
│   │   │   ├── function_desc.py
│   │   │   ├── function_util.py
│   │   │   ├── generator.py
│   │   │   ├── graph_build_util.py
│   │   │   ├── hob.py
│   │   │   ├── id_util.py
│   │   │   ├── infer_compiler/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── import_tools/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── format_utils.py
│   │   │   │   │   └── importer.py
│   │   │   │   ├── transform/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── builtin_transform.py
│   │   │   │   │   ├── custom_transform.py
│   │   │   │   │   └── manager.py
│   │   │   │   ├── utils/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── args_tree_util.py
│   │   │   │   │   ├── cost_util.py
│   │   │   │   │   ├── log_utils.py
│   │   │   │   │   ├── oneflow_exec_mode.py
│   │   │   │   │   ├── param_utils.py
│   │   │   │   │   ├── patch_for_compiler.py
│   │   │   │   │   └── patch_for_diffusers.py
│   │   │   │   ├── with_fx_graph.py
│   │   │   │   ├── with_fx_interpreter.py
│   │   │   │   ├── with_oneflow_backend.py
│   │   │   │   └── with_oneflow_compile.py
│   │   │   ├── job_set_util.py
│   │   │   ├── model.py
│   │   │   ├── multi_client_session.py
│   │   │   ├── register_class_method_util.py
│   │   │   ├── scope_util.py
│   │   │   ├── session_context.py
│   │   │   ├── sysconfig.py
│   │   │   ├── tensor.py
│   │   │   ├── tensor_str.py
│   │   │   ├── tensor_str_util.py
│   │   │   ├── tensor_tuple_util.py
│   │   │   ├── type_tensor.py
│   │   │   └── unittest.py
│   │   ├── fx/
│   │   │   └── __init__.py
│   │   ├── hub.py
│   │   ├── ir/
│   │   │   ├── __main__.py
│   │   │   ├── ast_gen_transformer.py
│   │   │   ├── bisect_transformer.py
│   │   │   ├── lr_jit.py
│   │   │   ├── math_params_transformer.py
│   │   │   └── self_params_transformer.py
│   │   ├── jit/
│   │   │   ├── __init__.py
│   │   │   └── annotations.py
│   │   ├── library.py
│   │   ├── linalg.py
│   │   ├── mock_torch/
│   │   │   ├── __init__.py
│   │   │   ├── __main__.py
│   │   │   ├── dyn_mock_mod.py
│   │   │   ├── mock_importer.py
│   │   │   ├── mock_modules.py
│   │   │   ├── mock_utils.py
│   │   │   └── torch/
│   │   │       └── __init__.py
│   │   ├── model.py
│   │   ├── multiprocessing/
│   │   │   ├── __init__.py
│   │   │   ├── _atfork.py
│   │   │   ├── pool.py
│   │   │   ├── queue.py
│   │   │   ├── reductions.py
│   │   │   ├── shared_memory/
│   │   │   │   └── __init__.py
│   │   │   └── spawn.py
│   │   ├── nn/
│   │   │   ├── __init__.py
│   │   │   ├── common_types.py
│   │   │   ├── functional/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── batch_norm.py
│   │   │   │   ├── ctc_loss.py
│   │   │   │   ├── deform_conv.py
│   │   │   │   ├── depend.py
│   │   │   │   ├── maxpool.py
│   │   │   │   ├── pad.py
│   │   │   │   └── softmax.py
│   │   │   ├── graph/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── cache.py
│   │   │   │   ├── graph.py
│   │   │   │   ├── graph_block.py
│   │   │   │   ├── graph_config.py
│   │   │   │   ├── optimizer.py
│   │   │   │   ├── proxy.py
│   │   │   │   └── util.py
│   │   │   ├── image.py
│   │   │   ├── init.py
│   │   │   ├── modules/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── _functions.py
│   │   │   │   ├── activation.py
│   │   │   │   ├── affine_grid.py
│   │   │   │   ├── all_reduce.py
│   │   │   │   ├── arange.py
│   │   │   │   ├── argsort.py
│   │   │   │   ├── argwhere.py
│   │   │   │   ├── as_tensor.py
│   │   │   │   ├── batchnorm.py
│   │   │   │   ├── batchnorm_fused.py
│   │   │   │   ├── broadcast_ops.py
│   │   │   │   ├── constant.py
│   │   │   │   ├── container.py
│   │   │   │   ├── conv.py
│   │   │   │   ├── dataset.py
│   │   │   │   ├── distance.py
│   │   │   │   ├── distributed_partial_fc_sample.py
│   │   │   │   ├── dropout.py
│   │   │   │   ├── einsum.py
│   │   │   │   ├── empty.py
│   │   │   │   ├── expand.py
│   │   │   │   ├── fake_quantization.py
│   │   │   │   ├── flatten.py
│   │   │   │   ├── fold.py
│   │   │   │   ├── fused_mlp.py
│   │   │   │   ├── global_cast.py
│   │   │   │   ├── grid_sample.py
│   │   │   │   ├── instancenorm.py
│   │   │   │   ├── interpolate.py
│   │   │   │   ├── is_tensor.py
│   │   │   │   ├── linear.py
│   │   │   │   ├── linspace.py
│   │   │   │   ├── logspace.py
│   │   │   │   ├── loss.py
│   │   │   │   ├── masked_select.py
│   │   │   │   ├── math_ops.py
│   │   │   │   ├── meshgrid.py
│   │   │   │   ├── min_max_observer.py
│   │   │   │   ├── module.py
│   │   │   │   ├── moving_average_min_max_observer.py
│   │   │   │   ├── nms.py
│   │   │   │   ├── nonzero.py
│   │   │   │   ├── norm.py
│   │   │   │   ├── normalization.py
│   │   │   │   ├── numel.py
│   │   │   │   ├── padding.py
│   │   │   │   ├── pixelshuffle.py
│   │   │   │   ├── pooling.py
│   │   │   │   ├── quantization.py
│   │   │   │   ├── reshape.py
│   │   │   │   ├── rnn.py
│   │   │   │   ├── roll.py
│   │   │   │   ├── scatter.py
│   │   │   │   ├── slice.py
│   │   │   │   ├── sparse.py
│   │   │   │   ├── sparse_softmax_cross_entropy.py
│   │   │   │   ├── tensor_buffer.py
│   │   │   │   ├── tensordot.py
│   │   │   │   ├── trigonometric_ops.py
│   │   │   │   ├── unique.py
│   │   │   │   ├── upsampling.py
│   │   │   │   ├── utils.py
│   │   │   │   └── where.py
│   │   │   ├── optimizer/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── adadelta.py
│   │   │   │   ├── adagrad.py
│   │   │   │   ├── adam.py
│   │   │   │   ├── adamw.py
│   │   │   │   ├── chained_scheduler.py
│   │   │   │   ├── constant_lr.py
│   │   │   │   ├── cosine_annealing_lr.py
│   │   │   │   ├── cosine_annealing_warm_restarts.py
│   │   │   │   ├── cosine_decay_lr.py
│   │   │   │   ├── exponential_lr.py
│   │   │   │   ├── lamb.py
│   │   │   │   ├── lambda_lr.py
│   │   │   │   ├── lbfgs.py
│   │   │   │   ├── linear_lr.py
│   │   │   │   ├── lr_scheduler.py
│   │   │   │   ├── multiplicative_lr.py
│   │   │   │   ├── multistep_lr.py
│   │   │   │   ├── polynomial_lr.py
│   │   │   │   ├── reduce_lr_on_plateau.py
│   │   │   │   ├── rmsprop.py
│   │   │   │   ├── sequential_lr.py
│   │   │   │   ├── sgd.py
│   │   │   │   ├── step_lr.py
│   │   │   │   ├── swa_utils.py
│   │   │   │   └── warmup_lr.py
│   │   │   ├── parallel/
│   │   │   │   ├── __init__.py
│   │   │   │   └── distributed.py
│   │   │   ├── parameter.py
│   │   │   ├── qat/
│   │   │   │   ├── __init__.py
│   │   │   │   └── conv.py
│   │   │   └── utils/
│   │   │       ├── __init__.py
│   │   │       ├── clip_grad.py
│   │   │       ├── container.py
│   │   │       ├── convert_parameters.py
│   │   │       ├── parameters_grouping.py
│   │   │       ├── prune.py
│   │   │       ├── rnn.py
│   │   │       ├── skip_init.py
│   │   │       └── weight_norm.py
│   │   ├── one_embedding.py
│   │   ├── onnx/
│   │   │   ├── __init__.py
│   │   │   └── symbolic_helper.py
│   │   ├── ops/
│   │   │   ├── __init__.py
│   │   │   ├── array_ops.py
│   │   │   ├── stateful_ops.py
│   │   │   ├── transpose_util.py
│   │   │   └── util/
│   │   │       ├── __init__.py
│   │   │       └── initializer_util.py
│   │   ├── optim/
│   │   │   ├── __init__.py
│   │   │   ├── lr_scheduler.py
│   │   │   ├── optimizer.py
│   │   │   └── swa_utils.py
│   │   ├── profiler/
│   │   │   ├── __init__.py
│   │   │   ├── events.py
│   │   │   ├── profiler.py
│   │   │   └── util.py
│   │   ├── remat/
│   │   │   └── __init__.py
│   │   ├── sbp.py
│   │   ├── special/
│   │   │   ├── __init__.py
│   │   │   └── special_ops.py
│   │   ├── support/
│   │   │   ├── __init__.py
│   │   │   ├── async_util.py
│   │   │   ├── box.py
│   │   │   ├── enable_if.py
│   │   │   ├── env_var_util.py
│   │   │   ├── func_inspect_util.py
│   │   │   ├── high_order_bool.py
│   │   │   ├── lazy.py
│   │   │   ├── pb_util.py
│   │   │   ├── scope_stack.py
│   │   │   └── traceinfo.py
│   │   ├── sysconfig.py
│   │   ├── test/
│   │   │   ├── README.md
│   │   │   ├── dataloader/
│   │   │   │   ├── data_utils.py
│   │   │   │   ├── test_cifar_dataset_multiprocess.py
│   │   │   │   ├── test_cifar_dataset_singleprocess.py
│   │   │   │   ├── test_fashion_mnist_dataset.py
│   │   │   │   ├── test_lenet.py
│   │   │   │   ├── test_mnist_dataset.py
│   │   │   │   ├── test_numpy_dataset.py
│   │   │   │   ├── test_tensor_dataset.py
│   │   │   │   └── test_transforms.py
│   │   │   ├── exceptions/
│   │   │   │   ├── test_activation.py
│   │   │   │   ├── test_add_n_op.py
│   │   │   │   ├── test_arg_sort_op.py
│   │   │   │   ├── test_array_functor.py
│   │   │   │   ├── test_autograd.py
│   │   │   │   ├── test_batch_gather_op.py
│   │   │   │   ├── test_bias_add_op.py
│   │   │   │   ├── test_binary_functor_exception.py
│   │   │   │   ├── test_bmm.py
│   │   │   │   ├── test_broadcast_ops.py
│   │   │   │   ├── test_chunk.py
│   │   │   │   ├── test_cosine_similarity.py
│   │   │   │   ├── test_deform_conv2d_op.py
│   │   │   │   ├── test_device.py
│   │   │   │   ├── test_dot.py
│   │   │   │   ├── test_error_reported_in_thread.py
│   │   │   │   ├── test_gird_sample_op.py
│   │   │   │   ├── test_global_branch_error_local_to_global_with_broadcast_sbp_1n2d.py
│   │   │   │   ├── test_global_branch_error_local_to_global_with_broadcast_sbp_1n4d.py
│   │   │   │   ├── test_global_branch_error_local_to_global_with_split_sbp.py
│   │   │   │   ├── test_global_branch_error_with_global_mean.py
│   │   │   │   ├── test_hann_window.py
│   │   │   │   ├── test_in_top_k.py
│   │   │   │   ├── test_inv.py
│   │   │   │   ├── test_layernorm.py
│   │   │   │   ├── test_linalg.py
│   │   │   │   ├── test_local_global_convert_error.py
│   │   │   │   ├── test_median.py
│   │   │   │   ├── test_mm.py
│   │   │   │   ├── test_mode.py
│   │   │   │   ├── test_multi_input_with_diff_device_or_placement.py
│   │   │   │   ├── test_mv.py
│   │   │   │   ├── test_nn_functor.py
│   │   │   │   ├── test_optim_add_param_group.py
│   │   │   │   ├── test_pad.py
│   │   │   │   ├── test_placement.py
│   │   │   │   ├── test_randperm_op.py
│   │   │   │   ├── test_reduce_like_ops.py
│   │   │   │   ├── test_reduce_ops.py
│   │   │   │   ├── test_repeat_interleave.py
│   │   │   │   ├── test_reshape.py
│   │   │   │   ├── test_reshape_like_op.py
│   │   │   │   ├── test_roi_align_op.py
│   │   │   │   ├── test_save_load.py
│   │   │   │   ├── test_saved_tensor_hooks.py
│   │   │   │   ├── test_slice_op.py
│   │   │   │   ├── test_smooth_l1_loss_op.py
│   │   │   │   ├── test_softmax_cross_entropy_op.py
│   │   │   │   ├── test_sparse_cross_entropy_op.py
│   │   │   │   ├── test_sparse_softmax_cross_entropy_op.py
│   │   │   │   ├── test_split_like_op.py
│   │   │   │   ├── test_stft_op.py
│   │   │   │   ├── test_tensor_index.py
│   │   │   │   ├── test_tensordot.py
│   │   │   │   ├── test_to_global_error.py
│   │   │   │   ├── test_view.py
│   │   │   │   └── throw_error.py
│   │   │   ├── expensive/
│   │   │   │   ├── README.md
│   │   │   │   ├── _internally_replaced_utils.py
│   │   │   │   ├── _test_remat.py
│   │   │   │   ├── pytorch_alexnet.py
│   │   │   │   ├── pytorch_convmixer.py
│   │   │   │   ├── pytorch_convnext.py
│   │   │   │   ├── pytorch_crossformer.py
│   │   │   │   ├── pytorch_densenet.py
│   │   │   │   ├── pytorch_efficientnet.py
│   │   │   │   ├── pytorch_ghostnet.py
│   │   │   │   ├── pytorch_googlenet.py
│   │   │   │   ├── pytorch_inception_v3.py
│   │   │   │   ├── pytorch_levit.py
│   │   │   │   ├── pytorch_mnasnet.py
│   │   │   │   ├── pytorch_poolformer.py
│   │   │   │   ├── pytorch_pvt.py
│   │   │   │   ├── pytorch_res2net.py
│   │   │   │   ├── pytorch_resmlp.py
│   │   │   │   ├── pytorch_resnet.py
│   │   │   │   ├── pytorch_rexnet.py
│   │   │   │   ├── pytorch_rexnetv1_lite.py
│   │   │   │   ├── pytorch_senet.py
│   │   │   │   ├── pytorch_shufflenetv2.py
│   │   │   │   ├── pytorch_squeezenet.py
│   │   │   │   ├── pytorch_swin_transformer.py
│   │   │   │   ├── pytorch_uniformer.py
│   │   │   │   ├── pytroch_mlp_mixer.py
│   │   │   │   ├── resnet50_model.py
│   │   │   │   ├── test_compatibility.py
│   │   │   │   ├── test_conv3d.py
│   │   │   │   ├── test_convtranspose.py
│   │   │   │   ├── test_dynamic_allocation_gradient_shuffle.py
│   │   │   │   ├── test_einsum.py
│   │   │   │   ├── test_global_tensor_offload.py
│   │   │   │   ├── test_graph_multi_graph_v2.py
│   │   │   │   ├── test_id_shuffle.py
│   │   │   │   ├── test_id_shuffle_global.py
│   │   │   │   ├── test_layernorm.py
│   │   │   │   ├── test_oneembedding.py
│   │   │   │   ├── test_oneembedding_padding_idx.py
│   │   │   │   ├── test_permute.py
│   │   │   │   ├── test_remat.py
│   │   │   │   ├── test_resnet50_with_bn.py
│   │   │   │   ├── test_resnet50_without_bn.py
│   │   │   │   ├── test_rnn.py
│   │   │   │   ├── test_rnn_cell.py
│   │   │   │   ├── test_rnn_pack_sequence.py
│   │   │   │   ├── test_rnn_utils.py
│   │   │   │   ├── test_sqrt_square_sum.py
│   │   │   │   ├── test_tensor_offload.py
│   │   │   │   ├── test_tensor_str.py
│   │   │   │   └── test_util.py
│   │   │   ├── gen_ops_process.py
│   │   │   ├── graph/
│   │   │   │   ├── alexnet_model.py
│   │   │   │   ├── ofrecord_data_utils.py
│   │   │   │   ├── optimizer_test_util.py
│   │   │   │   ├── test_alexnet_auto_parallel.py
│   │   │   │   ├── test_alexnet_graph.py
│   │   │   │   ├── test_comb1to2d.py
│   │   │   │   ├── test_comb2d.py
│   │   │   │   ├── test_forward_graph.py
│   │   │   │   ├── test_free_tensor_not_in_job.py
│   │   │   │   ├── test_fx_fuse.py
│   │   │   │   ├── test_fx_replace_ops.py
│   │   │   │   ├── test_fx_symbolic_trace_module.py
│   │   │   │   ├── test_gbc1to2d.py
│   │   │   │   ├── test_gbc2d.py
│   │   │   │   ├── test_gbc2to1d.py
│   │   │   │   ├── test_gbc2to2d.py
│   │   │   │   ├── test_graph.py
│   │   │   │   ├── test_graph_activation_checkpoint.py
│   │   │   │   ├── test_graph_arange.py
│   │   │   │   ├── test_graph_asymmetric_io.py
│   │   │   │   ├── test_graph_block.py
│   │   │   │   ├── test_graph_buffer_limit.py
│   │   │   │   ├── test_graph_clip_grad_norm.py
│   │   │   │   ├── test_graph_copy.py
│   │   │   │   ├── test_graph_debug.py
│   │   │   │   ├── test_graph_depend.py
│   │   │   │   ├── test_graph_eye.py
│   │   │   │   ├── test_graph_free_eager_tensor.py
│   │   │   │   ├── test_graph_grad_acc.py
│   │   │   │   ├── test_graph_image_gpu_decoder.py
│   │   │   │   ├── test_graph_inplace_add.py
│   │   │   │   ├── test_graph_io_check.py
│   │   │   │   ├── test_graph_linear.py
│   │   │   │   ├── test_graph_linear_train.py
│   │   │   │   ├── test_graph_loss.py
│   │   │   │   ├── test_graph_lr_scale.py
│   │   │   │   ├── test_graph_lr_scheduler.py
│   │   │   │   ├── test_graph_lr_with_warmup.py
│   │   │   │   ├── test_graph_lrs.py
│   │   │   │   ├── test_graph_masked_fill.py
│   │   │   │   ├── test_graph_nccl_logical_fusion.py
│   │   │   │   ├── test_graph_non_contiguous_tensors.py
│   │   │   │   ├── test_graph_normal_inplace.py
│   │   │   │   ├── test_graph_ofrecord_reader.py
│   │   │   │   ├── test_graph_optim_adadelta.py
│   │   │   │   ├── test_graph_optim_adagrad.py
│   │   │   │   ├── test_graph_optim_adam.py
│   │   │   │   ├── test_graph_optim_adamw.py
│   │   │   │   ├── test_graph_optim_ftrl.py
│   │   │   │   ├── test_graph_optim_lamb.py
│   │   │   │   ├── test_graph_optim_rmsprop.py
│   │   │   │   ├── test_graph_optim_sgd.py
│   │   │   │   ├── test_graph_optimizer.py
│   │   │   │   ├── test_graph_pipeline.py
│   │   │   │   ├── test_graph_pipeline_delay.py
│   │   │   │   ├── test_graph_random_seed.py
│   │   │   │   ├── test_graph_relu.py
│   │   │   │   ├── test_graph_reshape_acc.py
│   │   │   │   ├── test_graph_reuse_var.py
│   │   │   │   ├── test_graph_save_load.py
│   │   │   │   ├── test_graph_save_load_global_b_s.py
│   │   │   │   ├── test_graph_scalar.py
│   │   │   │   ├── test_graph_separate_compile.py
│   │   │   │   ├── test_graph_session_env_destruct.py
│   │   │   │   ├── test_graph_session_env_destruct1.py
│   │   │   │   ├── test_graph_sparse_optimizer.py
│   │   │   │   ├── test_graph_sparse_softmax_cross_entropy.py
│   │   │   │   ├── test_graph_tensor_clone.py
│   │   │   │   ├── test_graph_tensor_detach.py
│   │   │   │   ├── test_graph_with_global.py
│   │   │   │   ├── test_graph_zero.py
│   │   │   │   ├── test_input_op_expr.py
│   │   │   │   ├── test_long_add_n_pass.py
│   │   │   │   ├── test_modify_module_forward.py
│   │   │   │   ├── test_multi_client_session.py
│   │   │   │   ├── test_multi_graph.py
│   │   │   │   ├── test_multi_tensor_adam_update_with_cast.py
│   │   │   │   ├── test_multi_tensor_sgd_update_with_cast.py
│   │   │   │   ├── test_nccl_logical_send_recv.py
│   │   │   │   ├── test_neq_device_process_num.py
│   │   │   │   ├── test_oneflow_compiler.py
│   │   │   │   ├── test_optimization_conf.py
│   │   │   │   ├── test_output_op_expr.py
│   │   │   │   ├── test_run_global_graph_by_vm.py
│   │   │   │   ├── test_run_graph_by_vm.py
│   │   │   │   ├── test_to_global.py
│   │   │   │   ├── test_tvm_frontend_dependency_on_graph.py
│   │   │   │   ├── test_user_op_expr.py
│   │   │   │   ├── test_util.py
│   │   │   │   └── test_variable_op_expr.py
│   │   │   ├── misc/
│   │   │   │   ├── mock_example.py
│   │   │   │   ├── test_autograd_functional.py
│   │   │   │   ├── test_distributed_env_vars.py
│   │   │   │   ├── test_empty_cache.py
│   │   │   │   ├── test_env_cuda.py
│   │   │   │   ├── test_manual_seed_api.py
│   │   │   │   ├── test_mock_diffusers.py
│   │   │   │   ├── test_mock_scope.py
│   │   │   │   ├── test_np_dtype_converter.py
│   │   │   │   ├── test_placement.py
│   │   │   │   └── test_pybind11_caster.py
│   │   │   ├── modules/
│   │   │   │   ├── image_test_util.py
│   │   │   │   ├── optimizer_test_util.py
│   │   │   │   ├── save_load_test_data/
│   │   │   │   │   ├── 3x3_i3o3_conv2d/
│   │   │   │   │   │   ├── pickled_data
│   │   │   │   │   │   ├── tensor_3/
│   │   │   │   │   │   │   ├── meta
│   │   │   │   │   │   │   └── out
│   │   │   │   │   │   └── tensor_4/
│   │   │   │   │   │       ├── meta
│   │   │   │   │   │       └── out
│   │   │   │   │   └── 3x3_i3o3_conv2d_params/
│   │   │   │   │       ├── pickled_data
│   │   │   │   │       ├── tensor_5/
│   │   │   │   │       │   ├── meta
│   │   │   │   │       │   └── out
│   │   │   │   │       └── tensor_6/
│   │   │   │   │           ├── meta
│   │   │   │   │           └── out
│   │   │   │   ├── sync_batchnorm_test_util.py
│   │   │   │   ├── test_0_dim_tensor.py
│   │   │   │   ├── test_TripletMarginLoss.py
│   │   │   │   ├── test_abs.py
│   │   │   │   ├── test_activation.py
│   │   │   │   ├── test_adaptive_max_pool.py
│   │   │   │   ├── test_adaptive_pool.py
│   │   │   │   ├── test_adaptive_pool_fp16.py
│   │   │   │   ├── test_add.py
│   │   │   │   ├── test_addcdiv.py
│   │   │   │   ├── test_addcmul.py
│   │   │   │   ├── test_addmm.py
│   │   │   │   ├── test_affine_grid.py
│   │   │   │   ├── test_allclose.py
│   │   │   │   ├── test_allreduce.py
│   │   │   │   ├── test_amax.py
│   │   │   │   ├── test_amin.py
│   │   │   │   ├── test_arange.py
│   │   │   │   ├── test_argmax.py
│   │   │   │   ├── test_argmin.py
│   │   │   │   ├── test_argsort.py
│   │   │   │   ├── test_argwhere.py
│   │   │   │   ├── test_as_strided.py
│   │   │   │   ├── test_as_tensor.py
│   │   │   │   ├── test_asyncs_thread.py
│   │   │   │   ├── test_atleast.py
│   │   │   │   ├── test_auto_to_global.py
│   │   │   │   ├── test_autograd.py
│   │   │   │   ├── test_autograd_function.py
│   │   │   │   ├── test_autograd_mode.py
│   │   │   │   ├── test_avgpool.py
│   │   │   │   ├── test_baddbmm.py
│   │   │   │   ├── test_batch_gather.py
│   │   │   │   ├── test_batchnorm.py
│   │   │   │   ├── test_batchnorm_add_relu.py
│   │   │   │   ├── test_bernoulli.py
│   │   │   │   ├── test_binary_math_ops_dtype.py
│   │   │   │   ├── test_bincount.py
│   │   │   │   ├── test_bitwise.py
│   │   │   │   ├── test_bmm.py
│   │   │   │   ├── test_broadcast_like.py
│   │   │   │   ├── test_broadcast_ops.py
│   │   │   │   ├── test_cast.py
│   │   │   │   ├── test_ceil.py
│   │   │   │   ├── test_check_meta_consistency.py
│   │   │   │   ├── test_checkpointing.py
│   │   │   │   ├── test_chunk.py
│   │   │   │   ├── test_clamp.py
│   │   │   │   ├── test_clip_grad.py
│   │   │   │   ├── test_clone.py
│   │   │   │   ├── test_coco_reader.py
│   │   │   │   ├── test_coin_flip.py
│   │   │   │   ├── test_comb2to2d.py
│   │   │   │   ├── test_combined_margin_loss.py
│   │   │   │   ├── test_comm.py
│   │   │   │   ├── test_comm_ops.py
│   │   │   │   ├── test_concat.py
│   │   │   │   ├── test_constant.py
│   │   │   │   ├── test_constant_pad.py
│   │   │   │   ├── test_contiguous.py
│   │   │   │   ├── test_conv1d.py
│   │   │   │   ├── test_conv2d.py
│   │   │   │   ├── test_copy.py
│   │   │   │   ├── test_cosine_similarity.py
│   │   │   │   ├── test_ctc_greedy_decoder.py
│   │   │   │   ├── test_ctc_loss.py
│   │   │   │   ├── test_cublas_fused_mlp.py
│   │   │   │   ├── test_cum_ops.py
│   │   │   │   ├── test_dataset.py
│   │   │   │   ├── test_ddp.py
│   │   │   │   ├── test_ddp_multi_outputs.py
│   │   │   │   ├── test_deconv2d.py
│   │   │   │   ├── test_default_dtype.py
│   │   │   │   ├── test_deform_conv2d.py
│   │   │   │   ├── test_det.py
│   │   │   │   ├── test_diag.py
│   │   │   │   ├── test_diagonal.py
│   │   │   │   ├── test_div.py
│   │   │   │   ├── test_dlpack.py
│   │   │   │   ├── test_dot.py
│   │   │   │   ├── test_dropout.py
│   │   │   │   ├── test_dynamic_allocation_gradient_shuffle_shuffle_global.py
│   │   │   │   ├── test_eager_boxing.py
│   │   │   │   ├── test_eager_boxing_exhaustive.py
│   │   │   │   ├── test_empty.py
│   │   │   │   ├── test_eq.py
│   │   │   │   ├── test_equal.py
│   │   │   │   ├── test_erf.py
│   │   │   │   ├── test_erfc.py
│   │   │   │   ├── test_erfinv.py
│   │   │   │   ├── test_expand.py
│   │   │   │   ├── test_expand_stride.py
│   │   │   │   ├── test_expm1.py
│   │   │   │   ├── test_eye.py
│   │   │   │   ├── test_fake_quantization.py
│   │   │   │   ├── test_fft.py
│   │   │   │   ├── test_flatten.py
│   │   │   │   ├── test_flip.py
│   │   │   │   ├── test_floor.py
│   │   │   │   ├── test_fmod.py
│   │   │   │   ├── test_fold.py
│   │   │   │   ├── test_fork_sub_process.py
│   │   │   │   ├── test_frac.py
│   │   │   │   ├── test_from_numpy.py
│   │   │   │   ├── test_from_torch.py
│   │   │   │   ├── test_functional_docstr.py
│   │   │   │   ├── test_functional_scalar_tensor_param.py
│   │   │   │   ├── test_fused_attention_ops.py
│   │   │   │   ├── test_fused_bias_add_dropout.py
│   │   │   │   ├── test_fused_bias_add_gelu.py
│   │   │   │   ├── test_fused_bias_add_scale_mask_softmax_dropout.py
│   │   │   │   ├── test_fused_center.py
│   │   │   │   ├── test_fused_codegeex_qkv_reshape.py
│   │   │   │   ├── test_fused_cross_interaction.py
│   │   │   │   ├── test_fused_dot_feature_interaction.py
│   │   │   │   ├── test_fused_gelu_mul.py
│   │   │   │   ├── test_fused_get_boundding_boxes_coord.py
│   │   │   │   ├── test_fused_get_ciou_diagonal_angle.py
│   │   │   │   ├── test_fused_get_ciou_result.py
│   │   │   │   ├── test_fused_get_convex_diagonal_squared.py
│   │   │   │   ├── test_fused_get_intersection_area.py
│   │   │   │   ├── test_fused_get_iou.py
│   │   │   │   ├── test_fused_glu.py
│   │   │   │   ├── test_fused_matmul_bias.py
│   │   │   │   ├── test_fused_matmul_bias_add_relu_dropout.py
│   │   │   │   ├── test_fused_rotary_embedding.py
│   │   │   │   ├── test_fused_scale_mask_bias_softmax.py
│   │   │   │   ├── test_fused_scale_mask_softmax.py
│   │   │   │   ├── test_fused_scale_mask_softmax_dropout.py
│   │   │   │   ├── test_fused_scale_tril.py
│   │   │   │   ├── test_fused_self_attention.py
│   │   │   │   ├── test_fused_tril_softmax_mask_scale.py
│   │   │   │   ├── test_fused_weighted_sum.py
│   │   │   │   ├── test_gather.py
│   │   │   │   ├── test_gather_nd.py
│   │   │   │   ├── test_gelu_approximate.py
│   │   │   │   ├── test_generator.py
│   │   │   │   ├── test_global_0_dim_tensor.py
│   │   │   │   ├── test_global_TripletMarginLoss.py
│   │   │   │   ├── test_global_abs.py
│   │   │   │   ├── test_global_activation.py
│   │   │   │   ├── test_global_adaptive_pool.py
│   │   │   │   ├── test_global_add.py
│   │   │   │   ├── test_global_addcdiv.py
│   │   │   │   ├── test_global_addcmul.py
│   │   │   │   ├── test_global_addmm.py
│   │   │   │   ├── test_global_affine_grid.py
│   │   │   │   ├── test_global_argmax.py
│   │   │   │   ├── test_global_argmin.py
│   │   │   │   ├── test_global_argsort.py
│   │   │   │   ├── test_global_argwhere.py
│   │   │   │   ├── test_global_atleast.py
│   │   │   │   ├── test_global_avgpool.py
│   │   │   │   ├── test_global_batch_gather.py
│   │   │   │   ├── test_global_bincount.py
│   │   │   │   ├── test_global_bitwise.py
│   │   │   │   ├── test_global_broadcase_like.py
│   │   │   │   ├── test_global_broadcast_matmul.py
│   │   │   │   ├── test_global_broadcast_ops.py
│   │   │   │   ├── test_global_cast.py
│   │   │   │   ├── test_global_chunk.py
│   │   │   │   ├── test_global_clone.py
│   │   │   │   ├── test_global_coin_flip.py
│   │   │   │   ├── test_global_concat.py
│   │   │   │   ├── test_global_constant.py
│   │   │   │   ├── test_global_ctc_loss.py
│   │   │   │   ├── test_global_cumprod.py
│   │   │   │   ├── test_global_cumsum.py
│   │   │   │   ├── test_global_deconv2d.py
│   │   │   │   ├── test_global_deform_conv2d.py
│   │   │   │   ├── test_global_det.py
│   │   │   │   ├── test_global_diag.py
│   │   │   │   ├── test_global_diagonal.py
│   │   │   │   ├── test_global_div.py
│   │   │   │   ├── test_global_dot.py
│   │   │   │   ├── test_global_dropout.py
│   │   │   │   ├── test_global_einsum_alphaflod_usecase1.py
│   │   │   │   ├── test_global_einsum_alphaflod_usecase10.py
│   │   │   │   ├── test_global_einsum_alphaflod_usecase11.py
│   │   │   │   ├── test_global_einsum_alphaflod_usecase2.py
│   │   │   │   ├── test_global_einsum_alphaflod_usecase3.py
│   │   │   │   ├── test_global_einsum_alphaflod_usecase4.py
│   │   │   │   ├── test_global_einsum_alphaflod_usecase5.py
│   │   │   │   ├── test_global_einsum_alphaflod_usecase6.py
│   │   │   │   ├── test_global_einsum_alphaflod_usecase7.py
│   │   │   │   ├── test_global_einsum_alphaflod_usecase8.py
│   │   │   │   ├── test_global_einsum_alphaflod_usecase9.py
│   │   │   │   ├── test_global_einsum_attention.py
│   │   │   │   ├── test_global_einsum_batch_matmul.py
│   │   │   │   ├── test_global_einsum_batch_matmul2.py
│   │   │   │   ├── test_global_einsum_batch_matmul3.py
│   │   │   │   ├── test_global_einsum_batch_matmul4.py
│   │   │   │   ├── test_global_einsum_batch_matrix_vector_multiply.py
│   │   │   │   ├── test_global_einsum_batch_permute.py
│   │   │   │   ├── test_global_einsum_bilinear_transformation.py
│   │   │   │   ├── test_global_einsum_eltwise_mul_sum_row.py
│   │   │   │   ├── test_global_einsum_eltwise_mul_then_reduce_sum.py
│   │   │   │   ├── test_global_einsum_eltwise_multiply.py
│   │   │   │   ├── test_global_einsum_get_diagonal.py
│   │   │   │   ├── test_global_einsum_matmul.py
│   │   │   │   ├── test_global_einsum_matmul2.py
│   │   │   │   ├── test_global_einsum_matrix_column_sum.py
│   │   │   │   ├── test_global_einsum_matrix_transpose.py
│   │   │   │   ├── test_global_einsum_matrix_vector_multiply.py
│   │   │   │   ├── test_global_einsum_reduce_sum.py
│   │   │   │   ├── test_global_einsum_tensor_contraction.py
│   │   │   │   ├── test_global_einsum_tensor_contraction2.py
│   │   │   │   ├── test_global_einsum_vector_inner_product.py
│   │   │   │   ├── test_global_einsum_vector_outer_product.py
│   │   │   │   ├── test_global_empty.py
│   │   │   │   ├── test_global_eq.py
│   │   │   │   ├── test_global_erf.py
│   │   │   │   ├── test_global_erfc.py
│   │   │   │   ├── test_global_expand_op.py
│   │   │   │   ├── test_global_expm1.py
│   │   │   │   ├── test_global_eye.py
│   │   │   │   ├── test_global_fill.py
│   │   │   │   ├── test_global_flatten.py
│   │   │   │   ├── test_global_flip.py
│   │   │   │   ├── test_global_floor.py
│   │   │   │   ├── test_global_fmod.py
│   │   │   │   ├── test_global_fold.py
│   │   │   │   ├── test_global_frac.py
│   │   │   │   ├── test_global_full.py
│   │   │   │   ├── test_global_full_like.py
│   │   │   │   ├── test_global_greater.py
│   │   │   │   ├── test_global_greater_equal.py
│   │   │   │   ├── test_global_grid_sample.py
│   │   │   │   ├── test_global_groupnorm.py
│   │   │   │   ├── test_global_gru_cell.py
│   │   │   │   ├── test_global_hann_window.py
│   │   │   │   ├── test_global_higher_derivative_activation.py
│   │   │   │   ├── test_global_higher_derivative_conv.py
│   │   │   │   ├── test_global_higher_derivative_div.py
│   │   │   │   ├── test_global_higher_derivative_loss.py
│   │   │   │   ├── test_global_higher_derivative_matmul.py
│   │   │   │   ├── test_global_higher_derivative_neg.py
│   │   │   │   ├── test_global_higher_derivative_pool.py
│   │   │   │   ├── test_global_higher_derivative_pow.py
│   │   │   │   ├── test_global_higher_derivative_scalar_pow.py
│   │   │   │   ├── test_global_higher_derivative_slice.py
│   │   │   │   ├── test_global_higher_derivative_softmax.py
│   │   │   │   ├── test_global_inv.py
│   │   │   │   ├── test_global_lerp.py
│   │   │   │   ├── test_global_linalg_cross.py
│   │   │   │   ├── test_global_linear.py
│   │   │   │   ├── test_global_linspace.py
│   │   │   │   ├── test_global_logspace.py
│   │   │   │   ├── test_global_lstm_cell.py
│   │   │   │   ├── test_global_masked_fill.py
│   │   │   │   ├── test_global_masked_select.py
│   │   │   │   ├── test_global_math_op_higher_derivative.py
│   │   │   │   ├── test_global_math_ops.py
│   │   │   │   ├── test_global_matmul.py
│   │   │   │   ├── test_global_max.py
│   │   │   │   ├── test_global_maximum_minimum.py
│   │   │   │   ├── test_global_maxpool.py
│   │   │   │   ├── test_global_maxunpool.py
│   │   │   │   ├── test_global_mean.py
│   │   │   │   ├── test_global_median.py
│   │   │   │   ├── test_global_meshgrid.py
│   │   │   │   ├── test_global_min.py
│   │   │   │   ├── test_global_min_max_observer.py
│   │   │   │   ├── test_global_movedim.py
│   │   │   │   ├── test_global_moving_average_max_min_observer.py
│   │   │   │   ├── test_global_mul.py
│   │   │   │   ├── test_global_mv.py
│   │   │   │   ├── test_global_nansum.py
│   │   │   │   ├── test_global_narrow.py
│   │   │   │   ├── test_global_ne.py
│   │   │   │   ├── test_global_negative.py
│   │   │   │   ├── test_global_nms.py
│   │   │   │   ├── test_global_normal.py
│   │   │   │   ├── test_global_normalize.py
│   │   │   │   ├── test_global_nozero.py
│   │   │   │   ├── test_global_ones_like.py
│   │   │   │   ├── test_global_pad.py
│   │   │   │   ├── test_global_partical_fc.py
│   │   │   │   ├── test_global_permute.py
│   │   │   │   ├── test_global_rand.py
│   │   │   │   ├── test_global_randint.py
│   │   │   │   ├── test_global_randint_like.py
│   │   │   │   ├── test_global_randn.py
│   │   │   │   ├── test_global_random_op_data.py
│   │   │   │   ├── test_global_randperm.py
│   │   │   │   ├── test_global_reciprocal.py
│   │   │   │   ├── test_global_reflection_pad2d.py
│   │   │   │   ├── test_global_repeat.py
│   │   │   │   ├── test_global_replication_pad2d.py
│   │   │   │   ├── test_global_reshape.py
│   │   │   │   ├── test_global_rnn.py
│   │   │   │   ├── test_global_rnn_cell.py
│   │   │   │   ├── test_global_roi_align.py
│   │   │   │   ├── test_global_roll.py
│   │   │   │   ├── test_global_round.py
│   │   │   │   ├── test_global_scatter_nd.py
│   │   │   │   ├── test_global_scatter_ops.py
│   │   │   │   ├── test_global_searchsorted.py
│   │   │   │   ├── test_global_sign.py
│   │   │   │   ├── test_global_slice.py
│   │   │   │   ├── test_global_slice_update.py
│   │   │   │   ├── test_global_sort.py
│   │   │   │   ├── test_global_sparse.py
│   │   │   │   ├── test_global_sparse_softmax_cross_entropy.py
│   │   │   │   ├── test_global_split.py
│   │   │   │   ├── test_global_sqrt_square_sum.py
│   │   │   │   ├── test_global_squeeze.py
│   │   │   │   ├── test_global_stack.py
│   │   │   │   ├── test_global_stateful_kernel_with_cache.py
│   │   │   │   ├── test_global_std.py
│   │   │   │   ├── test_global_sub.py
│   │   │   │   ├── test_global_sum.py
│   │   │   │   ├── test_global_tensor_new.py
│   │   │   │   ├── test_global_tensor_ops.py
│   │   │   │   ├── test_global_tensor_scatter_nd_update.py
│   │   │   │   ├── test_global_tensordot.py
│   │   │   │   ├── test_global_tile.py
│   │   │   │   ├── test_global_transpose.py
│   │   │   │   ├── test_global_tril.py
│   │   │   │   ├── test_global_triu.py
│   │   │   │   ├── test_global_unbind.py
│   │   │   │   ├── test_global_unfold.py
│   │   │   │   ├── test_global_unfold_tensor.py
│   │   │   │   ├── test_global_unique.py
│   │   │   │   ├── test_global_unsqueeze.py
│   │   │   │   ├── test_global_upsample.py
│   │   │   │   ├── test_global_var.py
│   │   │   │   ├── test_global_vector_matrix_product.py
│   │   │   │   ├── test_global_view.py
│   │   │   │   ├── test_global_weight_norm.py
│   │   │   │   ├── test_global_where.py
│   │   │   │   ├── test_global_zeropad2d.py
│   │   │   │   ├── test_global_zeros_like.py
│   │   │   │   ├── test_glu.py
│   │   │   │   ├── test_gpt_data_loader.py
│   │   │   │   ├── test_greater.py
│   │   │   │   ├── test_greater_equal.py
│   │   │   │   ├── test_grid_sample.py
│   │   │   │   ├── test_grouped_matmul_bias.py
│   │   │   │   ├── test_groupnorm.py
│   │   │   │   ├── test_groupwise_quantization.py
│   │   │   │   ├── test_gumbel_softmax.py
│   │   │   │   ├── test_hann_window.py
│   │   │   │   ├── test_higher_derivative_activation.py
│   │   │   │   ├── test_higher_derivative_conv.py
│   │   │   │   ├── test_higher_derivative_div.py
│   │   │   │   ├── test_higher_derivative_loss.py
│   │   │   │   ├── test_higher_derivative_matmul.py
│   │   │   │   ├── test_higher_derivative_neg.py
│   │   │   │   ├── test_higher_derivative_pool.py
│   │   │   │   ├── test_higher_derivative_pow.py
│   │   │   │   ├── test_higher_derivative_scalar_pow.py
│   │   │   │   ├── test_higher_derivative_slice.py
│   │   │   │   ├── test_higher_derivative_softmax.py
│   │   │   │   ├── test_host_memory_input.py
│   │   │   │   ├── test_hsplit.py
│   │   │   │   ├── test_hub.py
│   │   │   │   ├── test_image_batch_align.py
│   │   │   │   ├── test_image_decode.py
│   │   │   │   ├── test_image_flip.py
│   │   │   │   ├── test_image_normalize.py
│   │   │   │   ├── test_image_resize.py
│   │   │   │   ├── test_in_top_k.py
│   │   │   │   ├── test_index_add.py
│   │   │   │   ├── test_index_select.py
│   │   │   │   ├── test_info.py
│   │   │   │   ├── test_initializer.py
│   │   │   │   ├── test_instancenorm.py
│   │   │   │   ├── test_interpolate.py
│   │   │   │   ├── test_inv.py
│   │   │   │   ├── test_isclose.py
│   │   │   │   ├── test_jit_script_api.py
│   │   │   │   ├── test_layer_norm.py
│   │   │   │   ├── test_lerp.py
│   │   │   │   ├── test_less.py
│   │   │   │   ├── test_less_equal.py
│   │   │   │   ├── test_linalg_cross.py
│   │   │   │   ├── test_linear.py
│   │   │   │   ├── test_linspace.py
│   │   │   │   ├── test_log1p.py
│   │   │   │   ├── test_logaddexp.py
│   │   │   │   ├── test_logical_and.py
│   │   │   │   ├── test_logical_not.py
│   │   │   │   ├── test_logical_or.py
│   │   │   │   ├── test_logical_reduce.py
│   │   │   │   ├── test_logical_xor.py
│   │   │   │   ├── test_logspace.py
│   │   │   │   ├── test_logsumexp.py
│   │   │   │   ├── test_loss.py
│   │   │   │   ├── test_loss_global.py
│   │   │   │   ├── test_lr_scheduler.py
│   │   │   │   ├── test_masked_fill.py
│   │   │   │   ├── test_masked_select.py
│   │   │   │   ├── test_math_op_higher_derivative.py
│   │   │   │   ├── test_math_ops.py
│   │   │   │   ├── test_matmul.py
│   │   │   │   ├── test_max.py
│   │   │   │   ├── test_maxpool.py
│   │   │   │   ├── test_maxunpool.py
│   │   │   │   ├── test_mean.py
│   │   │   │   ├── test_median.py
│   │   │   │   ├── test_meshgrid.py
│   │   │   │   ├── test_min.py
│   │   │   │   ├── test_min_max_observer.py
│   │   │   │   ├── test_mock.py
│   │   │   │   ├── test_mode.py
│   │   │   │   ├── test_module.py
│   │   │   │   ├── test_module_to.py
│   │   │   │   ├── test_module_to_global_or_local.py
│   │   │   │   ├── test_module_to_half.py
│   │   │   │   ├── test_movedim.py
│   │   │   │   ├── test_moving_average_min_max_observer.py
│   │   │   │   ├── test_mul.py
│   │   │   │   ├── test_multi_tensor_yolov5_weight_update.py
│   │   │   │   ├── test_multinomial.py
│   │   │   │   ├── test_nansum.py
│   │   │   │   ├── test_narrow.py
│   │   │   │   ├── test_ne.py
│   │   │   │   ├── test_negative.py
│   │   │   │   ├── test_nll_loss.py
│   │   │   │   ├── test_nms.py
│   │   │   │   ├── test_noncontiguous_binary_op.py
│   │   │   │   ├── test_nonzero.py
│   │   │   │   ├── test_norm.py
│   │   │   │   ├── test_normalize.py
│   │   │   │   ├── test_ofrecord_reader.py
│   │   │   │   ├── test_one_embedding_adagrad.py
│   │   │   │   ├── test_one_embedding_adam.py
│   │   │   │   ├── test_one_embedding_ftrl.py
│   │   │   │   ├── test_one_embedding_sgd.py
│   │   │   │   ├── test_one_hot.py
│   │   │   │   ├── test_ones_like.py
│   │   │   │   ├── test_optim_adadelta.py
│   │   │   │   ├── test_optim_adagrad.py
│   │   │   │   ├── test_optim_adam.py
│   │   │   │   ├── test_optim_adamw.py
│   │   │   │   ├── test_optim_add_param_group.py
│   │   │   │   ├── test_optim_ftrl.py
│   │   │   │   ├── test_optim_lamb.py
│   │   │   │   ├── test_optim_lbfgs.py
│   │   │   │   ├── test_optim_rmsprop.py
│   │   │   │   ├── test_optim_sgd.py
│   │   │   │   ├── test_pairwise_distance.py
│   │   │   │   ├── test_param_group.py
│   │   │   │   ├── test_parameters_grouping.py
│   │   │   │   ├── test_parital_fc.py
│   │   │   │   ├── test_pixel_shuffle.py
│   │   │   │   ├── test_prelu.py
│   │   │   │   ├── test_prod.py
│   │   │   │   ├── test_pruning.py
│   │   │   │   ├── test_qat_conv_modules.py
│   │   │   │   ├── test_quantile.py
│   │   │   │   ├── test_quantization.py
│   │   │   │   ├── test_quick_gelu.py
│   │   │   │   ├── test_rand.py
│   │   │   │   ├── test_randint.py
│   │   │   │   ├── test_randint_like.py
│   │   │   │   ├── test_randn.py
│   │   │   │   ├── test_randn_like.py
│   │   │   │   ├── test_random_generator_and_seed.py
│   │   │   │   ├── test_randperm.py
│   │   │   │   ├── test_reciprocal.py
│   │   │   │   ├── test_reduce.py
│   │   │   │   ├── test_reduce_sum_like.py
│   │   │   │   ├── test_reflection_pad.py
│   │   │   │   ├── test_repeat.py
│   │   │   │   ├── test_repeat_interleave.py
│   │   │   │   ├── test_replication_pad.py
│   │   │   │   ├── test_reshape.py
│   │   │   │   ├── test_reshape_sbp.py
│   │   │   │   ├── test_resnet_load_torch_weight_compatibile.py
│   │   │   │   ├── test_rmsnorm.py
│   │   │   │   ├── test_roc_auc_score.py
│   │   │   │   ├── test_roi_align.py
│   │   │   │   ├── test_roll.py
│   │   │   │   ├── test_round.py
│   │   │   │   ├── test_rrelu.py
│   │   │   │   ├── test_save_load.py
│   │   │   │   ├── test_saved_tensor_hooks.py
│   │   │   │   ├── test_sbp_symbol.py
│   │   │   │   ├── test_scatter_nd.py
│   │   │   │   ├── test_scatter_ops.py
│   │   │   │   ├── test_searchsorted.py
│   │   │   │   ├── test_select.py
│   │   │   │   ├── test_shutting_down.py
│   │   │   │   ├── test_sign.py
│   │   │   │   ├── test_single_threaded_vm.py
│   │   │   │   ├── test_skip_layer_norm.py
│   │   │   │   ├── test_skip_rms_norm.py
│   │   │   │   ├── test_slice.py
│   │   │   │   ├── test_softmax.py
│   │   │   │   ├── test_softplus.py
│   │   │   │   ├── test_sort.py
│   │   │   │   ├── test_sparse.py
│   │   │   │   ├── test_sparse_softmax_cross_entropy.py
│   │   │   │   ├── test_special_ops.py
│   │   │   │   ├── test_split.py
│   │   │   │   ├── test_square_relu.py
│   │   │   │   ├── test_squeeze.py
│   │   │   │   ├── test_stack.py
│   │   │   │   ├── test_stateful_kernel_with_cache.py
│   │   │   │   ├── test_stateful_local_opkernel.py
│   │   │   │   ├── test_std.py
│   │   │   │   ├── test_stft.py
│   │   │   │   ├── test_sub.py
│   │   │   │   ├── test_sum.py
│   │   │   │   ├── test_swapaxes.py
│   │   │   │   ├── test_swapdims.py
│   │   │   │   ├── test_swautils.py
│   │   │   │   ├── test_sync_and_async_allreduce.py
│   │   │   │   ├── test_sync_batchnorm.py
│   │   │   │   ├── test_t.py
│   │   │   │   ├── test_t5_layernorm.py
│   │   │   │   ├── test_tensor_buffer.py
│   │   │   │   ├── test_tensor_ops.py
│   │   │   │   ├── test_tensor_scatter_nd_update.py
│   │   │   │   ├── test_tensor_split.py
│   │   │   │   ├── test_tensor_to.py
│   │   │   │   ├── test_tensordot.py
│   │   │   │   ├── test_tile.py
│   │   │   │   ├── test_to_torch.py
│   │   │   │   ├── test_topk.py
│   │   │   │   ├── test_transpose.py
│   │   │   │   ├── test_tril.py
│   │   │   │   ├── test_triu.py
│   │   │   │   ├── test_trunc.py
│   │   │   │   ├── test_trunc_divide.py
│   │   │   │   ├── test_type_tensor.py
│   │   │   │   ├── test_unbind.py
│   │   │   │   ├── test_unfold.py
│   │   │   │   ├── test_unfold_tensor.py
│   │   │   │   ├── test_unique.py
│   │   │   │   ├── test_unsqueeze.py
│   │   │   │   ├── test_upsample.py
│   │   │   │   ├── test_util_ops.py
│   │   │   │   ├── test_utils.py
│   │   │   │   ├── test_var.py
│   │   │   │   ├── test_view.py
│   │   │   │   ├── test_vsplit.py
│   │   │   │   ├── test_weight_norm.py
│   │   │   │   ├── test_where.py
│   │   │   │   └── test_zeropad2d.py
│   │   │   ├── profiler/
│   │   │   │   ├── test_events.py
│   │   │   │   └── test_profile_lenet.py
│   │   │   └── tensor/
│   │   │       ├── test_autocast.py
│   │   │       ├── test_bfloat16_activation.py
│   │   │       ├── test_complex.py
│   │   │       ├── test_data_ptr.py
│   │   │       ├── test_global_tensor.py
│   │   │       ├── test_global_tensor_and_ndarray_compatibility.py
│   │   │       ├── test_global_tensor_indexing.py
│   │   │       ├── test_lazy_tensor_indexing.py
│   │   │       ├── test_meta_tensor.py
│   │   │       ├── test_new_tensor.py
│   │   │       ├── test_parameter.py
│   │   │       ├── test_safetensors.py
│   │   │       ├── test_tensor_and_ndarray_compatibility.py
│   │   │       ├── test_tensor_exponential.py
│   │   │       ├── test_tensor_indexing.py
│   │   │       ├── test_tensor_indexing2.py
│   │   │       ├── test_tensor_is_view.py
│   │   │       ├── test_tensor_part_1.py
│   │   │       ├── test_tensor_part_2.py
│   │   │       ├── test_tensor_part_3.py
│   │   │       ├── test_tensor_pin_memory.py
│   │   │       └── test_tensor_to_memory_format.py
│   │   ├── test_utils/
│   │   │   ├── __init__.py
│   │   │   ├── automated_test_util/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── generators.py
│   │   │   │   ├── global_scope.py
│   │   │   │   ├── profiler.py
│   │   │   │   ├── torch_flow_dual_object.py
│   │   │   │   └── util.py
│   │   │   ├── oneflow_pytorch_compatibility/
│   │   │   │   ├── __init__.py
│   │   │   │   └── oneflow_pytorch_compatiblity_test.py
│   │   │   ├── test_util.py
│   │   │   └── throttle.py
│   │   ├── unittest/
│   │   │   ├── __init__.py
│   │   │   ├── dataset.py
│   │   │   ├── env.py
│   │   │   └── mlir.py
│   │   └── utils/
│   │       ├── __init__.py
│   │       ├── checkpoint.py
│   │       ├── data/
│   │       │   ├── __init__.py
│   │       │   ├── _utils/
│   │       │   │   ├── __init__.py
│   │       │   │   ├── collate.py
│   │       │   │   ├── fetch.py
│   │       │   │   ├── pin_memory.py
│   │       │   │   ├── signal_handling.py
│   │       │   │   └── worker.py
│   │       │   ├── dataloader.py
│   │       │   ├── dataset.py
│   │       │   ├── decorator.py
│   │       │   ├── distributed.py
│   │       │   └── sampler.py
│   │       ├── global_view/
│   │       │   ├── __init__.py
│   │       │   ├── global_mode.py
│   │       │   ├── global_utils.py
│   │       │   ├── to_global.py
│   │       │   └── to_local.py
│   │       ├── hooks.py
│   │       ├── insight/
│   │       │   ├── README.md
│   │       │   ├── requirements.txt
│   │       │   └── sqlite_to_google_trace_event.py
│   │       ├── model_zoo.py
│   │       └── tensor/
│   │           ├── __init__.py
│   │           └── from_or_to_torch_tensor.py
│   └── setup.py
└── tools/
    ├── check_src.py
    ├── clean_generated_api.py
    ├── create_pip_index.py
    ├── flags_from_git_diff.py
    ├── functional/
    │   ├── generate_dispatch_stateful_ops.py
    │   ├── generate_functional_api.py
    │   ├── generate_tensor_api.py
    │   └── generator.py
    ├── generate_header_list.py
    ├── generate_pip_version.py
    ├── oneflow-tblgen/
    │   ├── CMakeLists.txt
    │   ├── backends.h
    │   ├── example/
    │   │   └── constant.td
    │   ├── op_schema_emitter.cpp
    │   ├── op_schema_header.inc
    │   ├── op_schema_source.inc
    │   ├── op_schema_types.inc
    │   └── tablegen.cpp
    ├── oss_file_exist.py
    └── package_mirror.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .clang-format
================================================
---
Language:        Cpp
AccessModifierOffset: -1
AlignAfterOpenBracket: Align
AlignConsecutiveAssignments: false
AlignConsecutiveDeclarations: false
AlignEscapedNewlinesLeft: true
AlignOperands:   true
AlignTrailingComments: true
AllowAllParametersOfDeclarationOnNextLine: true
AllowShortBlocksOnASingleLine: true
AllowShortCaseLabelsOnASingleLine: true
AllowShortFunctionsOnASingleLine: All
AllowShortIfStatementsOnASingleLine: true
AllowShortLoopsOnASingleLine: true
AlwaysBreakAfterDefinitionReturnType: None
AlwaysBreakAfterReturnType: None
AlwaysBreakBeforeMultilineStrings: false
AlwaysBreakTemplateDeclarations: true
BinPackArguments: true
BinPackParameters: true
BraceWrapping:
  AfterClass:      true
  AfterControlStatement: false
  AfterEnum:       false
  AfterFunction:   false
  AfterNamespace:  false
  AfterObjCDeclaration: false
  AfterStruct:     false
  AfterUnion:      false
  BeforeCatch:     false
  BeforeElse:      false
  IndentBraces:    false
BreakBeforeBinaryOperators: NonAssignment
BreakBeforeBraces: Attach
BreakBeforeTernaryOperators: true
BreakConstructorInitializersBeforeComma: false
BreakAfterJavaFieldAnnotations: false
BreakStringLiterals: true
ColumnLimit:     100
CommentPragmas:  '^ IWYU pragma:'
BreakBeforeInheritanceComma: false
ConstructorInitializerAllOnOneLineOrOnePerLine: true
ConstructorInitializerIndentWidth: 4
ContinuationIndentWidth: 4
Cpp11BracedListStyle: true
DisableFormat:   false
ExperimentalAutoDetectBinPacking: false
FixNamespaceComments: true
ForEachMacros:   [ foreach, Q_FOREACH, BOOST_FOREACH ]
IncludeCategories:
  - Regex:           '^<.*\.h>'
    Priority:        1
  - Regex:           '^<.*'
    Priority:        2
  - Regex:           '.*'
    Priority:        3
IncludeIsMainRegex: '([-_](test|unittest))?$'
IndentCaseLabels: true
IndentWidth:     2
IndentWrappedFunctionNames: false
JavaScriptQuotes: Leave
JavaScriptWrapImports: true
KeepEmptyLinesAtTheStartOfBlocks: false
MacroBlockBegin: ''
MacroBlockEnd:   ''
MaxEmptyLinesToKeep: 1
NamespaceIndentation: None
ObjCBlockIndentWidth: 2
ObjCSpaceAfterProperty: false
ObjCSpaceBeforeProtocolList: false
PenaltyBreakBeforeFirstCallParameter: 1
PenaltyBreakComment: 300
PenaltyBreakFirstLessLess: 120
PenaltyBreakString: 1000
PenaltyExcessCharacter: 1000000
PenaltyReturnTypeOnItsOwnLine: 200
PointerAlignment: Left
ReflowComments:  true
SortIncludes:    false
SpaceAfterCStyleCast: false
SpaceAfterTemplateKeyword: false
SpaceBeforeAssignmentOperators: true
SpaceBeforeParens: ControlStatements
SpaceInEmptyParentheses: false
SpacesBeforeTrailingComments: 2
SpacesInAngles:  false
SpacesInContainerLiterals: true
SpacesInCStyleCastParentheses: false
SpacesInParentheses: false
SpacesInSquareBrackets: false
Standard:        Auto
TabWidth:        8
UseTab:          Never
...


================================================
FILE: .clang-tidy
================================================
# `maybe-*` checks are only available on OneFlow custom clang-tidy and clangd
# `-allow-enabling-analyzer-alpha-checkers` should be passed to clang-tidy for CSA checkers named `clang-analyzer-alpha.*` (or `-allow-enabling-alpha-checkers` for run-clang-tidy.py)
# `aggressive-binary-operation-simplification` should be enabled (via `-Xclang -analyzer-config -Xclang aggressive-binary-operation-simplification=true` in clang)
# there is some problem in `clang-analyzer-alpha.clone.*`, so do not enable it
# `clang-analyzer-alpha.deadcode.*` is just too verbose to enable
Checks: >-
  -*,
  clang-diagnostic-*,
  maybe-*,
  clang-analyzer-core.*,
  clang-analyzer-cplusplus.*,
  clang-analyzer-nullability.*,
  clang-analyzer-deadcode.*,
  clang-analyzer-security.*,
  clang-analyzer-optin.cplusplus.*,
  clang-analyzer-optin.performance.*,
  clang-analyzer-alpha.core.*,
  clang-analyzer-alpha.cplusplus.*,
  clang-analyzer-alpha.security.*,
  cppcoreguidelines-avoid-goto,
  cppcoreguidelines-init-variables,
  cppcoreguidelines-interfaces-global-init,
  cppcoreguidelines-no-malloc,
  cppcoreguidelines-prefer-member-initializer,
  cppcoreguidelines-pro-type-member-init,
  cppcoreguidelines-pro-type-static-cast-downcast,
  cppcoreguidelines-slicing,
  cppcoreguidelines-special-member-functions,
  performance-unnecessary-value-param,
  performance-unnecessary-copy-initialization,
  performance-noexcept-move-constructor,
  performance-no-automatic-move,
  performance-move-const-arg,
  performance-implicit-conversion-in-loop,
  performance-for-range-copy,
  google-default-arguments,
  google-global-names-in-headers,
  google-explicit-constructor,
  modernize-use-emplace

# TODO: treat all maybe warnings as errors when existing warnings are all fixed
# `clang-analyzer-cplusplus.NewDelete` cannot model reference counting properly for ObjectMsg
WarningsAsErrors: >-
  maybe-unused,
  clang-analyzer-nullability.*,
  clang-analyzer-cplusplus.*,
  performance-implicit-conversion-in-loop,
  performance-move-const-arg,
  performance-no-automatic-move,
  performance-noexcept-move-constructor,
  google-default-arguments,
  google-global-names-in-headers,
  -clang-analyzer-cplusplus.NewDelete,
  modernize-use-emplace

CheckOptions:
  # `cppcoreguidelines-special-member-functions` is enabled, refer to https://en.cppreference.com/w/cpp/language/rule_of_three
  - key:             cppcoreguidelines-special-member-functions.AllowSoleDefaultDtor
    value:           True
  - key:             performance-move-const-arg.CheckTriviallyCopyableMove
    value:           False
  - key:             cppcoreguidelines-special-member-functions.AllowMissingMoveFunctionsWhenCopyIsDeleted
    value:           True


================================================
FILE: .cmake-format.py
================================================
# ----------------------------------
# Options affecting listfile parsing
# ----------------------------------
with section("parse"):

    # Specify structure for custom cmake functions
    additional_commands = {
        "cc_binary": {
            "flags": ["ADD_RUNTARGET"],
            "kwargs": {
                "DEPS": "*",
                "INC": {
                    "kwargs": {"INTERFACE": "*", "PRIVATE": "*", "PUBLIC": "*"},
                    "pargs": 0,
                },
                "LIBDIRS": {
                    "kwargs": {"INTERFACE": "*", "PRIVATE": "*", "PUBLIC": "*"},
                    "pargs": "*",
                },
                "PKGDEPS": "*",
                "PROPERTIES": {"kwargs": {"EXPORT_NAME": 1, "OUTPUT_NAME": 1}},
                "SRCS": "*",
            },
            "pargs": "1+",
        },
        "cc_library": {
            "flags": ["STATIC", "SHARED"],
            "kwargs": {
                "DEPS": {
                    "kwargs": {"INTERFACE": "*", "PRIVATE": "*", "PUBLIC": "*"},
                    "pargs": "*",
                },
                "INC": {
                    "kwargs": {"INTERFACE": "*", "PRIVATE": "*", "PUBLIC": "*"},
                    "pargs": 0,
                },
                "LIBDIRS": {
                    "kwargs": {"INTERFACE": "*", "PRIVATE": "*", "PUBLIC": "*"},
                    "pargs": "*",
                },
                "PKGDEPS": "*",
                "PROPERTIES": {
                    "kwargs": {
                        "ARCHIVE_OUTPUT_NAME": 1,
                        "EXPORT_NAME": 1,
                        "INTERFACE_INCLUDE_DIRECTORIES": 1,
                        "LIBRARY_OUTPUT_NAME": 1,
                        "OUTPUT_NAME": 1,
                        "SOVERSION": 1,
                        "SUFFIX": 1,
                        "VERSION": 1,
                    }
                },
                "SRCS": "*",
            },
            "pargs": "1+",
        },
        "cc_test": {
            "kwargs": {
                "ARGV": "*",
                "DEPS": "*",
                "LABELS": "*",
                "PKGDEPS": "*",
                "SRCS": "*",
                "TEST_DEPS": "*",
                "WORKING_DIRECTORY": "*",
            },
            "pargs": 1,
        },
        "check_call": {
            "flags": [
                "OUTPUT_QUIET",
                "ERROR_QUIET",
                "OUTPUT_STRIP_TRAILING_WHITESPACE",
                "ERROR_STRIP_TRAILING_WHITESPACE",
            ],
            "kwargs": {
                "COMMAND": "*",
                "ENCODING": "1",
                "ERROR_FILE": "1",
                "ERROR_VARIABLE": "1",
                "INPUT_FILE": "1",
                "OUTPUT_FILE": "1",
                "OUTPUT_VARIABLE": "1",
                "RESULTS_VARIABLE": "1",
                "RESULT_VARIABLE": "1",
                "TIMEOUT": "1",
                "WORKING_DIRECTORY": "1",
            },
        },
        "check_pyoneline": {
            "kwargs": {"ERROR_VARIABLE": 1, "OUTPUT_VARIABLE": 1},
            "pargs": "+",
        },
        "create_debian_binary_packages": {
            "kwargs": {"DEPS": "*", "OUTPUTS": "*"},
            "pargs": [3, "+"],
        },
        "create_debian_depsrepo": {"pargs": [3, "+"]},
        "create_debian_packages": {
            "kwargs": {"DEPS": "*", "OUTPUTS": "*"},
            "pargs": [{"flags": ["FORCE_PBUILDER"], "nargs": "+"}],
        },
        "debhelp": {"pargs": ["1+"], "spelling": "DEBHELP"},
        "exportvars": {
            "kwargs": {"VARS": "+"},
            "pargs": "1+",
            "spelling": "EXPORTVARS",
        },
        "format_and_lint": {
            "kwargs": {"CC": "*", "CMAKE": "*", "JS": "*", "PY": "*", "SHELL": "*"}
        },
        "get_debs": {"pargs": [3, "*"]},
        "gresource": {"kwargs": {"DEPENDS": "+", "SRCDIR": 1}, "pargs": 2},
        "gtk_doc_add_module": {
            "kwargs": {
                "FIXREFOPTS": "*",
                "IGNOREHEADERS": "*",
                "LIBRARIES": "*",
                "LIBRARY_DIRS": "*",
                "SOURCE": "*",
                "SUFFIXES": "*",
                "XML": 1,
            },
            "pargs": 1,
        },
        "importvars": {
            "kwargs": {"VARS": "+"},
            "pargs": "1+",
            "spelling": "IMPORTVARS",
        },
        "join": {"kwargs": {"GLUE": 1}, "pargs": [1, "+"]},
        "pkg_find": {"kwargs": {"PKG": "*"}},
        "stage_files": {
            "kwargs": {"FILES": "*", "LIST": 1, "SOURCEDIR": 1, "STAGE": 1}
        },
        "tangent_addtest": {
            "kwargs": {
                "COMMAND": "+",
                "CONFIGURATIONS": "+",
                "DEPENDS": "+",
                "LABELS": "+",
                "NAME": 1,
                "WORKING_DIRECTORY": 1,
            }
        },
        "tangent_extract_svg": {"kwargs": {"EXPORT": 1, "OUTPUT": 1, "SRC": 1}},
        "tangent_fetchobj": {"kwargs": {"OUTDIR": 1}, "pargs": 2},
        "tangent_rmark_render": {
            "kwargs": {"DEPENDS": 1, "FORMAT": 1, "OUTPUT": 1, "PAGENO": 1, "UUID": 1},
            "pargs": 1,
        },
        "tangent_unzip": {
            "kwargs": {"OUTPUT": "1+", "WORKING_DIRECTORY": 1},
            "pargs": "1+",
        },
        "travis_decrypt": {"kwargs": {}, "pargs": [3]},
    }

    # Override configurations per-command where available
    override_spec = {}

    # Specify variable tags.
    vartags = []

    # Specify property tags.
    proptags = []

# -----------------------------
# Options affecting formatting.
# -----------------------------
with section("format"):

    # Disable formatting entirely, making cmake-format a no-op
    disable = False

    # How wide to allow formatted cmake files
    line_width = 100

    # How many spaces to tab for indent
    tab_size = 2

    # If true, lines are indented using tab characters (utf-8 0x09) instead of
    # <tab_size> space characters (utf-8 0x20). In cases where the layout would
    # require a fractional tab character, the behavior of the  fractional
    # indentation is governed by <fractional_tab_policy>
    use_tabchars = False

    # If <use_tabchars> is True, then the value of this variable indicates how
    # fractional indentions are handled during whitespace replacement. If set to
    # 'use-space', fractional indentation is left as spaces (utf-8 0x20). If set
    # to `round-up` fractional indentation is replaced with a single tab character
    # (utf-8 0x09) effectively shifting the column to the next tabstop
    fractional_tab_policy = "use-space"

    # If an argument group contains more than this many sub-groups (parg or kwarg
    # groups) then force it to a vertical layout.
    max_subgroups_hwrap = 3

    # If a positional argument group contains more than this many arguments, then
    # force it to a vertical layout.
    max_pargs_hwrap = 6

    # If a cmdline positional group consumes more than this many lines without
    # nesting, then invalidate the layout (and nest)
    max_rows_cmdline = 3

    # If true, separate flow control names from their parentheses with a space
    separate_ctrl_name_with_space = False

    # If true, separate function names from parentheses with a space
    separate_fn_name_with_space = False

    # If a statement is wrapped to more than one line, than dangle the closing
    # parenthesis on its own line.
    dangle_parens = False

    # If the trailing parenthesis must be 'dangled' on its on line, then align it
    # to this reference: `prefix`: the start of the statement,  `prefix-indent`:
    # the start of the statement, plus one indentation  level, `child`: align to
    # the column of the arguments
    dangle_align = "prefix"

    # If the statement spelling length (including space and parenthesis) is
    # smaller than this amount, then force reject nested layouts.
    min_prefix_chars = 4

    # If the statement spelling length (including space and parenthesis) is larger
    # than the tab width by more than this amount, then force reject un-nested
    # layouts.
    max_prefix_chars = 10

    # If a candidate layout is wrapped horizontally but it exceeds this many
    # lines, then reject the layout.
    max_lines_hwrap = 2

    # What style line endings to use in the output.
    line_ending = "unix"

    # Format command names consistently as 'lower' or 'upper' case
    command_case = "canonical"

    # Format keywords consistently as 'lower' or 'upper' case
    keyword_case = "unchanged"

    # A list of command names which should always be wrapped
    always_wrap = []

    # If true, the argument lists which are known to be sortable will be sorted
    # lexicographicall
    enable_sort = True

    # If true, the parsers may infer whether or not an argument list is sortable
    # (without annotation).
    autosort = False

    # By default, if cmake-format cannot successfully fit everything into the
    # desired linewidth it will apply the last, most agressive attempt that it
    # made. If this flag is True, however, cmake-format will print error, exit
    # with non-zero status code, and write-out nothing
    require_valid_layout = False

    # A dictionary mapping layout nodes to a list of wrap decisions. See the
    # documentation for more information.
    layout_passes = {}

# ------------------------------------------------
# Options affecting comment reflow and formatting.
# ------------------------------------------------
with section("markup"):

    # What character to use for bulleted lists
    bullet_char = "*"

    # What character to use as punctuation after numerals in an enumerated list
    enum_char = "."

    # If comment markup is enabled, don't reflow the first comment block in each
    # listfile. Use this to preserve formatting of your copyright/license
    # statements.
    first_comment_is_literal = False

    # If comment markup is enabled, don't reflow any comment block which matches
    # this (regex) pattern. Default is `None` (disabled).
    literal_comment_pattern = None

    # Regular expression to match preformat fences in comments default=
    # ``r'^\s*([`~]{3}[`~]*)(.*)$'``
    fence_pattern = "^\\s*([`~]{3}[`~]*)(.*)$"

    # Regular expression to match rulers in comments default=
    # ``r'^\s*[^\w\s]{3}.*[^\w\s]{3}$'``
    ruler_pattern = "^\\s*[^\\w\\s]{3}.*[^\\w\\s]{3}$"

    # If a comment line matches starts with this pattern then it is explicitly a
    # trailing comment for the preceeding argument. Default is '#<'
    explicit_trailing_pattern = "#<"

    # If a comment line starts with at least this many consecutive hash
    # characters, then don't lstrip() them off. This allows for lazy hash rulers
    # where the first hash char is not separated by space
    hashruler_min_length = 10

    # If true, then insert a space between the first hash char and remaining hash
    # chars in a hash ruler, and normalize its length to fill the column
    canonicalize_hashrulers = True

    # enable comment markup parsing and reflow
    enable_markup = False

# ----------------------------
# Options affecting the linter
# ----------------------------
with section("lint"):

    # a list of lint codes to disable
    disabled_codes = ["C0113"]

    # regular expression pattern describing valid function names
    function_pattern = "[0-9a-z_]+"

    # regular expression pattern describing valid macro names
    macro_pattern = "[0-9A-Z_]+"

    # regular expression pattern describing valid names for variables with global
    # (cache) scope
    global_var_pattern = "[A-Z][0-9A-Z_]+"

    # regular expression pattern describing valid names for variables with global
    # scope (but internal semantic)
    internal_var_pattern = "_[A-Z][0-9A-Z_]+"

    # regular expression pattern describing valid names for variables with local
    # scope
    local_var_pattern = "[a-z][a-z0-9_]+"

    # regular expression pattern describing valid names for privatedirectory
    # variables
    private_var_pattern = "_[0-9a-z_]+"

    # regular expression pattern describing valid names for public directory
    # variables
    public_var_pattern = "[A-Z][0-9A-Z_]+"

    # regular expression pattern describing valid names for function/macro
    # arguments and loop variables.
    argument_var_pattern = "[a-z][a-z0-9_]+"

    # regular expression pattern describing valid names for keywords used in
    # functions or macros
    keyword_pattern = "[A-Z][0-9A-Z_]+"

    # In the heuristic for C0201, how many conditionals to match within a loop in
    # before considering the loop a parser.
    max_conditionals_custom_parser = 2

    # Require at least this many newlines between statements
    min_statement_spacing = 1

    # Require no more than this many newlines between statements
    max_statement_spacing = 2
    max_returns = 6
    max_branches = 12
    max_arguments = 5
    max_localvars = 15
    max_statements = 50

# -------------------------------
# Options affecting file encoding
# -------------------------------
with section("encode"):

    # If true, emit the unicode byte-order mark (BOM) at the start of the file
    emit_byteorder_mark = False

    # Specify the encoding of the input file. Defaults to utf-8
    input_encoding = "utf-8"

    # Specify the encoding of the output file. Defaults to utf-8. Note that cmake
    # only claims to support utf-8 so be careful when using anything else
    output_encoding = "utf-8"

# -------------------------------------
# Miscellaneous configurations options.
# -------------------------------------
with section("misc"):

    # A dictionary containing any per-command configuration overrides. Currently
    # only `command_case` is supported.
    per_command = {}


================================================
FILE: .devcontainer/Dockerfile
================================================
# See here for image contents: https://github.com/Oneflow-Inc/docker-images/blob/main/oneflow/Dockerfile
# [Choice] llvm12 llvm13 cuda11.1
ARG VARIANT="llvm13"
ARG REPO="oneflowinc/devcontainer"
FROM ${REPO}:${VARIANT}


================================================
FILE: .devcontainer/devcontainer.json
================================================
// For format details, see https://aka.ms/devcontainer.json. For config options, see the README at:
// https://github.com/microsoft/vscode-dev-containers/tree/v0.209.6/containers/cpp
// workaround for EACCES: permission denied, mkdir '/tmp/vsch.....
// https://github.com/microsoft/vscode-remote-release/issues/2347
// sudo chmod 777 /tmp/vsch/container-features
{
	"name": "oneflow-devel",
	"image": "oneflowinc/manylinux2014_x86_64_cuda11.2",
	"runArgs": [
		"--cap-add=SYS_PTRACE",
		"--privileged",
		"--shm-size=8g",
		"--security-opt",
		"seccomp=unconfined",
		"--network=host",
		// "--gpus",
		// "all",
	],
	"remoteEnv": {
		"PATH": "${containerEnv:PATH}:/opt/python/cp37-cp37m/bin",
		"ONEFLOW_CI_PYTHON_EXE": "/opt/python/cp37-cp37m/bin/python3",
		"ONEFLOW_CI_SRC_DIR": "${containerWorkspaceFolder}",
		"ONEFLOW_CI_BUILD_DIR": "${containerWorkspaceFolder}/build",
		"ONEFLOW_CI_CMAKE_INIT_CACHE": "${containerWorkspaceFolder}/cmake/caches/ci/cuda.cmake",
		"ONEFLOW_CI_BUILD_PARALLEL": "20"
	},
	"initializeCommand": "mkdir -p ${localWorkspaceFolder}/devcontainer-cache/dot/ccache && mkdir -p ${localWorkspaceFolder}/devcontainer-cache/dot/local && mkdir -p ${localWorkspaceFolder}/devcontainer-cache/dot/cache",
	"mounts": [
		"source=${localWorkspaceFolder}/devcontainer-cache/dot/ccache,target=/root/.ccache,type=bind,consistency=cached",
		"source=${localWorkspaceFolder}/devcontainer-cache/dot/local,target=/root/.local,type=bind,consistency=cached",
		"source=${localWorkspaceFolder}/devcontainer-cache/dot/cache,target=/root/.cache,type=bind,consistency=cached",
		"source=/dataset,target=/dataset,type=bind,consistency=cached,readonly",
		"source=/model_zoo,target=/model_zoo,type=bind,consistency=cached,readonly",
	],
	// Set *default* container specific settings.json values on container create.
	"settings": {
		"files.insertFinalNewline": true,
		"files.trimFinalNewlines": true,
		"files.trimTrailingWhitespace": true,
		"files.eol": "\n",
		"clangd.arguments": [
			"-j",
			"8",
			"-header-insertion=never"
		],
	},
	// Add the IDs of extensions you want installed when the container is created.
	"extensions": [
		"llvm-vs-code-extensions.vscode-clangd",
		"ms-vscode.cmake-tools",
		"ms-python.python"
	],
	// Comment out connect as root instead. More info: https://aka.ms/vscode-remote/containers/non-root.
	"remoteUser": "root",
}


================================================
FILE: .dockerignore
================================================
**/.git
/build
/build-*
/docs/build
/cmake-build-*
/third_party
/examples/**/oneflow
/benchmark/**/oneflow
/.vscode
/.idea
/.clangd
/dist
/wheelhouse*
/.DS_Store
/tmp_wheel
/manylinux*

**/__pycache__
**/*.pyc
**/log
**/.ipynb_checkpoints
**/core.0*
**/core.1*
**/core.2*
**/core.3*
**/core.4*
**/core.5*
**/core.6*
**/core.7*
**/core.8*
**/core.9*
/.cache
/oneflow-src.zip
/distributed-tmp
/serving-tmp


================================================
FILE: .github/CODEOWNERS
================================================
*.cu @liujuncheng
*.py @BBuf @daquexian
/oneflow/core/cuda @liujuncheng
/oneflow/core/eager @daquexian
/oneflow/core/framework @chengtbf @strint
/oneflow/core/functional @hjchen2
/oneflow/core/graph @chengtbf
/oneflow/core/ndarray @daquexian
/oneflow/core/object_msg @daquexian
/oneflow/core/platform @jackalcooper
/oneflow/core/ep @liujuncheng
/oneflow/core/rpc @jackalcooper
/oneflow/core/stream @liujuncheng
/oneflow/core/hardware @liujuncheng
/oneflow/core/transport @chengtbf
/oneflow/core/vm @daquexian
/oneflow/xrt @hjchen2
/oneflow/ir @hjchen2 @BBuf @jackalcooper
/ci @jackalcooper
/python/oneflow/test_utils @daquexian @BBuf
/cmake @daquexian @jackalcooper
CMakeLists.txt @daquexian @jackalcooper
/.github @jackalcooper
/tools @jackalcooper
/docs @doombeaker


================================================
FILE: .github/ISSUE_TEMPLATE/blank_issue.yml
================================================
name: Blank Issue
description: Submit an issue about OneFlow.
labels: [Blank Issue]
body:
  - type: textarea
    id: description
    attributes:
      label: Description
      description: Please describe the issue here.
      placeholder: Description
    validations:
      required: false


================================================
FILE: .github/ISSUE_TEMPLATE/bug_report.md
================================================
---
name: Bug report
about: Create a report to help us improve
title: ''
labels: bug, community
assignees: ''

---

## Summary

A short description about the bug/issue

## Code to reproduce bug

Please post a minimal example to repro the bug. GitHub Gist or repo is highly recommended.

## System Information

- What is your OneFlow installation (pip, source, dockerhub):
- OS:
- OneFlow version (run `python3 -m oneflow --doctor`):
- Python version:
- CUDA driver version:
- GPU models:
- Other info:


================================================
FILE: .github/ISSUE_TEMPLATE/documention_issue.yml
================================================
name: Documentation Issue
description: Report an issue about OneFlow ducumention or require a documention.
title: "[Documention Issue]: "
labels: [Documention Issue]
body:
  - type: markdown
    attributes:
      value: |
        Welcome to suggest to OneFlow documention! This template will help us gather the information we need to improve it.
  - type: textarea
    id: brief-description
    attributes:
      label: Brief Description
      description: Please describe the problem or the requst for new documention here.
      placeholder: Description
    validations:
      required: true
  - type: textarea
    id: alternatives
    attributes:
      label: Alternatives
      description: |
        Please provide some alternative information here, if any.
      placeholder: Alternatives
    validations:
      required: false
  - type: markdown
    attributes:
      value: |
        Thanks for your contributing!


================================================
FILE: .github/ISSUE_TEMPLATE/feature_request.yml
================================================
name: Feature Request
description: Request/Propose a new OneFlow feature.
title: "[Feature Request]: "
labels: [feature-request]
body:
  - type: markdown
    attributes:
      value: |
        We welcome feature proposal/request! This template will help us gather the information we need to review the proposal/request.
  - type: textarea
    id: background
    attributes:
      label: Background and motivation
      description: Please describe the purpose and value of the new feature here. If the feature is linked to a specific problem, please describe it or put the link here.
      placeholder: Purpose
    validations:
      required: true
  - type: textarea
    id: api-proposal
    attributes:
      label: API Proposal
      description: |
        Please provide the specific public API signature diff that you are proposing. If a new API is not required, please provide the current API related to the feature, or note that there is no related public API.
      placeholder: API declaration (no method bodies)
      value: |
        ```py
        def new_api(value: Tensor) -> Tensor:
          pass
        ```
    validations:
      required: true
  - type: textarea
    id: api-usage
    attributes:
      label: API Usage
      description: |
        Please provide code examples that highlight how the proposed API additions are meant to be consumed. This will help suggest whether the API has the right shape to be functional, performant and usable.
        If there is not a new API in step 2, please skip it.
      placeholder: API usage
    validations:
      required: false
  - type: textarea
    id: alternatives
    attributes:
      label: Alternatives
      description: |
        Please provide some alternative information of the feature, if any. For example, if you request a feature which depends on a specific device, please provide the device information.
      placeholder: Alternatives
    validations:
      required: false
  - type: textarea
    id: risks
    attributes:
      label: Risks
      description: |
        Please mention any risks that to your knowledge the API proposal might entail, such as breaking changes, performance regressions, etc.
      placeholder: Risks
    validations:
      required: false
  - type: markdown
    attributes:
      value: |
        Thanks for your contributing!


================================================
FILE: .github/ISSUE_TEMPLATE/performance_issue.yml
================================================
name: Performance Issue
description: Submit an issue about performance problem or regression of OneFlow.
title: "[Performance Issue]: "
labels: [Performance Issue]
body:
  - type: markdown
    attributes:
      value: |
        We welcome issues about OneFlow performance! This template will help us gather the information we need to locate the problem improve the performance.
  - type: textarea
    id: brief-description
    attributes:
      label: Brief Description
      description: Please give a brief description about the performance issue here.
      placeholder: Description
    validations:
      required: true
  - type: textarea
    id: device-and-context
    attributes:
      label: Device and Context
      description: |
        Please describe the device and context you used when you encounter the performance problem/regression.
      placeholder: Device and Context
    validations:
      required: true
  - type: textarea
    id: benchmark
    attributes:
      label: Benchmark
      description: |
        We will appreciate it if you'd like to provide benchmark comparison of the performance issue.
      placeholder: Benchmark
    validations:
      required: false
  - type: textarea
    id: alternatives
    attributes:
      label: Alternatives
      description: |
        Please provide some alternative information of the performance issue here, if any.
      placeholder: Alternatives
    validations:
      required: false
  - type: markdown
    attributes:
      value: |
        Thanks for your contributing!


================================================
FILE: .github/ISSUE_TEMPLATE/question.yml
================================================
name: Question
description: Ask a question about OneFlow and discuss with community members.
title: "[Question]: "
labels: [Question]
body:
  - type: markdown
    attributes:
      value: |
        Welcome to ask questions about OneFlow! This template will help us get your point.
  - type: textarea
    id: description
    attributes:
      label: Description
      description: Please describe your question here.
      placeholder: Description
    validations:
      required: true
  - type: textarea
    id: alternatives
    attributes:
      label: Alternatives
      description: |
        Please provide some alternative information here, if any.
      placeholder: Alternatives
    validations:
      required: false
  - type: markdown
    attributes:
      value: |
        We are always willing to answer your questions!


================================================
FILE: .github/PULL_REQUEST_TEMPLATE/general_template.md
================================================
## 概述


## PR Checklist
 - [ ] PR 标题语句通畅，明确表达 PR 内容，适合直接作为新版本发布时的 changelog
 - [ ] 代码格式化
 - [ ] 已经本地编译通过
 - [ ] 已本地针对改动测试
 - [ ] 已添加 type 标签:(填写 type 标签名，如 `bug, enhancement, purge, feature, documentation`)
 - [ ] 已添加 component 标签:(填写 component 标签名，如 `op, system, eager, build, xla, python, ci, test, tooling`)
 - [ ] Draft 转正式 PR 前已请人 Review


================================================
FILE: .github/PULL_REQUEST_TEMPLATE/op_template.md
================================================
## 概述
描述 op 的功能、公式等。若参考了其它框架的接口，应列出超链接。

## 功能 CheckList
**注意** : 功能复选框均为可选项，若未选择，说明理由即可。例如：该 Op 由 Python 接口拼接而成，因此无 `SetBatchAxisInferFn` Op 注册；再比如：该 Op 无输入，因此无 `SetInputArgModifyFn`。

模板中自带的复选框可留空，但是不能删除。可根据实际情况增加复选框选项。

### Op
 - [ ] Op SetBatchAxisInferFn
 - [ ] Op SetGetSbpFn
 - [ ] Op SetInputArgModifyFn
 - [ ] Op 反向梯度注册

### Kernel
 - [ ] CPU in:float32
 - [ ] CPU in:float64
 - [ ] CPU in:int32
 - [ ] CPU in:int64
 - [ ] CPU in:int8

 - [ ] GPU in:float32
 - [ ] GPU in:float64
 - [ ] GPU in:int32
 - [ ] GPU in:int64
 - [ ] GPU in:float16
 - [ ] GPU in:int8


### Python Wrapper
 - [ ] Python API 参数检查及异常提示
 - [ ] 接口注释
 - [ ] Example 

### 测试
 - [ ] 单机单卡  CPU Test Case
 - [ ] 单机单卡  GPU Test Case
 - [ ] 单机多卡  CPU Test Case
 - [ ] 单机多卡  GPU Test Case
 - [ ] 分布式  CPU Test Case
 - [ ] 分布式  GPU Test Case

## GPU 有效带宽
带 GPU 的 Op，请参考 https://github.com/Oneflow-Inc/OneTeam/issues/167 测试有效带宽，并附带测试报告。
以下是报告样例：

理论带宽：
```text
 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			250798.5
```

实际带宽：
```
PROFILER::KERNEL::CUDA_MEMORY_BANDWIDTH op_name: sqrt_2 elapsed(ms): 0.196064 memory_size(Byte): 50331648 bandwidth(GB/s): 239.08
PROFILER::KERNEL::CUDA_MEMORY_BANDWIDTH op_name: sqrt_2_grad elapsed(ms): 0.29072 memory_size(Byte): 75497472 bandwidth(GB/s): 241.856
```


## PR Checklist
 - [ ] PR 标题语句通畅，明确表达 PR 内容，适合直接作为新版本发布时的 changelog
 - [ ] 代码格式化
 - [ ] 已经本地编译通过
 - [ ] 已本地针对改动测试
 - [ ] 已添加 type 标签:(填写 type 标签名，如 `bug, enhancement, purge, feature, documentation`)
 - [ ] 已添加 component 标签:(填写 component 标签名，如 `op, system, eager, build, xla, python, ci, test, tooling`)
 - [ ] Draft 转正式 PR 前已请人 Review


================================================
FILE: .github/actions/mac-build/action.yml
================================================
name: "Build OneFlow on macOS"
description: ""
runs:
  using: "composite"
  steps:
    - name: Install dependencies
      run: |
        brew install nasm
      shell: bash
    - name: Set environment variables
      run: |
        set -x
        cmake_flags=""
        cmake_flags+=" -DPython3_EXECUTABLE=$(which python3)"
        cmake_flags+=" -DRPC_BACKEND=LOCAL"
        cmake_flags+=" -DCMAKE_BUILD_TYPE=Release"
        cmake_flags+=" -DBUILD_CUDA=OFF"
        echo "cmake_flags=${cmake_flags}" >> $GITHUB_ENV
      shell: bash
    - name: Build (third party)
      run: |
        mkdir -p build
        cd build
        cmake .. $cmake_flags -DTHIRD_PARTY=ON -DONEFLOW=OFF
        make -j $(nproc)
      shell: bash
    - name: Build (oneflow)
      run: |
        mkdir -p build
        cd build
        cmake .. $cmake_flags -DTHIRD_PARTY=OFF -DONEFLOW=ON
        make -j 2 oneflow
      shell: bash
    - name: Build (oneflow_internal)
      run: |
        mkdir -p build
        cd build
        cmake .. $cmake_flags -DTHIRD_PARTY=OFF -DONEFLOW=ON
        make -j 2 oneflow_internal
      shell: bash
    - name: Build (generate_api)
      run: |
        mkdir -p build
        cd build
        cmake .. $cmake_flags -DTHIRD_PARTY=OFF -DONEFLOW=ON
        make -j 2 generate_api
      shell: bash


================================================
FILE: .github/actions/setup/action.yml
================================================
inputs:
  name:
    description: 'Placeholder'
    default: 'Placeholder'
runs:
  using: "composite"
  steps:
    - run: |
        echo $HOSTNAME
        rm -rf build/third_party
        bash ci/setup_submodule.sh
        auth_header="$(git config --local --get http.https://github.com/.extraheader)"
        git -c "http.extraheader=$auth_header" -c protocol.version=2 submodule update --init --recursive
      shell: bash


================================================
FILE: .github/actions/upload_oss/action.yml
================================================
inputs:
  src_path:
    required: true
  oss_dst_path:
    required: true
  oss_access_key_id:
    required: true
  oss_access_key_secret:
    required: true
  upload_core:
    required: false
runs:
  using: "composite"
  steps:
    - run: |
        if [ -z "$OSS_ACCESS_KEY_ID" ]
        then
          exit 0
        fi
        if [ ! -f "$HOME/ossutil64" ]; then
          curl http://gosspublic.alicdn.com/ossutil/1.7.15/ossutil64 -o $HOME/ossutil64
        fi
        chmod 755 $HOME/ossutil64
        $HOME/ossutil64 config -e oss-cn-beijing.aliyuncs.com -i ${{ inputs.oss_access_key_id }} -k ${{ inputs.oss_access_key_secret }}  -L EN -c $HOME/.ossutilconfig
        dir_arg=""
        if [ -d "${{ inputs.src_path }}" ]; then
          dir_arg="--recursive"
        fi
        upload_core_arg=""
        if [ "${{ inputs.upload_core }}" == "true" ]; then
            echo "will upload core files"
        else
            upload_core_arg+='--exclude "core*"'
        fi
        set -x
        $HOME/ossutil64 cp --disable-ignore-error --update ${dir_arg} ${upload_core_arg} ${{ inputs.src_path }} ${{ inputs.oss_dst_path }}
      shell: bash
      env:
        OSS_ACCESS_KEY_ID: ${{ inputs.oss_access_key_id }}
        OSS_ACCESS_KEY_SECRET: ${{ inputs.oss_access_key_secret }}


================================================
FILE: .github/actions/upload_ssh/action.yml
================================================
name: "Upload via ssh"
description: ""
inputs:
  src_path:
    required: true
    description: ""
  dst_host:
    required: true
    description: ""
  dst_path:
    required: true
    description: ""
runs:
  using: "composite"
  steps:
    - run: |
        set -x
        dir_arg=""
        if [ -d "${{ inputs.src_path }}" ]; then
          dir_arg="-r"
        fi
        parent_dir=$(dirname ${{ inputs.dst_path }})
        ssh -o StrictHostKeyChecking=no ${{ inputs.dst_host }} mkdir -p $parent_dir
        ssh ${{ inputs.dst_host }} rm -rf ${{ inputs.dst_path }}
        scp ${dir_arg} ${{ inputs.src_path }} ${{ inputs.dst_host }}:${{ inputs.dst_path }}
      shell: bash


================================================
FILE: .github/actions/whl/action.yml
================================================
inputs:
  tmp_dir:
    description: "tmp dir"
    required: true
  cuda_version:
    description: "cuda_version"
    default: "10.2"
  python_version:
    description: "python_version"
    default: "3.8"
  extra_flags:
    description: "flags like --xla"
    default: ""
  extra_docker_args:
    description: ""
    default: ""
runs:
  using: "composite"
  steps:
    - run: |
        set -x
        src_dir=${PWD}
        tmp_dir="${{ inputs.tmp_dir }}"
        mkdir -p ${tmp_dir}
        cd ${tmp_dir}
        docker run --rm -v $PWD:/p -w $PWD:/p busybox rm -rf /p/wheelhouse
        python3 ${src_dir}/docker/package/manylinux/build_wheel.py \
            --cuda_version=${{ inputs.cuda_version }} \
            --python_version=${{ inputs.python_version }} \
            --use_tuna --use_system_proxy --use_aliyun_mirror \
            --wheel_house_dir=${tmp_dir}/wheelhouse \
            --oneflow_src_dir=${src_dir} ${{ inputs.extra_flags }} \
            --retry=1 \
            --extra_docker_args "${extra_docker_args}"
      shell: bash


================================================
FILE: .github/scripts/requirements.txt
================================================
PyYAML>=5.1
parsec


================================================
FILE: .github/scripts/set_initial_variables.py
================================================
import json


def create_one(name=None, allow_fail=None):
    return {
        "test_suite": name,
        "cuda_version": "N/A",
        "extra_flags": "N/A",
        "os": ["self-hosted", "linux", "build"],
        "allow_fail": allow_fail,
        "python_version": "N/A",
    }


def create_conda(name=None):
    return create_one(name=name, allow_fail=False)


def print_github_action_output(name=None, value=None):
    print(f"::set-output name={name}::{value}")


def print_result(build_matrix=None, test_matrix=None, out=None):
    check_include(include_key="test_suite", matrix=build_matrix)
    if test_matrix != {}:
        check_include(include_key="test_suite", matrix=test_matrix)
    assert build_matrix
    assert test_matrix != None
    root = {
        "build_matrix": build_matrix,
        "test_matrix": test_matrix,
    }
    for k, v in root.items():
        print_github_action_output(
            name=k, value=json.dumps(v),
        )
    if out:
        with open(out, "w+") as f:
            f.write(json.dumps(root, indent=4))


def check_include(include_key=None, matrix: dict = None):
    assert include_key in matrix
    in_declare = set(matrix[include_key])
    in_include = set()
    for include_value in matrix["include"]:
        in_include.add(include_value[include_key])
    assert in_declare == in_include, {
        "in_declare": in_declare,
        "in_include": in_include,
    }


if __name__ == "__main__":
    import argparse

    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--labels", type=lambda x: (str(x).replace(" ", "").split(",")), required=True,
    )
    parser.add_argument("--out", type=str, required=False)
    args = parser.parse_args()
    if "need-clang-only" in args.labels:
        print_result(
            build_matrix={
                "test_suite": ["cpu-clang"],
                "include": [create_conda("cpu-clang")],
            },
            test_matrix={},
            out=args.out,
        )
    else:
        full_build_matrix = {
            "test_suite": ["cuda", "cpu", "xla", "xla_cpu", "cpu-clang"],
            "include": [
                {
                    "test_suite": "cuda",
                    "cuda_version": 10.2,
                    "extra_flags": "--extra_oneflow_cmake_args=-DCUDA_ARCHITECTURES=61 --extra_oneflow_cmake_args=-DRPC_BACKEND=GRPC,LOCAL --extra_oneflow_cmake_args=-DPIP_INDEX_MIRROR=https://pypi.tuna.tsinghua.edu.cn/simple",
                    "os": ["self-hosted", "linux", "build"],
                    "allow_fail": False,
                    "python_version": "3.6,3.7",
                },
                {
                    "test_suite": "cpu",
                    "cuda_version": 10.2,
                    "extra_flags": "--extra_oneflow_cmake_args=-DBUILD_SHARED_LIBS=OFF --extra_oneflow_cmake_args=-DRPC_BACKEND=LOCAL --cpu",
                    "os": ["self-hosted", "linux", "build"],
                    "allow_fail": False,
                    "python_version": "3.6,3.7",
                },
                {
                    "test_suite": "xla",
                    "cuda_version": 10.1,
                    "extra_flags": "--extra_oneflow_cmake_args=-DCUDA_ARCHITECTURES=61 --extra_oneflow_cmake_args=-DRPC_BACKEND=GRPC,LOCAL --xla --extra_oneflow_cmake_args=-DPIP_INDEX_MIRROR=https://pypi.tuna.tsinghua.edu.cn/simple",
                    "os": ["self-hosted", "linux", "build"],
                    "allow_fail": True,
                    "python_version": 3.6,
                },
                {
                    "test_suite": "xla_cpu",
                    "cuda_version": 10.1,
                    "extra_flags": "--extra_oneflow_cmake_args=-DRPC_BACKEND=GRPC,LOCAL --xla --cpu --extra_oneflow_cmake_args=-DPIP_INDEX_MIRROR=https://pypi.tuna.tsinghua.edu.cn/simple",
                    "os": ["self-hosted", "linux", "build"],
                    "allow_fail": True,
                    "python_version": 3.6,
                },
                create_conda("cpu-clang"),
            ],
        }
        full_test_matrix = {
            "test_suite": [
                "cuda",
                "cuda_op",
                "cuda_new_interface",
                "cpu_new_interface",
                "cpu",
                "xla",
                "xla_cpu",
            ],
            "include": [
                {
                    "test_suite": "cuda",
                    "os": ["self-hosted", "linux", "gpu"],
                    "allow_fail": False,
                    "build_env": "build.cuda.env",
                },
                {
                    "test_suite": "cuda_op",
                    "os": ["self-hosted", "linux", "gpu"],
                    "allow_fail": False,
                    "build_env": "build.cuda.env",
                },
                {
                    "test_suite": "cuda_new_interface",
                    "os": ["self-hosted", "linux", "gpu"],
                    "allow_fail": False,
                    "build_env": "build.cuda.env",
                },
                {
                    "test_suite": "cpu",
                    "os": ["self-hosted", "linux", "cpu"],
                    "allow_fail": False,
                    "build_env": "build.cpu.env",
                },
                {
                    "test_suite": "cpu_new_interface",
                    "os": ["self-hosted", "linux", "cpu"],
                    "allow_fail": False,
                    "build_env": "build.cpu.env",
                },
                {
                    "test_suite": "xla",
                    "os": ["self-hosted", "linux", "gpu"],
                    "allow_fail": True,
                    "build_env": "build.xla.env",
                },
                {
                    "test_suite": "xla_cpu",
                    "os": ["self-hosted", "linux", "cpu"],
                    "allow_fail": True,
                    "build_env": "build.xla_cpu.env",
                },
            ],
        }
        print_result(
            build_matrix=full_build_matrix, test_matrix=full_test_matrix, out=args.out,
        )


================================================
FILE: .github/workflows/canary.yml
================================================
name: Canary

on:
  push:
    branches:
      - master
      - "canary/*"
  workflow_dispatch:
    inputs:
      oneflow-ref:
        description: ""
        default: "master"
        required: true
concurrency:
  group: canary-${{ github.ref }}
  cancel-in-progress: false
jobs:
  canary_release:
    name: Canary Release
    timeout-minutes: 120
    runs-on: [self-hosted, linux, release]
    if: github.repository == 'Oneflow-Inc/oneflow'
    strategy:
      max-parallel: 1
      fail-fast: false
      matrix:
        entry: ["canary", "profiler"]
        include:
          - entry: "canary"
            cmake-init-cache: "cmake/caches/ci/canary/cuda.cmake"
          - entry: "profiler"
            cmake-init-cache: "cmake/caches/ci/profiler/cuda.cmake"
    env:
      ONEFLOW_SRC: .
      MANYLINUX_CACHE_DIR: ~/manylinux-cache-dir/canary-cu112
      WHEELHOUSE_DIR: manylinux-wheelhouse
      COMPUTE_PLATFORM: cu118
      OSS_BUCKET: oneflow-staging
      OSS_WHEEL_HOUSE_DIR: ${{ matrix.entry }}/commit/${{ github.sha }}
      OSS_GITHUB_REF_DIR: ${{ matrix.entry }}/${{ github.ref }}
    steps:
      - name: Fix permissions
        run: |
          set -x
          docker run --rm -v $PWD:$PWD -w $PWD busybox rm -rf *
      - name: Remove leftover cuda-installer.log
        run: |
          docker run --rm -v /tmp:/host/tmp -w /p busybox rm -f /host/tmp/cuda-installer.log
      - name: Checkout Oneflow-Inc/oneflow
        if: ${{ github.event.inputs.oneflow-ref != '' }}
        uses: actions/checkout@v2
        with:
          ref: ${{ github.event.inputs.oneflow-ref }}
      - name: Checkout Oneflow-Inc/oneflow
        if: ${{ github.event.inputs.oneflow-ref == '' }}
        uses: actions/checkout@v2
      - uses: Oneflow-Inc/get-oneflow@ci-test-with-cu118
        name: Build manylinux
        id: build-cuda
        with:
          cmake-init-cache: ${{ env.ONEFLOW_SRC }}/${{ matrix.cmake-init-cache }}
          build-script: ${{ env.ONEFLOW_SRC }}/ci/manylinux/build-gcc9.sh
          oneflow-src: ${{ env.ONEFLOW_SRC }}
          oneflow-build-env: manylinux
          wheelhouse-dir: ${{ env.WHEELHOUSE_DIR }}
          clear-wheelhouse-dir: true
          self-hosted: true
          manylinux-cache-dir: ${{ env.MANYLINUX_CACHE_DIR }}
          docker-run-use-system-http-proxy: false
          docker-run-use-lld: true
          retry-failed-build: true
          clean-ccache: true
          compute-platform: ${{ env.COMPUTE_PLATFORM }}
          python-versions: |
            3.8
            3.10
      - name: Upload wheelhouse
        uses: ./.github/actions/upload_oss
        with:
          src_path: ${{ env.WHEELHOUSE_DIR }}
          oss_dst_path: oss://${{ env.OSS_BUCKET }}/${{ env.OSS_WHEEL_HOUSE_DIR }}/${{ env.COMPUTE_PLATFORM }}
          oss_access_key_id: ${{ secrets.OSS_ACCESS_KEY_ID }}
          oss_access_key_secret: ${{ secrets.OSS_ACCESS_KEY_SECRET }}
      - name: Update pip index
        env:
          OSS_ACCESS_KEY_ID: ${{ secrets.OSS_ACCESS_KEY_ID }}
          OSS_ACCESS_KEY_SECRET: ${{ secrets.OSS_ACCESS_KEY_SECRET }}
        run: |
          python3 -m pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
          python3 -m pip install oss2 beautifulsoup4 --user
          python3 tools/create_pip_index.py -b ${{ env.OSS_BUCKET }} \
            --dir_key ${{ env.OSS_WHEEL_HOUSE_DIR }}/${{ env.COMPUTE_PLATFORM }} \
            --index_key=${{ env.OSS_WHEEL_HOUSE_DIR }}/${{ env.COMPUTE_PLATFORM }}/index.html \
            --index_key=${{ env.OSS_GITHUB_REF_DIR}}/${{ env.COMPUTE_PLATFORM }}/index.html


================================================
FILE: .github/workflows/community_release.yml
================================================
name: Community Release

on:
  push:
    branches:
      - "community/*"
  schedule:
    # beijing: 6 pm.
    # utc: 10 am.
    - cron: "0 10 * * sat"
  workflow_dispatch:
    inputs:
      priv_branch:
        required: false
        default: "main"

concurrency:
  group: community-release-${{ github.ref }}-${{ inputs.priv_branch }}
  cancel-in-progress: true

jobs:
  release:
    name: Release pip
    permissions:
      contents: read
      pull-requests: write
    uses: ./.github/workflows/release.yml
    with:
      is_priv: true
      branch: ${{ inputs.priv_branch || 'main' }}
      upload_override_branch: "community"
      cuda_cmake_cache: cmake/caches/ci/release/cuda_community.cmake
    secrets:
      ONEFLOW_PRIV_ORG: ${{ secrets.ONEFLOW_PRIV_ORG }}
      ONEFLOW_PRIV_GH_TOKEN: ${{ secrets.ONEFLOW_PRIV_GH_TOKEN }}
      ONEFLOW_PRIV_OSS_BUCKET: ${{ secrets.ONEFLOW_PRIV_OSS_BUCKET }}
      OSS_ACCESS_KEY_ID: ${{ secrets.OSS_ACCESS_KEY_ID }}
      OSS_ACCESS_KEY_SECRET: ${{ secrets.OSS_ACCESS_KEY_SECRET }}
      ONEFLOW_CI_HTTP_PROXY: ${{ secrets.ONEFLOW_CI_HTTP_PROXY }}


================================================
FILE: .github/workflows/on_merge.yml
================================================
name: Update Benchmark History
on:
  pull_request:
    types:
      - closed
    branches:
      - master

env:
  OSS_ACCESS_KEY_ID: ${{ secrets.OSS_ACCESS_KEY_ID }}
  OSS_ACCESS_KEY_SECRET: ${{ secrets.OSS_ACCESS_KEY_SECRET }}

jobs:
  if_merged:
    if: github.event.pull_request.merged == true
    runs-on: ubuntu-latest
    steps:
      - uses: Oneflow-Inc/get-oneflow/update-benchmark-history@ci-test-with-cu118
        name: Update benchmark history
        timeout-minutes: 10


================================================
FILE: .github/workflows/pr.yml
================================================
name: Check PR

on:
  pull_request:
    types: [opened, labeled, unlabeled, synchronize]

jobs:
  check_labels:
    runs-on: ubuntu-22.04
    name: Labels
    if: github.event.pull_request.draft == false && github.base_ref == 'master'
    steps:
      - name: Check type labels 'bug, enhancement, purge, feature, documentation'
        if: (contains(github.event.pull_request.labels.*.name, 'bug') || contains(github.event.pull_request.labels.*.name, 'enhancement') || contains(github.event.pull_request.labels.*.name, 'purge') || contains(github.event.pull_request.labels.*.name, 'feature') || contains(github.event.pull_request.labels.*.name, 'documentation')) == false
        run: |
          exit 1
      - name: Check component labels 'op, system, eager, build, xla, python, ci, test, tooling, quantization, graph, ir, serving'
        if: (contains(github.event.pull_request.labels.*.name, 'op') || contains(github.event.pull_request.labels.*.name, 'system') || contains(github.event.pull_request.labels.*.name, 'eager') || contains(github.event.pull_request.labels.*.name, 'build') || contains(github.event.pull_request.labels.*.name, 'xla') || contains(github.event.pull_request.labels.*.name, 'python') || contains(github.event.pull_request.labels.*.name, 'ci') || contains(github.event.pull_request.labels.*.name, 'test') || contains(github.event.pull_request.labels.*.name, 'tooling') || contains(github.event.pull_request.labels.*.name, 'quantization') || contains(github.event.pull_request.labels.*.name, 'graph') || contains(github.event.pull_request.labels.*.name, 'ir') || contains(github.event.pull_request.labels.*.name, 'serving')) == false
        run: |
          exit 2


================================================
FILE: .github/workflows/priv_release.yml
================================================
name: Priv Release

on:
  push:
    branches:
      - "pro/*"
  schedule:
    # beijing: 12 pm.
    # utc: 4 am.
    - cron: "0 4 * * sun"
  workflow_dispatch:
    inputs:
      priv_branch:
        required: false
        default: "main"

concurrency:
  group: priv-release-${{ github.ref }}-${{ inputs.priv_branch }}
  cancel-in-progress: true

jobs:
  release:
    name: Release pip
    permissions:
      contents: read
      pull-requests: write
    uses: ./.github/workflows/release.yml
    with:
      is_priv: true
      branch: ${{ inputs.priv_branch || 'main' }}
      cuda_cmake_cache: cmake/caches/ci/release/cuda_pro.cmake
    secrets:
      ONEFLOW_PRIV_ORG: ${{ secrets.ONEFLOW_PRIV_ORG }}
      ONEFLOW_PRIV_GH_TOKEN: ${{ secrets.ONEFLOW_PRIV_GH_TOKEN }}
      ONEFLOW_PRIV_OSS_BUCKET: ${{ secrets.ONEFLOW_PRIV_OSS_BUCKET }}
      OSS_ACCESS_KEY_ID: ${{ secrets.OSS_ACCESS_KEY_ID }}
      OSS_ACCESS_KEY_SECRET: ${{ secrets.OSS_ACCESS_KEY_SECRET }}
      ONEFLOW_CI_HTTP_PROXY: ${{ secrets.ONEFLOW_CI_HTTP_PROXY }}


================================================
FILE: .github/workflows/release.yml
================================================
name: Release

on:
  push:
    branches:
      - "release/*"

  schedule:
    # beijing: 2 am.
    # utc: 6 pm.
    - cron: "0 18 * * *"
  workflow_dispatch:
    inputs:
      placeholder:
        description: "update .github/workflows/release.yml to config your build"
        required: false
  workflow_call:
    inputs:
      is_priv:
        required: true
        type: boolean
      branch:
        required: false
        type: string
        default: "main"
      upload_override_branch:
        required: false
        type: string
      cuda_cmake_cache:
        required: false
        type: string
    secrets:
      ONEFLOW_PRIV_ORG:
        required: true
      ONEFLOW_PRIV_GH_TOKEN:
        required: true
      ONEFLOW_PRIV_OSS_BUCKET:
        required: true
      OSS_ACCESS_KEY_ID:
        required: true
      OSS_ACCESS_KEY_SECRET:
        required: true
      ONEFLOW_CI_HTTP_PROXY:
        required: false
concurrency:
  group: release-${{ github.ref }}-${{ inputs.branch }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/master' }}
env:
  ONEFLOW_SRC: .
jobs:
  generate-build-matrix:
    name: "Generate build matrix"
    runs-on: ubuntu-latest
    env:
      ONEFLOW_SRC: .
    outputs:
      matrix: ${{ steps.find-cache.outputs.matrix }}
      formatted_date: ${{ steps.date.outputs.formatted_date }}
    steps:
      - name: Checkout Oneflow-Inc/oneflow
        uses: actions/checkout@v2
        if: ${{ !inputs.is_priv }}
        with:
          ref: ${{ github.event.pull_request.head.sha }}
          repository: ${{github.event.pull_request.head.repo.full_name}}
      - name: Checkout oneflow
        uses: actions/checkout@v2
        if: ${{ inputs.is_priv }}
        with:
          ref: ${{ inputs.branch }}
          repository: ${{ secrets.ONEFLOW_PRIV_ORG }}/oneflow
          token: ${{ secrets.ONEFLOW_PRIV_GH_TOKEN }}
      - uses: Oneflow-Inc/get-oneflow/cache-complete/matrix/build@ci-test-with-cu118
        name: Find build cache
        id: find-cache
        timeout-minutes: 5
        with:
          delete-cache: ${{ contains(github.event.pull_request.labels.*.name, 'need-clean-ccache') }}
          runner-labels: |
            self-hosted
            linux
            release
          oneflow-src: ${{ env.ONEFLOW_SRC }}
          entries: |
            cu122
            cu121
            cu118
            cpu
      - name: Get current date
        id: date
        run: echo "formatted_date=$(date +'%Y%m%d')" >> $GITHUB_OUTPUT

  staging_release:
    env:
      MANYLINUX_CACHE_DIR: ~/manylinux-cache-dir/release/${{ matrix.entry }}
      WHEELHOUSE_DIR: manylinux_wheelhouse
      OSS_DIR: branch/${{ github.ref_name }}/${{ matrix.entry }}/${{ github.sha }}
      GITHUB_REF_NAME: ${{ github.ref_name }}
      GITHUB_SHA: ${{ github.sha }}
      ONEFLOW_OSS_BUCKET: oneflow-staging
      https_proxy: ${{ secrets.ONEFLOW_CI_HTTP_PROXY }}
    needs: [generate-build-matrix]
    name: Staging Release
    timeout-minutes: 240
    runs-on: [self-hosted, linux, release]
    if: github.repository == 'Oneflow-Inc/oneflow' || inputs.is_priv
    strategy:
      fail-fast: false
      max-parallel: 6
      matrix: ${{ fromJson(needs.generate-build-matrix.outputs.matrix) }}
    steps:
      - name: Fix permissions
        run: |
          docker run --rm -v $PWD:/p -w /p busybox rm -rf *
      - name: Install dependencies
        run: |
          python3 -m pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
          python3 -m pip install -U setuptools wheel --user
          python3 -m pip install oss2  --user
      - name: Checkout Oneflow-Inc/oneflow
        uses: actions/checkout@v2
        if: ${{ !inputs.is_priv }}
        with:
          ref: ${{ github.event.pull_request.head.sha }}
          repository: ${{github.event.pull_request.head.repo.full_name}}
      - name: Checkout private oneflow
        uses: actions/checkout@v2
        if: ${{ inputs.is_priv }}
        with:
          ref: ${{ inputs.branch }}
          repository: ${{ secrets.ONEFLOW_PRIV_ORG }}/oneflow
          token: ${{ secrets.ONEFLOW_PRIV_GH_TOKEN }}
      - name: Checkout cutlass_extension
        uses: actions/checkout@v2
        if: ${{ inputs.is_priv }}
        with:
          repository: ${{ secrets.ONEFLOW_PRIV_ORG }}/cutlass-extension
          token: ${{ secrets.ONEFLOW_PRIV_GH_TOKEN }}
          path: cutlass-extension
      - name: Set Private env
        if: ${{ inputs.is_priv }}
        run: |
          GITHUB_SHA=$(git rev-parse HEAD)
          echo "OSS_DIR=branch/${{ inputs.upload_override_branch || inputs.branch }}/${{ matrix.entry }}/${GITHUB_SHA}" >> $GITHUB_ENV
          echo "GITHUB_REF_NAME=${{ inputs.upload_override_branch || inputs.branch }}" >> $GITHUB_ENV
          echo "GITHUB_SHA=${GITHUB_SHA}" >> $GITHUB_ENV
          echo "ONEFLOW_OSS_BUCKET=${{ secrets.ONEFLOW_PRIV_OSS_BUCKET }}" >> $GITHUB_ENV
      - name: Print env
        if: ${{ inputs.is_priv }}
        run: |
          env
      - uses: Oneflow-Inc/get-oneflow@ci-test-with-cu118
        name: Build ${{ matrix.entry }}
        if: ${{ matrix.entry =='cu118' || startsWith(matrix.entry, 'cu12') }}
        with:
          cmake-init-cache: ${{ env.ONEFLOW_SRC }}/${{ inputs.cuda_cmake_cache || 'cmake/caches/ci/release/cu118.cmake' }}
          build-script: ${{ env.ONEFLOW_SRC }}/ci/manylinux/build-gcc9.sh
          oneflow-src: ${{ env.ONEFLOW_SRC }}
          oneflow-build-env: manylinux
          wheelhouse-dir: ${{ env.WHEELHOUSE_DIR }}
          clear-wheelhouse-dir: true
          self-hosted: true
          compute-platform: ${{ matrix.entry }}
          manylinux-cache-dir: ${{ env.MANYLINUX_CACHE_DIR }}
          docker-run-use-system-http-proxy: false
          docker-run-use-lld: false
          retry-failed-build: true
          clean-ccache: true
          nightly: ${{ inputs.is_priv || github.event_name == 'schedule' || github.ref == 'refs/heads/release/add_nightly_date_index'}}
          nightly-date: ${{ needs.generate-build-matrix.outputs.formatted_date }}
          use-nvidia-wheels: ${{ matrix.entry !='cu112' }}
          python-versions: |
            3.12
            3.11
            3.10
            3.9
            3.8
      - uses: Oneflow-Inc/get-oneflow@ci-test-with-cu118
        name: Build ${{ matrix.entry }}
        if: ${{ startsWith(matrix.entry, 'cu') && matrix.entry !='cu118' && !startsWith(matrix.entry, 'cu12') }}
        with:
          cmake-init-cache: ${{ env.ONEFLOW_SRC }}/cmake/caches/ci/release/cuda.cmake
          build-script: ${{ env.ONEFLOW_SRC }}/ci/manylinux/build-gcc9.sh
          oneflow-src: ${{ env.ONEFLOW_SRC }}
          oneflow-build-env: manylinux
          wheelhouse-dir: ${{ env.WHEELHOUSE_DIR }}
          clear-wheelhouse-dir: true
          self-hosted: true
          compute-platform: ${{ matrix.entry }}
          manylinux-cache-dir: ${{ env.MANYLINUX_CACHE_DIR }}
          docker-run-use-system-http-proxy: false
          docker-run-use-lld: false
          retry-failed-build: true
          clean-ccache: true
          nightly: ${{ inputs.is_priv || github.event_name == 'schedule' || github.ref == 'refs/heads/release/add_nightly_date_index'}}
          nightly-date: ${{ needs.generate-build-matrix.outputs.formatted_date }}
          use-nvidia-wheels: ${{ matrix.entry !='cu112' }}
          python-versions: |
            3.12
            3.11
            3.10
            3.9
            3.8
      - uses: Oneflow-Inc/get-oneflow@ci-test-with-cu118
        name: Build ${{ matrix.entry }}
        if: ${{ matrix.entry =='cpu' }}
        with:
          cmake-init-cache: ${{ env.ONEFLOW_SRC }}/cmake/caches/ci/release/cpu.cmake
          build-script: ${{ env.ONEFLOW_SRC }}/ci/manylinux/build.sh
          oneflow-src: ${{ env.ONEFLOW_SRC }}
          oneflow-build-env: manylinux
          wheelhouse-dir: ${{ env.WHEELHOUSE_DIR }}
          clear-wheelhouse-dir: true
          self-hosted: true
          compute-platform: ${{ matrix.entry }}
          manylinux-cache-dir: ${{ env.MANYLINUX_CACHE_DIR }}
          docker-run-use-system-http-proxy: false
          docker-run-use-lld: false
          retry-failed-build: true
          clean-ccache: false
          nightly: ${{ inputs.is_priv || github.event_name == 'schedule' || github.ref == 'refs/heads/release/add_nightly_date_index'}}
          nightly-date: ${{ needs.generate-build-matrix.outputs.formatted_date }}
          python-versions: |
            3.12
            3.11
            3.10
            3.9
            3.8
      - name: Upload wheel
        uses: ./.github/actions/upload_oss
        with:
          src_path: ${{ env.WHEELHOUSE_DIR }}
          oss_dst_path: oss://${{ env.ONEFLOW_OSS_BUCKET }}/${{ env.OSS_DIR }}
          oss_access_key_id: ${{ secrets.OSS_ACCESS_KEY_ID }}
          oss_access_key_secret: ${{ secrets.OSS_ACCESS_KEY_SECRET }}
      - name: Update pip index
        env:
          OSS_ACCESS_KEY_ID: ${{ secrets.OSS_ACCESS_KEY_ID }}
          OSS_ACCESS_KEY_SECRET: ${{ secrets.OSS_ACCESS_KEY_SECRET }}
        run: |
          python3 -m pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
          python3 -m pip install oss2 beautifulsoup4 --user
          python3 tools/create_pip_index.py --dir_key ${{ env.OSS_DIR }} -b ${{ env.ONEFLOW_OSS_BUCKET }} \
            --index_key=branch/${{ env.GITHUB_REF_NAME }}/${{ matrix.entry }}/index.html \
            --index_key=branch/${{ env.GITHUB_REF_NAME }}/date/${{ needs.generate-build-matrix.outputs.formatted_date }}/${{ matrix.entry }}/index.html \
            --index_key=${{ env.OSS_DIR }}/index.html \
            --index_key=commit/${{ env.GITHUB_SHA }}/${{ matrix.entry }}/index.html
      - name: Update API docs
        if: github.ref == 'refs/heads/master' && matrix.entry == 'cpu' && !inputs.is_priv
        env:
          READTHEDOCS_TOKEN: ${{ secrets.READTHEDOCS_TOKEN }}
        run: |
          curl -X POST -d "branches=master" -d "token=${READTHEDOCS_TOKEN}"  https://readthedocs.org/api/v2/webhook/oneflow/135376/


================================================
FILE: .github/workflows/simple.yml
================================================
name: Simple CI
on:
  pull_request:
    types: [review_requested]
    branches:
      - "*"
  push:
    branches:
      - master
  workflow_dispatch:
    inputs:
      placeholder:
        description: "placeholder, no effect"
        required: false
concurrency:
  group: simple-ci-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/master' }}
jobs:
  static_analysis_with_clang:
    name: Static analysis with clang
    runs-on: ubuntu-22.04
    if: github.ref == 'refs/heads/master' || (github.event.pull_request.draft == false && contains(github.event.pull_request.requested_reviewers.*.login, 'oneflow-ci-bot') && contains(github.event.pull_request.labels.*.name, 'need-simple-ci'))
    steps:
      - name: Check out OneFlow
        uses: actions/checkout@v2
        with:
          ref: ${{ github.event.pull_request.head.ref }}
          repository: ${{github.event.pull_request.head.repo.full_name}}
      - name: Install dependencies
        run: |
          sudo apt-get update
          sudo apt-get install -y libopenblas-dev nasm python3-pip ninja-build
      - name: Download OneFlow custom clang-tidy
        run: |
          wget https://github.com/Oneflow-Inc/llvm-project/releases/download/maybe-14.0.4/clang-tidy-14.AppImage
          wget https://raw.githubusercontent.com/oneflow-inc/llvm-project/maybe/clang-tools-extra/clang-tidy/tool/run-clang-tidy.py
          chmod +x clang-tidy-14.AppImage run-clang-tidy.py
      - name: Build third party libs and generate files
        run: |
          mkdir build
          cd build
          cmake .. -C ../cmake/caches/international/cpu.cmake \
            -DCMAKE_BUILD_TYPE=Release \
            -DBUILD_TESTING=ON
          cmake --build . -j$(nproc) --target oneflow_deps of_protoobj of_functional_obj of_functional_tensor_obj of_op_schema
      - name: Run clang-tidy for all translation units
        # use clang as compiler for correct compiler flags
        run: |
          cd build
          rm CMakeCache.txt
          cmake .. -C ../cmake/caches/international/cpu.cmake \
            -DCMAKE_C_COMPILER=clang-12 \
            -DCMAKE_CXX_COMPILER=clang++-12 \
            -DCMAKE_BUILD_TYPE=Release \
            -DBUILD_TESTING=ON \
            -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
          cd ..
          ./run-clang-tidy.py -clang-tidy-binary ./clang-tidy-14.AppImage -p build -quiet -allow-enabling-alpha-checkers -extra-arg="-Xclang" -extra-arg="-analyzer-config" -extra-arg="-Xclang" -extra-arg="aggressive-binary-operation-simplification=true" "^(?!$(pwd)/build)"

  hosted:
    name: CPU-only
    if: github.ref == 'refs/heads/master' || (github.event.pull_request.draft == false && contains(github.event.pull_request.requested_reviewers.*.login, 'oneflow-ci-bot') && contains(github.event.pull_request.labels.*.name, 'need-simple-ci'))
    runs-on: ${{ matrix.os }}
    env:
      CFLAGS: "-w"
      CXXFLAGS: "-w"
    strategy:
      fail-fast: true
      max-parallel: 1
      matrix:
        test_suite: ["mac", "ubuntu"]
        cmake_generator: ["Ninja", "Unix Makefiles"]
        cmake_build_type: ["Debug", "Release"]
        build_shared_libs: ["ON", "OFF"]
        include:
          - test_suite: mac
            os: "macos-10.15"
            make_concurrency: 2
          - test_suite: ubuntu
            os: "ubuntu-22.04"
            make_concurrency: 2
        exclude:
          - test_suite: mac
            cmake_build_type: "Debug"
          - test_suite: mac
            cmake_generator: "Ninja"
          - test_suite: ubuntu
            cmake_generator: "Ninja"
            cmake_build_type: "Debug"
          - test_suite: ubuntu
            cmake_generator: "Ninja"
            build_shared_libs: "OFF"
          - test_suite: ubuntu
            cmake_build_type: "Debug"
            build_shared_libs: "OFF"
          - test_suite: ubuntu
            cmake_generator: "Unix Makefiles"
            cmake_build_type: "Release"
    steps:
      - name: Set Swap Space
        uses: pierotofy/set-swap-space@master
        with:
          swap-size-gb: 5
      - uses: actions/checkout@v2
        with:
          ref: ${{ github.event.pull_request.head.sha }}
      - name: Install dependencies (homebrew)
        if: matrix.test_suite == 'mac'
        run: |
          brew install nasm ninja
      - name: Install dependencies (apt)
        if: matrix.test_suite == 'ubuntu'
        run: |
          sudo apt install -y libopenblas-dev nasm g++ gcc python3-pip ninja-build
      - name: Cache pip (Linux)
        if: startsWith(runner.os, 'Linux')
        uses: actions/cache@v4
        with:
          path: ~/.cache/pip
          key: ${{ matrix.os }}-pip-${{ hashFiles('**/requirements.txt') }}
      - name: Cache pip (macOS)
        if: startsWith(runner.os, 'macOS')
        uses: actions/cache@v4
        with:
          path: ~/Library/Caches/pip
          key: ${{ matrix.os }}-pip-${{ hashFiles('**/requirements.txt') }}
      - name: Install dependencies (pip)
        run: |
          python3 -m pip install -r ci/requirements.txt
          python3 -m pip install -r dev-requirements.txt
      - name: Set environment variables
        run: |
          set -x
          cmake_flags=""
          cmake_flags+=" -DBUILD_CUDA=OFF"
          cmake_flags+=" -DBUILD_TESTING=ON"
          cmake_flags+=" -G '${{ matrix.cmake_generator }}'"
          cmake_flags+=" -DCMAKE_BUILD_TYPE=${{ matrix.cmake_build_type }}"
          cmake_flags+=" -DBUILD_SHARED_LIBS=${{ matrix.build_shared_libs }}"
          cmake_flags+=" -DCMAKE_MACOSX_RPATH=FALSE"
          cmake_flags+=" -DCMAKE_BUILD_WITH_INSTALL_RPATH=FALSE"
          echo "cmake_flags=${cmake_flags}" >> $GITHUB_ENV
      - name: Build (third party)
        if: matrix.cmake_generator != 'Ninja'
        run: |
          set -x
          mkdir -p build-third_party
          mkdir -p third_party_install
          cd build-third_party
          cmake .. ${{ env.cmake_flags }} -DTHIRD_PARTY=ON -DONEFLOW=OFF -DTHIRD_PARTY_DIR=$PWD/../third_party_install
          cmake --build . -j $(nproc)
      - name: Build (oneflow)
        if: matrix.cmake_generator != 'Ninja'
        run: |
          mkdir -p build
          cd build
          cmake .. ${{ env.cmake_flags }} -DTHIRD_PARTY=OFF -DONEFLOW=ON -DTHIRD_PARTY_DIR=$PWD/../third_party_install
          cmake --build . -j ${{ matrix.make_concurrency }} --target oneflow
      - name: Build (oneflow_internal)
        if: always() && matrix.cmake_generator != 'Ninja'
        run: |
          mkdir -p build
          cd build
          cmake .. ${{ env.cmake_flags }} -DTHIRD_PARTY=OFF -DONEFLOW=ON
          cmake --build . -j ${{ matrix.make_concurrency }} --target oneflow_internal
      - name: Build (oneflow_py)
        if: always() && matrix.cmake_generator != 'Ninja'
        run: |
          mkdir -p build
          cd build
          cmake .. ${{ env.cmake_flags }} -DTHIRD_PARTY=OFF -DONEFLOW=ON
          cmake --build . -j ${{ matrix.make_concurrency }} --target oneflow_py
      - name: Build (oneflow_testexe)
        if: always() && matrix.cmake_generator != 'Ninja'
        run: |
          mkdir -p build
          cd build
          cmake .. ${{ env.cmake_flags }} -DTHIRD_PARTY=OFF -DONEFLOW=ON
          cmake --build . -j ${{ matrix.make_concurrency }} --target oneflow_testexe
      - name: Build (ALL)
        if: always()
        continue-on-error: ${{ startsWith(runner.os, 'macOS') && matrix.cmake_generator == 'Ninja' && matrix.build_shared_libs == 'ON' }}
        run: |
          mkdir -p build
          cd build
          cmake .. ${{ env.cmake_flags }}
          cmake --build . -j ${{ matrix.make_concurrency }}
      - name: Exe test
        if: always()
        continue-on-error: true
        run: |
          ulimit -c
          ulimit -c unlimited
          ulimit -c
          mkdir -p build
          cd build
          ./bin/oneflow_testexe
      - name: Op test
        if: always()
        continue-on-error: true
        run: |
          ulimit -c
          ulimit -c unlimited
          ulimit -c
          source build/source.sh
          ONEFLOW_TEST_GITHUB_HOSTED=1 ONEFLOW_TEST_CPU_ONLY=1 bash ci/test/1node_op_test.sh
      - name: "Tar logs"
        if: always() && contains(github.event.pull_request.labels.*.name, 'need-simple-ci-upload-artifact')
        continue-on-error: true
        run: |
          set -ex
          if [[ -d "${HOME}/oneflow_temp" ]]
          then
              tar -cvf home_oneflow_temp.tar ${HOME}/oneflow_temp
          fi
          if [[ -d "${PWD}/test_tmp_dir" ]]
          then
              tar -cvf cwd_test_tmp_dir.tar ${PWD}/test_tmp_dir
          fi
      - name: Upload logs
        if: always() && contains(github.event.pull_request.labels.*.name, 'need-simple-ci-upload-artifact')
        uses: actions/upload-artifact@v4
        with:
          name: logs-${{ matrix.test_suite }}-${{ matrix.cmake_generator }}-${{ matrix.cmake_build_type }}-shared-${{ matrix.build_shared_libs }}
          path: |
            home_oneflow_temp.tar
            cwd_test_tmp_dir.tar

  conda:
    name: Build with conda
    if: github.ref == 'refs/heads/master' || (github.event.pull_request.draft == false && contains(github.event.pull_request.requested_reviewers.*.login, 'oneflow-ci-bot') && contains(github.event.pull_request.labels.*.name, 'need-simple-ci'))
    runs-on: ubuntu-latest
    strategy:
      fail-fast: true
      max-parallel: 1
      matrix:
        build-type: ["gcc7", "clang10"]
    steps:
      - name: Checkout Oneflow-Inc/oneflow
        uses: actions/checkout@v2
      - name: Checkout Oneflow-Inc/conda-env
        uses: actions/checkout@v2
        with:
          repository: Oneflow-Inc/conda-env
          ref: 30a7f00eb48ee9009d85a848e720823e5054c66b
          path: conda-env
      - uses: Oneflow-Inc/get-oneflow@ci-test-with-cu118
        name: Build with gcc7
        if: ${{ matrix.build-type == 'gcc7'}}
        with:
          cmake-init-cache: cmake/caches/ci/gh-hosted/cpu-gcc.cmake
          oneflow-src: .
          oneflow-build-env: conda
          conda-env-file: conda-env/dev/gcc7/environment-v2.yml
          conda-env-name: oneflow-dev-gcc7-v2
      - uses: Oneflow-Inc/get-oneflow@ci-test-with-cu118
        name: Build with clang10
        if: ${{ matrix.build-type == 'clang10'}}
        with:
          cmake-init-cache: cmake/caches/ci/gh-hosted/cpu-clang.cmake
          oneflow-src: .
          oneflow-build-env: conda
          conda-env-file: conda-env/dev/clang10/environment-v2.yml
          conda-env-name: oneflow-dev-clang10-v2


================================================
FILE: .github/workflows/test.yml
================================================
name: Build and Test CI
on:
  pull_request:
    types: [opened, review_requested, ready_for_review, synchronize, unlocked]
  merge_group:
    types: [checks_requested]

concurrency:
  group: build-and-test-${{ github.ref }}
  cancel-in-progress: true

env:
  OSS_ACCESS_KEY_ID: ${{ secrets.OSS_ACCESS_KEY_ID }}
  OSS_ACCESS_KEY_SECRET: ${{ secrets.OSS_ACCESS_KEY_SECRET }}
  ONEFLOW_TIMEOUT_SECONDS: 90
  ONEFLOW_THRAED_LOCAL_CACHED_SIZE: 16384
  FLOW_VISION_SRC: flow_vision
  FLOW_VISION_COMMIT: ca8ebc663b58667cf8cd1b6ef0c861522780b7bb
  LIBAI_SRC: libai
  LIBAI_COMMIT: 94eb85ff0131e8dfce953a3a916de7a4f897c647
  ONEFLOW_FACE_SRC: oneflow_face
  ONEFLOW_FACE_COMMIT: 110a97e8d5737a1f1856281a7df556a5ac8f06de
  ONEFLOW_IREE_SRC: oneflow_iree
  ONEFLOW_IREE_COMMIT: 42fd479de7047e6af1d42c6e62b9b056e0a762aa
  ONE_FX_SRC: one-fx
  ONE_FX_COMMIT: da4051c7f1ace7a20b3f54395b580cd102fc99da
  TEST_WITH_TORCH_IMG_TAG: registry.cn-beijing.aliyuncs.com/oneflow/test-with-pytorch-1.10.0-cuda11.3-cudnn8-runtime:25817b5c0e1dd79bef8fdd43d729b98af381e7d5
  MLIR_DOCKER_ARGS: "-e ONEFLOW_MLIR_ENABLE_ROUND_TRIP=1 -e ONEFLOW_MLIR_PREFER_NHWC=0 -e ONEFLOW_MLIR_ENABLE_INFERENCE_OPTIMIZATION=1"
  SSH_TANK_HOST: 192.168.1.40
  SSH_TANK_PATH: /data/tank

jobs:
  source_info:
    name: Collect information about PR and source
    runs-on: ubuntu-22.04
    if: github.event.pull_request.draft == false && github.base_ref == 'master'
    steps:
      - name: Check out OneFlow
        uses: actions/checkout@v2
        with:
          ref: ${{ github.event.pull_request.head.sha }}
          repository: ${{github.event.pull_request.head.repo.full_name}}
          fetch-depth: 0
      - name: Python diff
        id: py-diff
        run: |
          ONEFLOW_TEST_FILES="$(git diff --diff-filter=d --name-only ${{ github.event.pull_request.base.sha }}  -- python/oneflow/test/**/test_*.py | { grep -v expensive || true; })"
          ONEFLOW_TEST_FILES=$(echo "${ONEFLOW_TEST_FILES}" | xargs)
          if [ -z "$ONEFLOW_TEST_FILES" ]; then
              echo "no changed python tests"
              echo "has_changed_python_tests=false" >> $GITHUB_OUTPUT
          else
              echo "changed python tests: ${ONEFLOW_TEST_FILES}"
              echo "has_changed_python_tests=true" >> $GITHUB_OUTPUT
          fi
          echo "changed_python_tests=${ONEFLOW_TEST_FILES}" >> $GITHUB_OUTPUT
    outputs:
      changed_python_tests: ${{ steps.py-diff.outputs.changed_python_tests }}
      has_changed_python_tests: ${{ steps.py-diff.outputs.has_changed_python_tests }}

  mirror_third_party:
    name: Mirror third party dependencies
    runs-on: ubuntu-22.04
    if: github.event.pull_request.draft == false && github.base_ref == 'master'
    steps:
      - uses: actions/checkout@v2
      - name: Mirror dependencies to aliyun
        if: github.event.pull_request.head.repo.full_name == github.repository
        run: |
          set -x
          if [ -z "$OSS_ACCESS_KEY_ID" ]
          then
            exit 0
          fi
          python3 -m pip install -U pip "setuptools<=68.2.2" wheel
          python3 -m pip install 'cryptography<=3.4' oss2
          python3 tools/package_mirror.py -i $PWD

  check_license_and_format:
    name: License and format
    runs-on: ubuntu-22.04
    if: github.event.pull_request.draft == false
    steps:
      - uses: actions/checkout@v2
        with:
          repository: ${{github.event.pull_request.head.repo.full_name}}
          ref: ${{ github.head_ref }}
      - name: Check license
        id: license_check
        run: |
          python3 ci/check/run_license_format.py -i oneflow -c
          python3 ci/check/run_license_format.py -i python -c
      - name: Add license
        id: license_fmt
        if: ${{ failure() }}
        run: |
          python3 ci/check/run_license_format.py -i oneflow --fix
          python3 ci/check/run_license_format.py -i python --fix
      - name: Check C++/CUDA format
        id: cpp_check
        run: |
          sudo apt install libtinfo5
          python3 ci/check/run_clang_format.py --clang_format_binary clang-format --source_dir oneflow
      - name: Run C++/CUDA format
        id: cpp_fmt
        if: ${{ failure() }}
        run: |
          sudo apt install libtinfo5
          python3 ci/check/run_clang_format.py --clang_format_binary clang-format --source_dir oneflow --fix
      - name: Check Python format
        id: py_check
        run: |
          python3 -m pip install black==19.10b0 click==8.0.0
          python3 ci/check/run_py_format.py --source_dir $PWD
      - name: Run Python Format
        id: py_fmt
        if: ${{ failure() }}
        run: |
          python3 -m pip install black==19.10b0 --user
          python3 ci/check/run_py_format.py --source_dir $PWD --fix
      - name: Check CMake format
        id: cmake_check
        run: |
          python3 -m pip install cmakelang
          python3 ci/check/run_cmake_format.py --source_dir $PWD
      - name: Run CMake Format
        id: cmake_fmt
        if: ${{ failure() }}
        run: |
          python3 -m pip install cmakelang
          python3 ci/check/run_cmake_format.py --source_dir $PWD --fix
      - name: Git push
        id: git_push
        if: ${{ failure() }}
        run: |
          git diff -p > license_and_format.patch
          cat license_and_format.patch
          git config --global user.email "ci-bot@oneflow.org"
          git config --global user.name "oneflow-ci-bot"
          git add -u
          git commit -m "auto format by CI"
          git push
      - name: Upload patch
        if: ${{ failure() && steps.git_push.outcome == 'failure' }}
        uses: actions/upload-artifact@v4
        with:
          name: license_and_format-${{ github.sha }}.patch
          path: license_and_format.patch
      - name: Add comment
        if: ${{ failure() }}
        uses: actions/github-script@v4
        with:
          script: |
            github.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: 'Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.'
            })
      - name: Please request CI again
        if: ${{ failure() }}
        run: |
          exit 1
      - name: Check source code (prevent creating files at wrong places)
        run: |
          python3 tools/check_src.py

  find-build-cache:
    name: "Find build cache"
    if: github.event.pull_request.draft == false && github.base_ref == 'master'
    runs-on: ubuntu-latest
    env:
      ONEFLOW_SRC: .
    outputs:
      matrix: ${{ steps.find-cache.outputs.matrix }}
    steps:
      - name: Checkout Oneflow-Inc/oneflow
        uses: actions/checkout@v2
        with:
          ref: ${{ github.event.pull_request.head.sha }}
          repository: ${{github.event.pull_request.head.repo.full_name}}
      - uses: Oneflow-Inc/get-oneflow/cache-complete/matrix/build@ci-test-with-cu118
        name: find cache
        id: find-cache
        timeout-minutes: 5
        with:
          delete-cache: ${{ contains(github.event.pull_request.labels.*.name, 'need-clean-ccache') }}
          runner-labels: |
            self-hosted
            linux
            builder
          oneflow-src: ${{ env.ONEFLOW_SRC }}
          entries: |
            cu118
            cpu
            cpu-asan-ubsan
            cpu-tsan
            llvm15

  build-oneflow:
    name: "Build OneFlow"
    if: github.event.pull_request.draft == false && github.base_ref == 'master'
    runs-on: ${{ matrix.runs-on }}
    needs: [find-build-cache]
    timeout-minutes: 80
    strategy:
      fail-fast: true
      max-parallel: 5
      matrix: ${{ fromJson(needs.find-build-cache.outputs.matrix) }}
    env:
      ONEFLOW_SRC: .
      MANYLINUX_CACHE_DIR: ~/manylinux-cache-dir/${{ matrix.entry }}
      WHEELHOUSE_DIR: manylinux-wheelhouse
    steps:
      - name: Set proxy
        if: ${{ contains(matrix.runs-on, 'self-hosted') }}
        run: |
          echo "https_proxy=${{ secrets.ONEFLOW_CI_HTTP_PROXY }}" >> $GITHUB_ENV
      - name: Fix permissions
        if: ${{ contains(matrix.runs-on, 'self-hosted') }}
        run: |
          set -x
          docker run --rm -v $PWD:$PWD -w $PWD busybox rm -rf *
      - name: Checkout Oneflow-Inc/oneflow
        uses: actions/checkout@v2
        with:
          ref: ${{ github.event.pull_request.head.sha }}
          repository: ${{github.event.pull_request.head.repo.full_name}}
      - uses: Oneflow-Inc/get-oneflow/cache-complete@ci-test-with-cu118
        name: Save cache if successful
        id: save-cache
        timeout-minutes: 5
        with:
          oneflow-src: ${{ env.ONEFLOW_SRC }}
          entry: ${{ matrix.entry }}
          digest-type: build
          mark-as-completed: ${{ contains(matrix.runs-on, 'self-hosted') && github.event.pull_request.head.repo.full_name == github.repository }}
      - name: Check digest cache result. If this step failed, usually it is caused by new commits pushed when this CI run is running.
        if: ${{ fromJSON(steps.save-cache.outputs.cache-hit) != matrix.cache-hit }}
        run: |
          echo "::error file=test.yml,line=204,col=10::steps.save-cache.outputs.cache-hit != matrix.cache-hit"
          exit 1
      - uses: Oneflow-Inc/get-oneflow@ci-test-with-cu118
        name: Build manylinux ${{ matrix.entry }}
        id: build-cpu
        if: ${{ matrix.entry =='cpu' && !matrix.cache-hit }}
        with:
          cmake-init-cache: ${{ env.ONEFLOW_SRC }}/cmake/caches/ci/cpu.cmake
          build-script: ${{ env.ONEFLOW_SRC }}/ci/manylinux/build.sh
          run-lit: true
          oneflow-src: ${{ env.ONEFLOW_SRC }}
          oneflow-build-env: manylinux
          wheelhouse-dir: ${{ env.WHEELHOUSE_DIR }}
          clear-wheelhouse-dir: true
          self-hosted: ${{ contains(matrix.runs-on, 'self-hosted') }}
          cuda-version: none
          manylinux-cache-dir: ${{ env.MANYLINUX_CACHE_DIR }}
          docker-run-use-system-http-proxy: false
          docker-run-use-lld: true
          retry-failed-build: true
          clean-ccache: ${{ contains(github.event.pull_request.labels.*.name, 'need-clean-ccache') }}
          python-versions: |
            3.7
            3.8
      - uses: Oneflow-Inc/get-oneflow@ci-test-with-cu118
        name: Build manylinux ${{ matrix.entry }}
        id: build-cpu-sanitizers
        if: ${{ (matrix.entry == 'cpu-asan-ubsan' || matrix.entry == 'cpu-tsan') && !matrix.cache-hit && false }}
        with:
          cmake-init-cache: ${{ env.ONEFLOW_SRC }}/cmake/caches/ci/${{ matrix.entry }}.cmake
          build-script: ${{ env.ONEFLOW_SRC }}/ci/manylinux/build.sh
          run-lit: false
          oneflow-src: ${{ env.ONEFLOW_SRC }}
          oneflow-build-env: manylinux
          wheelhouse-dir: ${{ env.WHEELHOUSE_DIR }}
          clear-wheelhouse-dir: true
          self-hosted: ${{ contains(matrix.runs-on, 'self-hosted') }}
          cuda-version: none
          manylinux-cache-dir: ${{ env.MANYLINUX_CACHE_DIR }}
          docker-run-use-system-http-proxy: false
          docker-run-use-lld: true
          retry-failed-build: true
          clean-ccache: ${{ contains(github.event.pull_request.labels.*.name, 'need-clean-ccache') }}
          python-versions: |
            3.8
      - uses: Oneflow-Inc/get-oneflow@ci-test-with-cu118
        name: Build manylinux ${{ matrix.entry }}
        id: build-cuda
        if: ${{ matrix.entry =='cu118' && !matrix.cache-hit }}
        with:
          cmake-init-cache: ${{ env.ONEFLOW_SRC }}/cmake/caches/ci/cuda.cmake
          build-script: ${{ env.ONEFLOW_SRC }}/ci/manylinux/build-gcc9.sh
          oneflow-src: ${{ env.ONEFLOW_SRC }}
          oneflow-build-env: manylinux
          wheelhouse-dir: ${{ env.WHEELHOUSE_DIR }}
          clear-wheelhouse-dir: true
          self-hosted: ${{ contains(matrix.runs-on, 'self-hosted') }}
          cuda-version: "11.8"
          manylinux-cache-dir: ${{ env.MANYLINUX_CACHE_DIR }}
          docker-run-use-system-http-proxy: false
          docker-run-use-lld: false
          retry-failed-build: true
          clean-ccache: ${{ contains(github.event.pull_request.labels.*.name, 'need-clean-ccache') }}
          python-versions: |
            3.7
      - uses: Oneflow-Inc/get-oneflow@ci-test-with-cu118
        name: Build ${{ matrix.entry }}
        if: ${{ matrix.entry == 'llvm15' && !matrix.cache-hit }}
        with:
          cmake-init-cache: ${{ env.ONEFLOW_SRC }}/cmake/caches/ci/llvm/cuda-75-clang.cmake
          build-script: ${{ env.ONEFLOW_SRC }}/ci/clang/build-llvm.sh
          oneflow-src: ${{ env.ONEFLOW_SRC }}
          oneflow-build-env: llvm
          wheelhouse-dir: ${{ env.WHEELHOUSE_DIR }}
          clear-wheelhouse-dir: true
          self-hosted: true
          cuda-version: ${{ env.CUDA_VERSION }}
          manylinux-cache-dir: ${{ env.MANYLINUX_CACHE_DIR }}
          docker-run-use-system-http-proxy: false
          docker-run-use-lld: false
          retry-failed-build: true
          clean-ccache: ${{ contains(github.event.pull_request.labels.*.name, 'need-clean-ccache') }}
          wheel-audit: false
          python-versions: |
            3.8
      - name: Remove automerge
        if: ${{ failure() && contains(matrix.runs-on, 'self-hosted') && cancelled() == false && contains(github.event.pull_request.labels.*.name, 'automerge') }}
        uses: actions/github-script@v4
        with:
          script: |
            github.issues.removeLabel({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              name: 'automerge'
            })
            github.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: 'CI failed when running job: Build ${{ matrix.entry }}. PR label automerge has been removed'
            })
      - name: Upload packed liboneflow
        if: ${{ !fromJson(matrix.cache-hit) && matrix.entry != 'llvm15' && matrix.entry != 'cpu-asan-ubsan' && matrix.entry != 'cpu-tsan' }}
        uses: Oneflow-Inc/get-oneflow/digest/upload@ci-test-with-cu118
        timeout-minutes: 10
        with:
          digest: ${{ steps.save-cache.outputs.build-digest }}
          entry: ${{ matrix.entry }}
          ssh-tank-host: ${{ env.SSH_TANK_HOST }}
          ssh-tank-path: ${{ env.SSH_TANK_PATH }}
          src-dir: ${{ env.MANYLINUX_CACHE_DIR }}/build/cpack
          dst-dir: cpack
      - name: Upload whl
        if: ${{ !fromJson(matrix.cache-hit) && matrix.entry != 'llvm15' && matrix.entry != 'cpu-asan-ubsan' && matrix.entry != 'cpu-tsan' }}
        uses: Oneflow-Inc/get-oneflow/digest/upload@ci-test-with-cu118
        timeout-minutes: 10
        with:
          digest: ${{ steps.save-cache.outputs.build-digest }}
          entry: ${{ matrix.entry }}
          ssh-tank-host: ${{ env.SSH_TANK_HOST }}
          ssh-tank-path: ${{ env.SSH_TANK_PATH }}
          src-dir: ${{ env.WHEELHOUSE_DIR }}
          dst-dir: whl

  find-test-cache-distributed:
    name: "Find test cache (distributed)"
    if: github.event.pull_request.draft == false && github.base_ref == 'master' && contains(github.event.pull_request.labels.*.name, 'need-test-distributed')
    runs-on: ubuntu-latest
    needs: [build-oneflow]
    env:
      ONEFLOW_SRC: .
    outputs:
      matrix: ${{ steps.find-cache.outputs.matrix }}
    steps:
      - name: Checkout Oneflow-Inc/oneflow
        uses: actions/checkout@v2
        with:
          ref: ${{ github.event.pull_request.head.sha }}
          repository: ${{github.event.pull_request.head.repo.full_name}}
      - uses: Oneflow-Inc/get-oneflow/cache-complete/matrix/test@ci-test-with-cu118
        name: find cache
        id: find-cache
        timeout-minutes: 5
        with:
          runner-labels: |
            self-hosted
            linux
          oneflow-src: ${{ env.ONEFLOW_SRC }}
          include-distributed: true
          world-size: 2
          devices: |
            cuda
          tests: |
            module

  find-test-cache:
    name: "Find test cache"
    if: github.event.pull_request.draft == false && github.base_ref == 'master'
    runs-on: ubuntu-latest
    needs: [build-oneflow]
    env:
      ONEFLOW_SRC: .
    outputs:
      matrix: ${{ steps.find-cache.outputs.matrix }}
    steps:
      - name: Checkout Oneflow-Inc/oneflow
        uses: actions/checkout@v2
        with:
          ref: ${{ github.event.pull_request.head.sha }}
          repository: ${{github.event.pull_request.head.repo.full_name}}
      - uses: Oneflow-Inc/get-oneflow/cache-complete/matrix/test@ci-test-with-cu118
        name: find cache
        id: find-cache
        timeout-minutes: 5
        with:
          runner-labels: |
            self-hosted
            linux
          oneflow-src: ${{ env.ONEFLOW_SRC }}
          devices: |
            cuda
            cpu
          tests: |
            module
            misc
            speed-test

  test-distributed:
    name: Distributed test suite
    needs: [find-test-cache-distributed, test]
    runs-on: ${{ matrix.runs-on }}
    timeout-minutes: 120
    if: github.event.pull_request.draft == false && github.base_ref == 'master' && contains(github.event.pull_request.labels.*.name, 'need-test-distributed')
    concurrency:
      group: distributed-test-${{ matrix.entry }}-rank-${{ matrix.rank }}
      cancel-in-progress: false
    strategy:
      fail-fast: true
      max-parallel: 2
      matrix: ${{ fromJson(needs.find-test-cache-distributed.outputs.matrix) }}
    env:
      ONEFLOW_SRC: .
      TEST_CONTAINER_NAME: "ci-test-distributed"
    steps:
      - name: Fix permissions
        if: ${{ contains(matrix.runs-on, 'self-hosted') }}
        run: |
          set -x
          docker run --rm -v $PWD:$PWD -w $PWD busybox rm -rf *
          docker run --rm -v $PWD:$PWD -w $PWD busybox rm -rf .pytest_cache
      - name: Checkout Oneflow-Inc/oneflow
        uses: actions/checkout@v2
        with:
          ref: ${{ github.event.pull_request.head.sha }}
          repository: ${{github.event.pull_request.head.repo.full_name}}
      - name: Checkout Oneflow-Inc/vision
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
        uses: actions/checkout@v2
        with:
          repository: Oneflow-Inc/vision
          # please use a commit here
          ref: ${{ env.FLOW_VISION_COMMIT}}
          path: ${{ env.FLOW_VISION_SRC}}
      - name: Checkout Oneflow-Inc/one-fx
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
        uses: actions/checkout@v2
        with:
          repository: Oneflow-Inc/one-fx
          # please use a commit here
          ref: ${{ env.ONE_FX_COMMIT}}
          path: ${{ env.ONE_FX_SRC}}
      - name: Checkout Oneflow-Inc/libai
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
        uses: actions/checkout@v2
        with:
          repository: Oneflow-Inc/libai
          # please use a commit here
          ref: ${{ env.LIBAI_COMMIT}}
          path: ${{ env.LIBAI_SRC}}
      - name: Checkout Oneflow-Inc/oneflow_iree
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
        uses: actions/checkout@v2
        with:
          repository: Oneflow-Inc/oneflow_iree
          # please use a commit here
          ref: ${{ env.ONEFLOW_IREE_COMMIT}}
          path: ${{ env.ONEFLOW_IREE_SRC}}
      - name: Remove container
        timeout-minutes: 45
        if: ${{ contains(matrix.runs-on, 'self-hosted') }}
        run: |
          docker rm -f ${{ env.TEST_CONTAINER_NAME }} || true
      - uses: Oneflow-Inc/get-oneflow/cache-complete@ci-test-with-cu118
        name: Save cache if successful
        id: save-cache
        timeout-minutes: 5
        with:
          oneflow-src: ${{ env.ONEFLOW_SRC }}
          entry: ${{ matrix.entry }}
          digest-type: ${{ matrix.digest-type }}
          mark-as-completed: ${{ contains(matrix.runs-on, 'self-hosted') && github.event.pull_request.head.repo.full_name == github.repository }}
      - name: Check digest cache result. If this step failed, usually it is caused by new commits pushed when this CI run is running.
        if: ${{ fromJSON(steps.save-cache.outputs.cache-hit) != matrix.cache-hit }}
        run: |
          echo "::error file=test.yml,line=204,col=10::steps.save-cache.outputs.cache-hit != matrix.cache-hit"
          exit 1
      - name: Download wheel and packed liboneflow
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
        uses: Oneflow-Inc/get-oneflow/digest/download@ci-test-with-cu118
        id: download-digest
        timeout-minutes: 10
        with:
          digest: ${{ steps.save-cache.outputs.build-digest }}
          entry: ${{ matrix.compute-platform }}
          ssh-tank-host: ${{ env.SSH_TANK_HOST }}
          ssh-tank-path: ${{ env.SSH_TANK_PATH }}
      - name: Get primary node
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
        uses: Oneflow-Inc/get-oneflow/master-address@ci-test-with-cu118
        id: get-primary-node
        with:
          rank: ${{ matrix.rank }}
      - name: Set environment variables
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
        run: |
          set -x
          extra_docker_args=""
          if [ "${{ matrix.device }}" == "cpu" ]; then
            extra_docker_args+=" --env ONEFLOW_TEST_CPU_ONLY=1"
            extra_docker_args+=" --env CUDA_VISIBLE_DEVICES=-1"
          fi
          echo "EXTRA_DOCKER_ARGS=${extra_docker_args}" >> $GITHUB_ENV
          echo "ONEFLOW_TEST_CACHE_DIR=$HOME/ci-cache/test_cache" >> $GITHUB_ENV
          echo "ONEFLOW_TEST_DATASET_DIR=$HOME/dataset" >> $GITHUB_ENV

          echo "ONEFLOW_WHEEL_PATH=${{ steps.download-digest.outputs.entry-dir }}/whl" >> $GITHUB_ENV
          echo "ONEFLOW_CPACK_PATH=${{ steps.download-digest.outputs.entry-dir }}/cpack" >> $GITHUB_ENV
      - name: Set environment variables (distributed)
        if: ${{ fromJson(matrix.is-distributed) }}
        run: |
          set -x
          EXTRA_DOCKER_ARGS+=" --network host "
          echo "EXTRA_DOCKER_ARGS=${EXTRA_DOCKER_ARGS}" >> $GITHUB_ENV
      - name: Enable ONEFLOW_TEST_VERBOSE
        if: ${{ contains(github.event.pull_request.labels.*.name, 'need-test-verbose') }}
        run: |
          EXTRA_DOCKER_ARGS+=" --env ONEFLOW_TEST_VERBOSE=1"
          echo "EXTRA_DOCKER_ARGS=${EXTRA_DOCKER_ARGS}" >> $GITHUB_ENV
      - name: Start container
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
        working-directory: ${{ env.ONEFLOW_SRC }}
        run: |
          docker run --gpus=all -d --rm --privileged --shm-size=8g \
            --pids-limit 2000 \
            --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
            -v ${ONEFLOW_TEST_DATASET_DIR}:${ONEFLOW_TEST_DATASET_DIR}:ro \
            -v ${ONEFLOW_WHEEL_PATH}:${ONEFLOW_WHEEL_PATH}:ro \
            -v $HOME/test-container-cache/dot-local:/root/.local \
            -v $HOME/test-container-cache/dot-cache:/root/.cache \
            -e NODE_RANK=${{ matrix.rank }} \
            -e _MASTER_ADDR=${{ steps.get-primary-node.outputs.master-address }} \
            -e ONEFLOW_WHEEL_PATH=${ONEFLOW_WHEEL_PATH} \
            -e ONEFLOW_CI=1 \
            -v $PWD:$PWD \
            -w $PWD \
            -v ${ONEFLOW_TEST_CACHE_DIR}:${ONEFLOW_TEST_CACHE_DIR} \
            -e ONEFLOW_TEST_CACHE_DIR=${ONEFLOW_TEST_CACHE_DIR} \
            -e ONEFLOW_TEST_DATASET_DIR=${ONEFLOW_TEST_DATASET_DIR} \
            -e ONEFLOW_TIMEOUT_SECONDS=${{ env.ONEFLOW_TIMEOUT_SECONDS }} \
            -e ONEFLOW_THRAED_LOCAL_CACHED_SIZE=${{ env.ONEFLOW_THRAED_LOCAL_CACHED_SIZE }} \
            ${{ env.MLIR_DOCKER_ARGS }} \
            --name ${TEST_CONTAINER_NAME} \
            ${{ env.EXTRA_DOCKER_ARGS }} \
            ${{ env.TEST_WITH_TORCH_IMG_TAG }} \
            sleep 5400
      - name: Test container
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
        run: |
          docker exec ${{ env.TEST_CONTAINER_NAME }} ls
          docker exec ${{ env.TEST_CONTAINER_NAME }} python3 -m pip list
      - name: Install OneFlow
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
        run: |
          ls ${ONEFLOW_WHEEL_PATH}
          docker exec ${TEST_CONTAINER_NAME} python3 -m pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
          docker exec ${TEST_CONTAINER_NAME} python3 -m pip install --find-links=${ONEFLOW_WHEEL_PATH} oneflow
      - name: Install downstream libs
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
        run: |
          docker exec ${TEST_CONTAINER_NAME} python3 -m pip install -e ${{ env.FLOW_VISION_SRC}}
          docker exec ${TEST_CONTAINER_NAME} python3 -m pip install pybind11 --user
          docker exec ${TEST_CONTAINER_NAME} python3 -m pip install tensorboardX==2.6 --user
          docker exec ${TEST_CONTAINER_NAME} python3 -m pip install -e ${{ env.LIBAI_SRC}}
          docker exec ${TEST_CONTAINER_NAME} python3 -m pip install -e ${{ env.ONEFLOW_IREE_SRC}}
          docker exec ${TEST_CONTAINER_NAME} python3 -m pip install -e ${{ env.ONE_FX_SRC}}
      - name: Module API test (distributed)
        timeout-minutes: 90
        if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'module' && matrix.device == 'cuda' && fromJson(matrix.is-distributed) }}
        continue-on-error: false
        run: |
          docker exec -e ONEFLOW_TEST_DIR=$PWD/python/oneflow/test/modules ${{ env.TEST_CONTAINER_NAME }} bash ci/test/2node_op_test_multi_client.sh
      - name: Module API test (distributed, without IB)
        timeout-minutes: 60
        if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'module' && matrix.device == 'cuda' && fromJson(matrix.is-distributed) && contains(github.event.pull_request.labels.*.name, 'need-distributed-without-ib')}}
        continue-on-error: false
        run: |
          docker exec -e ONEFLOW_TEST_DIR=$PWD/python/oneflow/test/modules \
            -e ONEFLOW_LIBIBVERBS_PATH=invalid_lib \
            -e ONEFLOW_CI_DEVICE_NUMS="4" \
            ${{ env.TEST_CONTAINER_NAME }} bash ci/test/2node_op_test_multi_client.sh
      - name: Print stacks in all core files
        timeout-minutes: 45
        if: ${{ failure() && contains(matrix.runs-on, 'self-hosted') }}
        run: |
          docker exec ${{ env.TEST_CONTAINER_NAME }} bash ci/test/print_stack_in_all_dirs.sh || true
      - name: Remove automerge
        if: ${{ failure() && contains(matrix.runs-on, 'self-hosted') && cancelled() == false && contains(github.event.pull_request.labels.*.name, 'automerge') }}
        uses: actions/github-script@v4
        with:
          script: |
            github.issues.removeLabel({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              name: 'automerge'
            })
            github.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: 'CI failed when running job: ${{ matrix.entry }}. PR label automerge has been removed'
            })
      - name: Remove container
        timeout-minutes: 45
        if: ${{ always() && contains(matrix.runs-on, 'self-hosted') }}
        run: |
          docker rm -f ${{ env.TEST_CONTAINER_NAME }} || true
          docker run --rm -v $PWD:$PWD -w $PWD busybox rm -rf *

  test:
    name: Test suite
    needs: [find-test-cache, source_info]
    timeout-minutes: 120
    runs-on: ${{ matrix.runs-on }}
    if: github.event.pull_request.draft == false && github.base_ref == 'master'
    strategy:
      fail-fast: ${{ !contains(github.event.pull_request.labels.*.name, 'need-all-tests-even-fail') }}
      max-parallel: 10
      matrix: ${{ fromJson(needs.find-test-cache.outputs.matrix) }}
    env:
      ONEFLOW_SRC: .
      TEST_CONTAINER_NAME: "pr-${{ github.event.pull_request.number }}-run-id-${{ github.run_id }}-${{ matrix.entry }}-test"
      TEST_MANYLINUX_CONTAINER_NAME: "pr-${{ github.event.pull_request.number }}-run-id-${{ github.run_id }}-${{ matrix.entry }}-test-manylinux"
      TEST_WITH_TF_IMG_TAG: registry.cn-beijing.aliyuncs.com/oneflow/test-with-tf-2.3.0:2f831e9354298a11447578e869d983959feb046f
      TEST_MANYLINUX_IMG_TAG: registry.cn-beijing.aliyuncs.com/oneflow/manylinux2014_x86_64_cuda11.8:6455f9b8154333333e6285fde3747aaac4a92929
      METRICS_DIR: metrics
    steps:
      - name: Set proxy
        if: ${{ contains(matrix.runs-on, 'self-hosted') }}
        run: |
          echo "https_proxy=${{ secrets.ONEFLOW_CI_HTTP_PROXY }}" >> $GITHUB_ENV
      - name: Fix permissions
        if: ${{ contains(matrix.runs-on, 'self-hosted') }}
        run: |
          set -x
          docker run --rm -v $PWD:$PWD -w $PWD busybox rm -rf *
          docker run --rm -v $PWD:$PWD -w $PWD busybox rm -rf .pytest_cache
      - name: Checkout Oneflow-Inc/oneflow
        uses: actions/checkout@v2
        with:
          ref: ${{ github.event.pull_request.head.sha }}
          repository: ${{github.event.pull_request.head.repo.full_name}}
      - name: Checkout Oneflow-Inc/vision
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
        uses: actions/checkout@v2
        with:
          repository: Oneflow-Inc/vision
          # please use a commit here
          ref: ${{ env.FLOW_VISION_COMMIT}}
          path: ${{ env.FLOW_VISION_SRC}}
      - name: Checkout Oneflow-Inc/libai
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
        uses: actions/checkout@v2
        with:
          repository: Oneflow-Inc/libai
          # please use a commit here
          ref: ${{ env.LIBAI_COMMIT}}
          path: ${{ env.LIBAI_SRC}}
      - name: Checkout Oneflow-Inc/oneflow_face
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
        uses: actions/checkout@v2
        with:
          repository: Oneflow-Inc/oneflow_face
          # please use a commit here
          ref: ${{ env.ONEFLOW_FACE_COMMIT}}
          path: ${{ env.ONEFLOW_FACE_SRC}}
      - name: Checkout Oneflow-Inc/oneflow_iree
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
        uses: actions/checkout@v2
        with:
          repository: Oneflow-Inc/oneflow_iree
          # please use a commit here
          ref: ${{ env.ONEFLOW_IREE_COMMIT}}
          path: ${{ env.ONEFLOW_IREE_SRC}}
      - name: Checkout Oneflow-Inc/one-fx
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
        uses: actions/checkout@v2
        with:
          repository: Oneflow-Inc/one-fx
          # please use a commit here
          ref: ${{ env.ONE_FX_COMMIT}}
          path: ${{ env.ONE_FX_SRC}}
      - name: Remove container
        timeout-minutes: 45
        if: ${{ contains(matrix.runs-on, 'self-hosted') }}
        run: |
          docker rm -f ${{ env.TEST_CONTAINER_NAME }} || true
      - name: Remove manylinux container
        timeout-minutes: 45
        if: ${{ contains(matrix.runs-on, 'self-hosted') }}
        run: |
          docker rm -f ${{ env.TEST_MANYLINUX_CONTAINER_NAME }} || true
      - uses: Oneflow-Inc/get-oneflow/cache-complete@ci-test-with-cu118
        name: Save cache if successful
        id: save-cache
        timeout-minutes: 5
        with:
          oneflow-src: ${{ env.ONEFLOW_SRC }}
          entry: ${{ matrix.entry }}
          digest-type: ${{ matrix.digest-type }}
          mark-as-completed: ${{ contains(matrix.runs-on, 'self-hosted') && github.event.pull_request.head.repo.full_name == github.repository }}
      - name: Check digest cache result. If this step failed, usually it is caused by new commits pushed when this CI run is running.
        if: ${{ fromJSON(steps.save-cache.outputs.cache-hit) != matrix.cache-hit }}
        run: |
          echo "::error file=test.yml,line=204,col=10::steps.save-cache.outputs.cache-hit != matrix.cache-hit"
          exit 1
      - name: Download wheel and packed liboneflow
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
        uses: Oneflow-Inc/get-oneflow/digest/download@ci-test-with-cu118
        id: download-digest
        timeout-minutes: 10
        with:
          digest: ${{ steps.save-cache.outputs.build-digest }}
          entry: ${{ matrix.compute-platform }}
          ssh-tank-host: ${{ env.SSH_TANK_HOST }}
          ssh-tank-path: ${{ env.SSH_TANK_PATH }}
      - name: Download ASAN and UBSAN wheel and packed liboneflow
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') && matrix.device == 'cpu' && false }}
        uses: Oneflow-Inc/get-oneflow/digest/download@ci-test-with-cu118
        id: asan-ubsan-download-digest
        timeout-minutes: 10
        with:
          digest: ${{ steps.save-cache.outputs.build-digest }}
          entry: cpu-asan-ubsan
          ssh-tank-host: ${{ env.SSH_TANK_HOST }}
          ssh-tank-path: ${{ env.SSH_TANK_PATH }}
      - name: Download TSAN wheel and packed liboneflow
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') && matrix.device == 'cpu' && false }}
        uses: Oneflow-Inc/get-oneflow/digest/download@ci-test-with-cu118
        id: tsan-download-digest
        timeout-minutes: 10
        with:
          digest: ${{ steps.save-cache.outputs.build-digest }}
          entry: cpu-tsan
          ssh-tank-host: ${{ env.SSH_TANK_HOST }}
          ssh-tank-path: ${{ env.SSH_TANK_PATH }}
      - name: Enable TF container
        if: ${{ fromJSON(matrix.is-single-client) }}
        run: |
          echo "TEST_IMG_TAG=${TEST_WITH_TF_IMG_TAG}" >> $GITHUB_ENV
      - name: Enable Pytorch container
        if: ${{ !fromJSON(matrix.is-single-client) }}
        run: |
          echo "TEST_IMG_TAG=${TEST_WITH_TORCH_IMG_TAG}" >> $GITHUB_ENV
      - name: Set environment variables
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
        run: |
          set -x
          extra_docker_args=""
          if [ "${{ matrix.device }}" == "cpu" ]; then
            extra_docker_args+=" --env ONEFLOW_TEST_CPU_ONLY=1"
            extra_docker_args+=" --env CUDA_VISIBLE_DEVICES=-1"
          fi
          echo "EXTRA_DOCKER_ARGS=${extra_docker_args}" >> $GITHUB_ENV
          echo "ONEFLOW_TEST_CACHE_DIR=$HOME/ci-cache/test_cache" >> $GITHUB_ENV
          echo "ONEFLOW_TEST_DATASET_DIR=$HOME/dataset" >> $GITHUB_ENV

          echo "ONEFLOW_WHEEL_PATH=${{ steps.download-digest.outputs.entry-dir }}/whl" >> $GITHUB_ENV
          echo "ONEFLOW_CPACK_PATH=${{ steps.download-digest.outputs.entry-dir }}/cpack" >> $GITHUB_ENV
          echo "DOCS_PATH=docs/${{ github.repository }}/pr/${{ github.event.pull_request.number }}" >> $GITHUB_ENV
      - name: Set environment variables (experimental flags)
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') && fromJson(matrix.is-experimental) }}
        run: |
          EXTRA_DOCKER_ARGS+=" --env ONEFLOW_KERNEL_ENABLE_CUDA_GRAPH=1"
          EXTRA_DOCKER_ARGS+=" --env ONEFLOW_THREAD_ENABLE_LOCAL_MESSAGE_QUEUE=1"
          EXTRA_DOCKER_ARGS+=" --env ONEFLOW_KERNEL_DISABLE_BLOB_ACCESS_CHECKER=1"
          echo "EXTRA_DOCKER_ARGS=${EXTRA_DOCKER_ARGS}" >> $GITHUB_ENV
      - name: Set Thread Limit (CPU)
        if: ${{ !fromJson(matrix.cache-hit) && matrix.device == 'cpu' }}
        run: |
          echo "THREAD_LIMIT=25000" >> $GITHUB_ENV
      - name: Set Thread Limit (CUDA)
        if: ${{ !fromJson(matrix.cache-hit) && matrix.device == 'cuda' }}
        run: |
          echo "THREAD_LIMIT=20000" >> $GITHUB_ENV
      - name: Enable ONEFLOW_TEST_VERBOSE
        if: ${{ contains(github.event.pull_request.labels.*.name, 'need-test-verbose') }}
        run: |
          EXTRA_DOCKER_ARGS+=" --env ONEFLOW_TEST_VERBOSE=1"
          echo "EXTRA_DOCKER_ARGS=${EXTRA_DOCKER_ARGS}" >> $GITHUB_ENV
      - name: Pull image
        continue-on-error: true
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
        run: |
          docker pull ${{ env.TEST_IMG_TAG }}
      - name: Unzip packed liboneflow
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') && !fromJson(matrix.is-xla) }}
        run: |
          unzip ${{ env.ONEFLOW_CPACK_PATH }}/liboneflow-ci-linux.zip
      - name: Unzip packed sanitized liboneflow
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') && !fromJson(matrix.is-xla) && matrix.device == 'cpu' && false }}
        run: |
          unzip ${{ steps.asan-ubsan-download-digest.outputs.entry-dir }}/cpack/liboneflow-ci-linux.zip -d asan-ubsan
          unzip ${{ steps.tsan-download-digest.outputs.entry-dir }}/cpack/liboneflow-ci-linux.zip -d tsan
      - name: Start container
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
        working-directory: ${{ env.ONEFLOW_SRC }}
        run: |
          docker run --gpus=all -d --rm --privileged --shm-size=8g \
            --pids-limit ${{ env.THREAD_LIMIT }} \
            --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
            -v ${ONEFLOW_TEST_DATASET_DIR}:${ONEFLOW_TEST_DATASET_DIR}:ro \
            -v ${ONEFLOW_WHEEL_PATH}:${ONEFLOW_WHEEL_PATH}:ro \
            -v $HOME/test-container-cache/dot-local:/root/.local \
            -v $HOME/test-container-cache/dot-cache:/root/.cache \
            -e ONEFLOW_WHEEL_PATH=${ONEFLOW_WHEEL_PATH} \
            -e ONEFLOW_CI=1 \
            -e NVIDIA_TF32_OVERRIDE=0 \
            -e NCCL_P2P_DISABLE=1 \
            -v $PWD:$PWD \
            -w $PWD \
            -v ${ONEFLOW_TEST_CACHE_DIR}:${ONEFLOW_TEST_CACHE_DIR} \
            -e ONEFLOW_TEST_CACHE_DIR=${ONEFLOW_TEST_CACHE_DIR} \
            -e ONEFLOW_TEST_DATASET_DIR=${ONEFLOW_TEST_DATASET_DIR} \
            -e ONEFLOW_TIMEOUT_SECONDS=${{ env.ONEFLOW_TIMEOUT_SECONDS }} \
            -e ONEFLOW_THRAED_LOCAL_CACHED_SIZE=${{ env.ONEFLOW_THRAED_LOCAL_CACHED_SIZE }} \
            ${{ env.MLIR_DOCKER_ARGS }} \
            --name ${TEST_CONTAINER_NAME} \
            ${{ env.EXTRA_DOCKER_ARGS }} \
            ${{ env.TEST_IMG_TAG }} \
            sleep 7200
      - name: Start manylinux container
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
        working-directory: ${{ env.ONEFLOW_SRC }}
        # For unknown reason we need to disable the requirement from nvidia docker
        # by -e NVIDIA_DISABLE_REQUIRE=true
        run: |
          docker run --gpus=all -d --rm --privileged --shm-size=8g \
            --pids-limit ${{ env.THREAD_LIMIT }} \
            --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
            -v ${ONEFLOW_TEST_DATASET_DIR}:${ONEFLOW_TEST_DATASET_DIR}:ro \
            -v ${ONEFLOW_WHEEL_PATH}:${ONEFLOW_WHEEL_PATH}:ro \
            -v $HOME/test-container-cache/dot-local:/root/.local \
            -v $HOME/test-container-cache/dot-cache:/root/.cache \
            -e NVIDIA_DISABLE_REQUIRE=true \
            -e ONEFLOW_WHEEL_PATH=${ONEFLOW_WHEEL_PATH} \
            -e ONEFLOW_CI=1 \
            -v $PWD:$PWD \
            -w $PWD \
            -v ${ONEFLOW_TEST_CACHE_DIR}:${ONEFLOW_TEST_CACHE_DIR} \
            -e ONEFLOW_TEST_CACHE_DIR=${ONEFLOW_TEST_CACHE_DIR} \
            -e ONEFLOW_TEST_DATASET_DIR=${ONEFLOW_TEST_DATASET_DIR} \
            -e ONEFLOW_TIMEOUT_SECONDS=${{ env.ONEFLOW_TIMEOUT_SECONDS }} \
            -e ONEFLOW_THRAED_LOCAL_CACHED_SIZE=${{ env.ONEFLOW_THRAED_LOCAL_CACHED_SIZE }} \
            ${{ env.MLIR_DOCKER_ARGS }} \
            --name ${TEST_MANYLINUX_CONTAINER_NAME} \
            ${{ env.EXTRA_DOCKER_ARGS }} \
            ${{ env.TEST_MANYLINUX_IMG_TAG }} \
            sleep 7200
      - name: Exe test
        if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'misc' }}
        timeout-minutes: 20
        run: |
          docker exec ${{ env.TEST_MANYLINUX_CONTAINER_NAME }} ./liboneflow-ci-linux/bin/oneflow_testexe
      - name: Exe test (C++ API)
        if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'misc' }}
        timeout-minutes: 20
        run: |
          docker exec -e ONEFLOW_SERVING_DEBUG=1 ${{ env.TEST_MANYLINUX_CONTAINER_NAME }} ./liboneflow-ci-linux/bin/oneflow_cpp_api_testexe --gtest_filter=-Api.embedding*
      - name: Exe test (C++ API with sanitizers)
        if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'misc' && matrix.device == 'cpu' && false }}
        timeout-minutes: 10
        run: |
          docker exec -e UBSAN_OPTIONS=suppressions=.ubsan-suppressions -e ASAN_OPTIONS=strict_string_checks=1:detect_stack_use_after_return=1 -e LSAN_OPTIONS=suppressions=.lsan-suppressions ${{ env.TEST_MANYLINUX_CONTAINER_NAME }} ./asan-ubsan/liboneflow-ci-linux/bin/oneflow_cpp_api_testexe --gtest_filter=Api.graph_\*
          # Run 5 times to avoid false positive because of occasional lack of stack info
          docker exec -e TSAN_OPTIONS="history_size=7 suppressions=.tsan-suppressions" ${{ env.TEST_MANYLINUX_CONTAINER_NAME }} bash -c "./tsan/liboneflow-ci-linux/bin/oneflow_cpp_api_testexe || ./tsan/liboneflow-ci-linux/bin/oneflow_cpp_api_testexe || ./tsan/liboneflow-ci-linux/bin/oneflow_cpp_api_testexe || ./tsan/liboneflow-ci-linux/bin/oneflow_cpp_api_testexe || ./tsan/liboneflow-ci-linux/bin/oneflow_cpp_api_testexe"
      - name: Test container
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
        run: |
          docker exec ${{ env.TEST_CONTAINER_NAME }} ls
          docker exec ${{ env.TEST_CONTAINER_NAME }} python3 -m pip list
      - name: Install OneFlow
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
        run: |
          ls ${ONEFLOW_WHEEL_PATH}
          docker exec ${TEST_CONTAINER_NAME} python3 -m pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
          docker exec ${TEST_CONTAINER_NAME} python3 -m pip install -U --find-links=${ONEFLOW_WHEEL_PATH} oneflow
      - name: Install downstream libs
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
        run: |
          docker exec ${TEST_CONTAINER_NAME} python3 -m pip install -e ${{ env.FLOW_VISION_SRC}}
          docker exec ${TEST_CONTAINER_NAME} python3 -m pip install pybind11 --user
          docker exec ${TEST_CONTAINER_NAME} python3 -m pip install tensorboardX==2.6 --user
          docker exec ${TEST_CONTAINER_NAME} python3 -m pip install -e ${{ env.LIBAI_SRC}}
          docker exec ${TEST_CONTAINER_NAME} python3 -m pip install -e ${{ env.ONEFLOW_FACE_SRC}}
          docker exec ${TEST_CONTAINER_NAME} python3 -m pip install -e ${{ env.ONEFLOW_IREE_SRC}}
          docker exec ${TEST_CONTAINER_NAME} python3 -m pip install -e ${{ env.ONE_FX_SRC}}
      - name: Run OneFlow doctor
        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
        run: |
          docker exec ${{ env.TEST_CONTAINER_NAME }} python3 -m oneflow --doctor
      - name: Build documentation
        timeout-minutes: 10
        if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'misc' && matrix.device == 'cpu' }}
        run: |
          docker exec ${{ env.TEST_CONTAINER_NAME }} bash ci/test/build_docs.sh
      - name: Upload documentation
        id: upload-docs
        if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'misc' && matrix.device == 'cpu' && github.repository == 'Oneflow-Inc/oneflow' }}
        continue-on-error: true
        uses: ./.github/actions/upload_oss
        with:
          src_path: build-docs/build/html
          oss_dst_path: oss://oneflow-staging/${{ env.DOCS_PATH }}
          oss_access_key_id: ${{ secrets.OSS_ACCESS_KEY_ID }}
          oss_access_key_secret: ${{ secrets.OSS_ACCESS_KEY_SECRET }}
      - name: Post docs url
        if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'misc' && matrix.device == 'cpu' && github.repository == 'Oneflow-Inc/oneflow' && steps.upload-docs.outcome == 'success'	}}
        continue-on-error: true
        uses: actions/github-script@v4
        with:
          script: |
            github.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: "View latest API docs preview at: https://oneflow-staging.oss-cn-beijing.aliyuncs.com/${{ env.DOCS_PATH }}/"
            })
      - name: Doctest
        timeout-minutes: 45
        if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'misc' && matrix.device == 'cuda' }}
        run: |
          docker exec ${{ env.TEST_CONTAINER_NAME }} bash ci/test/doctest.sh
      - name: Checkout Oneflow-Inc/models
        if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'speed-test' && matrix.device == 'cuda' }}
        uses: actions/checkout@v2
        with:
          repository: Oneflow-Inc/models
          ref: d6b2b8260e87541726ed87361171438d258e6a4d
          path: oneflow-models
      - name: ResNet50 Graph DDP test
        id: models-resnet50
        timeout-minutes: 20
        if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'speed-test' && matrix.device == 'cuda' }}
        run: |
          docker exec -e NCCL_DEBUG=INFO -e ONEFLOW_MODELS_DIR=$PWD/oneflow-models ${{ env.TEST_CONTAINER_NAME }} bash ci/test/test_resnet50_graph_ddp.sh
      - name: Speed test
        id: speed
        timeout-minutes: 20
        continue-on-error: ${{ !contains(github.event.pull_request.labels.*.name, 'need-pass-speed-test') }}
        if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'speed-test' && matrix.device == 'cuda' }}
        run: |
          docker exec -e ONEFLOW_MODELS_DIR=$PWD/oneflow-models ${{ env.TEST_CONTAINER_NAME }} bash ci/test/test_speed_multi_client.sh
      - name: Save speed stats
        if: ${{ always() && !fromJson(matrix.cache-hit) && matrix.test-type == 'speed-test' && matrix.device == 'cuda' }}
        run: |
          mkdir -p ${{ env.METRICS_DIR }}
          echo "${{ steps.speed.outputs.stats }}" >> ${{ env.METRICS_DIR }}/speed_stats.txt
      - name: Upload speed stats
        if: ${{ always() && !fromJson(matrix.cache-hit) && matrix.test-type == 'speed-test' && matrix.device == 'cuda' }}
        # must succeed if it is a branch of Oneflow-Inc/oneflow
        continue-on-error: ${{ !(github.repository == 'Oneflow-Inc/oneflow') }}
        uses: ./.github/actions/upload_oss
        with:
          src_path: ${{ env.METRICS_DIR }}
          oss_dst_path: oss://oneflow-log/${{ github.repository }}/metrics/pr/${{ github.event.pull_request.number }}/${{ github.event.pull_request.head.sha }}/${{github.run_id}}
          oss_access_key_id: ${{ secrets.OSS_ACCESS_KEY_ID }}
          oss_access_key_secret: ${{ secrets.OSS_ACCESS_KEY_SECRET }}
      - name: Post speed stats
        if: ${{ always() && !fromJson(matrix.cache-hit) && matrix.test-type == 'speed-test' && matrix.device == 'cuda' }}
        continue-on-error: true
        uses: actions/github-script@v4
        with:
          script: |
            github.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: "<details>\n <summary>Speed stats:</summary>\n\n ``` \n${{ steps.speed.outputs.stats }}\n ``` \n\n</details>".replace(/\\n/g, '\n')
            })
      - name: Run tests in changed files compared to default branch 100 times
        timeout-minutes: 60
        if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'module' && !fromJson(matrix.is-distributed) && steps.py-diff.outputs.has_changed_python_tests }}
        run: |
          docker exec -e ONEFLOW_TEST_DIR=diff \
            -e ONEFLOW_TEST_FILES="${{needs.source_info.outputs.changed_python_tests}}" \
            ${{ env.TEST_CONTAINER_NAME }} bash ci/test/generic_test_multi_client.sh
      - name: Expensive tests (models, cases require exclusive access to GPU)
        timeout-minutes: 45
        if: ${{ !fromJson(matrix.cache-hit) && (matrix.test-type == 'speed-test' || (matrix.test-type == 'misc' && matrix.device == 'cuda')) && !fromJson(matrix.is-distributed) }}
        run: |
          docker exec \
            -e ONEFLOW_TEST_TENSOR_SIZE_LIMIT_MB=1024 \
            -e ONEFLOW_TEST_DIR=$PWD/python/oneflow/test/expensive \
            ${{ env.TEST_CONTAINER_NAME }} bash ci/test/expensive_generic_test_multi_client.sh
      - name: Module API test
        timeout-minutes: 60
        if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'module' && !fromJson(matrix.is-distributed) }}
        run: |
          docker exec -e ONEFLOW_TEST_DIR=$PWD/python/oneflow/test/modules ${{ env.TEST_CONTAINER_NAME }} bash ci/test/generic_test_multi_client.sh
      - name: Graph API test
        timeout-minutes: 45
        if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'misc' }}
        run: |
          docker exec -e ONEFLOW_TEST_DIR=$PWD/python/oneflow/test/graph ${{ env.TEST_CONTAINER_NAME }} bash ci/test/generic_test_multi_client.sh
          docker exec ${{ env.TEST_CONTAINER_NAME }} python3 -m oneflow.distributed.launch --nproc_per_node 8 $PWD/python/oneflow/test/graph/test_neq_device_process_num.py
      - name: libai test
        timeout-minutes: 45
        if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'misc' && matrix.device == 'cuda' }}
        run: |
          docker exec -e ONEFLOW_TEST_DEVICE_NUM=4 -w $PWD/${{ env.LIBAI_SRC }} ${{ env.TEST_CONTAINER_NAME }} python3 -m oneflow.distributed.launch --nproc_per_node 4 -m unittest -f tests/models/test_bert.py
          docker exec -e ONEFLOW_TEST_DEVICE_NUM=4 -w $PWD/${{ env.LIBAI_SRC }} ${{ env.TEST_CONTAINER_NAME }} python3 -m oneflow.distributed.launch --nproc_per_node 4 -m unittest -f tests/models/test_gpt.py
          docker exec -e ONEFLOW_TEST_DEVICE_NUM=4 -w $PWD/${{ env.LIBAI_SRC }} ${{ env.TEST_CONTAINER_NAME }} python3 -m oneflow.distributed.launch --nproc_per_node 4 -m unittest -f tests/models/test_t5.py
          docker exec -e ONEFLOW_TEST_DEVICE_NUM=4 -w $PWD/${{ env.LIBAI_SRC }} ${{ env.TEST_CONTAINER_NAME }} python3 -m oneflow.distributed.launch --nproc_per_node 4 -m unittest -f tests/models/test_vit.py
      - name: oneflow_face test
        timeout-minutes: 30
        if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'misc' && matrix.device == 'cuda' }}
        run: |
          docker exec -e ONEFLOW_TEST_DEVICE_NUM=4 -w $PWD/${{ env.ONEFLOW_FACE_SRC }} ${{ env.TEST_CONTAINER_NAME }} python3 -m oneflow.distributed.launch --nproc_per_node 4 -m pytest tests/train/test_train.py
      - name: oneflow_iree test
        timeout-minutes: 45
        if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'misc'  && false  }}
        run: |
          docker exec -w $PWD/${{ env.ONEFLOW_IREE_SRC }} ${{ env.TEST_CONTAINER_NAME }} python3 -m pytest examples
      - name: IR tests
        timeout-minutes: 45
        if: ${{ !fromJson(matrix.cache-hit) && (matrix.test-type == 'misc' && matrix.device == 'cuda') && !fromJson(matrix.is-distributed) }}
        run: |
          docker exec \
            -e ONEFLOW_TEST_TENSOR_SIZE_LIMIT_MB=1024 \
            ${{ env.TEST_CONTAINER_NAME }} bash ci/test/ir_tests.sh
      - name: Exception API test
        timeout-minutes: 45
        if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'misc' && false }}
        run: docker exec ${{ env.TEST_CONTAINER_NAME }} bash ci/test/multi_client_exception_test.sh
      - name: Misc test
        timeout-minutes: 45
        if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'misc' }}
        run: |
          docker exec -e ONEFLOW_TEST_DIR=$PWD/python/oneflow/test/misc ${{ env.TEST_CONTAINER_NAME }} bash ci/test/generic_test_multi_client.sh
      - name: Dataloader API test
        timeout-minutes: 45
        # TODO(luyang): dataset check fails
        if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'misc' && false}}
        run: |
          docker exec -e ONEFLOW_TEST_DIR=$PWD/python/oneflow/test/dataloader ${{ env.TEST_CONTAINER_NAME }} bash ci/test/generic_test_multi_client.sh
      - name: Tensor API test
        timeout-minutes: 45
        if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'misc' }}
        run: |
          docker exec -e ONEFLOW_TEST_DIR=$PWD/python/oneflow/test/tensor ${{ env.TEST_CONTAINER_NAME }} bash ci/test/generic_test_multi_client.sh
      - name: Test mocking torch by script
        timeout-minutes: 45
        if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'module' }}
        run: |
          docker exec ${{ env.TEST_CONTAINER_NAME }} bash -x ci/test/test_mock_script.sh
      - name: Test mocking torch by function
        timeout-minutes: 45
        if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'module' }}
        run: |
          docker exec ${{ env.TEST_CONTAINER_NAME }} bash -x ci/test/test_mock_function.sh
      - name: Benchmark Test
        timeout-minutes: 100
        if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'benchmark' && matrix.device == 'cuda' }}
        uses: Oneflow-Inc/get-oneflow/pytest-benchmark@ci-test-with-cu118
        with:
          collect-path: ${{ env.FLOW_VISION_SRC }}/benchmark
          container-name: ${{ env.TEST_CONTAINER_NAME }}
          unknown-threshold: 30
          error-threshold: 40
      - name: Remove automerge
        if: ${{ failure() && contains(matrix.runs-on, 'self-hosted') && cancelled() == false && contains(github.event.pull_request.labels.*.name, 'automerge') }}
        uses: actions/github-script@v4
        with:
          script: |
            github.issues.removeLabel({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              name: 'automerge'
            })
            github.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: 'CI failed when running job: ${{ matrix.entry }}. PR label automerge has been removed'
            })
      - name: Print stacks in all core files
        timeout-minutes: 45
        if: ${{ failure() && contains(matrix.runs-on, 'self-hosted') }}
        run: |
          docker exec ${{ env.TEST_CONTAINER_NAME }} bash ci/test/print_stack_in_all_dirs.sh || true
      - name: Query system status
        timeout-minutes: 45
        if: ${{ failure() && contains(matrix.runs-on, 'self-hosted') }}
        run: |
          nvidia-smi || true
          docker ps || true
      - name: Remove container
        timeout-minutes: 45
        if: ${{ always() && contains(matrix.runs-on, 'self-hosted') }}
        run: |
          docker rm -f ${{ env.TEST_CONTAINER_NAME }} || true
      - name: Remove manylinux container
        timeout-minutes: 45
        if: ${{ always() && contains(matrix.runs-on, 'self-hosted') }}
        run: |
          docker rm -f ${{ env.TEST_MANYLINUX_CONTAINER_NAME }} || true
      - name: Clean workspace
        timeout-minutes: 45
        if: ${{ always() && contains(matrix.runs-on, 'self-hosted') }}
        run: |
          docker run --rm -v $PWD:$PWD -w $PWD busybox rm -rf *

  static_analysis_with_clang_on_diff:
    name: Static analysis with clang on diff
    runs-on: ubuntu-22.04
    if: github.event.pull_request.draft == false && github.base_ref == 'master'
    steps:
      - name: Check out OneFlow
        uses: actions/checkout@v2
        with:
          ref: ${{ github.event.pull_request.head.sha }}
          repository: ${{github.event.pull_request.head.repo.full_name}}
          fetch-depth: 0
      - uses: Oneflow-Inc/get-oneflow/cache-complete@ci-test-with-cu118
        name: Save cache if successful
        id: save-cache
        timeout-minutes: 5
        with:
          oneflow-src: .
          entry: static_analysis_with_clang_on_diff
          digest-type: build
          mark-as-completed: ${{ github.event.pull_request.head.repo.full_name == github.repository }}
      - name: Install dependencies
        if: ${{ !fromJSON(steps.save-cache.outputs.cache-hit) }}
        run: |
          sudo apt-get update
          sudo apt-get install -y libopenblas-dev nasm python3-pip ninja-build ccache
      - name: Download OneFlow custom clang-tidy
        if: ${{ !fromJSON(steps.save-cache.outputs.cache-hit) }}
        run: |
          wget https://github.com/Oneflow-Inc/llvm-project/releases/download/maybe-16.0.0/oneflow-clang-tidy-16
          wget https://raw.githubusercontent.com/oneflow-inc/llvm-project/maybe/clang-tools-extra/clang-tidy/tool/clang-tidy-diff.py
          chmod +x oneflow-clang-tidy-16 clang-tidy-diff.py
      - name: Cache third party dir
        uses: actions/cache@v4
        if: ${{ !fromJSON(steps.save-cache.outputs.cache-hit) }}
        with:
          path: ~/.ccache
          key: clang-tidy-diff-third-party-ccache-${{ hashFiles('**/CMakeLists.txt') }}-${{ hashFiles('**/*.cmake') }}
          restore-keys: |
            clang-tidy-diff-third-party-ccache-${{ hashFiles('**/CMakeLists.txt') }}-
            clang-tidy-diff-third-party-ccache-
      - name: Build third party libs and generate files
        if: ${{ !fromJSON(steps.save-cache.outputs.cache-hit) }}
        run: |
          export CCACHE_COMPRESS=true
          export CCACHE_MAXSIZE=500M
          mkdir build
          cd build
          cmake .. -C ../cmake/caches/international/cpu.cmake \
            -DCMAKE_BUILD_TYPE=Release \
            -DBUILD_TESTING=OFF \
            -DCMAKE_C_COMPILER_LAUNCHER=ccache \
            -DCMAKE_CXX_COMPILER_LAUNCHER=ccache
          cmake --build . -j$(nproc) --target oneflow_deps of_protoobj of_functional_obj of_functional_tensor_obj of_op_schema
      - name: Fetch upstream
        if: ${{ !fromJSON(steps.save-cache.outputs.cache-hit) && github.event.pull_request.head.repo.full_name != github.event.pull_request.base.repo.full_name }}
        run: |
          git remote add upstream https://github.com/Oneflow-Inc/oneflow
          git fetch upstream
      - name: Run clang-tidy for modified files
        # use clang as compiler for correct compiler flags
        if: ${{ !fromJSON(steps.save-cache.outputs.cache-hit) }}
        run: |
          sudo apt install clang-12 lldb-12 lld-12 libfuse2
          cd build
          rm CMakeCache.txt
          cmake .. -C ../cmake/caches/international/cpu.cmake \
            -DCMAKE_C_COMPILER=clang-12 \
            -DCMAKE_CXX_COMPILER=clang++-12 \
            -DCMAKE_BUILD_TYPE=Release \
            -DBUILD_TESTING=OFF \
            -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
          cd ..
          git diff -U0 ${{ github.event.pull_request.base.sha }} | ./clang-tidy-diff.py -clang-tidy-binary ./oneflow-clang-tidy-16 -path build -allow-enabling-alpha-checkers -j $(nproc) -p1 -extra-arg="-Xclang" -extra-arg="-analyzer-config" -extra-arg="-Xclang" -extra-arg="aggressive-binary-operation-simplification=true" -warnings-as-errors="$(cat ./ci/check/clang_tidy_warnings_as_errors_on_diff)"
      - name: Check error message absence in changed files
        if: ${{ !fromJSON(steps.save-cache.outputs.cache-hit) && contains(github.event.pull_request.labels.*.name, 'need-check-error-message') }}
        run: |
          git diff -U0 ${{ github.event.pull_request.base.sha }} | ./clang-tidy-diff.py -clang-tidy-binary ./oneflow-clang-tidy-16 -path build -allow-enabling-alpha-checkers -j $(nproc) -p1 -extra-arg="-Xclang" -extra-arg="-analyzer-config" -extra-arg="-Xclang" -extra-arg="aggressive-binary-operation-simplification=true" -checks=-*,maybe-need-error-msg -warnings-as-errors=* -skip-line-filter
      - name: Remove automerge
        if: ${{ !fromJSON(steps.save-cache.outputs.cache-hit) && failure() && cancelled() == false && contains(github.event.pull_request.labels.*.name, 'automerge') }}
        uses: actions/github-script@v4
        with:
          script: |
            github.issues.removeLabel({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              name: 'automerge'
            })
            github.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: 'Static analysis with clang failed. PR label automerge has been removed'
            })


================================================
FILE: .gitignore
================================================
/build
/build-*
/docs/build/
/docs/build-cn/
/docs/source/generated
/cmake-build-*
/dist
/third_party/
/examples/**/oneflow
/benchmark/**/oneflow
log/
*plan
core.*
*.pyc
*.ipynb
/.vscode
/.idea
/manylinux*
wheelhouse/
wheelhouse*
.DS_Store
/tmp_wheel
/oneflow/python/__export_symbols__.py
/oneflow/python/compatibility.py
/oneflow/python/framework/sysconfig_gen.py
/oneflow/python/test/ops/localhost_script_*.sh
.clangd
compile_commands.json
.cache
/oneflow-src.zip
/oneflow_temp
/distributed-tmp
/serving-tmp
test_tmp_dir
unittest-log-*
/oneflow/python
/oneflow/compatible_single_client_python
/benchmarks
/oneflow/python/version.py
/data-test
/tmp
/python/oneflow/test/dataloader/data-test/

/target
saved_model
/devcontainer-cache

op_prof.csv
*.lock


================================================
FILE: .lsan-suppressions
================================================
leak:CommandT


================================================
FILE: .mergify.yml
================================================
pull_request_rules:
  - name: automatic update for PR with label “automerge“
    conditions:
      - "#approved-reviews-by>=2"
      - -conflict # skip conflicts
      - -draft # skip draft PRs
      - label="automerge"
    actions:
      update:
  - name: automatic merge
    conditions:
      - "#approved-reviews-by>=2"
      - -conflict # skip conflicts
      - -draft # skip draft PRs
      - label="automerge"
      - "#commits-behind==0"
      - -closed
    actions:
      merge:
        method: squash


================================================
FILE: .tsan-suppressions
================================================
# These four group of functions are designed to be thread unsafe,
# it's user's responsibility to use them correctly.
race:ThreadUnsafe
race:thread_unsafe
race:flying_instruction_cnt
race:total_erased_instruction_cnt
race:ToShape
# glog
race:google::
# ~basic_string() in DenseElementsAttrToTensor interferes with
# ~~AccessBlobArgCbInstructionPolicy(). Perhaps it's a false
# positive.
race:~basic_string


================================================
FILE: .ubsan-suppressions
================================================
# llvm
vptr:Class.cpp


================================================
FILE: CMakeLists.txt
================================================
# Minimum CMake required
set(CMAKE_POLICY_DEFAULT_CMP0135 NEW)
cmake_minimum_required(VERSION 3.18.0)

set(CMAKE_INSTALL_MESSAGE LAZY CACHE STRING "")
set(CMAKE_EXPORT_COMPILE_COMMANDS ON CACHE BOOL "")

option(THIRD_PARTY "Build third party" ON)
option(ONEFLOW "Build oneflow" ON)

if(NOT THIRD_PARTY AND NOT ONEFLOW)
  message(FATAL_ERROR "at least one of flags THIRD_PARTY and ONEFLOW should be ON")
endif()

option(USE_CLANG_FORMAT "" OFF)
option(USE_CLANG_TIDY "" OFF)
option(BUILD_PYTHON "" ON)
option(BUILD_CPP_API "Option to build OneFlow C++ API (beta)" OFF)
option(BUILD_RDMA "" OFF)
option(BUILD_CUDA "" ON)
option(BUILD_TESTING "" OFF)
option(BUILD_GIT_VERSION "" ON)
option(BUILD_PROFILER "" OFF)
option(BUILD_FOR_CI "" OFF)
option(WITH_COCOAPI "Option to build with COCO API" ON)
option(WITH_ZLIB "" ON)
option(WITH_ONEDNN "" ON)
option(WITH_MLIR "" OFF)
option(WITH_MLIR_CUDA_CODEGEN "" OFF)
option(OF_SOFTMAX_USE_FAST_MATH "" ON)
option(OF_LAYER_NORM_USE_FAST_MATH "" ON)
option(TREAT_WARNINGS_AS_ERRORS "" ON)
option(MAYBE_NEED_ERROR_MSG_CHECK "" OFF)

option(LITE_USE_ASCEND_NPU "" OFF)

# Reference:
# https://medium.com/@alasher/colored-c-compiler-output-with-ninja-clang-gcc-10bfe7f2b949
option(OF_FORCE_COLORED_DIAGNOSTICS "Always produce ANSI-colored diagnostics (GNU/Clang only)." ON)

set(ONEFLOW_CURRENT_VERSION 0.8.1.dev CACHE STRING "")

if(BUILD_FOR_CI)
  set(ONEFLOW_CURRENT_VERSION ci)
endif()

set(LLVM_PROVIDER "in-tree" CACHE STRING "in-tree, install")

if(NOT WITH_MLIR)
  set(LLVM_PROVIDER "install"
      CACHE STRING "in-tree will build LLVM's ALL, not what we want when not building MLIR" FORCE)
endif(NOT WITH_MLIR)

set(RPC_BACKEND "GRPC,LOCAL" CACHE STRING "")
set(THIRD_PARTY_MIRROR "" CACHE STRING "")
set(PIP_INDEX_MIRROR "" CACHE STRING "")
set(CPU_THREADING_RUNTIMES "TBB;OMP" CACHE STRING "")

if(APPLE)
  set(RPC_BACKEND "LOCAL")
  set(BUILD_CUDA OFF)
  set(WITH_COCOAPI OFF)
  set(WITH_ONEDNN OFF)
endif()

set(CUDNN_STATIC OFF CACHE BOOL "")

project(oneflow C CXX)

if(NOT CMAKE_BUILD_TYPE)
  message(STATUS "No build type selected, default to Release")
  set(CMAKE_BUILD_TYPE "Release" CACHE STRING "Build type (default Release)" FORCE)
endif()

if(NOT CMAKE_BUILD_TYPE MATCHES "^(Debug|Release|RelWithDebInfo|MinSizeRel)$")
  message(
    FATAL_ERROR
      "Expected CMAKE_BUILD_TYPE is Debug, Release, RelWithDebInfo or MinSizeRel, got ${CMAKE_BUILD_TYPE}"
  )
endif()

message(STATUS "CMAKE_BUILD_TYPE: ${CMAKE_BUILD_TYPE}")

set(COMPILER_VERSION_ERROR_MSG
    "At least gcc 9, clang 5 or Apple clang 12 is supported. Current version ${CMAKE_CXX_COMPILER_VERSION}."
)

if("${CMAKE_CXX_COMPILER_ID}" STREQUAL "GNU")
  if("${CMAKE_CXX_COMPILER_VERSION}" VERSION_LESS 9)
    message(FATAL_ERROR ${COMPILER_VERSION_ERROR_MSG})
  endif()
elseif("${CMAKE_CXX_COMPILER_ID}" STREQUAL "Clang")
  if("${CMAKE_CXX_COMPILER_VERSION}" VERSION_LESS 5)
    message(FATAL_ERROR ${COMPILER_VERSION_ERROR_MSG})
  endif()
elseif("${CMAKE_CXX_COMPILER_ID}" STREQUAL "AppleClang")
  if("${CMAKE_CXX_COMPILER_VERSION}" VERSION_LESS 12)
    message(FATAL_ERROR ${COMPILER_VERSION_ERROR_MSG})
  endif()
else()
  message(WARNING "Unknown compiler \"${CMAKE_CXX_COMPILER_ID}\".")
endif()

set(oneflow_cmake_dir ${PROJECT_SOURCE_DIR}/cmake)

get_filename_component(real_src_dir "${CMAKE_SOURCE_DIR}" REALPATH)
get_filename_component(real_bin_dir "${CMAKE_BINARY_DIR}" REALPATH)

if("${real_src_dir}" STREQUAL "${real_bin_dir}")
  message(FATAL_ERROR "In-source build not allowed")
endif()

# Modules
list(APPEND CMAKE_MODULE_PATH ${oneflow_cmake_dir}/third_party)
list(APPEND CMAKE_MODULE_PATH ${oneflow_cmake_dir})

include(threading)
include(util)
include(proto2cpp)

if(NOT DEFINED USE_CXX11_ABI)
  check_cxx11_abi(CXX11_ABI_AVAILABLE)
  set(USE_CXX11_ABI ${CXX11_ABI_AVAILABLE})
elseif(USE_CXX11_ABI)
  check_cxx11_abi(CXX11_ABI_AVAILABLE)

  if(NOT CXX11_ABI_AVAILABLE)
    message(FATAL_ERROR "cxx11 abi is not available for current compiler")
  endif()
endif()

message(STATUS "USE_CXX11_ABI: ${USE_CXX11_ABI}")

if(WITH_MLIR)
  add_definitions(-DWITH_MLIR)

  if(WITH_MLIR_CUDA_CODEGEN)
    add_definitions(-DWITH_MLIR_CUDA_CODEGEN)
  endif()
endif()

if(WITH_COCOAPI)
  add_definitions(-DWITH_COCOAPI)
endif()

if(USE_CXX11_ABI)
  add_definitions(-D_GLIBCXX_USE_CXX11_ABI=1)
else()
  add_definitions(-D_GLIBCXX_USE_CXX11_ABI=0)
endif()

if(BUILD_PROFILER)
  add_definitions(-DOF_ENABLE_PROFILER)
endif()

if(OF_SOFTMAX_USE_FAST_MATH)
  add_definitions(-DOF_SOFTMAX_USE_FAST_MATH)
endif()

if(OF_LAYER_NORM_USE_FAST_MATH)
  add_definitions(-DOF_LAYER_NORM_USE_FAST_MATH)
endif()

if(OF_FORCE_COLORED_DIAGNOSTICS)
  add_compile_options(
    $<$<COMPILE_LANGUAGE:CXX>:$<$<CXX_COMPILER_ID:GNU>:-fdiagnostics-color=always>>
    $<$<COMPILE_LANGUAGE:CXX>:$<$<CXX_COMPILER_ID:Clang>:-fcolor-diagnostics>>
    $<$<COMPILE_LANGUAGE:CUDA>:$<$<CUDA_COMPILER_ID:Clang>:-fcolor-diagnostics>>)
endif()

if(RPC_BACKEND MATCHES "GRPC")
  add_definitions(-DRPC_BACKEND_GRPC)
  message(STATUS "RPC backend enabled: gRPC")
  set(SUPPORTED_RPC_BACKEND_FOUND 1)
endif()

if(WITH_ONEDNN)
  add_definitions(-DWITH_ONEDNN)
endif()

add_definitions(-DRPC_BACKEND_LOCAL)
message(STATUS "RPC backend enabled: local")
enable_testing()
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)

set(THIRD_PARTY_DIR "${PROJECT_BINARY_DIR}/third_party_install"
    CACHE PATH "Where to install third party headers and libs")

set(ONEFLOW_PYTHON_DIR "${PROJECT_SOURCE_DIR}/python" CACHE PATH "oneflow python src dir")

include(platform)

if((ENABLE_ASAN OR ENABLE_UBSAN) AND ENABLE_TSAN)
  message(FATAL_ERROR "Only ASAN and UBSAN can be enabled at the same time.")
endif()
if(ENABLE_ASAN)
  add_compile_options(-fsanitize=address -fno-omit-frame-pointer)
  add_link_options(-fsanitize=address -fno-omit-frame-pointer)
endif()
if(ENABLE_UBSAN)
  add_compile_options(-fsanitize=undefined)
  add_link_options(-fsanitize=undefined)
endif()
if(ENABLE_TSAN)
  add_compile_options(-fsanitize=thread)
  add_link_options(-fsanitize=thread)
endif()

if(BUILD_PYTHON)
  set(ONEFLOW_INCLUDE_DIR "${ONEFLOW_PYTHON_DIR}/oneflow/include")
endif(BUILD_PYTHON)

set(CUTLASS_URL
    https://github.com/Oneflow-Inc/cutlass/archive/e6f548d80bfdf1167d66adbbbcfc2ee3394f4777.zip)
use_mirror(VARIABLE CUTLASS_URL URL ${CUTLASS_URL})
set(CUTLASS_MD5 425f8cf064ff47c81124e55490135f5c)

include(cuda)
add_subdirectory(external)
include(third_party)

message(STATUS "CMAKE_CXX_COMPILER_VERSION: " ${CMAKE_CXX_COMPILER_VERSION})

add_custom_target(oneflow_deps ALL DEPENDS prepare_oneflow_third_party)

# skip oneflow cmake to avoid errors caused by the absences of python-dev, proto src
if(ONEFLOW)
  include(oneflow)
endif()

add_subdirectory(ci)


================================================
FILE: LICENSE
================================================
Copyright 2020 The OneFlow Authors. All rights reserved.
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: README.md
================================================
# OneFlow

OneFlow is a deep learning framework designed to be **user-friendly, scalable and efficient**. With OneFlow, it is easy to:

- program a model with [**PyTorch-like API**](https://oneflow.readthedocs.io/en/master/)
- scale a model to n-dimensional-parallel execution with the [**Global Tensor**](https://docs.oneflow.org/en/master/cookies/global_tensor.html)
- accelerate/deploy a model with the [**Graph Compiler**](https://oneflow.readthedocs.io/en/master/graph.html).

[![Simple CI](https://github.com/Oneflow-Inc/oneflow/actions/workflows/simple.yml/badge.svg)](https://github.com/Oneflow-Inc/oneflow/actions/workflows/simple.yml)
[![Nightly Docker Image](https://github.com/Oneflow-Inc/docker-images/actions/workflows/oneflow-nightly.yml/badge.svg)](https://github.com/Oneflow-Inc/docker-images/actions/workflows/oneflow-nightly.yml)
[![Nightly Release](https://github.com/Oneflow-Inc/oneflow/actions/workflows/release.yml/badge.svg)](https://github.com/Oneflow-Inc/oneflow/actions/workflows/release.yml)
[![Documentation](https://readthedocs.org/projects/oneflow/badge/?version=master)](https://oneflow.readthedocs.io/en/master/?badge=master)

## Latest News

- Version 1.0.0 is out!
  - [Full changelog](https://github.com/Oneflow-Inc/oneflow/releases/tag/v1.0.0)

## Publication

- [OneFlow: Redesign the Distributed Deep Learning Framework from Scratch](https://arxiv.org/abs/2110.15032)

## System Requirements

### General
- Linux
- Python 3.7, 3.8, 3.9, 3.10, 3.11

### CUDA
- CUDA arch 60 or above
- CUDA Toolkit version 10.0 or above
- Nvidia driver version 440.33 or above

  OneFlow will work on a minimum supported driver, and any driver beyond. For more information, please refer to [CUDA compatibility documentation](https://docs.nvidia.com/deploy/cuda-compatibility/index.html).

## Install

### Preinstall docker image

```
docker pull oneflowinc/oneflow:nightly-cuda11.8
```

### Pip Install

- (**Highly recommended**) Upgrade pip

  ```
  python3 -m pip install --upgrade pip #--user
  ```

- To install latest stable release of OneFlow with CUDA support:

  ```bash
  python3 -m pip install oneflow
  ```

- To install nightly release of OneFlow with CPU-only support:

  ```bash
  python3 -m pip install --pre oneflow -f https://oneflow-staging.oss-cn-beijing.aliyuncs.com/branch/master/cpu
  ```

- To install nightly release of OneFlow with CUDA support:

  ```bash
  python3 -m pip install --pre oneflow -f https://oneflow-staging.oss-cn-beijing.aliyuncs.com/branch/master/cu118
  ```

  If you are in China, you could run this to have pip download packages from domestic mirror of pypi:
  ```
  python3 -m pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
  ```
  For more information on this, please refer to [pypi 镜像使用帮助](https://mirror.tuna.tsinghua.edu.cn/help/pypi/)

### Install from Source

<details>
<summary>Clone Source Code</summary>

- #### Option 1: Clone source code from GitHub

  ```bash
  git clone https://github.com/Oneflow-Inc/oneflow.git
  ```

- #### Option 2: Download from Aliyun(Only available in China)

  ```bash
  curl https://oneflow-public.oss-cn-beijing.aliyuncs.com/oneflow-src.zip -o oneflow-src.zip
  unzip oneflow-src.zip
  ```

  </details>

<details>
<summary>Build OneFlow</summary>

- Install dependencies
  ```
  apt install -y libopenblas-dev nasm g++ gcc python3-pip cmake autoconf libtool
  ```
  These dependencies are preinstalled in offical conda environment and docker image, you can use the offical conda environment [here](https://github.com/Oneflow-Inc/conda-env) or use the docker image by:
  ```bash
  docker pull oneflowinc/manylinux2014_x86_64_cuda11.2
  ```
- In the root directory of OneFlow source code, run:

  ```
  mkdir build
  cd build
  ```

- Config the project, inside `build` directory:

  - If you are in China

    config for CPU-only like this:

    ```
    cmake .. -C ../cmake/caches/cn/cpu.cmake
    ```

    config for CUDA like this:

    ```
    cmake .. -C ../cmake/caches/cn/cuda.cmake -DCMAKE_CUDA_ARCHITECTURES=80 -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda -DCUDNN_ROOT_DIR=/usr/local/cudnn
    ```

  - If you are not in China

    config for CPU-only like this:

    ```
    cmake .. -C ../cmake/caches/international/cpu.cmake
    ```

    config for CUDA like this:

    ```
    cmake .. -C ../cmake/caches/international/cuda.cmake -DCMAKE_CUDA_ARCHITECTURES=80 -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda -DCUDNN_ROOT_DIR=/usr/local/cudnn
    ```
    Here the DCMAKE\_CUDA\_ARCHITECTURES macro is used to specify the CUDA architecture, and the DCUDA\_TOOLKIT\_ROOT\_DIR and DCUDNN\_ROOT\_DIR macros are used to specify the root path of the CUDA Toolkit and CUDNN.

- Build the project, inside `build` directory, run:

  ```
  make -j$(nproc)
  ```

- Add oneflow to your PYTHONPATH, inside `build` directory, run:

  ```
  source source.sh
  ```

  Please note that this change is not permanent.

- Simple validation

  ```
  python3 -m oneflow --doctor
  ```

  </details>

### Troubleshooting

Please refer to [troubleshooting](docs/source/troubleshooting.md) for common issues you might encounter when compiling and running OneFlow.

## Getting Started

- Please refer to [QUICKSTART](https://docs.oneflow.org/en/master/basics/01_quickstart.html)
- 中文版请参见 [快速上手](https://docs.oneflow.org/master/basics/01_quickstart.html)

## Documentation

- [API Reference](https://oneflow.readthedocs.io/en/master/)
- [Usage & Design Docs](http://docs.oneflow.org/)
- [System Design](https://docs.oneflow.org/en/v0.4.0/basics_topics/essentials_of_oneflow.html)

## Model Zoo and Benchmark

- [Libai(Toolbox for Parallel Training Large-Scale Transformer Models)](https://github.com/Oneflow-Inc/libai)
  - [BERT-large](https://libai.readthedocs.io/en/latest/tutorials/get_started/quick_run.html)
  - [GPT](https://libai.readthedocs.io/en/latest/modules/libai.models.html#id5)
  - [T5](https://libai.readthedocs.io/en/latest/modules/libai.models.html#id4)
  - [VisionTransformer](https://libai.readthedocs.io/en/latest/modules/libai.models.html#id1)
  - [SwinTransformer](https://libai.readthedocs.io/en/latest/modules/libai.models.html#id2)
- [FlowVision(Toolbox for Computer Vision Datasets, SOTA Models and Utils)](https://github.com/Oneflow-Inc/vision)
- [OneFlow-Models(Outdated)](https://github.com/Oneflow-Inc/models)
  - [ResNet-50](https://github.com/Oneflow-Inc/models/tree/main/Vision/classification/image/resnet50)
  - [Wide&Deep](https://github.com/Oneflow-Inc/models/tree/main/RecommenderSystems/wide_and_deep)
- [OneFlow-Benchmark(Outdated)](https://github.com/Oneflow-Inc/OneFlow-Benchmark)

## Communication

- [GitHub issues](https://github.com/Oneflow-Inc/oneflow/issues): any install, bug, feature issues.
- [www.oneflow.org](http://www.oneflow.org): brand related information.

- ### 中文

  - QQ 群: 331883
  - 微信号（加好友入交流群）: OneFlowXZS
  - [知乎](https://www.zhihu.com/org/oneflow-17)

- ### International
  - [Discord](https://discord.gg/4kpjGA5bZY)
  - [Twitter](https://twitter.com/OneFlowNews)
  - [LinkedIn](https://www.linkedin.com/company/oneflow-inc)
  - [Medium](https://oneflow2020.medium.com)

## The Team

OneFlow was originally developed by [OneFlow Inc](http://www.oneflow.org) and [Zhejiang Lab](http://www.zhejianglab.com/).

## License

[Apache License 2.0](LICENSE)


================================================
FILE: ci/CMakeLists.txt
================================================
add_subdirectory(test)


================================================
FILE: ci/build/ensure_img.py
================================================
import os
import argparse
from pathlib import Path
import re
import json
import subprocess


def check_and_download(tag, url):
    img_dir = os.path.join(os.path.expanduser("~"), "imgs")
    if not os.path.exists(img_dir):
        os.makedirs(img_dir)
    returncode = subprocess.run(
        f"docker image inspect {tag}",
        shell=True,
        stdout=subprocess.DEVNULL,
        stderr=subprocess.DEVNULL,
    ).returncode
    if returncode == 0:
        print("[OK]", tag)
    else:
        basename = os.path.basename(url)
        dst = os.path.join(img_dir, basename)
        subprocess.check_call(f"wget -c {url} -O {dst}", shell=True)
        subprocess.check_call(f"docker load -i {dst}", shell=True)
        base = os.path.basename(dst)
        base = os.path.splitext(base)[0]
        base = os.path.splitext(base)[0]
        keep_tag = f"ofkeep:{base}"
        subprocess.check_call(f"docker tag {tag} {keep_tag}", shell=True)


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--create_index", action="store_true", required=False, default=False
    )
    args = parser.parse_args()
    imgs = [
        {
            "tag": "nvidia/cuda:10.0-cudnn7-devel-centos7",
            "url": "https://oneflow-static.oss-cn-beijing.aliyuncs.com/img/nvidiacuda10.0-cudnn7-devel-centos7.tar.gz",
        },
        {
            "tag": "nvidia/cuda:10.1-cudnn7-devel-centos7",
            "url": "https://oneflow-static.oss-cn-beijing.aliyuncs.com/img/nvidiacuda10.1-cudnn7-devel-centos7.tar.gz",
        },
        {
            "tag": "nvidia/cuda:10.2-cudnn7-devel-centos7",
            "url": "https://oneflow-static.oss-cn-beijing.aliyuncs.com/img/nvidiacuda10.2-cudnn7-devel-centos7.tar.gz",
        },
        {
            "tag": "nvidia/cuda:11.0-cudnn8-devel-centos7",
            "url": "https://oneflow-static.oss-cn-beijing.aliyuncs.com/img/nvidiacuda11.0-cudnn8-devel-centos7.tar.gz",
        },
        {
            "tag": "nvidia/cuda:11.1-cudnn8-devel-centos7",
            "url": "https://oneflow-static.oss-cn-beijing.aliyuncs.com/img/nvidiacuda11.1-cudnn8-devel-centos7.tar.gz",
        },
    ]
    for img in imgs:
        check_and_download(img["tag"], img["url"])


================================================
FILE: ci/build/make.sh
================================================
set -ex

src_dir=${ONEFLOW_SRC_DIR:-"$PWD"}
tmp_dir=${ONEFLOW_CI_TMP_DIR:-"$HOME/ci-tmp"}
extra_oneflow_cmake_args=${ONEFLOW_CI_EXTRA_ONEFLOW_CMAKE_ARGS:-""}
package_suffix=${ONEFLOW_CI_PACKAGE_SUFFIX:-""}
cuda_version=${ONEFLOW_CI_CUDA_VERSION:-"10.2"}
python_version_args=${ONEFLOW_CI_PYTHON_VERSION_ARGS:-"--python3.6"}
build_wheel_bash_args=${ONEFLOW_CI_BUILD_WHEEL_BASH_ARGS:-"-l"}
mkdir -p $tmp_dir
docker_tag=${ONEFLOW_CI_DOCKER_TAG:-"oneflow:ci-manylinux2014-cuda10.2"}

docker_proxy_build_args=""
docker_proxy_build_args+="--build-arg http_proxy=${ONEFLOW_CI_HTTP_PROXY} --build-arg https_proxy=${ONEFLOW_CI_HTTPS_PROXY}"
docker_proxy_run_args=""
docker_proxy_run_args+="--env http_proxy=${ONEFLOW_CI_HTTP_PROXY} --env https_proxy=${ONEFLOW_CI_HTTPS_PROXY}"

docker_it=""
if [[ -t 1 ]]; then
    docker_it="-it"
fi

# build manylinux image
cd $src_dir
docker build -f $src_dir/docker/package/manylinux/Dockerfile \
    --build-arg from=nvidia/cuda:${cuda_version}-cudnn7-devel-centos7 \
    $docker_proxy_build_args -t $docker_tag .

cd -

# build function
function build() {
    set -x
    docker run --rm \
        -v $tmp_dir:/ci-tmp \
        -w $tmp_dir:/ci-tmp busybox rm -rf /ci-tmp/wheelhouse
    docker run \
        $docker_proxy_run_args \
        --rm $docker_it \
        -v $src_dir:/oneflow-src \
        -v $tmp_dir:/ci-tmp \
        -w /ci-tmp \
        "$docker_tag" \
        bash ${build_wheel_bash_args} /oneflow-src/docker/package/manylinux/build_wheel.sh \
            ${python_version_args} \
            --house-dir /ci-tmp/wheelhouse \
            --package-name oneflow${package_suffix} \
            $extra_oneflow_cmake_args
}

set +e
# reuse cache
build

# clean cache and retry
cached_build_ret=$?
set -e
if [ $cached_build_ret -ne 0 ] && [[ ! -t 1 ]]; then
    echo "retry after cleaning build dir"
    docker run --rm -v $tmp_dir:/ci-tmp busybox sh -c "rm -rf /ci-tmp/*"
    build
fi


================================================
FILE: ci/check/clang_tidy_warnings_as_errors_on_diff
================================================
*,-maybe-glog-fatal,-clang-analyzer-alpha.*,-clang-analyzer-cplusplus.NewDelete,-clang-diagnostic-*

================================================
FILE: ci/check/lintutils.py
================================================
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

import multiprocessing as mp
import os
from fnmatch import fnmatch
from subprocess import Popen


def chunk(seq, n):
    """
    divide a sequence into equal sized chunks
    (the last chunk may be smaller, but won't be empty)
    """
    chunks = []
    some = []
    for element in seq:
        if len(some) == n:
            chunks.append(some)
            some = []
        some.append(element)
    if len(some) > 0:
        chunks.append(some)
    return chunks


def dechunk(chunks):
    "flatten chunks into a single list"
    seq = []
    for chunk in chunks:
        seq.extend(chunk)
    return seq


def run_parallel(cmds, **kwargs):
    """
    Run each of cmds (with shared **kwargs) using subprocess.Popen
    then wait for all of them to complete.
    Runs batches of multiprocessing.cpu_count() * 2 from cmds
    returns a list of tuples containing each process'
    returncode, stdout, stderr
    """
    complete = []
    for cmds_batch in chunk(cmds, mp.cpu_count() * 2):
        procs_batch = [Popen(cmd, **kwargs) for cmd in cmds_batch]
        for proc in procs_batch:
            stdout, stderr = proc.communicate()
            complete.append((proc.returncode, stdout, stderr))
    return complete


_source_extensions = """
.h
.cc
.cpp
.cu
.cuh
""".split()


def get_sources(source_dir, exclude_globs=[]):
    sources = []
    for directory, subdirs, basenames in os.walk(source_dir):
        for path in [os.path.join(directory, basename) for basename in basenames]:
            # filter out non-source files
            if os.path.splitext(path)[1] not in _source_extensions:
                continue

            path = os.path.abspath(path)

            # filter out files that match the globs in the globs file
            if any([fnmatch(path, glob) for glob in exclude_globs]):
                continue

            sources.append(path)
    return sources


def stdout_pathcolonline(completed_process, filenames):
    """
    given a completed process which may have reported some files as problematic
    by printing the path name followed by ':' then a line number, examine
    stdout and return the set of actually reported file names
    """
    returncode, stdout, stderr = completed_process
    bfilenames = set()
    for filename in filenames:
        bfilenames.add(filename.encode("utf-8") + b":")
    problem_files = set()
    for line in stdout.splitlines():
        for filename in bfilenames:
            if line.startswith(filename):
                problem_files.add(filename.decode("utf-8"))
                bfilenames.remove(filename)
                break
    return problem_files, stdout


================================================
FILE: ci/check/run_clang_format.py
================================================
#!/usr/bin/env python3
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

import asyncio
import argparse
import pathlib
import multiprocessing
import subprocess
import os
import platform


def split_and_print(prefix, text):
    lines = text.decode().splitlines(keepends=True)
    prefixed = ""
    for l in lines:
        prefixed += f"{prefix} {l.strip()}"
    if l.strip():
        print(prefixed, flush=True)


async def handle_stream(stream, cb):
    while True:
        line = await stream.readline()
        if line:
            cb(line)
        else:
            break


async def run_command(cmd=None, dry=False, name=None):
    if dry:
        print(f"[dry] {cmd}")
        return 0
    process = await asyncio.create_subprocess_shell(
        cmd, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE,
    )
    l = lambda x: split_and_print(f"[{name}]" if name else "", x)
    # l = lambda x: x
    await asyncio.gather(
        handle_stream(process.stdout, l), handle_stream(process.stderr, l),
    )
    await process.wait()
    return process.returncode


def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i : i + n]


def check_version(bin):
    try:
        out = subprocess.check_output(["bash", "-c", f"{bin} --version"]).decode()
        print(out)
        return "version 11.0.0" in out
    except:
        return False


def download(dry=False):
    if platform.system() != "Linux":
        raise ValueError("Please install clang format 11.0.0")
    url = "https://oneflow-static.oss-cn-beijing.aliyuncs.com/bin/clang-format/linux-x86/clang-format-11"
    if os.getenv("CI"):
        url = "https://github.com/Oneflow-Inc/oneflow-fmt/raw/master/clang-format/linux-x86/clang-format-11"
    dst_dir = ".cache/bin"
    dst = f"{dst_dir}/clang-format-11"
    if dry:
        if os.path.isfile(dst):
            return dst
        else:
            None
    else:
        assert subprocess.call(f"mkdir -p {dst_dir}", shell=True) == 0
        assert subprocess.call(f"curl -L {url} -o {dst}", shell=True) == 0
        assert subprocess.call(f"chmod +x {dst}", shell=True) == 0
        return dst


if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description="Runs clang-format on all of the source "
        "files. If --fix is specified enforce format by "
        "modifying in place, otherwise compare the output "
        "with the existing file and output any necessary "
        "changes as a patch in unified diff format"
    )
    parser.add_argument(
        "--clang_format_binary",
        required=False,
        help="Path to the clang-format binary.",
        default="clang-format",
    )
    parser.add_argument(
        "--source_dir", required=True, help="Root directory of the source code"
    )
    parser.add_argument(
        "--fix",
        default=False,
        action="store_true",
        help="If specified, will re-format the source "
        "code instead of comparing the re-formatted "
        "output, defaults to %(default)s",
    )
    parser.add_argument(
        "--quiet",
        default=False,
        action="store_true",
        help="If specified, only print errors",
    )
    args = parser.parse_args()
    exts = [".h", ".cc", ".cpp", ".cu", ".cuh"]
    files = filter(
        lambda p: p.suffix in exts, pathlib.Path(args.source_dir).rglob("*"),
    )
    loop = asyncio.get_event_loop()
    files = [str(f) for f in files]
    clang_fmt_args = "-dry-run --Werror"
    if args.fix:
        clang_fmt_args = "-i"
    results = []
    if check_version(args.clang_format_binary) == False:
        downloaded = download(dry=True)
        if downloaded:
            assert check_version(downloaded)
            args.clang_format_binary = downloaded
        else:
            args.clang_format_binary = download()
            assert check_version(args.clang_format_binary)
    for chunk in chunks(files, multiprocessing.cpu_count() * 2):
        promises = [
            run_command(f"{args.clang_format_binary} {clang_fmt_args} {f}")
            for f in chunk
        ]
        chunk_results = loop.run_until_complete(asyncio.gather(*promises))
        results.extend(chunk_results)
    print(len(results), "files checked")
    assert len(results) == len(files)
    for (r, f) in zip(results, files):
        if r != 0:
            print("[fail]", f)
    assert sum(results) == 0


================================================
FILE: ci/check/run_clang_tidy.py
================================================
#!/usr/bin/env python3
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

import asyncio
import argparse
import subprocess
import os
from typing import List, Optional
from pathlib import Path


def split_and_print(prefix, text):
    lines = text.decode().splitlines(keepends=True)
    prefixed = ""
    for l in lines:
        prefixed += f"{prefix} {l.strip()}"
    if l.strip():
        print(prefixed, flush=True)


async def handle_stream(stream, cb):
    while True:
        line = await stream.readline()
        if line:
            cb(line)
        else:
            break


async def run_command(cmd=None, dry=False, name=None):
    if dry:
        print(f"[dry] {cmd}")
        return 0
    process = await asyncio.create_subprocess_shell(
        cmd, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE,
    )
    l = lambda x: split_and_print(f"[{name}]" if name else "", x)
    await asyncio.gather(
        handle_stream(process.stdout, l), handle_stream(process.stderr, l),
    )
    await process.wait()
    return process.returncode


def download(build_dir, dry=False) -> Optional[List[str]]:
    urls = [
        "https://github.com/Oneflow-Inc/llvm-project/releases/download/update-err-msg-checker/clang-tidy-15.AppImage"
        if os.getenv("CI")
        else "https://oneflow-static.oss-cn-beijing.aliyuncs.com/bin/clang-tidy/linux-x86_64/clang-tidy-15.AppImage",
        "https://raw.githubusercontent.com/oneflow-inc/llvm-project/maybe/clang-tools-extra/clang-tidy/tool/clang-tidy-diff.py",
    ]
    dst_dir = f"{build_dir}/cache/bin"
    dst = [f"{dst_dir}/clang-tidy", f"{dst_dir}/clang-tidy-diff.py"]
    if dry:
        if os.path.isfile(dst[0]) and os.path.isfile(dst[1]):
            return dst
        else:
            None
    else:
        assert subprocess.call(f"mkdir -p {dst_dir}", shell=True) == 0
        for i, _dst in enumerate(dst):
            assert subprocess.call(f"curl -L {urls[i]} -o {_dst}", shell=True) == 0
            assert subprocess.call(f"chmod +x {_dst}", shell=True) == 0
        return dst


if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description="Runs clang-tidy on all of the source files."
    )
    parser.add_argument(
        "--build_dir", required=True,
    )
    parser.add_argument(
        "--check-error-msg", action="store_true", default=False,
    )
    args = parser.parse_args()
    loop = asyncio.get_event_loop()
    downloaded = download(args.build_dir, dry=True)
    if downloaded is None:
        downloaded = download(args.build_dir)
    assert downloaded is not None
    warnings_as_errors = (
        (Path(__file__).parent / "clang_tidy_warnings_as_errors_on_diff")
        .read_text()
        .strip()
    )
    cmd = f"git diff -U0 master | {downloaded[1]} -clang-tidy-binary {downloaded[0]} -path {args.build_dir} -j $(nproc) -p1 -allow-enabling-alpha-checkers -extra-arg=-Xclang -extra-arg=-analyzer-config -extra-arg=-Xclang -extra-arg=aggressive-binary-operation-simplification=true"
    if args.check_error_msg:
        command = f" cd .. && {cmd} -warnings-as-errors='{warnings_as_errors}' && {cmd} -checks=-*,maybe-need-error-msg -warnings-as-errors=* -skip-line-filter"
    else:
        command = f"cd .. && {cmd} -warnings-as-errors='{warnings_as_errors}'"

    ret_code = loop.run_until_complete(run_command(command))
    exit(ret_code)


================================================
FILE: ci/check/run_cmake_format.py
================================================
from subprocess import call
from argparse import ArgumentParser
from glob import glob
from pathlib import Path
from multiprocessing.pool import ThreadPool
from multiprocessing import cpu_count

if __name__ == "__main__":
    parser = ArgumentParser(
        description="Runs cmake-format on all of the cmake source files."
    )

    parser.add_argument(
        "--bin", default="cmake-format", help="Path of cmake-format binary"
    )
    parser.add_argument(
        "--fix", default=False, action="store_true", help="Format all sources in place"
    )
    parser.add_argument(
        "--source_dir", default=".", help="Root directory of the source code"
    )
    parser.add_argument(
        "-j",
        "--jobs",
        type=int,
        default=cpu_count(),
        help="Specifies the number of jobs (commands) to run simultaneously",
    )

    args = parser.parse_args()

    patterns = [
        "cmake/**/*.cmake",
        "oneflow/**/*.cmake",
        "oneflow/**/CMakeLists.txt",
        "tools/**/*.cmake",
        "tools/**/CMakeLists.txt",
        "CMakeLists.txt",
    ]

    files = []
    for pattern in patterns:
        files.extend(glob(str(Path(args.source_dir) / pattern), recursive=True))

    def gen_cmd(file):
        cmd = [args.bin, file]
        cmd.append("-i" if args.fix else "--check")
        return cmd

    tp = ThreadPool(args.jobs)
    res = tp.map_async(call, [gen_cmd(file) for file in files])

    tp.close()
    tp.join()

    count = sum(map(lambda x: 0 if x == 0 else 1, res.get()))
    total = len(files)
    if args.fix:
        print(f"cmake-format -i done. {total} total")
    else:
        print(f"cmake-format --check done. {count} failed / {total} total")

    exit(0 if count == 0 else 1)


================================================
FILE: ci/check/run_license_format.py
================================================
import argparse
import os
import glob
from multiprocessing import Pool

LICENSE_TXT = """Copyright 2020 The OneFlow Authors. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
"""

CPP_TXT = "/*\n{}*/\n".format(LICENSE_TXT)
PY_TXT = '"""\n{}"""\n'.format(LICENSE_TXT)


def get_txt(path: str):
    if path.endswith((".cpp", ".h", ".hpp", ".cu", ".cuh")):
        return CPP_TXT
    elif path.endswith((".py")):
        return PY_TXT
    else:
        return None


def check_file(path):
    with open(path, "r", encoding="utf-8") as f:
        content = f.read()
        txt = get_txt(path)
        if (
            "import doctest" in content
            and "raise_on_error=True" not in content
            and "doctest.DebugRunner" not in content
        ):
            return ("please add 'doctest.testmod(raise_on_error=True)'", content)
        elif content.count("The OneFlow Authors. All rights reserved.") > 1:
            return ("license_duplicated", content)
        elif content.startswith(txt) or (not content):
            return ("ok", content)
        elif content.startswith(txt) == False:
            return ("license_absent", content)


def format_file(path):
    txt = get_txt(path)
    with open(path, "r", encoding="utf-8") as r:
        content = r.read()
    format_status, content = check_file(path)
    if format_status == "ok":
        return True
    elif format_status == "license_absent":
        with open(path, "w") as w:
            new_content = txt + content
            w.write(new_content)
        return False
    else:
        raise ValueError(f"{format_status} {path}")


def do_check(x):
    format_status, _ = check_file(x)
    return (x, format_status)


def do_format(x):
    return (x, format_file(x))


def glob_files(path: str = None, excludes=None):
    files = []
    for ext in ("**/*.cpp", "**/*.h", "**/*.hpp", "**/*.cu", "**/*.cuh", "**/*.py"):
        joined = os.path.join(path, ext)
        files.extend(glob.glob(joined, recursive=True))
    files = [
        f
        for f in files
        if "version.py" not in f and all([not e in f for e in excludes])
    ]
    print("[files]", len(files))
    return files


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("-i", "--root_path", type=str, required=True)
    parser.add_argument(
        "-v", "--verbose", default=False, action="store_true", required=False
    )
    parser.add_argument("--silent", default=False, action="store_true", required=False)
    parser.add_argument(
        "-c", "--check", default=False, action="store_true", required=False
    )
    parser.add_argument(
        "-f", "--fix", default=False, action="store_true", required=False
    )
    parser.add_argument("--exclude", action="append", default=[])
    args = parser.parse_args()
    files = glob_files(args.root_path, excludes=args.exclude)
    assert args.check != args.fix
    with Pool(10) as p:
        if args.check:
            any_absence = False
            for (p, format_status) in p.map(do_check, files):
                if format_status != "ok":
                    print(f"{format_status}:", p)
                    any_absence = True
            if any_absence:
                exit(1)
        if args.fix:
            for (p, format_result) in p.map(do_format, files):
                if format_result == True:
                    if args.verbose:
                        print("license already added:", p)
                else:
                    if args.silent == False:
                        print("license just added:", p)


================================================
FILE: ci/check/run_py_format.py
================================================
import argparse
import sys
import platform
from subprocess import Popen
import os

if __name__ == "__main__":

    major = platform.sys.version_info.major
    minor = platform.sys.version_info.minor
    if major == 3 and minor < 6:
        print("WARNING: python >= 3.6 required, python source format won't run")
        exit(0)
    parser = argparse.ArgumentParser(
        description="Runs py-format on all of the source files."
        "If --fix is specified enforce format by modifying in place."
    )
    parser.add_argument(
        "--source_dir", required=True, help="Root directory of the source code"
    )
    parser.add_argument(
        "--fix",
        default=False,
        action="store_true",
        help="If specified, will re-format the source",
    )

    arguments = parser.parse_args()
    os.chdir(arguments.source_dir)

    version_cmd = sys.executable + " -m {} --version | grep {} > /dev/null"
    BLACK_VER = "19.10b0"
    if os.system(version_cmd.format("black", BLACK_VER)):
        print(
            f"Please install black {BLACK_VER}. For instance, run 'python3 -m pip install black=={BLACK_VER} --user'"
        )
        sys.exit(1)

    cmd_line = sys.executable + " -m black " + "."
    if arguments.fix == False:
        cmd_line += " --check"
    if os.system(cmd_line):
        sys.exit(1)


================================================
FILE: ci/clang/build-llvm.sh
================================================
set -ex
export PATH=/usr/lib/llvm-15/bin:/usr/lib64/ccache:/root/.local/bin:$PATH

# clean python dir
cd ${ONEFLOW_CI_SRC_DIR}
${ONEFLOW_CI_PYTHON_EXE} -m pip install -i https://mirrors.aliyun.com/pypi/simple --user -r ci/fixed-dev-requirements.txt
cd python
git config --global --add safe.directory ${ONEFLOW_CI_SRC_DIR}
git clean -nXd -e \!dist -e \!dist/**
git clean -fXd -e \!dist -e \!dist/**

# cmake config
mkdir -p ${ONEFLOW_CI_BUILD_DIR}
cd ${ONEFLOW_CI_BUILD_DIR}
find ${ONEFLOW_CI_BUILD_DIR} -name CMakeCache.txt
find ${ONEFLOW_CI_BUILD_DIR} -name CMakeCache.txt -delete
if [ ! -f "$ONEFLOW_CI_CMAKE_INIT_CACHE" ]; then
    echo "$ONEFLOW_CI_CMAKE_INIT_CACHE does not exist."
    exit 1
fi
cmake -S ${ONEFLOW_CI_SRC_DIR} -C ${ONEFLOW_CI_CMAKE_INIT_CACHE} -DPython3_EXECUTABLE=${ONEFLOW_CI_PYTHON_EXE}
# cmake build
cd ${ONEFLOW_CI_BUILD_DIR}
cmake --build . -j $(nproc)

# build pip
cd ${ONEFLOW_CI_SRC_DIR}
cd python
${ONEFLOW_CI_PYTHON_EXE} setup.py bdist_wheel


================================================
FILE: ci/conda/build-clang.sh
================================================
set -ex
conda activate oneflow-dev-clang10-v2
mkdir -p build
cd build
cmake .. -C ../cmake/caches/cn/fast/cpu-clang.cmake
cmake --build . -j $(nproc)
cd -
cd python
python setup.py bdist_wheel
echo "wheelhouse_dir=$PWD/dist" >> $GITHUB_ENV


================================================
FILE: ci/conda/tuna.condarc
================================================
channels:
  - defaults
show_channel_urls: true
default_channels:
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2
custom_channels:
  conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  msys2: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  bioconda: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  menpo: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  pytorch: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  simpleitk: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud


================================================
FILE: ci/fixed-dev-requirements.txt
================================================
numpy==1.26.4 ; python_version >= "3.12"
numpy==1.22.1 ; python_version >= "3.10" and python_version < "3.12"
numpy==1.21.6 ; python_version >= "3.7" and python_version < "3.10"


================================================
FILE: ci/manylinux/build-gcc7-xla.sh
================================================
source scl_source enable devtoolset-7
set -ex
ONEFLOW_CI_BUILD_PARALLEL=${ONEFLOW_CI_BUILD_PARALLEL:-$(nproc)}
gcc --version
ld --version
# clean python dir
cd ${ONEFLOW_CI_SRC_DIR}
${ONEFLOW_CI_PYTHON_EXE} -m pip install -i https://mirrors.aliyun.com/pypi/simple --user -r ci/fixed-dev-requirements.txt
cd python
git clean -nXd -e \!dist -e \!dist/**
git clean -fXd -e \!dist -e \!dist/**
# cmake config
mkdir -p ${ONEFLOW_CI_BUILD_DIR}
cd ${ONEFLOW_CI_BUILD_DIR}
find ${ONEFLOW_CI_BUILD_DIR} -name CMakeCache.txt
find ${ONEFLOW_CI_BUILD_DIR} -name CMakeCache.txt -delete
if [ ! -f "$ONEFLOW_CI_CMAKE_INIT_CACHE" ]; then
    echo "$ONEFLOW_CI_CMAKE_INIT_CACHE does not exist."
    exit 1
fi
export PATH="${PATH}:$(dirname ${ONEFLOW_CI_PYTHON_EXE})"
export PYTHON_BIN_PATH=${ONEFLOW_CI_PYTHON_EXE}
cmake -S ${ONEFLOW_CI_SRC_DIR} -C ${ONEFLOW_CI_CMAKE_INIT_CACHE} -DPython3_EXECUTABLE=${ONEFLOW_CI_PYTHON_EXE}

# cmake build
cd ${ONEFLOW_CI_BUILD_DIR}
cmake --build . --parallel ${ONEFLOW_CI_BUILD_PARALLEL}

# build pip
cd ${ONEFLOW_CI_SRC_DIR}
cd python
${ONEFLOW_CI_PYTHON_EXE} setup.py bdist_wheel


================================================
FILE: ci/manylinux/build-gcc9.sh
================================================
source scl_source enable devtoolset-9
set -ex
ONEFLOW_CI_BUILD_PARALLEL=${ONEFLOW_CI_BUILD_PARALLEL:-$(nproc)}
gcc --version
ld --version
# clean python dir
cd ${ONEFLOW_CI_SRC_DIR}
${ONEFLOW_CI_PYTHON_EXE} -m pip install -i https://mirrors.aliyun.com/pypi/simple --user -r ci/fixed-dev-requirements.txt
${ONEFLOW_CI_PYTHON_EXE} -m pip install -i https://mirrors.aliyun.com/pypi/simple --user auditwheel setuptools wheel
cd python

function clean_artifacts {
    git config --global --add safe.directory ${ONEFLOW_CI_SRC_DIR}
    git clean -nXd -e \!dist -e \!dist/**
    git clean -fXd -e \!dist -e \!dist/**
}

clean_artifacts

# cmake config
mkdir -p ${ONEFLOW_CI_BUILD_DIR}
cd ${ONEFLOW_CI_BUILD_DIR}
find ${ONEFLOW_CI_BUILD_DIR} -name CMakeCache.txt
find ${ONEFLOW_CI_BUILD_DIR} -name CMakeCache.txt -delete
if [ ! -f "$ONEFLOW_CI_CMAKE_INIT_CACHE" ]; then
    echo "$ONEFLOW_CI_CMAKE_INIT_CACHE does not exist."
    exit 1
fi
export PATH="${PATH}:$(dirname ${ONEFLOW_CI_PYTHON_EXE})"
export PYTHON_BIN_PATH=${ONEFLOW_CI_PYTHON_EXE}
cmake -S ${ONEFLOW_CI_SRC_DIR} -C ${ONEFLOW_CI_CMAKE_INIT_CACHE} -DPython3_EXECUTABLE=${ONEFLOW_CI_PYTHON_EXE}

# cmake build
cd ${ONEFLOW_CI_BUILD_DIR}
cmake --build . --parallel ${ONEFLOW_CI_BUILD_PARALLEL}
if [ ! -z "$ONEFLOW_CI_BUILD_RUN_LIT" ]; then
    ${ONEFLOW_CI_PYTHON_EXE} -m pip install -i https://mirrors.aliyun.com/pypi/simple --user flowvision==0.1.0
    export PATH=$PATH:$(dirname $ONEFLOW_CI_PYTHON_EXE)
    cmake --build . -t c1
fi

# build pip
cd ${ONEFLOW_CI_SRC_DIR}
cd python
${ONEFLOW_CI_PYTHON_EXE} setup.py bdist_wheel


================================================
FILE: ci/manylinux/build.sh
================================================
set -ex
ONEFLOW_CI_BUILD_PARALLEL=${ONEFLOW_CI_BUILD_PARALLEL:-$(nproc)}
gcc --version
ld --version
# clean python dir
cd ${ONEFLOW_CI_SRC_DIR}
${ONEFLOW_CI_PYTHON_EXE} -m pip install -i https://mirrors.aliyun.com/pypi/simple --user -r ci/fixed-dev-requirements.txt
${ONEFLOW_CI_PYTHON_EXE} -m pip install -i https://mirrors.aliyun.com/pypi/simple --user auditwheel setuptools wheel
cd python

function clean_artifacts {
    git config --global --add safe.directory ${ONEFLOW_CI_SRC_DIR}
    git clean -nXd -e \!dist -e \!dist/**
    git clean -fXd -e \!dist -e \!dist/**
}

clean_artifacts

# cmake config
mkdir -p ${ONEFLOW_CI_BUILD_DIR}
cd ${ONEFLOW_CI_BUILD_DIR}
find ${ONEFLOW_CI_BUILD_DIR} -name CMakeCache.txt
find ${ONEFLOW_CI_BUILD_DIR} -name CMakeCache.txt -delete
if [ ! -f "$ONEFLOW_CI_CMAKE_INIT_CACHE" ]; then
    echo "$ONEFLOW_CI_CMAKE_INIT_CACHE does not exist."
    exit 1
fi
cmake -S ${ONEFLOW_CI_SRC_DIR} -C ${ONEFLOW_CI_CMAKE_INIT_CACHE} -DPython3_EXECUTABLE=${ONEFLOW_CI_PYTHON_EXE}
# cmake build
cd ${ONEFLOW_CI_BUILD_DIR}
cmake --build . --parallel ${ONEFLOW_CI_BUILD_PARALLEL}
if [ ! -z "$ONEFLOW_CI_BUILD_RUN_LIT" ]; then
    ${ONEFLOW_CI_PYTHON_EXE} -m pip install -i https://mirrors.aliyun.com/pypi/simple --user flowvision==0.1.0
    export PATH=$PATH:$(dirname $ONEFLOW_CI_PYTHON_EXE)
    cmake --build . -t c1
fi

# build pip
cd ${ONEFLOW_CI_SRC_DIR}
cd python
${ONEFLOW_CI_PYTHON_EXE} setup.py bdist_wheel


================================================
FILE: ci/requirements.txt
================================================
pycocotools
opencv-python==4.3.0.38; sys_platform == 'darwin'
opencv-python==4.2.0.34; sys_platform != 'darwin'
scipy
pillow
tensorflow-addons==0.13.0
tensorflow==2.5.0


================================================
FILE: ci/reset_submodule.sh
================================================
set -x
set -e
git reset --hard
git submodule deinit -f .
rm -rf .git/modules/*


================================================
FILE: ci/setup_submodule.py
================================================
import configparser
import argparse
import os

parser = argparse.ArgumentParser()
parser.add_argument("-s", "--oneflow_src_local_path", type=str, required=False)
parser.add_argument("-r", "--oneflow_src_remote_url", type=str, required=False)
args = parser.parse_args()

assert (
    args.oneflow_src_local_path or args.oneflow_src_remote_url
), "require one of oneflow_src_local_path or oneflow_src_remote_url"
config = configparser.ConfigParser()
config.read(".gitmodules")
for s in config.sections():
    path = config[s]["path"]
    if args.oneflow_src_local_path:
        src_path = os.path.join(args.oneflow_src_local_path, path)
        assert os.path.exists("{}/.git".format(src_path)), src_path
        config[s]["url"] = "file://{}".format(src_path)
    else:
        src_path = os.path.join(args.oneflow_src_remote_url, path)
        config[s]["url"] = src_path

with open(".gitmodules", "w") as configfile:
    config.write(configfile)


================================================
FILE: ci/setup_submodule.sh
================================================
set -x
set -e
src_dir=${ONEFLOW_CI_SRC_DIR:-"$HOME/oneflow"}
python3 ci/setup_submodule.py --oneflow_src_local_path=$src_dir
git submodule sync
git submodule update --init --recursive


================================================
FILE: ci/test/1node_benchmark_test.sh
================================================
set -xe

rm -rf /benchmarks
cp -r python/oneflow/compatible/single_client/benchmarks /benchmarks
cd /benchmarks

python3 cnn_benchmark/of_cnn_benchmarks.py \
    --gpu_num_per_node=1 \
    --model="vgg16" \
    --batch_size_per_device=8 \
    --iter_num=5 \
    --learning_rate=0.01 \
    --optimizer="sgd" \
    --loss_print_every_n_iter=1 \
    --data_dir="/dataset/imagenet_227/train/32"

python3 cnn_benchmark/of_cnn_benchmarks.py \
    --gpu_num_per_node=1 \
    --model="alexnet" \
    --batch_size_per_device=8 \
    -

Download .txt

gitextract_pzk3dhhw/

├── .clang-format
├── .clang-tidy
├── .cmake-format.py
├── .devcontainer/
│   ├── Dockerfile
│   └── devcontainer.json
├── .dockerignore
├── .github/
│   ├── CODEOWNERS
│   ├── ISSUE_TEMPLATE/
│   │   ├── blank_issue.yml
│   │   ├── bug_report.md
│   │   ├── documention_issue.yml
│   │   ├── feature_request.yml
│   │   ├── performance_issue.yml
│   │   └── question.yml
│   ├── PULL_REQUEST_TEMPLATE/
│   │   ├── general_template.md
│   │   └── op_template.md
│   ├── actions/
│   │   ├── mac-build/
│   │   │   └── action.yml
│   │   ├── setup/
│   │   │   └── action.yml
│   │   ├── upload_oss/
│   │   │   └── action.yml
│   │   ├── upload_ssh/
│   │   │   └── action.yml
│   │   └── whl/
│   │       └── action.yml
│   ├── scripts/
│   │   ├── requirements.txt
│   │   └── set_initial_variables.py
│   └── workflows/
│       ├── canary.yml
│       ├── community_release.yml
│       ├── on_merge.yml
│       ├── pr.yml
│       ├── priv_release.yml
│       ├── release.yml
│       ├── simple.yml
│       └── test.yml
├── .gitignore
├── .lsan-suppressions
├── .mergify.yml
├── .tsan-suppressions
├── .ubsan-suppressions
├── CMakeLists.txt
├── LICENSE
├── README.md
├── ci/
│   ├── CMakeLists.txt
│   ├── build/
│   │   ├── ensure_img.py
│   │   └── make.sh
│   ├── check/
│   │   ├── clang_tidy_warnings_as_errors_on_diff
│   │   ├── lintutils.py
│   │   ├── run_clang_format.py
│   │   ├── run_clang_tidy.py
│   │   ├── run_cmake_format.py
│   │   ├── run_license_format.py
│   │   └── run_py_format.py
│   ├── clang/
│   │   └── build-llvm.sh
│   ├── conda/
│   │   ├── build-clang.sh
│   │   └── tuna.condarc
│   ├── fixed-dev-requirements.txt
│   ├── manylinux/
│   │   ├── build-gcc7-xla.sh
│   │   ├── build-gcc9.sh
│   │   └── build.sh
│   ├── requirements.txt
│   ├── reset_submodule.sh
│   ├── setup_submodule.py
│   ├── setup_submodule.sh
│   └── test/
│       ├── 1node_benchmark_test.sh
│       ├── 1node_benchmark_test_fp16.sh
│       ├── 1node_custom_op_test.sh
│       ├── 1node_model_eager_test.sh
│       ├── 1node_model_test.sh
│       ├── 1node_op_test.sh
│       ├── 2node_op_test.sh
│       ├── 2node_op_test_multi_client.sh
│       ├── CMakeLists.txt
│       ├── build_docs.sh
│       ├── distributed_run.py
│       ├── doctest.sh
│       ├── excludelist
│       ├── expensive_generic_test_multi_client.sh
│       ├── generic_test.sh
│       ├── generic_test_multi_client.sh
│       ├── ir_tests.sh
│       ├── multi_client_exception_test.sh
│       ├── multi_launch.py
│       ├── parallel_run.py
│       ├── print_stack_from_core.sh
│       ├── print_stack_in_all_dirs.sh
│       ├── resource-spec/
│       │   ├── 1x-gtx-1080.json
│       │   ├── 2x-rtx-2080.json
│       │   └── 4x-rtx-2080ti.json
│       ├── test_mock_function.sh
│       ├── test_mock_script.sh
│       ├── test_resnet50_graph_ddp.sh
│       ├── test_speed_multi_client.sh
│       └── try_install.sh
├── cmake/
│   ├── caches/
│   │   ├── ci/
│   │   │   ├── canary/
│   │   │   │   └── cuda.cmake
│   │   │   ├── cpu-asan-ubsan.cmake
│   │   │   ├── cpu-tsan.cmake
│   │   │   ├── cpu.cmake
│   │   │   ├── cuda-xla.cmake
│   │   │   ├── cuda.cmake
│   │   │   ├── gh-hosted/
│   │   │   │   ├── cpu-clang.cmake
│   │   │   │   └── cpu-gcc.cmake
│   │   │   ├── llvm/
│   │   │   │   └── cuda-75-clang.cmake
│   │   │   ├── profiler/
│   │   │   │   └── cuda.cmake
│   │   │   ├── release/
│   │   │   │   ├── cpu.cmake
│   │   │   │   ├── cu118.cmake
│   │   │   │   └── cuda.cmake
│   │   │   └── serving/
│   │   │       ├── cuda-75.cmake
│   │   │       └── openvino.cmake
│   │   ├── cn/
│   │   │   ├── cpu.cmake
│   │   │   ├── cuda.cmake
│   │   │   └── fast/
│   │   │       ├── cpu-clang.cmake
│   │   │       ├── cpu.cmake
│   │   │       ├── cuda-61-clang.cmake
│   │   │       ├── cuda-61.cmake
│   │   │       ├── cuda-75-clang.cmake
│   │   │       ├── cuda-75.cmake
│   │   │       ├── cuda-86.cmake
│   │   │       ├── mlir-cpu.cmake
│   │   │       ├── mlir-cuda-61.cmake
│   │   │       ├── mlir-cuda-75.cmake
│   │   │       ├── mlir-cuda-80.cmake
│   │   │       └── mlir-cuda-86.cmake
│   │   └── international/
│   │       ├── cpu.cmake
│   │       └── cuda.cmake
│   ├── cuda.cmake
│   ├── functional.cmake
│   ├── git_version.cmake
│   ├── oneflow-config.cmake
│   ├── oneflow.cmake
│   ├── op_schema.cmake
│   ├── platform.cmake
│   ├── proto2cpp.cmake
│   ├── pybind11.cmake
│   ├── python.cmake
│   ├── third_party/
│   │   ├── FindBFD.cmake
│   │   ├── FindBLAS.cmake
│   │   ├── FindCUDNN.cmake
│   │   ├── FindUnwind.cmake
│   │   ├── absl.cmake
│   │   ├── cares.cmake
│   │   ├── cocoapi.cmake
│   │   ├── cub.cmake
│   │   ├── cutlass.cmake
│   │   ├── eigen.cmake
│   │   ├── flash_attention.cmake
│   │   ├── flatbuffers.cmake
│   │   ├── glog.cmake
│   │   ├── googletest.cmake
│   │   ├── grpc.cmake
│   │   ├── half.cmake
│   │   ├── header_index/
│   │   │   ├── cub_headers.txt
│   │   │   ├── grpc_headers.txt
│   │   │   ├── libpng_headers.txt
│   │   │   └── opencv_headers.txt
│   │   ├── hwloc.cmake
│   │   ├── json.cmake
│   │   ├── libjpeg-turbo.cmake
│   │   ├── nccl.cmake
│   │   ├── oneDNN.cmake
│   │   ├── opencv.cmake
│   │   ├── openssl.cmake
│   │   ├── patches/
│   │   │   └── tensorflow-logging.patch
│   │   ├── protobuf.cmake
│   │   ├── re2.cmake
│   │   ├── trt_flash_attention.cmake
│   │   └── zlib.cmake
│   ├── third_party.cmake
│   ├── threading.cmake
│   └── util.cmake
├── dev-requirements.txt
├── docker/
│   ├── build/
│   │   ├── Dockerfile
│   │   ├── build-ubuntu.sh
│   │   ├── build.sh
│   │   ├── build.ubuntu.dockerfile
│   │   ├── launch.sh
│   │   └── test.sh
│   ├── ci/
│   │   ├── base/
│   │   │   └── Dockerfile
│   │   ├── fmt/
│   │   │   ├── Dockerfile
│   │   │   └── build.sh
│   │   ├── make/
│   │   │   └── Dockerfile
│   │   ├── test/
│   │   │   ├── Dockerfile
│   │   │   ├── build.sh
│   │   │   ├── launch.sh
│   │   │   └── requirements.txt
│   │   ├── test-v2/
│   │   │   ├── Dockerfile
│   │   │   ├── build.sh
│   │   │   ├── requirements.txt
│   │   │   └── sources.list
│   │   └── third_party/
│   │       └── Dockerfile
│   └── package/
│       └── manylinux/
│           ├── CentOS-Base.repo
│           ├── CentOS7-Base-163.repo
│           ├── Dockerfile
│           ├── README.md
│           ├── build_wheel.py
│           └── launch.sh
├── docs/
│   ├── Makefile
│   ├── requirements.txt
│   └── source/
│       ├── _static/
│       │   └── .gitkeep
│       ├── auto_parallel.rst
│       ├── autograd.rst
│       ├── cn/
│       │   ├── __init__.py
│       │   ├── activation.py
│       │   └── math_ops.py
│       ├── conf.py
│       ├── cuda.rst
│       ├── distributed.rst
│       ├── distributions.rst
│       ├── environment_variables.rst
│       ├── graph.rst
│       ├── hub.rst
│       ├── image.rst
│       ├── index.rst
│       ├── linalg.rst
│       ├── nn.functional.rst
│       ├── nn.init.rst
│       ├── nn.rst
│       ├── one_embedding.rst
│       ├── oneflow.rst
│       ├── optim.rst
│       ├── special.rst
│       ├── tensor.rst
│       ├── tensor_attributes.rst
│       ├── troubleshooting.md
│       ├── type_info.rst
│       ├── utils.data.rst
│       ├── utils.global_view.rst
│       └── utils.tensor.rst
├── external/
│   ├── CMakeLists.txt
│   ├── fmt/
│   │   └── CMakeLists.txt
│   ├── kineto/
│   │   └── CMakeLists.txt
│   ├── onetbb/
│   │   └── CMakeLists.txt
│   └── robin-hood-hashing/
│       └── CMakeLists.txt
├── oneflow/
│   ├── api/
│   │   ├── common/
│   │   │   ├── ir_pass.cpp
│   │   │   ├── job_build_and_infer_ctx.h
│   │   │   ├── sbp.h
│   │   │   └── variable_tensor_mgr.h
│   │   ├── cpp/
│   │   │   ├── api.h
│   │   │   ├── embedding/
│   │   │   │   ├── embedding.cpp
│   │   │   │   └── embedding.h
│   │   │   ├── env.cpp
│   │   │   ├── env.h
│   │   │   ├── env_impl.cpp
│   │   │   ├── env_impl.h
│   │   │   ├── framework/
│   │   │   │   ├── device.cpp
│   │   │   │   ├── device.h
│   │   │   │   ├── dtype.cpp
│   │   │   │   ├── dtype.h
│   │   │   │   ├── graph.cpp
│   │   │   │   ├── graph.h
│   │   │   │   ├── ivalue.cpp
│   │   │   │   ├── ivalue.h
│   │   │   │   ├── shape.cpp
│   │   │   │   ├── shape.h
│   │   │   │   ├── tensor.cpp
│   │   │   │   └── tensor.h
│   │   │   ├── framework.h
│   │   │   ├── nn/
│   │   │   │   └── functional/
│   │   │   │       ├── activation.cpp
│   │   │   │       └── activation.h
│   │   │   ├── nn.h
│   │   │   └── tests/
│   │   │       ├── api_test.cpp
│   │   │       ├── api_test.h
│   │   │       ├── graph_test.cpp
│   │   │       ├── graph_test_model/
│   │   │       │   ├── affine_no_parameter/
│   │   │       │   │   └── model.mlir
│   │   │       │   └── affine_with_parameter/
│   │   │       │       ├── model.a/
│   │   │       │       │   ├── meta
│   │   │       │       │   └── out
│   │   │       │       ├── model.b/
│   │   │       │       │   ├── meta
│   │   │       │       │   └── out
│   │   │       │       └── model.mlir
│   │   │       ├── ivalue_test.cpp
│   │   │       ├── nn_test.cpp
│   │   │       ├── one_embedding_test.cpp
│   │   │       └── tensor_test.cpp
│   │   └── python/
│   │       ├── autograd/
│   │       │   ├── autograd.cpp
│   │       │   ├── autograd_engine.cpp
│   │       │   ├── autograd_function.cpp
│   │       │   ├── autograd_function_state.cpp
│   │       │   ├── autograd_function_state.h
│   │       │   ├── autograd_mode.cpp
│   │       │   └── function_node.cpp
│   │       ├── caster/
│   │       │   ├── autograd_function_state.h
│   │       │   ├── common.h
│   │       │   ├── maybe.h
│   │       │   ├── optional.h
│   │       │   ├── size.h
│   │       │   ├── tensor.h
│   │       │   └── test.cpp
│   │       ├── deprecated.cpp
│   │       ├── dlpack/
│   │       │   ├── converter.cpp
│   │       │   ├── converter.h
│   │       │   └── dlpack.h
│   │       ├── eager/
│   │       │   └── eager.cpp
│   │       ├── env/
│   │       │   ├── env.cpp
│   │       │   └── env.h
│   │       ├── ep/
│   │       │   └── cuda_matmul_mode.cpp
│   │       ├── exception/
│   │       │   ├── exception.cpp
│   │       │   └── exception.h
│   │       ├── flags.cpp
│   │       ├── framework/
│   │       │   ├── autocast.cpp
│   │       │   ├── device.cpp
│   │       │   ├── doc.cpp
│   │       │   ├── dtype.cpp
│   │       │   ├── framework.cpp
│   │       │   ├── framework.h
│   │       │   ├── global_mode.cpp
│   │       │   ├── id_state.cpp
│   │       │   ├── id_util.cpp
│   │       │   ├── instructions_builder.cpp
│   │       │   ├── layout.cpp
│   │       │   ├── memory_format.cpp
│   │       │   ├── memory_format.h
│   │       │   ├── nn_graph.cpp
│   │       │   ├── one_embedding.cpp
│   │       │   ├── op_builder.cpp
│   │       │   ├── op_expr.cpp
│   │       │   ├── parallel_conf_util.cpp
│   │       │   ├── py_kernel_registry.cpp
│   │       │   ├── random_generator.cpp
│   │       │   ├── scope_util.cpp
│   │       │   ├── session_util.cpp
│   │       │   ├── shut_down_util.cpp
│   │       │   ├── size.cpp
│   │       │   ├── size.h
│   │       │   ├── stream.cpp
│   │       │   ├── tensor.cpp
│   │       │   ├── tensor.h
│   │       │   ├── tensor_functions.cpp
│   │       │   ├── tensor_functions_util.h
│   │       │   ├── tensor_tuple.cpp
│   │       │   ├── tensortype.cpp
│   │       │   ├── tensortype.h
│   │       │   ├── thread.cpp
│   │       │   ├── thread.h
│   │       │   ├── typeinfo.cpp
│   │       │   ├── typeinfo.h
│   │       │   └── variable_tensor_mgr.cpp
│   │       ├── functional/
│   │       │   ├── common.cpp
│   │       │   ├── common.h
│   │       │   ├── dispatch_stateful_ops.cpp
│   │       │   ├── dispatch_stateful_ops.yaml
│   │       │   ├── function_def.h
│   │       │   ├── indexing.cpp
│   │       │   ├── indexing.h
│   │       │   ├── python_arg.cpp
│   │       │   ├── python_arg.h
│   │       │   ├── python_arg_parser.cpp
│   │       │   ├── python_arg_parser.h
│   │       │   ├── python_return_types.h
│   │       │   ├── tensor_api.cpp
│   │       │   ├── tensor_api.yaml
│   │       │   ├── value_types.cpp
│   │       │   └── value_types.h
│   │       ├── gil_foreign_lock_helper.cpp
│   │       ├── init.cpp
│   │       ├── ir.cpp
│   │       ├── job_build/
│   │       │   ├── job_build_and_infer.cpp
│   │       │   ├── job_build_and_infer.h
│   │       │   └── lazy_mode.cpp
│   │       ├── multiprocessing/
│   │       │   ├── init.cpp
│   │       │   ├── object_ptr.cpp
│   │       │   ├── object_ptr.h
│   │       │   └── shared_memory.cpp
│   │       ├── numpy/
│   │       │   └── init_numpy_c_api.cpp
│   │       ├── of_api_registry.cpp
│   │       ├── of_api_registry.h
│   │       ├── profiler.cpp
│   │       ├── registry/
│   │       │   └── registry.cpp
│   │       ├── remat/
│   │       │   └── remat.cpp
│   │       ├── rpc/
│   │       │   ├── ccl.cpp
│   │       │   └── rank_group.cpp
│   │       ├── session/
│   │       │   └── session.cpp
│   │       ├── stack_getter.cpp
│   │       ├── symbol/
│   │       │   ├── job_conf_symbol.cpp
│   │       │   ├── op_conf_symbol.cpp
│   │       │   ├── placement_symbol.cpp
│   │       │   ├── sbp_symbol.cpp
│   │       │   └── scope_symbol.cpp
│   │       └── utils/
│   │           ├── dataloader.cpp
│   │           ├── tensor_utils.cpp
│   │           └── tensor_utils.h
│   ├── core/
│   │   ├── auto_parallel/
│   │   │   ├── algorithm_util.cpp
│   │   │   ├── algorithm_util.h
│   │   │   ├── auto_memory.cpp
│   │   │   ├── auto_memory.h
│   │   │   ├── binary_set.cpp
│   │   │   ├── binary_set.h
│   │   │   ├── boxing_collector.cpp
│   │   │   ├── boxing_collector.h
│   │   │   ├── sbp_collector.cpp
│   │   │   ├── sbp_collector.h
│   │   │   ├── sbp_constructor.cpp
│   │   │   ├── sbp_constructor.h
│   │   │   ├── sbp_edge.cpp
│   │   │   ├── sbp_edge.h
│   │   │   ├── sbp_graph.cpp
│   │   │   ├── sbp_graph.h
│   │   │   ├── sbp_node.cpp
│   │   │   ├── sbp_node.h
│   │   │   ├── sbp_util.cpp
│   │   │   └── sbp_util.h
│   │   ├── autograd/
│   │   │   ├── autograd_captured_tensor.h
│   │   │   ├── autograd_engine.cpp
│   │   │   ├── autograd_engine.h
│   │   │   ├── autograd_function.cpp
│   │   │   ├── autograd_function.h
│   │   │   ├── autograd_meta.cpp
│   │   │   ├── autograd_meta.h
│   │   │   ├── autograd_mode.cpp
│   │   │   ├── autograd_mode.h
│   │   │   ├── gradient_funcs/
│   │   │   │   ├── activation.cpp
│   │   │   │   ├── adaptive_avg_pool.cpp
│   │   │   │   ├── adaptive_max_pool.cpp
│   │   │   │   ├── add_n.cpp
│   │   │   │   ├── affine_grid.cpp
│   │   │   │   ├── amp_white_identity.cpp
│   │   │   │   ├── as_strided.cpp
│   │   │   │   ├── avg_pool.cpp
│   │   │   │   ├── batch_gather.cpp
│   │   │   │   ├── bias_add.cpp
│   │   │   │   ├── binary_cross_entropy.cpp
│   │   │   │   ├── binary_cross_entropy_with_logits.cpp
│   │   │   │   ├── binary_cross_entropy_with_logits_reduce_mean.cpp
│   │   │   │   ├── broadcast_binary_ops.cpp
│   │   │   │   ├── broadcast_like.cpp
│   │   │   │   ├── cast.cpp
│   │   │   │   ├── clip_by_scalar.cpp
│   │   │   │   ├── clip_by_scalar_max.cpp
│   │   │   │   ├── clip_by_scalar_min.cpp
│   │   │   │   ├── combined_margin_loss.cpp
│   │   │   │   ├── complex.cpp
│   │   │   │   ├── concat.cpp
│   │   │   │   ├── conv.cpp
│   │   │   │   ├── copy.cpp
│   │   │   │   ├── ctc_loss.cpp
│   │   │   │   ├── cublas_fused_mlp.cpp
│   │   │   │   ├── cum_ops.cpp
│   │   │   │   ├── deconv.cpp
│   │   │   │   ├── deform_conv.cpp
│   │   │   │   ├── depand.cpp
│   │   │   │   ├── det.cpp
│   │   │   │   ├── diag.cpp
│   │   │   │   ├── diagonal.cpp
│   │   │   │   ├── dim_gather.cpp
│   │   │   │   ├── dim_scatter.cpp
│   │   │   │   ├── dot.cpp
│   │   │   │   ├── dropout.cpp
│   │   │   │   ├── eager_ccl_broadcast.cpp
│   │   │   │   ├── elementwise_minimum_maximum.cpp
│   │   │   │   ├── embedding.cpp
│   │   │   │   ├── expand.cpp
│   │   │   │   ├── fake_quantization.cpp
│   │   │   │   ├── fft.cpp
│   │   │   │   ├── fill.cpp
│   │   │   │   ├── flatten.cpp
│   │   │   │   ├── flip.cpp
│   │   │   │   ├── fold.cpp
│   │   │   │   ├── fused_bias_add_dropout.cpp
│   │   │   │   ├── fused_bias_add_gelu.cpp
│   │   │   │   ├── fused_bias_add_scale_mask_softmax_dropout.cpp
│   │   │   │   ├── fused_center.cpp
│   │   │   │   ├── fused_cross_interaction.cpp
│   │   │   │   ├── fused_dot_feature_interaction.cpp
│   │   │   │   ├── fused_fast_gelu_mul.cpp
│   │   │   │   ├── fused_get_boundding_boxes_coord.cpp
│   │   │   │   ├── fused_get_ciou_diagonal_angle.cpp
│   │   │   │   ├── fused_get_ciou_result.cpp
│   │   │   │   ├── fused_get_convex_diagonal_squared.cpp
│   │   │   │   ├── fused_get_intersection_area.cpp
│   │   │   │   ├── fused_get_iou.cpp
│   │   │   │   ├── fused_glu.cpp
│   │   │   │   ├── fused_gru_cell.cpp
│   │   │   │   ├── fused_lstm_cell.cpp
│   │   │   │   ├── fused_matmul_bias.cpp
│   │   │   │   ├── fused_matmul_bias_add_relu_dropout.cpp
│   │   │   │   ├── fused_scale_mask_bias_softmax.cpp
│   │   │   │   ├── fused_scale_mask_softmax.cpp
│   │   │   │   ├── fused_scale_mask_softmax_dropout.cpp
│   │   │   │   ├── fused_scale_tril.cpp
│   │   │   │   ├── fused_scale_tril_softmax_mask_scale.cpp
│   │   │   │   ├── fused_self_attention.cpp
│   │   │   │   ├── fused_weighted_sum.cpp
│   │   │   │   ├── gather.cpp
│   │   │   │   ├── gather_nd.cpp
│   │   │   │   ├── global_cast.cpp
│   │   │   │   ├── global_to_global.cpp
│   │   │   │   ├── gradient_accumulation.cpp
│   │   │   │   ├── graph_feed_and_fetch.cpp
│   │   │   │   ├── grid_sample.cpp
│   │   │   │   ├── group_norm.cpp
│   │   │   │   ├── identity.cpp
│   │   │   │   ├── inv.cpp
│   │   │   │   ├── kl_div.cpp
│   │   │   │   ├── l2_normalize.cpp
│   │   │   │   ├── layer_norm.cpp
│   │   │   │   ├── lerp.cpp
│   │   │   │   ├── linalg_cross.cpp
│   │   │   │   ├── log_softmax.cpp
│   │   │   │   ├── masked_fill.cpp
│   │   │   │   ├── math_binary_op.cpp
│   │   │   │   ├── math_unary_op.cpp
│   │   │   │   ├── matmul.cpp
│   │   │   │   ├── matrix_vector_product.cpp
│   │   │   │   ├── max_pool.cpp
│   │   │   │   ├── max_unpool.cpp
│   │   │   │   ├── median.cpp
│   │   │   │   ├── mode.cpp
│   │   │   │   ├── narrow.cpp
│   │   │   │   ├── nll.cpp
│   │   │   │   ├── noncontiguous_binary_op.cpp
│   │   │   │   ├── normalization.cpp
│   │   │   │   ├── normalization_add_relu.cpp
│   │   │   │   ├── one_embedding_fused_lookup.cpp
│   │   │   │   ├── padding.cpp
│   │   │   │   ├── partial_fc_sample.cpp
│   │   │   │   ├── reduce_ops.cpp
│   │   │   │   ├── reduce_sum_like.cpp
│   │   │   │   ├── reshape.cpp
│   │   │   │   ├── rms_norm.cpp
│   │   │   │   ├── roi_align.cpp
│   │   │   │   ├── roll.cpp
│   │   │   │   ├── rrelu.cpp
│   │   │   │   ├── scalar_add.cpp
│   │   │   │   ├── scalar_div.cpp
│   │   │   │   ├── scalar_floordiv.cpp
│   │   │   │   ├── scalar_fmod.cpp
│   │   │   │   ├── scalar_mul.cpp
│   │   │   │   ├── scalar_pow.cpp
│   │   │   │   ├── scalar_truncdiv.cpp
│   │   │   │   ├── scaled_dot_product_attention.cpp
│   │   │   │   ├── scatter_nd.cpp
│   │   │   │   ├── select_top_n.cpp
│   │   │   │   ├── slice.cpp
│   │   │   │   ├── smooth_l1_loss.cpp
│   │   │   │   ├── softmax.cpp
│   │   │   │   ├── softmax_cross_entropy.cpp
│   │   │   │   ├── sparse_cross_entropy.cpp
│   │   │   │   ├── sparse_softmax_cross_entropy.cpp
│   │   │   │   ├── sparse_softmax_cross_entropy_ms.cpp
│   │   │   │   ├── split_like.cpp
│   │   │   │   ├── squeeze.cpp
│   │   │   │   ├── stack.cpp
│   │   │   │   ├── tensor_scalar_binary.cpp
│   │   │   │   ├── tensor_scatter_nd_update.cpp
│   │   │   │   ├── tf_pool.cpp
│   │   │   │   ├── to_contiguous.cpp
│   │   │   │   ├── transpose.cpp
│   │   │   │   ├── tril.cpp
│   │   │   │   ├── triu.cpp
│   │   │   │   ├── trunc.cpp
│   │   │   │   ├── two_stage_reduce.cpp
│   │   │   │   ├── unfold.cpp
│   │   │   │   ├── unfold_tensor.cpp
│   │   │   │   ├── unsqueeze.cpp
│   │   │   │   ├── upsample.cpp
│   │   │   │   ├── variance.cpp
│   │   │   │   ├── vector_matrix_product.cpp
│   │   │   │   └── where.cpp
│   │   │   └── higher_order_gradient_funcs/
│   │   │       ├── activation.cpp
│   │   │       ├── avg_pool.cpp
│   │   │       ├── binary_cross_entropy_loss.cpp
│   │   │       ├── binary_cross_entropy_with_logits.cpp
│   │   │       ├── binary_cross_entropy_with_logits_reduce_mean.cpp
│   │   │       ├── conv.cpp
│   │   │       ├── div.cpp
│   │   │       ├── kl_div_loss.cpp
│   │   │       ├── log_softmax.cpp
│   │   │       ├── math_unary_op.cpp
│   │   │       ├── matmul.cpp
│   │   │       ├── max_pool.cpp
│   │   │       ├── nll_loss.cpp
│   │   │       ├── pow.cpp
│   │   │       ├── scalar_pow.cpp
│   │   │       ├── slice.cpp
│   │   │       ├── smooth_l1_loss.cpp
│   │   │       └── softmax.cpp
│   │   ├── boxing/
│   │   │   ├── asymmetric_broadcast.cpp
│   │   │   ├── boxing_dividor.h
│   │   │   ├── boxing_dividor_util.cpp
│   │   │   ├── boxing_dividor_util.h
│   │   │   ├── boxing_interpreter_status.cpp
│   │   │   ├── boxing_interpreter_status.h
│   │   │   ├── ccl_boxing_function.cpp
│   │   │   ├── cuda_copy_boxing_interpreter.cpp
│   │   │   ├── eager_boxing_interpreter.cpp
│   │   │   ├── eager_boxing_interpreter.h
│   │   │   ├── eager_boxing_interpreter_mgr.cpp
│   │   │   ├── eager_boxing_interpreter_mgr.h
│   │   │   ├── eager_boxing_logger.cpp
│   │   │   ├── eager_boxing_logger.h
│   │   │   ├── flatten_hierarchy.cpp
│   │   │   ├── generic_symmetric_nd_sbp_boxing.cpp
│   │   │   ├── identity_boxing_interpreter.cpp
│   │   │   ├── naive_1_to_p_boxing.cpp
│   │   │   ├── naive_b_to_1_boxing.cpp
│   │   │   ├── naive_b_to_s_boxing.cpp
│   │   │   ├── naive_p_to_b_boxing.cpp
│   │   │   ├── naive_p_to_s_boxing.cpp
│   │   │   ├── naive_s_to_b_boxing.cpp
│   │   │   ├── naive_s_to_p_boxing.cpp
│   │   │   ├── naive_s_to_s_boxing.cpp
│   │   │   ├── nd_sbp_dim_reduce_boxing.cpp
│   │   │   ├── one_to_one_boxing.cpp
│   │   │   ├── slice_boxing_util.cpp
│   │   │   ├── slice_boxing_util.h
│   │   │   ├── symmetric_acyclic_nd_sbp_boxing.cpp
│   │   │   ├── symmetric_b_to_p_boxing.cpp
│   │   │   ├── symmetric_b_to_s_boxing.cpp
│   │   │   ├── symmetric_s_to_p_boxing.cpp
│   │   │   └── unflatten_hierarchy.cpp
│   │   ├── ccl/
│   │   │   ├── ccl.cpp
│   │   │   └── ccl.h
│   │   ├── comm_network/
│   │   │   ├── comm_network.cpp
│   │   │   ├── comm_network.h
│   │   │   ├── epoll/
│   │   │   │   ├── epoll_comm_network.cpp
│   │   │   │   ├── epoll_comm_network.h
│   │   │   │   ├── io_event_poller.cpp
│   │   │   │   ├── io_event_poller.h
│   │   │   │   ├── socket_helper.cpp
│   │   │   │   ├── socket_helper.h
│   │   │   │   ├── socket_memory_desc.h
│   │   │   │   ├── socket_message.h
│   │   │   │   ├── socket_read_helper.cpp
│   │   │   │   ├── socket_read_helper.h
│   │   │   │   ├── socket_write_helper.cpp
│   │   │   │   └── socket_write_helper.h
│   │   │   └── ibverbs/
│   │   │       ├── ibverbs.proto
│   │   │       ├── ibverbs_comm_network.cpp
│   │   │       ├── ibverbs_comm_network.h
│   │   │       ├── ibverbs_memory_desc.cpp
│   │   │       ├── ibverbs_memory_desc.h
│   │   │       ├── ibverbs_qp.cpp
│   │   │       └── ibverbs_qp.h
│   │   ├── common/
│   │   │   ├── array_ref.h
│   │   │   ├── auto_registration_factory.h
│   │   │   ├── balanced_splitter.cpp
│   │   │   ├── balanced_splitter.h
│   │   │   ├── balanced_splitter_test.cpp
│   │   │   ├── bfloat16.h
│   │   │   ├── bfloat16_math.h
│   │   │   ├── bfloat16_test.cpp
│   │   │   ├── blas.h
│   │   │   ├── blocking_counter.cpp
│   │   │   ├── blocking_counter.h
│   │   │   ├── blocking_then_busy.h
│   │   │   ├── buffer.h
│   │   │   ├── buffer_manager.h
│   │   │   ├── cached_caller.cpp
│   │   │   ├── cached_caller.h
│   │   │   ├── cblas.h
│   │   │   ├── channel.h
│   │   │   ├── channel_test.cpp
│   │   │   ├── check.cpp
│   │   │   ├── check.h
│   │   │   ├── check_level.cpp
│   │   │   ├── check_level.h
│   │   │   ├── constant.h
│   │   │   ├── container_util.h
│   │   │   ├── container_util_test.cpp
│   │   │   ├── cost_util.h
│   │   │   ├── cpp_attribute.h
│   │   │   ├── data_type.cpp
│   │   │   ├── data_type.h
│   │   │   ├── data_type.proto
│   │   │   ├── data_type_converter.h
│   │   │   ├── data_type_converter_test.cpp
│   │   │   ├── data_type_converter_test_static.h
│   │   │   ├── data_type_seq.h
│   │   │   ├── decorator.h
│   │   │   ├── decorator_test.cpp
│   │   │   ├── device.proto
│   │   │   ├── device_type.cpp
│   │   │   ├── device_type.h
│   │   │   ├── device_type.proto
│   │   │   ├── dtype_signature.h
│   │   │   ├── dtype_signature.proto
│   │   │   ├── eigen_util.h
│   │   │   ├── either_ptr.h
│   │   │   ├── env_var/
│   │   │   │   ├── bootstrap.h
│   │   │   │   ├── debug_mode.h
│   │   │   │   ├── eager.h
│   │   │   │   ├── env_var.h
│   │   │   │   ├── remat.h
│   │   │   │   ├── stream.h
│   │   │   │   └── vm.h
│   │   │   ├── error.cpp
│   │   │   ├── error.h
│   │   │   ├── error.proto
│   │   │   ├── error_util.cpp
│   │   │   ├── error_util.h
│   │   │   ├── exception.h
│   │   │   ├── flat_shape.cpp
│   │   │   ├── flat_shape.h
│   │   │   ├── foreign_lock_helper.cpp
│   │   │   ├── foreign_lock_helper.h
│   │   │   ├── function_traits.h
│   │   │   ├── hash.h
│   │   │   ├── hash_container.h
│   │   │   ├── hash_eq_trait_ptr.h
│   │   │   ├── high_order_bool.h
│   │   │   ├── just.h
│   │   │   ├── layout_standardize.h
│   │   │   ├── math_util.cpp
│   │   │   ├── math_util.h
│   │   │   ├── maybe.h
│   │   │   ├── maybe_test.cpp
│   │   │   ├── mem_util.cpp
│   │   │   ├── mem_util.h
│   │   │   ├── memory_format.proto
│   │   │   ├── meta_util.hpp
│   │   │   ├── nd_index.cpp
│   │   │   ├── nd_index.h
│   │   │   ├── nd_index_offset_helper.h
│   │   │   ├── nd_index_offset_helper_test.cpp
│   │   │   ├── not_equal_to_previous_adjacent_iterator.h
│   │   │   ├── notifier.cpp
│   │   │   ├── notifier.h
│   │   │   ├── of_unused.h
│   │   │   ├── op_args_reserved_size.h
│   │   │   ├── op_args_vector.h
│   │   │   ├── optional.h
│   │   │   ├── optional_test.cpp
│   │   │   ├── pcheck.h
│   │   │   ├── permutation_iterator.h
│   │   │   ├── platform.h
│   │   │   ├── preprocessor.h
│   │   │   ├── preprocessor_internal.h
│   │   │   ├── preprocessor_test.cpp
│   │   │   ├── process_state.h
│   │   │   ├── protobuf.cpp
│   │   │   ├── protobuf.h
│   │   │   ├── range.cpp
│   │   │   ├── range.h
│   │   │   ├── range.proto
│   │   │   ├── registry_error.cpp
│   │   │   ├── registry_error.h
│   │   │   ├── scalar.cpp
│   │   │   ├── scalar.h
│   │   │   ├── sequential.proto
│   │   │   ├── shape.cpp
│   │   │   ├── shape.h
│   │   │   ├── shape.proto
│   │   │   ├── shape_test.cpp
│   │   │   ├── shape_vec.h
│   │   │   ├── shape_view.cpp
│   │   │   ├── shape_view.h
│   │   │   ├── shared_or_scalar.h
│   │   │   ├── single_thread_obj_pool.h
│   │   │   ├── single_thread_obj_pool_test.cpp
│   │   │   ├── singleton.h
│   │   │   ├── sized_buffer_view.h
│   │   │   ├── small_vector.h
│   │   │   ├── spin_counter.cpp
│   │   │   ├── spin_counter.h
│   │   │   ├── static_check.h
│   │   │   ├── static_global.h
│   │   │   ├── steady_vector.h
│   │   │   ├── steady_vector_test.cpp
│   │   │   ├── str_util.cpp
│   │   │   ├── str_util.h
│   │   │   ├── stream_type.h
│   │   │   ├── stride.cpp
│   │   │   ├── stride.h
│   │   │   ├── switch_func.h
│   │   │   ├── symbol.h
│   │   │   ├── symbol_test.cpp
│   │   │   ├── tensor_buffer.cpp
│   │   │   ├── tensor_buffer.h
│   │   │   ├── tensor_desc.cpp
│   │   │   ├── tensor_desc.h
│   │   │   ├── tensor_meta.cpp
│   │   │   ├── tensor_meta.h
│   │   │   ├── test_util.h
│   │   │   ├── thread_local_guard.h
│   │   │   ├── thread_local_guard_test.cpp
│   │   │   ├── throw.h
│   │   │   ├── to_string.h
│   │   │   ├── tuple_hash.h
│   │   │   ├── type_traits.h
│   │   │   ├── util.cpp
│   │   │   ├── util.h
│   │   │   ├── wrap_dim_utils.h
│   │   │   └── zero_only_zip.h
│   │   ├── control/
│   │   │   ├── bootstrap_client.h
│   │   │   ├── bootstrap_server.h
│   │   │   ├── control.proto
│   │   │   ├── ctrl_bootstrap.cpp
│   │   │   ├── ctrl_bootstrap.h
│   │   │   ├── ctrl_bootstrap.proto
│   │   │   ├── ctrl_call.h
│   │   │   ├── ctrl_client.cpp
│   │   │   ├── ctrl_client.h
│   │   │   ├── ctrl_server.cpp
│   │   │   ├── ctrl_server.h
│   │   │   ├── ctrl_service.cpp
│   │   │   ├── ctrl_service.h
│   │   │   ├── ctrl_test.cpp
│   │   │   ├── ctrl_util.cpp
│   │   │   ├── ctrl_util.h
│   │   │   ├── global_process_ctx.h
│   │   │   ├── host_list_bootstrap_client.cpp
│   │   │   ├── host_list_bootstrap_client.h
│   │   │   ├── host_list_bootstrap_server.cpp
│   │   │   ├── host_list_bootstrap_server.h
│   │   │   ├── rank_info_bootstrap_client.cpp
│   │   │   ├── rank_info_bootstrap_client.h
│   │   │   ├── rank_info_bootstrap_server.cpp
│   │   │   ├── rank_info_bootstrap_server.h
│   │   │   ├── rpc_client.cpp
│   │   │   ├── rpc_client.h
│   │   │   ├── rpc_server.cpp
│   │   │   ├── rpc_server.h
│   │   │   └── worker_process_info.proto
│   │   ├── cuda/
│   │   │   ├── atomic.cuh
│   │   │   ├── elementwise.cuh
│   │   │   ├── layer_norm.cuh
│   │   │   ├── rms_norm.cuh
│   │   │   ├── softmax.cuh
│   │   │   └── unique.cuh
│   │   ├── device/
│   │   │   ├── cuda_pseudo_bfloat16.h
│   │   │   ├── cuda_pseudo_half.h
│   │   │   ├── cuda_util.cpp
│   │   │   ├── cuda_util.h
│   │   │   ├── cudnn_conv_util.cpp
│   │   │   ├── cudnn_conv_util.h
│   │   │   ├── cudnn_util.cpp
│   │   │   ├── cudnn_util.h
│   │   │   ├── device_id.cpp
│   │   │   ├── device_id.h
│   │   │   ├── ep_based_event_record.h
│   │   │   ├── event_record.h
│   │   │   ├── nccl_util.cpp
│   │   │   └── nccl_util.h
│   │   ├── eager/
│   │   │   ├── call_context.cpp
│   │   │   ├── call_context.h
│   │   │   ├── dev_vm_dep_object_consume_mode.h
│   │   │   ├── eager_blob_object.cpp
│   │   │   ├── eager_blob_object.h
│   │   │   ├── local_dep_object.cpp
│   │   │   ├── local_dep_object.h
│   │   │   ├── tensor_storage.cpp
│   │   │   └── tensor_storage.h
│   │   ├── embedding/
│   │   │   ├── cache.cpp
│   │   │   ├── cache.h
│   │   │   ├── cache_test.cpp
│   │   │   ├── cached_key_value_store.cu
│   │   │   ├── cached_key_value_store.h
│   │   │   ├── embedding_manager.cpp
│   │   │   ├── embedding_manager.h
│   │   │   ├── full_cache.cu
│   │   │   ├── full_cache.h
│   │   │   ├── hash_functions.cuh
│   │   │   ├── key_value_store.h
│   │   │   ├── key_value_store_options.h
│   │   │   ├── key_value_store_test.cpp
│   │   │   ├── kv_iterator.h
│   │   │   ├── lru_cache.cu
│   │   │   ├── lru_cache.h
│   │   │   ├── mock_key_value_store.cu
│   │   │   ├── mock_key_value_store.h
│   │   │   ├── persistent_table.cpp
│   │   │   ├── persistent_table.h
│   │   │   ├── persistent_table_key_value_store.cu
│   │   │   ├── persistent_table_key_value_store.h
│   │   │   └── posix_file.h
│   │   ├── ep/
│   │   │   ├── common/
│   │   │   │   ├── active_device_guard.cpp
│   │   │   │   ├── device.cpp
│   │   │   │   ├── device_manager_registry.cpp
│   │   │   │   ├── onednn.h
│   │   │   │   └── primitive/
│   │   │   │       ├── add.cpp
│   │   │   │       ├── batch_matmul.cpp
│   │   │   │       ├── binary_functor.h
│   │   │   │       ├── broadcast_elementwise_binary.h
│   │   │   │       ├── broadcast_elementwise_unary.h
│   │   │   │       ├── broadcast_matmul.h
│   │   │   │       ├── broadcast_simplify_dims_test.cpp
│   │   │   │       ├── constant_pad.h
│   │   │   │       ├── copy_nd.h
│   │   │   │       ├── elementwise_unary.h
│   │   │   │       ├── matmul.cpp
│   │   │   │       ├── permute.h
│   │   │   │       ├── permute_impl.h
│   │   │   │       ├── permute_test.cpp
│   │   │   │       ├── unary_functor.h
│   │   │   │       ├── util.h
│   │   │   │       └── where.h
│   │   │   ├── cpu/
│   │   │   │   ├── cpu_device.cpp
│   │   │   │   ├── cpu_device.h
│   │   │   │   ├── cpu_device_manager.cpp
│   │   │   │   ├── cpu_device_manager.h
│   │   │   │   ├── cpu_device_manager_factory.cpp
│   │   │   │   ├── cpu_event.cpp
│   │   │   │   ├── cpu_event.h
│   │   │   │   ├── cpu_random_generator.cpp
│   │   │   │   ├── cpu_random_generator.h
│   │   │   │   ├── cpu_stream.cpp
│   │   │   │   ├── cpu_stream.h
│   │   │   │   └── primitive/
│   │   │   │       ├── add.cpp
│   │   │   │       ├── binary_functor.h
│   │   │   │       ├── broadcast_elementwise_binary.cpp
│   │   │   │       ├── broadcast_elementwise_unary.cpp
│   │   │   │       ├── broadcast_matmul.cpp
│   │   │   │       ├── cast.cpp
│   │   │   │       ├── constant_pad.cpp
│   │   │   │       ├── copy_nd.cpp
│   │   │   │       ├── elementwise_unary.cpp
│   │   │   │       ├── fill.cpp
│   │   │   │       ├── memcpy.cpp
│   │   │   │       ├── memset.cpp
│   │   │   │       ├── permute.cpp
│   │   │   │       ├── softmax.cpp
│   │   │   │       ├── softmax_backward.cpp
│   │   │   │       ├── tensor_fill.cpp
│   │   │   │       ├── type_seq.h
│   │   │   │       ├── unary_functor.h
│   │   │   │       └── where.cpp
│   │   │   ├── cuda/
│   │   │   │   ├── cuda_device.cpp
│   │   │   │   ├── cuda_device.h
│   │   │   │   ├── cuda_device_manager.cpp
│   │   │   │   ├── cuda_device_manager.h
│   │   │   │   ├── cuda_device_manager_factory.cpp
│   │   │   │   ├── cuda_event.cpp
│   │   │   │   ├── cuda_event.h
│   │   │   │   ├── cuda_matmul_mode.cpp
│   │   │   │   ├── cuda_matmul_mode.h
│   │   │   │   ├── cuda_random_generator.cpp
│   │   │   │   ├── cuda_random_generator.h
│   │   │   │   ├── cuda_stream.cpp
│   │   │   │   ├── cuda_stream.h
│   │   │   │   └── primitive/
│   │   │   │       ├── add.cu
│   │   │   │       ├── binary_functor.cuh
│   │   │   │       ├── broadcast_elementwise_binary.cu
│   │   │   │       ├── broadcast_elementwise_binary.cuh
│   │   │   │       ├── broadcast_elementwise_binary_activation_grad_0.cu
│   │   │   │       ├── broadcast_elementwise_binary_activation_grad_1.cu
│   │   │   │       ├── broadcast_elementwise_binary_activation_grad_2.cu
│   │   │   │       ├── broadcast_elementwise_binary_bitwise.cu
│   │   │   │       ├── broadcast_elementwise_binary_comparision_0.cu
│   │   │   │       ├── broadcast_elementwise_binary_comparision_1.cu
│   │   │   │       ├── broadcast_elementwise_binary_comparision_complex.cu
│   │   │   │       ├── broadcast_elementwise_binary_logical.cu
│   │   │   │       ├── broadcast_elementwise_binary_math_0.cu
│   │   │   │       ├── broadcast_elementwise_binary_math_1.cu
│   │   │   │       ├── broadcast_elementwise_binary_math_2.cu
│   │   │   │       ├── broadcast_elementwise_binary_math_complex.cu
│   │   │   │       ├── broadcast_elementwise_unary.cu
│   │   │   │       ├── broadcast_matmul.cpp
│   │   │   │       ├── cast.cu
│   │   │   │       ├── constant_pad.cu
│   │   │   │       ├── copy_nd.cu
│   │   │   │       ├── elementwise_unary.cu
│   │   │   │       ├── fill.cu
│   │   │   │       ├── math_elementwise_unary_math_grad_0.cu
│   │   │   │       ├── math_elementwise_unary_math_grad_1.cu
│   │   │   │       ├── math_elementwise_unary_math_grad_2.cu
│   │   │   │       ├── math_elementwise_unary_math_grad_3.cu
│   │   │   │       ├── math_elementwise_unary_math_grad_complex.cu
│   │   │   │       ├── memcpy.cpp
│   │   │   │       ├── memset.cpp
│   │   │   │       ├── permute.cu
│   │   │   │       ├── softmax.cu
│   │   │   │       ├── softmax_backward.cu
│   │   │   │       ├── tensor_fill.cu
│   │   │   │       ├── type_seq.h
│   │   │   │       ├── unary_functor.cuh
│   │   │   │       └── where.cu
│   │   │   ├── include/
│   │   │   │   ├── active_device_guard.h
│   │   │   │   ├── allocation_options.h
│   │   │   │   ├── device.h
│   │   │   │   ├── device_manager.h
│   │   │   │   ├── device_manager_factory.h
│   │   │   │   ├── device_manager_registry.h
│   │   │   │   ├── event.h
│   │   │   │   ├── primitive/
│   │   │   │   │   ├── add.h
│   │   │   │   │   ├── batch_matmul.h
│   │   │   │   │   ├── binary_op.h
│   │   │   │   │   ├── blas.h
│   │   │   │   │   ├── broadcast_elementwise_binary.h
│   │   │   │   │   ├── broadcast_elementwise_unary.h
│   │   │   │   │   ├── broadcast_matmul.h
│   │   │   │   │   ├── cast.h
│   │   │   │   │   ├── constant_pad.h
│   │   │   │   │   ├── copy_nd.h
│   │   │   │   │   ├── elementwise_unary.h
│   │   │   │   │   ├── fast_integer_math.h
│   │   │   │   │   ├── fill.h
│   │   │   │   │   ├── log_softmax.h
│   │   │   │   │   ├── log_softmax_backward.h
│   │   │   │   │   ├── matmul.h
│   │   │   │   │   ├── memcpy.h
│   │   │   │   │   ├── memset.h
│   │   │   │   │   ├── one_hot.h
│   │   │   │   │   ├── permute.h
│   │   │   │   │   ├── primitive.h
│   │   │   │   │   ├── softmax.h
│   │   │   │   │   ├── softmax_backward.h
│   │   │   │   │   ├── tensor_fill.h
│   │   │   │   │   ├── unary_op.h
│   │   │   │   │   └── where.h
│   │   │   │   ├── random_generator.h
│   │   │   │   └── stream.h
│   │   │   └── test/
│   │   │       ├── primitive/
│   │   │       │   ├── add_test.cpp
│   │   │       │   ├── batch_matmul_test.cpp
│   │   │       │   ├── binary_test.cpp
│   │   │       │   ├── broadcast_matmul_test.cpp
│   │   │       │   ├── cast_test.cpp
│   │   │       │   ├── constant_pad_test.cpp
│   │   │       │   ├── copy_nd_test.cpp
│   │   │       │   ├── elementwise_unary_test.cpp
│   │   │       │   ├── fill_test.cpp
│   │   │       │   ├── matmul_test.cpp
│   │   │       │   ├── memcpy_test.cpp
│   │   │       │   ├── memset_test.cpp
│   │   │       │   ├── permute_test.cpp
│   │   │       │   ├── primitive_test.h
│   │   │       │   ├── softmax_backward_test.cpp
│   │   │       │   ├── softmax_test.cpp
│   │   │       │   ├── unary_test.cpp
│   │   │       │   └── where_test.cpp
│   │   │       └── test_util.h
│   │   ├── framework/
│   │   │   ├── arg_tuple.cpp
│   │   │   ├── arg_tuple.h
│   │   │   ├── attr_map.cpp
│   │   │   ├── attr_map.h
│   │   │   ├── attr_map_test.cpp
│   │   │   ├── attr_value.cpp
│   │   │   ├── attr_value.h
│   │   │   ├── attr_value_accessor.cpp
│   │   │   ├── attr_value_accessor.h
│   │   │   ├── auto_random_generator.cpp
│   │   │   ├── auto_random_generator.h
│   │   │   ├── autocast.cpp
│   │   │   ├── autocast.h
│   │   │   ├── compute_complexity_fn_context.h
│   │   │   ├── config_def.cpp
│   │   │   ├── config_def.h
│   │   │   ├── config_def.proto
│   │   │   ├── consistency_check.cpp
│   │   │   ├── consistency_check.h
│   │   │   ├── device.cpp
│   │   │   ├── device.h
│   │   │   ├── dtype.cpp
│   │   │   ├── dtype.h
│   │   │   ├── eager_util.h
│   │   │   ├── framework.h
│   │   │   ├── get_nd_sbp_signature_list_context.h
│   │   │   ├── global_param_grad_sync_mode.cpp
│   │   │   ├── global_param_grad_sync_mode.h
│   │   │   ├── global_tensor_infer_cache.cpp
│   │   │   ├── global_tensor_infer_cache.h
│   │   │   ├── id_util.cpp
│   │   │   ├── id_util.h
│   │   │   ├── infer_nd_sbp_fn_context.h
│   │   │   ├── infer_output_blob_time_shape_fn_context.h
│   │   │   ├── infer_util.cpp
│   │   │   ├── infer_util.h
│   │   │   ├── instructions_builder.cpp
│   │   │   ├── instructions_builder.h
│   │   │   ├── layout.cpp
│   │   │   ├── layout.h
│   │   │   ├── load_library.cpp
│   │   │   ├── load_library.h
│   │   │   ├── local_tensor_infer_cache.cpp
│   │   │   ├── local_tensor_infer_cache.h
│   │   │   ├── multi_client_session_context.cpp
│   │   │   ├── multi_client_session_context.h
│   │   │   ├── multi_thread.cpp
│   │   │   ├── multi_thread.h
│   │   │   ├── mutable_attr_map.h
│   │   │   ├── nd_sbp.cpp
│   │   │   ├── nd_sbp.h
│   │   │   ├── nn_graph.cpp
│   │   │   ├── nn_graph.h
│   │   │   ├── nn_graph_if.h
│   │   │   ├── op_builder.cpp
│   │   │   ├── op_builder.h
│   │   │   ├── op_definition.h
│   │   │   ├── op_expr.cpp
│   │   │   ├── op_expr.h
│   │   │   ├── op_expr_grad_function.cpp
│   │   │   ├── op_expr_grad_function.h
│   │   │   ├── op_interpreter/
│   │   │   │   ├── dispatch_frame.cpp
│   │   │   │   ├── dispatch_frame.h
│   │   │   │   ├── eager_global_op_interpreter.cpp
│   │   │   │   ├── eager_local_op_interpreter.cpp
│   │   │   │   ├── eager_local_op_interpreter.h
│   │   │   │   ├── lazy_op_interpreter.cpp
│   │   │   │   ├── lazy_op_interpreter.h
│   │   │   │   ├── op_interpreter.cpp
│   │   │   │   ├── op_interpreter_util.cpp
│   │   │   │   └── op_interpreter_util.h
│   │   │   ├── op_interpreter.h
│   │   │   ├── op_kernel.cpp
│   │   │   ├── op_kernel.h
│   │   │   ├── op_kernel_infer_cache.cpp
│   │   │   ├── op_kernel_infer_cache.h
│   │   │   ├── ordered_string_list.h
│   │   │   ├── parallel_conf_util.cpp
│   │   │   ├── parallel_conf_util.h
│   │   │   ├── parallel_conf_util_test.cpp
│   │   │   ├── placed_nd_sbp.cpp
│   │   │   ├── placed_nd_sbp.h
│   │   │   ├── placement_sbp_util.cpp
│   │   │   ├── placement_sbp_util.h
│   │   │   ├── placement_sbp_util_test.cpp
│   │   │   ├── placement_utils.cpp
│   │   │   ├── placement_utils.h
│   │   │   ├── random_generator.cpp
│   │   │   ├── random_generator.h
│   │   │   ├── rank_group_rpc_util.cpp
│   │   │   ├── rank_group_rpc_util.h
│   │   │   ├── saved_tensor_hooks.h
│   │   │   ├── sbp_context.cpp
│   │   │   ├── sbp_context.h
│   │   │   ├── sbp_infer_util.cpp
│   │   │   ├── sbp_infer_util.h
│   │   │   ├── sbp_infer_util_test.cpp
│   │   │   ├── scope_util.cpp
│   │   │   ├── scope_util.h
│   │   │   ├── session_util.cpp
│   │   │   ├── session_util.h
│   │   │   ├── shut_down_util.cpp
│   │   │   ├── shut_down_util.h
│   │   │   ├── stream.cpp
│   │   │   ├── stream.h
│   │   │   ├── stream_allocator_is_pinned.h
│   │   │   ├── stream_get_stream_type_name.h
│   │   │   ├── stream_guard.cpp
│   │   │   ├── stream_guard.h
│   │   │   ├── stream_is_comm_net_stream.h
│   │   │   ├── stream_mgr.cpp
│   │   │   ├── stream_mgr.h
│   │   │   ├── stream_need_soft_sync.h
│   │   │   ├── stream_on_independent_thread.h
│   │   │   ├── stream_set.cpp
│   │   │   ├── stream_set.h
│   │   │   ├── stream_support_stream_wait.h
│   │   │   ├── symbol_storage_util.cpp
│   │   │   ├── symbol_storage_util.h
│   │   │   ├── sync_symbol_global_tensor_meta.cpp
│   │   │   ├── sync_symbol_global_tensor_meta.h
│   │   │   ├── sync_symbol_nd_sbp.cpp
│   │   │   ├── sync_symbol_nd_sbp.h
│   │   │   ├── sync_symbol_parallel_desc.cpp
│   │   │   ├── sync_symbol_parallel_desc.h
│   │   │   ├── synced_symbol_map.cpp
│   │   │   ├── synced_symbol_map.h
│   │   │   ├── tensor.cpp
│   │   │   ├── tensor.h
│   │   │   ├── tensor_arg.cpp
│   │   │   ├── tensor_arg.h
│   │   │   ├── tensor_global_id.cpp
│   │   │   ├── tensor_global_id.h
│   │   │   ├── tensor_impl.cpp
│   │   │   ├── tensor_impl.h
│   │   │   ├── tensor_methods.cpp
│   │   │   ├── tensor_methods.h
│   │   │   ├── tensor_name_scope.cpp
│   │   │   ├── tensor_name_scope.h
│   │   │   ├── tensor_rpc_util.cpp
│   │   │   ├── tensor_rpc_util.h
│   │   │   ├── tensor_storage.cpp
│   │   │   ├── tensor_storage.h
│   │   │   ├── tensor_tuple.cpp
│   │   │   ├── tensor_tuple.h
│   │   │   ├── tensor_util.cpp
│   │   │   ├── tensor_util.h
│   │   │   ├── to_string.cpp
│   │   │   ├── to_string.h
│   │   │   ├── transport_token.cpp
│   │   │   ├── transport_token.h
│   │   │   ├── transport_util.cpp
│   │   │   ├── transport_util.h
│   │   │   ├── user_op_attr.proto
│   │   │   ├── user_op_conf.cpp
│   │   │   ├── user_op_conf.h
│   │   │   ├── user_op_conf.proto
│   │   │   ├── user_op_def.cpp
│   │   │   ├── user_op_def.h
│   │   │   ├── user_op_def.proto
│   │   │   ├── user_op_hob.h
│   │   │   ├── user_op_kernel_registry.cpp
│   │   │   ├── user_op_kernel_registry.h
│   │   │   ├── user_op_registry.cpp
│   │   │   ├── user_op_registry.h
│   │   │   ├── user_op_registry_manager.cpp
│   │   │   ├── user_op_registry_manager.h
│   │   │   ├── user_op_tensor.h
│   │   │   ├── util.h
│   │   │   ├── variable_meta_info.proto
│   │   │   ├── variable_tensor_mgr.cpp
│   │   │   └── variable_tensor_mgr.h
│   │   ├── functional/
│   │   │   ├── function_library.h
│   │   │   ├── functional.h
│   │   │   ├── functional_api.yaml
│   │   │   ├── impl/
│   │   │   │   ├── activation_functor.cpp
│   │   │   │   ├── array_functor.cpp
│   │   │   │   ├── binary_functor.cpp
│   │   │   │   ├── binary_functor.h
│   │   │   │   ├── binary_grad_functor.cpp
│   │   │   │   ├── comm_functor.cpp
│   │   │   │   ├── common.cpp
│   │   │   │   ├── common.h
│   │   │   │   ├── dataset_functor.cpp
│   │   │   │   ├── eye_functor.cpp
│   │   │   │   ├── fused_attention_functor.cpp
│   │   │   │   ├── global_cast.cpp
│   │   │   │   ├── gradient_accumulation_functor.cpp
│   │   │   │   ├── higher_derivative_functor.cpp
│   │   │   │   ├── linalg_functor.cpp
│   │   │   │   ├── math_functor.cpp
│   │   │   │   ├── nn_functor.cpp
│   │   │   │   ├── nn_grad_functor.cpp
│   │   │   │   ├── quantization.cpp
│   │   │   │   ├── random_functor.cpp
│   │   │   │   ├── rnn_functor.cpp
│   │   │   │   ├── slice_boxing_functor.cpp
│   │   │   │   ├── test_functor.cpp
│   │   │   │   ├── unary_functor.cpp
│   │   │   │   ├── unary_functor.h
│   │   │   │   └── util_ops_functor.cpp
│   │   │   ├── packed_functor.h
│   │   │   ├── sequence_function.h
│   │   │   ├── tensor_index.cpp
│   │   │   ├── tensor_index.h
│   │   │   ├── tensor_processor.cpp
│   │   │   └── tensor_processor.h
│   │   ├── graph/
│   │   │   ├── boxing/
│   │   │   │   ├── b21_sub_task_graph_builder.cpp
│   │   │   │   ├── b21_sub_task_graph_builder.h
│   │   │   │   ├── boxing_logger.cpp
│   │   │   │   ├── boxing_logger.h
│   │   │   │   ├── ccl_sub_task_graph_builder.cpp
│   │   │   │   ├── ccl_sub_task_graph_builder.h
│   │   │   │   ├── chain_sub_task_graph_builder.cpp
│   │   │   │   ├── chain_sub_task_graph_builder.h
│   │   │   │   ├── collective_boxing.proto
│   │   │   │   ├── collective_boxing_sub_task_graph_builder.cpp
│   │   │   │   ├── collective_boxing_sub_task_graph_builder.h
│   │   │   │   ├── collective_boxing_util.cpp
│   │   │   │   ├── collective_boxing_util.h
│   │   │   │   ├── fallback_to_cpu_slice_boxing_sub_task_graph_builder.cpp
│   │   │   │   ├── fallback_to_cpu_slice_boxing_sub_task_graph_builder.h
│   │   │   │   ├── hierarchical_sub_task_graph_builder.h
│   │   │   │   ├── hierarchical_sub_task_graph_builder_impl.cpp
│   │   │   │   ├── hierarchical_sub_task_graph_builder_impl.h
│   │   │   │   ├── hierarchical_sub_task_graph_builder_util.cpp
│   │   │   │   ├── hierarchical_sub_task_graph_builder_util.h
│   │   │   │   ├── naive_b2b_sub_task_graph_builder.cpp
│   │   │   │   ├── naive_b2b_sub_task_graph_builder.h
│   │   │   │   ├── naive_b2p_sub_task_graph_builder.cpp
│   │   │   │   ├── naive_b2p_sub_task_graph_builder.h
│   │   │   │   ├── one_to_one_sub_task_graph_builder.cpp
│   │   │   │   ├── one_to_one_sub_task_graph_builder.h
│   │   │   │   ├── slice_boxing_sub_task_graph_builder.cpp
│   │   │   │   ├── slice_boxing_sub_task_graph_builder.h
│   │   │   │   ├── sub_task_graph_builder.h
│   │   │   │   ├── sub_task_graph_builder_context.cpp
│   │   │   │   ├── sub_task_graph_builder_context.h
│   │   │   │   ├── sub_task_graph_builder_status_util.cpp
│   │   │   │   ├── sub_task_graph_builder_status_util.h
│   │   │   │   ├── sub_task_graph_builder_util.cpp
│   │   │   │   └── sub_task_graph_builder_util.h
│   │   │   ├── boxing_identity_task_node.cpp
│   │   │   ├── boxing_identity_task_node.h
│   │   │   ├── boxing_task_graph.proto
│   │   │   ├── boxing_zeros_task_node.cpp
│   │   │   ├── boxing_zeros_task_node.h
│   │   │   ├── collective_boxing_pack_task_node.cpp
│   │   │   ├── collective_boxing_pack_task_node.h
│   │   │   ├── collective_boxing_task_node.cpp
│   │   │   ├── collective_boxing_task_node.h
│   │   │   ├── collective_boxing_unpack_task_node.cpp
│   │   │   ├── collective_boxing_unpack_task_node.h
│   │   │   ├── compute_task_node.cpp
│   │   │   ├── compute_task_node.h
│   │   │   ├── copy_task_node.cpp
│   │   │   ├── copy_task_node.h
│   │   │   ├── exec_graph.cpp
│   │   │   ├── exec_graph.h
│   │   │   ├── exec_sequence.proto
│   │   │   ├── fake_consumed_regst_provider.h
│   │   │   ├── graph.h
│   │   │   ├── inplace_lbi_graph.cpp
│   │   │   ├── inplace_lbi_graph.h
│   │   │   ├── inplace_regst_graph.cpp
│   │   │   ├── inplace_regst_graph.h
│   │   │   ├── nccl_send_recv_boxing_task_node.cpp
│   │   │   ├── nccl_send_recv_boxing_task_node.h
│   │   │   ├── node.cpp
│   │   │   ├── node.h
│   │   │   ├── normal_forward_compute_task_node.h
│   │   │   ├── op_graph.cpp
│   │   │   ├── op_graph.h
│   │   │   ├── plan_task_graph.cpp
│   │   │   ├── plan_task_graph.h
│   │   │   ├── slice_boxing_task_node.cpp
│   │   │   ├── slice_boxing_task_node.h
│   │   │   ├── straighten_nodes.cpp
│   │   │   ├── straighten_nodes.h
│   │   │   ├── stream_id.cpp
│   │   │   ├── stream_id.h
│   │   │   ├── stream_index_generator.cpp
│   │   │   ├── stream_index_generator.h
│   │   │   ├── task_edge.proto
│   │   │   ├── task_graph.cpp
│   │   │   ├── task_graph.h
│   │   │   ├── task_graph_rebuild_ctx.cpp
│   │   │   ├── task_graph_rebuild_ctx.h
│   │   │   ├── task_id.cpp
│   │   │   ├── task_id.h
│   │   │   ├── task_id_generator.cpp
│   │   │   ├── task_id_generator.h
│   │   │   ├── task_node.cpp
│   │   │   ├── task_node.h
│   │   │   ├── task_stream_id.h
│   │   │   ├── task_stream_index_manager.cpp
│   │   │   ├── task_stream_index_manager.h
│   │   │   ├── task_type_visitor.h
│   │   │   ├── transport_task_node.cpp
│   │   │   └── transport_task_node.h
│   │   ├── graph_impl/
│   │   │   ├── acc_compute_task_node.cpp
│   │   │   ├── acc_ctrl_tick_compute_task_node.cpp
│   │   │   ├── acc_tick_compute_task_node.cpp
│   │   │   ├── callback_notify_compute_task_node.cpp
│   │   │   ├── case_compute_task_node.cpp
│   │   │   ├── critical_section_wait_compute_task_node.cpp
│   │   │   ├── decode_h2d_compute_task_node.cpp
│   │   │   ├── device_tick_compute_task_node.cpp
│   │   │   ├── distribute_concat_compute_task_node.cpp
│   │   │   ├── distribute_split_compute_task_node.cpp
│   │   │   ├── dst_subset_tick_compute_task_node.cpp
│   │   │   ├── esac_compute_task_node.cpp
│   │   │   ├── normal_forward_compute_task_node.cpp
│   │   │   ├── pack_compute_task_node.cpp
│   │   │   ├── reentrant_lock_compute_task_node.cpp
│   │   │   ├── repeat_compute_task_node.cpp
│   │   │   ├── source_tick_compute_task_node.cpp
│   │   │   ├── src_subset_tick_compute_task_node.cpp
│   │   │   ├── ssp_variable_proxy_task_node.cpp
│   │   │   ├── tick_compute_task_node.cpp
│   │   │   ├── unpack_compute_task_node.cpp
│   │   │   └── wait_and_send_ids_compute_task_node.cpp
│   │   ├── hardware/
│   │   │   ├── basic_device_descriptor_list.cpp
│   │   │   ├── basic_device_descriptor_list.h
│   │   │   ├── cuda_device_descriptor.cpp
│   │   │   ├── cuda_device_descriptor.h
│   │   │   ├── cuda_device_descriptor_class.cpp
│   │   │   ├── device_descriptor.h
│   │   │   ├── device_descriptor_class.cpp
│   │   │   ├── device_descriptor_class.h
│   │   │   ├── device_descriptor_list.h
│   │   │   ├── net_ib_device_descriptor.cpp
│   │   │   ├── net_ib_device_descriptor.h
│   │   │   ├── net_ib_device_descriptor_class.cpp
│   │   │   ├── net_socket_device_descriptor.cpp
│   │   │   ├── net_socket_device_descriptor.h
│   │   │   ├── net_socket_device_descriptor_class.cpp
│   │   │   ├── node_device_descriptor.cpp
│   │   │   ├── node_device_descriptor.h
│   │   │   ├── node_device_descriptor_manager.cpp
│   │   │   ├── node_device_descriptor_manager.h
│   │   │   ├── topology_descriptor.cpp
│   │   │   └── topology_descriptor.h
│   │   ├── intrusive/
│   │   │   ├── README.md
│   │   │   ├── base.h
│   │   │   ├── cpp_attribute.h
│   │   │   ├── dss.h
│   │   │   ├── dss_test.cpp
│   │   │   ├── flat_msg.h
│   │   │   ├── flat_msg_test.cpp
│   │   │   ├── flat_msg_view.h
│   │   │   ├── flat_msg_view_test.cpp
│   │   │   ├── for_each.h
│   │   │   ├── force_standard_layout.h
│   │   │   ├── force_standard_layout_test.cpp
│   │   │   ├── head_free_list.h
│   │   │   ├── head_free_list_test.cpp
│   │   │   ├── intrusive.h
│   │   │   ├── intrusive_core_test.cpp
│   │   │   ├── list.h
│   │   │   ├── list_hook.h
│   │   │   ├── list_hook_test.cpp
│   │   │   ├── list_test.cpp
│   │   │   ├── mutexed_list.h
│   │   │   ├── object_pool.h
│   │   │   ├── object_pool_test.cpp
│   │   │   ├── ref.h
│   │   │   ├── reflective.h
│   │   │   ├── shared_ptr.h
│   │   │   ├── skiplist.h
│   │   │   ├── skiplist_hook.h
│   │   │   ├── skiplist_hook_test.cpp
│   │   │   ├── skiplist_test.cpp
│   │   │   ├── static_counter.h
│   │   │   ├── static_counter_test.cpp
│   │   │   ├── struct_traits.h
│   │   │   └── struct_traits_test.cpp
│   │   ├── ipc/
│   │   │   ├── shared_memory.cpp
│   │   │   └── shared_memory.h
│   │   ├── job/
│   │   │   ├── blob_lifetime_signature.proto
│   │   │   ├── checkpointing_config_def.cpp
│   │   │   ├── cluster_instruction.cpp
│   │   │   ├── cluster_instruction.h
│   │   │   ├── cluster_instruction.proto
│   │   │   ├── collective_boxing/
│   │   │   │   ├── coordinator.h
│   │   │   │   ├── executor.cpp
│   │   │   │   ├── executor.h
│   │   │   │   ├── executor_backend.h
│   │   │   │   ├── executor_backend_manager.cpp
│   │   │   │   ├── executor_backend_manager.h
│   │   │   │   ├── nccl_executor_backend.cu
│   │   │   │   ├── request_store.cpp
│   │   │   │   ├── request_store.h
│   │   │   │   ├── runtime_request_info.h
│   │   │   │   ├── scheduler.cpp
│   │   │   │   ├── scheduler.h
│   │   │   │   ├── static_group_coordinator.cpp
│   │   │   │   └── static_group_coordinator.h
│   │   │   ├── compile_mode.cpp
│   │   │   ├── compile_mode.h
│   │   │   ├── compiler.cpp
│   │   │   ├── compiler.h
│   │   │   ├── critical_section.proto
│   │   │   ├── critical_section_desc.cpp
│   │   │   ├── critical_section_desc.h
│   │   │   ├── critical_section_instance.h
│   │   │   ├── distribute_hirarchy.proto
│   │   │   ├── dlnet_conf.proto
│   │   │   ├── eager_ccl_comm_manager.cpp
│   │   │   ├── eager_ccl_comm_manager.h
│   │   │   ├── eager_nccl_comm_manager.cpp
│   │   │   ├── eager_nccl_comm_manager.h
│   │   │   ├── env.proto
│   │   │   ├── env_desc.cpp
│   │   │   ├── env_desc.h
│   │   │   ├── env_global_objects_scope.cpp
│   │   │   ├── env_global_objects_scope.h
│   │   │   ├── function_config_def.cpp
│   │   │   ├── global_for.cpp
│   │   │   ├── global_for.h
│   │   │   ├── global_mode.cpp
│   │   │   ├── global_mode.h
│   │   │   ├── graph_scope_vars.cpp
│   │   │   ├── graph_scope_vars.h
│   │   │   ├── id_manager.cpp
│   │   │   ├── id_manager.h
│   │   │   ├── id_manager_test.cpp
│   │   │   ├── id_state.h
│   │   │   ├── initializer_conf.proto
│   │   │   ├── inter_job_mem_sharing_util.cpp
│   │   │   ├── inter_job_mem_sharing_util.h
│   │   │   ├── inter_user_job_info.proto
│   │   │   ├── intra_job_mem_sharing_util.cpp
│   │   │   ├── intra_job_mem_sharing_util.h
│   │   │   ├── job.proto
│   │   │   ├── job_build_and_infer_ctx.cpp
│   │   │   ├── job_build_and_infer_ctx.h
│   │   │   ├── job_build_and_infer_ctx_mgr.cpp
│   │   │   ├── job_build_and_infer_ctx_mgr.h
│   │   │   ├── job_builder.cpp
│   │   │   ├── job_builder.h
│   │   │   ├── job_conf.proto
│   │   │   ├── job_desc.cpp
│   │   │   ├── job_desc.h
│   │   │   ├── job_instance.h
│   │   │   ├── job_interpreter.cpp
│   │   │   ├── job_interpreter.h
│   │   │   ├── job_ir.cpp
│   │   │   ├── job_ir.h
│   │   │   ├── job_set.proto
│   │   │   ├── job_set_compile_ctx.h
│   │   │   ├── job_set_compile_ctx.proto
│   │   │   ├── lazy_mode.cpp
│   │   │   ├── lazy_mode.h
│   │   │   ├── learning_rate_schedule_conf.proto
│   │   │   ├── local_parallel.proto
│   │   │   ├── local_sig_infer_hint.h
│   │   │   ├── memory_share_strategy.cpp
│   │   │   ├── memory_share_strategy.h
│   │   │   ├── module_conf.proto
│   │   │   ├── nd_sbp_infer_hint.h
│   │   │   ├── nd_sbp_util.cpp
│   │   │   ├── nd_sbp_util.h
│   │   │   ├── oneflow.cpp
│   │   │   ├── oneflow.h
│   │   │   ├── parallel_conf_signature.proto
│   │   │   ├── parallel_desc.cpp
│   │   │   ├── parallel_desc.h
│   │   │   ├── parallel_desc_test.cpp
│   │   │   ├── parallel_signature.proto
│   │   │   ├── pipeline_config_def.cpp
│   │   │   ├── placement.proto
│   │   │   ├── placement_scope.cpp
│   │   │   ├── placement_scope.h
│   │   │   ├── plan.proto
│   │   │   ├── plan_util.cpp
│   │   │   ├── plan_util.h
│   │   │   ├── qat_config_def.cpp
│   │   │   ├── rank_compiler.cpp
│   │   │   ├── rank_compiler.h
│   │   │   ├── rank_group.cpp
│   │   │   ├── rank_group.h
│   │   │   ├── rank_group_scope.cpp
│   │   │   ├── rank_group_scope.h
│   │   │   ├── rank_group_test.cpp
│   │   │   ├── regularizer_conf.proto
│   │   │   ├── resource.proto
│   │   │   ├── resource_desc.cpp
│   │   │   ├── resource_desc.h
│   │   │   ├── runtime.cpp
│   │   │   ├── runtime.h
│   │   │   ├── runtime_buffer_managers_scope.cpp
│   │   │   ├── runtime_buffer_managers_scope.h
│   │   │   ├── runtime_buffers_scope.cpp
│   │   │   ├── runtime_buffers_scope.h
│   │   │   ├── runtime_context.cpp
│   │   │   ├── runtime_context.h
│   │   │   ├── runtime_job_descs.cpp
│   │   │   ├── runtime_job_descs.h
│   │   │   ├── sbp_infer_hint.h
│   │   │   ├── sbp_parallel.cpp
│   │   │   ├── sbp_parallel.h
│   │   │   ├── sbp_parallel.proto
│   │   │   ├── sbp_signature_builder.cpp
│   │   │   ├── sbp_signature_builder.h
│   │   │   ├── scope.cpp
│   │   │   ├── scope.h
│   │   │   ├── scope.proto
│   │   │   ├── session.cpp
│   │   │   ├── session.h
│   │   │   ├── ssp_config_def.cpp
│   │   │   ├── sub_plan.proto
│   │   │   ├── task.proto
│   │   │   ├── utils/
│   │   │   │   ├── progress_bar.cpp
│   │   │   │   └── progress_bar.h
│   │   │   ├── version.cpp
│   │   │   └── version.h
│   │   ├── job_rewriter/
│   │   │   ├── adadelta_optim.cpp
│   │   │   ├── adagrad_optm.cpp
│   │   │   ├── adam_optm.cpp
│   │   │   ├── add_ssp_variable_proxy.cpp
│   │   │   ├── auto_learning_rate.cpp
│   │   │   ├── auto_mixed_precision.cpp
│   │   │   ├── auto_mixed_precision.h
│   │   │   ├── auto_mixed_precision_lists.cpp
│   │   │   ├── auto_mixed_precision_lists.h
│   │   │   ├── auto_parallel.cpp
│   │   │   ├── auto_train_step.cpp
│   │   │   ├── autograd.cpp
│   │   │   ├── autograd.h
│   │   │   ├── autotick.cpp
│   │   │   ├── autotick.h
│   │   │   ├── boxing_with_middle_nodes.cpp
│   │   │   ├── boxing_with_middle_nodes.h
│   │   │   ├── calculation_pass.cpp
│   │   │   ├── calculation_pass.h
│   │   │   ├── checkpointing_pass.cpp
│   │   │   ├── clip_by_global_norm_job_pass_state.h
│   │   │   ├── clone_grad.cpp
│   │   │   ├── clone_grad.h
│   │   │   ├── cudnn_fused_normalization_add_relu_pass.cpp
│   │   │   ├── cutlass_conv_tuning_warmup_pass.cpp
│   │   │   ├── delay_variable_op_execution_pass.cpp
│   │   │   ├── device_tick_autotick.cpp
│   │   │   ├── do_parallel_cast_before_widening_type_cast_pass.cpp
│   │   │   ├── dump_blob_parallel_conf_pass.cpp
│   │   │   ├── dump_variable_info_pass.cpp
│   │   │   ├── dynamic_loss_scale_job_pass_state.h
│   │   │   ├── dynamic_loss_scale_schedule_pass.cpp
│   │   │   ├── eliminate_dead_nodes_pass.cpp
│   │   │   ├── fix_pipeline_stage_id_pass.cpp
│   │   │   ├── ftrl_optm.cpp
│   │   │   ├── fuse_add_to_output_pass.cpp
│   │   │   ├── fuse_bce_reduce_mean_fw_bw_pass.cpp
│   │   │   ├── fuse_cast_scale_pass.cpp
│   │   │   ├── fuse_consecutive_add_pass.cpp
│   │   │   ├── fuse_embedding_interaction_pass.cpp
│   │   │   ├── fuse_model_update_cast_pass.cpp
│   │   │   ├── fuse_update_ops_pass.cpp
│   │   │   ├── generate_optimizer_op_confs.cpp
│   │   │   ├── group_boxing_by_dst_parallel.cpp
│   │   │   ├── group_boxing_by_dst_parallel.h
│   │   │   ├── indexed_slices_optimizer_rewrite_pass.cpp
│   │   │   ├── input_autotick.cpp
│   │   │   ├── insert_nccl_logical_op_pass.cpp
│   │   │   ├── insert_pinned_identity_op_pass.cpp
│   │   │   ├── job_completer.cpp
│   │   │   ├── job_completer.h
│   │   │   ├── job_pass.cpp
│   │   │   ├── job_pass.h
│   │   │   ├── lamb_optm.cpp
│   │   │   ├── lars_optm.cpp
│   │   │   ├── logical_chain_pass.cpp
│   │   │   ├── momentum_optm.cpp
│   │   │   ├── multi_tensor_model_update.cpp
│   │   │   ├── nccl_logical_chain_strict_order_pass.cpp
│   │   │   ├── nccl_logical_op_fusion_pass.cpp
│   │   │   ├── normalization_exponential_average_auto_tick_rewrite_pass.cpp
│   │   │   ├── optimizer.cpp
│   │   │   ├── optimizer.h
│   │   │   ├── optimizer_placement_optimization_pass.cpp
│   │   │   ├── pass_util.cpp
│   │   │   ├── pass_util.h
│   │   │   ├── pipeline_buffer_pass.cpp
│   │   │   ├── prune_amp_white_identity_op_pass.cpp
│   │   │   ├── prune_cast_to_static_shape_op_pass.cpp
│   │   │   ├── prune_depend_op_pass.cpp
│   │   │   ├── prune_parallel_cast_op_pass.cpp
│   │   │   ├── prune_pinned_identity_op_pass.cpp
│   │   │   ├── quantization_aware_training.cpp
│   │   │   ├── replace_embedding_ops_pass.cpp
│   │   │   ├── rmsprop_optm.cpp
│   │   │   ├── sequential_one_embedding_shuffle_ops_pass.cpp
│   │   │   ├── sgd_optm.cpp
│   │   │   ├── source_user_op_auto_tick.cpp
│   │   │   ├── split_sparse_softmax_cross_entropy_op_pass.cpp
│   │   │   ├── system_op_fill_job_name_pass.cpp
│   │   │   ├── tick_autotick.cpp
│   │   │   └── variable_autotick.cpp
│   │   ├── kernel/
│   │   │   ├── assign_kernel.cpp
│   │   │   ├── blob_access_checker_kernel_observer.cpp
│   │   │   ├── blob_access_checker_kernel_observer.h
│   │   │   ├── blob_tensor_view.cpp
│   │   │   ├── blob_tensor_view.h
│   │   │   ├── boxing_kernel.cpp
│   │   │   ├── boxing_zeros_kernel.cpp
│   │   │   ├── broadcast_to_compatible_with_kernel.cpp
│   │   │   ├── callback_notify_kernel.cpp
│   │   │   ├── case_kernel.cpp
│   │   │   ├── case_kernel.h
│   │   │   ├── chain_kernel_observer.cpp
│   │   │   ├── chain_kernel_observer.h
│   │   │   ├── collective_boxing_kernels.cpp
│   │   │   ├── collective_boxing_pack_kernel.cpp
│   │   │   ├── collective_boxing_unpack_kernel.cpp
│   │   │   ├── constant_like_kernel.cpp
│   │   │   ├── cpu_check_numerics_kernel_observer.h
│   │   │   ├── cpu_numerics_kernel_observer.cpp
│   │   │   ├── critical_section_callback_tick_kernel.cpp
│   │   │   ├── critical_section_wait_tick_kernel.cpp
│   │   │   ├── cuda_check_numerics_kernel_observer.cu
│   │   │   ├── cuda_check_numerics_kernel_observer.h
│   │   │   ├── cuda_graph_support.h
│   │   │   ├── distribute_kernels.cpp
│   │   │   ├── dynamic_reshape_kernel.cpp
│   │   │   ├── dynamic_reshape_like_kernel.cpp
│   │   │   ├── esac_kernel.cpp
│   │   │   ├── esac_kernel.h
│   │   │   ├── identity_kernel.cpp
│   │   │   ├── image_decoder_random_crop_resize_kernel.cpp
│   │   │   ├── input_kernel.cpp
│   │   │   ├── kernel.cpp
│   │   │   ├── kernel.h
│   │   │   ├── kernel.proto
│   │   │   ├── kernel_context.h
│   │   │   ├── kernel_observer.h
│   │   │   ├── kernel_registration.cpp
│   │   │   ├── kernel_registration.h
│   │   │   ├── kernel_util.cpp
│   │   │   ├── kernel_util.cuh
│   │   │   ├── kernel_util.h
│   │   │   ├── learning_rate_schedule_kernel.cpp
│   │   │   ├── nccl_send_recv_boxing_kernel.cpp
│   │   │   ├── new_kernel_util.h
│   │   │   ├── nop_kernel.cpp
│   │   │   ├── output_kernel.cpp
│   │   │   ├── profiler_kernel_observer.cpp
│   │   │   ├── profiler_kernel_observer.h
│   │   │   ├── random_generator.cpp
│   │   │   ├── random_generator.cu
│   │   │   ├── random_generator.h
│   │   │   ├── reentrant_lock_kernel.cpp
│   │   │   ├── reentrant_lock_kernel.h
│   │   │   ├── return_kernel.cpp
│   │   │   ├── runtime_blob_shape_infer_helper.cpp
│   │   │   ├── runtime_blob_shape_infer_helper.h
│   │   │   ├── shape_elem_cnt_kernel.cpp
│   │   │   ├── slice_boxing_kernel.cpp
│   │   │   ├── sync_check_kernel_observer.cpp
│   │   │   ├── sync_check_kernel_observer.h
│   │   │   ├── sync_dynamic_resize_kernel.cpp
│   │   │   ├── total_loss_instance_num_kernel.cpp
│   │   │   ├── user_kernel.cpp
│   │   │   ├── user_kernel.h
│   │   │   ├── util/
│   │   │   │   ├── cuda_half_util.h
│   │   │   │   ├── numeric_limits.cuh
│   │   │   │   └── numerics.cuh
│   │   │   ├── wait_and_send_ids_kernel.cpp
│   │   │   └── wait_and_send_ids_kernel.h
│   │   ├── lazy/
│   │   │   ├── actor/
│   │   │   │   ├── acc_actor.cpp
│   │   │   │   ├── acc_ctrl_tick_actor.cpp
│   │   │   │   ├── acc_tick_actor.cpp
│   │   │   │   ├── actor.cpp
│   │   │   │   ├── actor.h
│   │   │   │   ├── actor_base.cpp
│   │   │   │   ├── actor_base.h
│   │   │   │   ├── actor_context.cpp
│   │   │   │   ├── actor_context.h
│   │   │   │   ├── actor_message.cpp
│   │   │   │   ├── actor_message.h
│   │   │   │   ├── actor_message_bus.cpp
│   │   │   │   ├── actor_message_bus.h
│   │   │   │   ├── boxing_zeros_actor.cpp
│   │   │   │   ├── callback_notify_actor.cpp
│   │   │   │   ├── case_actor.cpp
│   │   │   │   ├── collective_boxing_actor_context.cpp
│   │   │   │   ├── collective_boxing_actor_context.h
│   │   │   │   ├── copy_comm_net_actor.cpp
│   │   │   │   ├── esac_actor.cpp
│   │   │   │   ├── generic_actor_context.cpp
│   │   │   │   ├── generic_actor_context.h
│   │   │   │   ├── input_wise_actor.cpp
│   │   │   │   ├── input_wise_actor.h
│   │   │   │   ├── light_actor.cpp
│   │   │   │   ├── light_actor.h
│   │   │   │   ├── naive_actor.cpp
│   │   │   │   ├── naive_actor.h
│   │   │   │   ├── pack_actor.cpp
│   │   │   │   ├── reentrant_lock_actor.cpp
│   │   │   │   ├── register_slot.cpp
│   │   │   │   ├── register_slot.h
│   │   │   │   ├── repeat_actor.cpp
│   │   │   │   ├── sink_actor.cpp
│   │   │   │   ├── sink_actor.h
│   │   │   │   ├── source_tick_actor.cpp
│   │   │   │   ├── ssp_variable_proxy_actor.cpp
│   │   │   │   ├── tick_actor.cpp
│   │   │   │   ├── unpack_actor.cpp
│   │   │   │   └── wait_and_send_ids_actor.cpp
│   │   │   └── stream_context/
│   │   │       ├── common/
│   │   │       │   └── generic_stream_context.cpp
│   │   │       ├── cpu/
│   │   │       │   └── cpu_stream_context.cpp
│   │   │       ├── cuda/
│   │   │       │   └── cuda_stream_context.cpp
│   │   │       └── include/
│   │   │           ├── generic_stream_context.h
│   │   │           └── stream_context.h
│   │   ├── memory/
│   │   │   ├── chunk_manager.cpp
│   │   │   ├── chunk_manager.h
│   │   │   ├── memory_allocator.cpp
│   │   │   ├── memory_allocator.h
│   │   │   ├── memory_block.proto
│   │   │   ├── memory_case.proto
│   │   │   ├── memory_case_util.cpp
│   │   │   ├── memory_case_util.h
│   │   │   ├── memory_zone.cpp
│   │   │   └── memory_zone.h
│   │   ├── ndarray/
│   │   │   ├── binary_func.h
│   │   │   ├── cpu_concat_var_ndarray.h
│   │   │   ├── cpu_concat_var_ndarray_test.cpp
│   │   │   ├── cpu_ndarray.h
│   │   │   ├── cpu_ndarray_builder.h
│   │   │   ├── cpu_ndarray_copy.h
│   │   │   ├── cpu_slice_var_ndarray.h
│   │   │   ├── cpu_slice_var_ndarray_test.cpp
│   │   │   ├── cpu_var_ndarray.h
│   │   │   ├── cpu_var_ndarray_test.cpp
│   │   │   ├── ndarray_apply_binary.h
│   │   │   ├── ndarray_apply_binary_core.cpp
│   │   │   ├── ndarray_apply_binary_core.cu
│   │   │   ├── ndarray_apply_binary_core.h
│   │   │   ├── ndarray_apply_broadcast_binary.h
│   │   │   ├── ndarray_apply_broadcast_binary_core.cpp
│   │   │   ├── ndarray_apply_broadcast_binary_core.cu
│   │   │   ├── ndarray_apply_broadcast_binary_core.h
│   │   │   ├── ndarray_apply_broadcast_unary.h
│   │   │   ├── ndarray_apply_broadcast_unary_core.cpp
│   │   │   ├── ndarray_apply_broadcast_unary_core.cu
│   │   │   ├── ndarray_apply_broadcast_unary_core.h
│   │   │   ├── ndarray_apply_unary.h
│   │   │   ├── ndarray_apply_unary_core.cpp
│   │   │   ├── ndarray_apply_unary_core.cu
│   │   │   ├── ndarray_apply_unary_core.h
│   │   │   ├── ndarray_assign_core.cpp
│   │   │   ├── ndarray_assign_core.cu
│   │   │   ├── ndarray_assign_core.h
│   │   │   ├── ndarray_reduce.h
│   │   │   ├── ndarray_reduce_impl.cpp
│   │   │   ├── ndarray_reduce_impl.cu
│   │   │   ├── ndarray_reduce_impl.h
│   │   │   ├── ndarray_util.h
│   │   │   ├── slice.cpp
│   │   │   ├── slice.h
│   │   │   ├── slice_test.cpp
│   │   │   ├── unary_func.h
│   │   │   ├── xpu_binary_func_ndarray.h
│   │   │   ├── xpu_broadcast_ndarray.h
│   │   │   ├── xpu_ndarray_assign.cu
│   │   │   ├── xpu_ndarray_assign.h
│   │   │   ├── xpu_ndarray_base.h
│   │   │   ├── xpu_reduced_ndarray.h
│   │   │   ├── xpu_reshape_ndarray.h
│   │   │   ├── xpu_shape.cpp
│   │   │   ├── xpu_shape.h
│   │   │   ├── xpu_transpose_ndarray.h
│   │   │   ├── xpu_unary_func_ndarray.h
│   │   │   ├── xpu_util.h
│   │   │   ├── xpu_var_ndarray.h
│   │   │   └── xpu_var_ndarray_builder.h
│   │   ├── operator/
│   │   │   ├── acc_tick_op.cpp
│   │   │   ├── acc_tick_op.h
│   │   │   ├── arg_modifier_signature.proto
│   │   │   ├── assign_op.cpp
│   │   │   ├── boxing_identity_op.cpp
│   │   │   ├── boxing_op.cpp
│   │   │   ├── boxing_op.h
│   │   │   ├── boxing_zeros_op.cpp
│   │   │   ├── broadcast_to_compatible_with_op.cpp
│   │   │   ├── callback_notify_op.cpp
│   │   │   ├── callback_notify_op.h
│   │   │   ├── case_op.cpp
│   │   │   ├── case_op.h
│   │   │   ├── collective_boxing_ops.cpp
│   │   │   ├── collective_boxing_pack_op.cpp
│   │   │   ├── collective_boxing_unpack_op.cpp
│   │   │   ├── constant_like_op.cpp
│   │   │   ├── copy_comm_net_op.cpp
│   │   │   ├── copy_comm_net_op.h
│   │   │   ├── critical_section_callback_tick_op.cpp
│   │   │   ├── critical_section_wait_tick_op.cpp
│   │   │   ├── cwise_op.cpp
│   │   │   ├── cwise_op.h
│   │   │   ├── decode_random_op.h
│   │   │   ├── device_tick_op.cpp
│   │   │   ├── device_tick_op.h
│   │   │   ├── distribute_add_op.cpp
│   │   │   ├── distribute_clone_op.cpp
│   │   │   ├── distribute_concat_op.cpp
│   │   │   ├── distribute_split_op.cpp
│   │   │   ├── dst_subset_tick_op.cpp
│   │   │   ├── dynamic_reshape_op.cpp
│   │   │   ├── esac_op.cpp
│   │   │   ├── esac_op.h
│   │   │   ├── identity_op.cpp
│   │   │   ├── image_decoder_random_crop_resize_op.cpp
│   │   │   ├── input_op.cpp
│   │   │   ├── input_op.h
│   │   │   ├── interface_blob_conf.proto
│   │   │   ├── interface_op_util.cpp
│   │   │   ├── interface_op_util.h
│   │   │   ├── learning_rate_schedule_op.cpp
│   │   │   ├── nccl_send_recv_boxing_op.cpp
│   │   │   ├── nccl_send_recv_boxing_op_util.cpp
│   │   │   ├── nccl_send_recv_boxing_op_util.h
│   │   │   ├── op_attribute.proto
│   │   │   ├── op_conf.proto
│   │   │   ├── op_conf_symbol.cpp
│   │   │   ├── op_conf_symbol.h
│   │   │   ├── op_conf_util.h
│   │   │   ├── op_infer_cache.h
│   │   │   ├── op_node_signature.proto
│   │   │   ├── operator.cpp
│   │   │   ├── operator.h
│   │   │   ├── operator_util.cpp
│   │   │   ├── operator_util.h
│   │   │   ├── output_op.cpp
│   │   │   ├── output_op.h
│   │   │   ├── reduce_sbp_util.cpp
│   │   │   ├── reduce_sbp_util.h
│   │   │   ├── reentrant_lock_op.cpp
│   │   │   ├── reentrant_lock_op.h
│   │   │   ├── return_op.cpp
│   │   │   ├── return_op.h
│   │   │   ├── scalar_op_base.cpp
│   │   │   ├── scalar_op_base.h
│   │   │   ├── shape_elem_cnt_op.cpp
│   │   │   ├── shape_elem_cnt_op.h
│   │   │   ├── sink_tick_op.cpp
│   │   │   ├── sink_tick_op.h
│   │   │   ├── slice_boxing_op.cpp
│   │   │   ├── source_tick_op.cpp
│   │   │   ├── source_tick_op.h
│   │   │   ├── src_subset_tick_op.cpp
│   │   │   ├── sync_dynamic_resize_op.cpp
│   │   │   ├── tick_op.cpp
│   │   │   ├── tick_op.h
│   │   │   ├── total_loss_instance_num_op.cpp
│   │   │   ├── total_loss_instance_num_op.h
│   │   │   ├── user_op.cpp
│   │   │   ├── user_op.h
│   │   │   ├── variable_op.cpp
│   │   │   ├── variable_op.h
│   │   │   ├── wait_and_send_ids_op.cpp
│   │   │   └── wait_and_send_ids_op.h
│   │   ├── persistence/
│   │   │   ├── binary_in_stream.h
│   │   │   ├── binary_in_stream_with_local_copy.cpp
│   │   │   ├── binary_in_stream_with_local_copy.h
│   │   │   ├── binary_in_stream_without_local_copy.cpp
│   │   │   ├── binary_in_stream_without_local_copy.h
│   │   │   ├── file_system.cpp
│   │   │   ├── file_system.h
│   │   │   ├── file_system_test.cpp
│   │   │   ├── hadoop/
│   │   │   │   ├── hadoop_file_system.cpp
│   │   │   │   ├── hadoop_file_system.h
│   │   │   │   └── hdfs.h
│   │   │   ├── persistent_in_stream.cpp
│   │   │   ├── persistent_in_stream.h
│   │   │   ├── persistent_out_stream.cpp
│   │   │   ├── persistent_out_stream.h
│   │   │   ├── posix/
│   │   │   │   ├── posix_file_system.cpp
│   │   │   │   └── posix_file_system.h
│   │   │   ├── stream_scanner.cpp
│   │   │   ├── stream_scanner.h
│   │   │   ├── tee_persistent_log_stream.cpp
│   │   │   └── tee_persistent_log_stream.h
│   │   ├── platform/
│   │   │   ├── include/
│   │   │   │   ├── ibv.h
│   │   │   │   ├── pthread_fork.h
│   │   │   │   └── wrapper.h
│   │   │   └── lib/
│   │   │       ├── ibv_wrapper.cpp
│   │   │       ├── pthread_fork.cpp
│   │   │       └── wrapper.cpp
│   │   ├── profiler/
│   │   │   ├── event.cpp
│   │   │   ├── event.h
│   │   │   ├── event_recorder.cpp
│   │   │   ├── event_recorder.h
│   │   │   ├── kernel.cpp
│   │   │   ├── kernel.h
│   │   │   ├── kineto_shim.cpp
│   │   │   ├── kineto_shim.h
│   │   │   ├── profile_manager.cpp
│   │   │   ├── profile_manager.h
│   │   │   ├── profiler.cpp
│   │   │   ├── profiler.h
│   │   │   └── util.h
│   │   ├── record/
│   │   │   ├── coco.proto
│   │   │   └── record.proto
│   │   ├── register/
│   │   │   ├── blob.cpp
│   │   │   ├── blob.h
│   │   │   ├── blob_desc.cpp
│   │   │   ├── blob_desc.h
│   │   │   ├── blob_desc.proto
│   │   │   ├── logical_blob_id.proto
│   │   │   ├── op_blob_arg.proto
│   │   │   ├── op_blob_arg_info.h
│   │   │   ├── register.cpp
│   │   │   ├── register.h
│   │   │   ├── register_desc.cpp
│   │   │   ├── register_desc.h
│   │   │   ├── register_desc.proto
│   │   │   ├── register_manager.cpp
│   │   │   ├── register_manager.h
│   │   │   ├── runtime_register_desc.cpp
│   │   │   ├── runtime_register_desc.h
│   │   │   ├── tensor_slice_copier.cpp
│   │   │   ├── tensor_slice_copier.h
│   │   │   ├── tensor_slice_view.cpp
│   │   │   ├── tensor_slice_view.h
│   │   │   └── tensor_slice_view.proto
│   │   ├── rpc/
│   │   │   ├── include/
│   │   │   │   ├── base.h
│   │   │   │   ├── ctrl.h
│   │   │   │   ├── global_process_ctx.h
│   │   │   │   ├── grpc.h
│   │   │   │   ├── local.h
│   │   │   │   └── manager.h
│   │   │   └── lib/
│   │   │       ├── global_process_ctx.cpp
│   │   │       ├── grpc.cpp
│   │   │       └── local.cpp
│   │   ├── summary/
│   │   │   ├── event.proto
│   │   │   ├── graph.proto
│   │   │   ├── plugin_data.proto
│   │   │   ├── projector.proto
│   │   │   ├── summary.proto
│   │   │   └── tensor.proto
│   │   ├── thread/
│   │   │   ├── is_main_thread_test.cpp
│   │   │   ├── thread.cpp
│   │   │   ├── thread.h
│   │   │   ├── thread_global_id.cpp
│   │   │   ├── thread_global_id.h
│   │   │   ├── thread_manager.cpp
│   │   │   ├── thread_manager.h
│   │   │   ├── thread_pool.cpp
│   │   │   ├── thread_pool.h
│   │   │   ├── thread_runtime.h
│   │   │   ├── thread_runtime_factory.cpp
│   │   │   └── thread_runtime_factory.h
│   │   ├── transport/
│   │   │   ├── transport.cpp
│   │   │   ├── transport.h
│   │   │   └── transport_message.h
│   │   └── vm/
│   │       ├── access_blob_arg_cb_instruction_policy.h
│   │       ├── allocate_tensor_instruction_policy.cpp
│   │       ├── allocate_tensor_instruction_policy.h
│   │       ├── allocator.h
│   │       ├── barrier_instruction_policy.h
│   │       ├── bin_allocator.h
│   │       ├── bin_allocator_test.cpp
│   │       ├── caching_allocator.h
│   │       ├── control_stream_policy.h
│   │       ├── critical_section_instruction_policy.cpp
│   │       ├── critical_section_instruction_policy.h
│   │       ├── critical_section_status_querier.h
│   │       ├── critical_section_stream_policy.cpp
│   │       ├── critical_section_stream_policy.h
│   │       ├── ep_backend_allocator.cpp
│   │       ├── ep_backend_allocator.h
│   │       ├── ep_backend_host_allocator.cpp
│   │       ├── ep_backend_host_allocator.h
│   │       ├── ep_d2h_stream_policy.cpp
│   │       ├── ep_d2h_stream_policy.h
│   │       ├── ep_event.cpp
│   │       ├── ep_event.h
│   │       ├── ep_optional_event_record_status_querier.cpp
│   │       ├── ep_optional_event_record_status_querier.h
│   │       ├── ep_record_event_instruction_policy.h
│   │       ├── ep_stream_policy.cpp
│   │       ├── ep_stream_policy.h
│   │       ├── ep_stream_policy_base.cpp
│   │       ├── ep_stream_policy_base.h
│   │       ├── event_recorded_ep_stream_policy.cpp
│   │       ├── event_recorded_ep_stream_policy.h
│   │       ├── fuse_instruction_policy.h
│   │       ├── global_sync_instruction_policy.h
│   │       ├── instruction.cpp
│   │       ├── instruction.h
│   │       ├── instruction_fuse_type.h
│   │       ├── instruction_policy.cpp
│   │       ├── instruction_policy.h
│   │       ├── instruction_policy_util.h
│   │       ├── lazy_job_instruction_policy.h
│   │       ├── lazy_job_stream_policy.cpp
│   │       ├── lazy_job_stream_policy.h
│   │       ├── naive_instruction_status_querier.h
│   │       ├── op_call_instruction_policy.cpp
│   │       ├── op_call_instruction_policy.h
│   │       ├── pinned_ep_stream_policy.cpp
│   │       ├── pinned_ep_stream_policy.h
│   │       ├── probe.h
│   │       ├── ref_cnt_instruction_status_querier.h
│   │       ├── release_tensor_instruction_policy.h
│   │       ├── remat/
│   │       │   ├── allocator.cpp
│   │       │   ├── allocator.h
│   │       │   ├── disjoint_set.cpp
│   │       │   ├── disjoint_set.h
│   │       │   ├── env.cpp
│   │       │   ├── env.h
│   │       │   ├── util.cpp
│   │       │   └── util.h
│   │       ├── stream.cpp
│   │       ├── stream.h
│   │       ├── stream_create_stream_policy.h
│   │       ├── stream_get_allocator_stream_type.h
│   │       ├── stream_policy.cpp
│   │       ├── stream_policy.h
│   │       ├── stream_record_event_instruction_policy.cpp
│   │       ├── stream_record_event_instruction_policy.h
│   │       ├── stream_wait_event_instruction_policy.cpp
│   │       ├── stream_wait_event_instruction_policy.h
│   │       ├── stream_wait_instruction_policy.cpp
│   │       ├── stream_wait_instruction_policy.h
│   │       ├── symbol_storage.cpp
│   │       ├── symbol_storage.h
│   │       ├── sync_access_instruction_policy.cpp
│   │       ├── sync_access_instruction_policy.h
│   │       ├── sync_vm_mode_guard.h
│   │       ├── thread_ctx.cpp
│   │       ├── thread_ctx.h
│   │       ├── thread_safe_guard.h
│   │       ├── touch_tensors_instruction_policy.h
│   │       ├── virtual_machine.cpp
│   │       ├── virtual_machine.h
│   │       ├── virtual_machine_engine.cpp
│   │       ├── virtual_machine_engine.h
│   │       ├── virtual_machine_scope.cpp
│   │       ├── virtual_machine_scope.h
│   │       ├── vm_object.cpp
│   │       ├── vm_object.h
│   │       ├── vm_sync.h
│   │       ├── vm_util.cpp
│   │       └── vm_util.h
│   ├── extension/
│   │   ├── python/
│   │   │   ├── numpy.cpp
│   │   │   ├── numpy.h
│   │   │   ├── numpy_internal.h
│   │   │   ├── py_compute.cpp
│   │   │   ├── py_compute.h
│   │   │   ├── py_kernel_caller.cpp
│   │   │   ├── py_kernel_caller.h
│   │   │   ├── py_kernel_registry.cpp
│   │   │   └── py_kernel_registry.h
│   │   └── stack/
│   │       ├── foreign_stack_getter.h
│   │       ├── python/
│   │       │   ├── custom_eval_frame.c
│   │       │   ├── custom_eval_frame.h
│   │       │   ├── stack_getter.cpp
│   │       │   └── stack_getter.h
│   │       └── stacktrace.h
│   ├── ir/
│   │   ├── .gitignore
│   │   ├── CMakeLists.txt
│   │   ├── README.md
│   │   ├── include/
│   │   │   ├── CMakeLists.txt
│   │   │   ├── OneFlow/
│   │   │   │   ├── CMakeLists.txt
│   │   │   │   ├── Conversion/
│   │   │   │   │   ├── NVVMToCubin.h
│   │   │   │   │   └── OneFlowToTosa.h
│   │   │   │   ├── Extension.h
│   │   │   │   ├── OKL/
│   │   │   │   │   ├── Conversion/
│   │   │   │   │   │   ├── Conversion.h
│   │   │   │   │   │   └── OKLToLLVM.h
│   │   │   │   │   ├── Kernel/
│   │   │   │   │   │   ├── ComputeContext.h
│   │   │   │   │   │   ├── InferContext.h
│   │   │   │   │   │   ├── InitContext.h
│   │   │   │   │   │   ├── JITEngine.h
│   │   │   │   │   │   ├── JITOpInfer.h
│   │   │   │   │   │   ├── LauncherContext.h
│   │   │   │   │   │   ├── LauncherState.h
│   │   │   │   │   │   ├── README.md
│   │   │   │   │   │   ├── RegContext.h
│   │   │   │   │   │   ├── TmpBufferManager.h
│   │   │   │   │   │   └── WrapperContext.h
│   │   │   │   │   ├── OKLAttributes.h
│   │   │   │   │   ├── OKLAttributes.td
│   │   │   │   │   ├── OKLBase.td
│   │   │   │   │   ├── OKLDialect.h
│   │   │   │   │   ├── OKLDialect.td
│   │   │   │   │   ├── OKLOps.h
│   │   │   │   │   ├── OKLOps.td
│   │   │   │   │   ├── OKLTypes.h
│   │   │   │   │   ├── OKLTypes.td
│   │   │   │   │   └── passes.h
│   │   │   │   ├── OKM/
│   │   │   │   │   ├── Conversion/
│   │   │   │   │   │   └── Conversion.h
│   │   │   │   │   ├── OKMAttributes.h
│   │   │   │   │   ├── OKMAttributes.td
│   │   │   │   │   ├── OKMBase.td
│   │   │   │   │   ├── OKMDialect.h
│   │   │   │   │   ├── OKMDialect.td
│   │   │   │   │   ├── OKMOps.h
│   │   │   │   │   ├── OKMOps.td
│   │   │   │   │   ├── OKMPasses.td
│   │   │   │   │   └── passes.h
│   │   │   │   ├── OneFlowBase.td
│   │   │   │   ├── OneFlowDataTypeConversion.h
│   │   │   │   ├── OneFlowDialect.h
│   │   │   │   ├── OneFlowDialect.td
│   │   │   │   ├── OneFlowEnums.td
│   │   │   │   ├── OneFlowInterfaces.td
│   │   │   │   ├── OneFlowOpGetGen.td
│   │   │   │   ├── OneFlowOpTraits.h
│   │   │   │   ├── OneFlowOps.h
│   │   │   │   ├── OneFlowOps.td
│   │   │   │   ├── OneFlowPDLLPatterns.h
│   │   │   │   ├── OneFlowPasses.td
│   │   │   │   ├── OneFlowPatternUtils.h
│   │   │   │   ├── OneFlowPatterns.td
│   │   │   │   ├── OneFlowSupport.h
│   │   │   │   ├── OneFlowTypes.h
│   │   │   │   ├── OneFlowUserOps.td
│   │   │   │   ├── OneFlowUtils.h
│   │   │   │   ├── Passes.h
│   │   │   │   ├── SBP/
│   │   │   │   │   ├── SBPAttributes.h
│   │   │   │   │   ├── SBPBase.td
│   │   │   │   │   ├── SBPDialect.h
│   │   │   │   │   ├── SBPDialect.td
│   │   │   │   │   ├── SBPImporter.h
│   │   │   │   │   └── SBPOps.td
│   │   │   │   ├── Transform/
│   │   │   │   │   ├── AggregateOps.h
│   │   │   │   │   ├── AutoNhwc.h
│   │   │   │   │   ├── BufferHostRegister.h
│   │   │   │   │   ├── CSEWithAttributesIgnored.h
│   │   │   │   │   ├── ConvertInferenceOp.h
│   │   │   │   │   ├── EliminateAllocOps.h
│   │   │   │   │   ├── FuncOps.h
│   │   │   │   │   ├── OneFlow MLIR CodeGen ABI.md
│   │   │   │   │   ├── OneFlowMemPool.h
│   │   │   │   │   ├── OneFlowStream.h
│   │   │   │   │   ├── OutlineAndFuse.h
│   │   │   │   │   ├── TraitFolder.h
│   │   │   │   │   └── TransposeHelpers.h
│   │   │   │   ├── UserOpConversion.h
│   │   │   │   └── UserOpReflection.h
│   │   │   └── Transform/
│   │   │       ├── CMakeLists.txt
│   │   │       ├── TransformDialectExtension.h
│   │   │       ├── TransformDialectExtension.td
│   │   │       └── TransformStateExtension.h
│   │   ├── install-llvm.cmake
│   │   ├── lib/
│   │   │   ├── CMakeLists.txt
│   │   │   ├── OneFlow/
│   │   │   │   ├── CMakeLists.txt
│   │   │   │   ├── Conversion/
│   │   │   │   │   ├── NVVMToCubin.cpp
│   │   │   │   │   ├── OneFlowToLinalg.cpp
│   │   │   │   │   └── OneFlowToTosa.cpp
│   │   │   │   ├── OKL/
│   │   │   │   │   ├── Conversion/
│   │   │   │   │   │   ├── Conversion.cpp
│   │   │   │   │   │   ├── CudaGraphSupport.cpp
│   │   │   │   │   │   └── OKLToLLVM.cpp
│   │   │   │   │   ├── Kernel/
│   │   │   │   │   │   ├── ComputeContext.cpp
│   │   │   │   │   │   ├── InferContext.cpp
│   │   │   │   │   │   ├── JITEngine.cpp
│   │   │   │   │   │   ├── JITOpInfer.cpp
│   │   │   │   │   │   ├── KernelLaunchOp.cpp
│   │   │   │   │   │   ├── LauncherContext.cpp
│   │   │   │   │   │   ├── LauncherState.cpp
│   │   │   │   │   │   ├── RegContext.cpp
│   │   │   │   │   │   └── TmpBufferManager.cpp
│   │   │   │   │   ├── OKLDialect.cpp
│   │   │   │   │   ├── OKLOps.cpp
│   │   │   │   │   ├── OKLTypes.cpp
│   │   │   │   │   └── README-OriginVersion.md
│   │   │   │   ├── OKM/
│   │   │   │   │   ├── Conversion/
│   │   │   │   │   │   └── Conversion.cpp
│   │   │   │   │   ├── OKMDialect.cpp
│   │   │   │   │   └── passes.cpp
│   │   │   │   ├── OneFlowCanonicalizers.cpp
│   │   │   │   ├── OneFlowDataTypeConversion.cpp
│   │   │   │   ├── OneFlowDialect.cpp
│   │   │   │   ├── OneFlowInferReturnTypes.cpp
│   │   │   │   ├── OneFlowOpFolders.cpp
│   │   │   │   ├── OneFlowOpGetGen.cpp.in
│   │   │   │   ├── OneFlowOpTraits.cpp
│   │   │   │   ├── OneFlowOps.cpp
│   │   │   │   ├── OneFlowRewrites.cpp
│   │   │   │   ├── OneFlowSupport.cpp
│   │   │   │   ├── OneFlowTypes.cpp
│   │   │   │   ├── OneFlowUtils.cpp
│   │   │   │   ├── PDLL/
│   │   │   │   │   ├── AllocEliminationPatterns.cpp
│   │   │   │   │   ├── AllocEliminationPatterns.pdll
│   │   │   │   │   ├── CMakeLists.txt
│   │   │   │   │   ├── ForwardOpPatterns.cpp
│   │   │   │   │   ├── ForwardOpPatterns.pdll
│   │   │   │   │   ├── FuseConv2DBatchNormPattern.cpp
│   │   │   │   │   ├── FuseConv2DBatchNormPattern.pdll
│   │   │   │   │   ├── FuseOpsWithBackwardImplPattern.cpp
│   │   │   │   │   ├── FuseOpsWithBackwardImplPattern.pdll
│   │   │   │   │   ├── NormalizationPatterns.cpp
│   │   │   │   │   ├── NormalizationPatterns.pdll
│   │   │   │   │   └── OneFlowPDLLUtils.pdll
│   │   │   │   ├── Passes.cpp
│   │   │   │   ├── SBP/
│   │   │   │   │   ├── SBPAttributes.cpp
│   │   │   │   │   ├── SBPDialect.cpp
│   │   │   │   │   └── SBPImporter.cpp
│   │   │   │   ├── Transform/
│   │   │   │   │   ├── AggregateOps.cpp
│   │   │   │   │   ├── AutoNHWCOps.cpp
│   │   │   │   │   ├── AutoNhwc.cpp
│   │   │   │   │   ├── BufferHostRegister.cpp
│   │   │   │   │   ├── CSEWithAttributesIgnored.cpp
│   │   │   │   │   ├── ConvertInferenceOp.cpp
│   │   │   │   │   ├── EliminateAllocOps.cpp
│   │   │   │   │   ├── FuncOps.cpp
│   │   │   │   │   ├── GroupMatMulOps.cpp
│   │   │   │   │   ├── JITPasses.cpp
│   │   │   │   │   ├── OneFlowMemPool.cpp
│   │   │   │   │   ├── OneFlowStream.cpp
│   │   │   │   │   ├── OutlineAndFuse.cpp
│   │   │   │   │   └── TraitFolder.cpp
│   │   │   │   ├── TransposeHelpers.cpp
│   │   │   │   ├── UserOpConversion.cpp
│   │   │   │   └── UserOpReflection.cpp
│   │   │   └── Transform/
│   │   │       ├── CMakeLists.txt
│   │   │       ├── TransformDialectExtension.cpp
│   │   │       ├── TransformDialectInterpreter.cpp
│   │   │       └── TransformStateExtension.cpp
│   │   ├── llvm-in-tree.cmake
│   │   ├── oneflow-extension/
│   │   │   ├── CMakeLists.txt
│   │   │   ├── README.md
│   │   │   ├── include/
│   │   │   │   ├── CMakeLists.txt
│   │   │   │   ├── OneFlow/
│   │   │   │   │   ├── CMakeLists.txt
│   │   │   │   │   ├── JITOpInfer.h
│   │   │   │   │   ├── OneFlowLRJITRegistry.h
│   │   │   │   │   └── OneFlowRoundTrip.h
│   │   │   │   └── PyAst/
│   │   │   │       ├── Ast.h
│   │   │   │       └── AstMlirGen.h
│   │   │   ├── ir_pass.cpp
│   │   │   ├── lr_jit.cpp
│   │   │   ├── mlir_gen.cpp
│   │   │   ├── mlir_jit_op.cpp
│   │   │   └── mlir_jit_op_kernel.cpp
│   │   ├── oneflow-lite/
│   │   │   ├── CMakeLists.txt
│   │   │   ├── OneFlowLiteCompileMain.cpp
│   │   │   ├── include/
│   │   │   │   └── OneFlow/
│   │   │   │       ├── ConvertToLiteExecutable.h
│   │   │   │       ├── FlatbufferUtils.h
│   │   │   │       ├── OneFlowLiteUtils.h
│   │   │   │       └── Transform/
│   │   │   │           ├── FoldVariable.h
│   │   │   │           ├── InferPlacement.h
│   │   │   │           ├── InsertTransferOp.h
│   │   │   │           ├── Lowering/
│   │   │   │           │   ├── LoweringAscend.h
│   │   │   │           │   └── LoweringAscendUtils.h
│   │   │   │           ├── LoweringLaunchJob.h
│   │   │   │           ├── MemoryPlanning.h
│   │   │   │           └── PartitionLaunchJob.h
│   │   │   ├── lib/
│   │   │   │   ├── CMakeLists.txt
│   │   │   │   └── OneFlow/
│   │   │   │       ├── CMakeLists.txt
│   │   │   │       ├── ConvertToLiteExecutable.cpp
│   │   │   │       ├── FlatbufferUtils.cpp
│   │   │   │       ├── OneFlowLiteUtils.cpp
│   │   │   │       ├── Transform/
│   │   │   │       │   ├── FoldVariable.cpp
│   │   │   │       │   ├── InferPlacement.cpp
│   │   │   │       │   ├── InsertTransferOp.cpp
│   │   │   │       │   ├── Lowering/
│   │   │   │       │   │   └── LoweringAscend.cpp
│   │   │   │       │   ├── LoweringLaunchJob.cpp
│   │   │   │       │   ├── MemoryPlanning.cpp
│   │   │   │       │   └── PartitionLaunchJob.cpp
│   │   │   │       └── cmake/
│   │   │   │           └── FindAscendSdk.cmake
│   │   │   └── schemas/
│   │   │       ├── CMakeLists.txt
│   │   │       ├── attributes/
│   │   │       │   ├── CMakeLists.txt
│   │   │       │   ├── bool.fbs
│   │   │       │   ├── f32.fbs
│   │   │       │   ├── f32s.fbs
│   │   │       │   ├── f64.fbs
│   │   │       │   ├── i32.fbs
│   │   │       │   ├── i32s.fbs
│   │   │       │   ├── i64.fbs
│   │   │       │   ├── i64s.fbs
│   │   │       │   ├── shape.fbs
│   │   │       │   ├── shapes.fbs
│   │   │       │   ├── str.fbs
│   │   │       │   └── strs.fbs
│   │   │       ├── executable.fbs
│   │   │       └── install_flatcc.cmake
│   │   ├── oneflow-opt/
│   │   │   ├── CMakeLists.txt
│   │   │   ├── README.md
│   │   │   └── oneflow-opt.cpp
│   │   ├── oneflow-runner/
│   │   │   ├── CMakeLists.txt
│   │   │   └── oneflow-runner.cpp
│   │   ├── oneflow-runtime/
│   │   │   ├── CMakeLists.txt
│   │   │   └── lib/
│   │   │       ├── CMakeLists.txt
│   │   │       └── Runtime.cpp
│   │   ├── oneflow-translate/
│   │   │   ├── CMakeLists.txt
│   │   │   ├── README.md
│   │   │   ├── include/
│   │   │   │   ├── CMakeLists.txt
│   │   │   │   └── OneFlow/
│   │   │   │       ├── CMakeLists.txt
│   │   │   │       └── MLIROneFlowTranslation.h
│   │   │   ├── lib/
│   │   │   │   ├── CMakeLists.txt
│   │   │   │   └── OneFlow/
│   │   │   │       ├── CMakeLists.txt
│   │   │   │       ├── Importer.cpp
│   │   │   │       └── MLIROneFlowTranslation.cpp
│   │   │   └── oneflow-translate.cpp
│   │   └── test/
│   │       ├── CMakeLists.txt
│   │       ├── Frontend/
│   │       │   ├── lit.local.cfg
│   │       │   ├── oneflow_to_iree.mlir
│   │       │   └── tosa_to_elf.mlir
│   │       ├── GPU/
│   │       │   ├── lit.local.cfg
│   │       │   └── nvvm_to_cubin.mlir
│   │       ├── OneFlow/
│   │       │   ├── auto_nhwc/
│   │       │   │   ├── lit.local.cfg
│   │       │   │   ├── test_nhwc_batchnorm_relu.py
│   │       │   │   ├── test_nhwc_bias_add.py
│   │       │   │   ├── test_nhwc_conv.py
│   │       │   │   ├── test_nhwc_conv2d_maxpool2d.py
│   │       │   │   ├── test_nhwc_conv_relu_add.py
│   │       │   │   ├── test_nhwc_lenet.py
│   │       │   │   ├── test_nhwc_maxpool_2d.py
│   │       │   │   ├── test_nhwc_resnet.py
│   │       │   │   ├── test_nhwc_transpose_eliminate.py
│   │       │   │   └── test_resnet101_benchmark.py
│   │       │   ├── conversion/
│   │       │   │   ├── lower_to_tosa.mlir
│   │       │   │   ├── lower_to_tosa_signed.mlir
│   │       │   │   └── oneflow_to_tosa.mlir
│   │       │   ├── cse.mlir
│   │       │   ├── cuda_code_gen/
│   │       │   │   ├── gpu_copy_arg.mlir
│   │       │   │   ├── lit.local.cfg
│   │       │   │   ├── test_append_oneflow_stream.mlir
│   │       │   │   ├── test_cast_ops_to_signless.mlir
│   │       │   │   ├── test_fold_alloc_to_subview.mlir
│   │       │   │   ├── test_fuser_cast_scale.py
│   │       │   │   ├── test_gpu_all_reduce.mlir
│   │       │   │   ├── test_insert_ofmempool.mlir
│   │       │   │   ├── test_matmul.py
│   │       │   │   ├── test_mgpu_to_oneflow_stream.mlir
│   │       │   │   └── tosa_to_linalg.mlir
│   │       │   ├── folding/
│   │       │   │   ├── test_conv_bn.py
│   │       │   │   └── test_simple_multiply.py
│   │       │   ├── fuse/
│   │       │   │   ├── fuse_forward_ops.mlir
│   │       │   │   ├── test_cast_optimal_pass.py
│   │       │   │   └── test_fuse_pad_conv.py
│   │       │   ├── group_matmul.mlir
│   │       │   ├── jit_outline_func.mlir
│   │       │   ├── kernel_launch/
│   │       │   │   ├── OKLPass/
│   │       │   │   │   ├── lower_launcher_to_llvm_ptr.mlir
│   │       │   │   │   ├── lower_okl_to_llvm_call.mlir
│   │       │   │   │   └── tag_cuda_graph_support.mlir
│   │       │   │   ├── OKMPass/
│   │       │   │   │   ├── extract_okm_tensor.mlir
│   │       │   │   │   ├── okm_to_okl.mlir
│   │       │   │   │   ├── opt_okm_memref.mlir
│   │       │   │   │   └── wrap_okm_kernel.mlir
│   │       │   │   ├── OneFlowPass/
│   │       │   │   │   ├── aggregate_compute_ops.mlir
│   │       │   │   │   └── wrap_ops_to_kernel_launch/
│   │       │   │   │       ├── cuda_graph.mlir
│   │       │   │   │       ├── lit.local.cfg
│   │       │   │   │       └── simple.mlir
│   │       │   │   └── test_resnet.py
│   │       │   ├── networks/
│   │       │   │   ├── __init__.py
│   │       │   │   └── resnet50.py
│   │       │   ├── oneflow-opt.mlir
│   │       │   ├── oneflow-translate.mlir
│   │       │   ├── psig/
│   │       │   │   ├── error_parse.mlir
│   │       │   │   ├── sbp_parse.mlir
│   │       │   │   ├── test_2nd_basic_parse.py
│   │       │   │   └── test_basic_parse.py
│   │       │   ├── traits.mlir
│   │       │   └── with_cuda/
│   │       │       ├── lit.local.cfg
│   │       │       ├── test_conv_bn_auto_nhwc.py
│   │       │       ├── test_fuse_bias_add_dropout.py
│   │       │       ├── test_fuse_bias_add_gelu.py
│   │       │       ├── test_fuse_bn_add_relu.py
│   │       │       ├── test_fuse_gelu.py
│   │       │       ├── test_fuse_scale_tril.py
│   │       │       ├── test_fused_matmul_bias.py
│   │       │       ├── test_fused_multi_head_attention_inference.py
│   │       │       └── test_graph_save_and_load.py
│   │       ├── Transform/
│   │       │   ├── lit.local.cfg
│   │       │   ├── matmul.mlir
│   │       │   ├── softmax.mlir
│   │       │   ├── softmax_codegen_spec.mlir
│   │       │   ├── softmax_codegen_spec_no_vectorize.mlir
│   │       │   └── test_dialect.mlir
│   │       ├── lit.cfg.py
│   │       └── lit.site.cfg.py.in
│   ├── maybe/
│   │   ├── config.h
│   │   ├── error.h
│   │   ├── error_test.cpp
│   │   ├── just.h
│   │   ├── just_test.cpp
│   │   ├── maybe.h
│   │   ├── maybe_test.cpp
│   │   ├── optional.h
│   │   ├── optional_test.cpp
│   │   ├── type_traits.h
│   │   ├── type_traits_test.cpp
│   │   ├── utility.h
│   │   ├── utility_test.cpp
│   │   ├── variant.h
│   │   └── variant_test.cpp
│   └── user/
│       ├── data/
│       │   ├── batch_dataset.h
│       │   ├── batch_random_shuffle_dataset.h
│       │   ├── coco_data_reader.cpp
│       │   ├── coco_data_reader.h
│       │   ├── coco_dataset.cpp
│       │   ├── coco_dataset.h
│       │   ├── coco_parser.cpp
│       │   ├── coco_parser.h
│       │   ├── data_reader.h
│       │   ├── dataset.h
│       │   ├── distributed_training_dataset.h
│       │   ├── distributed_util.h
│       │   ├── gpt_dataset.cpp
│       │   ├── gpt_dataset.h
│       │   ├── group_batch_dataset.h
│       │   ├── ofrecord_data_reader.h
│       │   ├── ofrecord_dataset.h
│       │   ├── ofrecord_image_classification_data_reader.h
│       │   ├── ofrecord_image_classification_dataset.cpp
│       │   ├── ofrecord_image_classification_dataset.h
│       │   ├── ofrecord_image_classification_parser.h
│       │   ├── ofrecord_parser.h
│       │   ├── parser.h
│       │   └── random_shuffle_dataset.h
│       ├── image/
│       │   ├── crop_window.h
│       │   ├── image_util.cpp
│       │   ├── image_util.h
│       │   ├── jpeg_decoder.cpp
│       │   ├── jpeg_decoder.h
│       │   ├── jpeg_decoder_test.cpp
│       │   ├── random_crop_generator.cpp
│       │   └── random_crop_generator.h
│       ├── kernels/
│       │   ├── acc_kernel.cpp
│       │   ├── activation_kernels.cpp
│       │   ├── adaptive_avg_pool_cpu_kernel.cpp
│       │   ├── adaptive_avg_pool_gpu_kernel.cu
│       │   ├── adaptive_max_pool_cpu_kernel.cpp
│       │   ├── adaptive_max_pool_gpu_kernel.cu
│       │   ├── adaptive_pool_kernel_util.h
│       │   ├── add_n_kernel.cpp
│       │   ├── affine_grid_kernel.cpp
│       │   ├── affine_grid_kernel.cu
│       │   ├── affine_grid_kernel.h
│       │   ├── arange_kernel.cpp
│       │   ├── arange_kernel_util.cpp
│       │   ├── arange_kernel_util.cu
│       │   ├── arange_kernel_util.h
│       │   ├── arg_sort_kernel.cpp
│       │   ├── arg_sort_kernel.cu
│       │   ├── arg_where_kernel.cpp
│       │   ├── arg_where_kernel_util.cpp
│       │   ├── arg_where_kernel_util.cu
│       │   ├── arg_where_kernel_util.h
│       │   ├── argmax_kernel.cpp
│       │   ├── argmax_kernel.cu
│       │   ├── as_strided_kernel.cpp
│       │   ├── as_strided_kernel.cu
│       │   ├── assign_if_kernel.cpp
│       │   ├── assign_if_kernel.cu
│       │   ├── assign_kernel.cpp
│       │   ├── avg_pool_kernel.cpp
│       │   ├── avg_pool_kernel.cu
│       │   ├── avg_pool_kernel_util.cpp
│       │   ├── avg_pool_kernel_util.h
│       │   ├── batch_gather_kernel.cpp
│       │   ├── batch_gather_kernel_util.cpp
│       │   ├── batch_gather_kernel_util.cu
│       │   ├── batch_gather_kernel_util.h
│       │   ├── batch_norm_backward_elemt_kernel.cu
│       │   ├── batch_norm_backward_reduce_kernel.cu
│       │   ├── batch_norm_elemt_kernel.cu
│       │   ├── batch_norm_gather_stats_with_counts_kernel.cu
│       │   ├── batch_norm_kernel_utils.h
│       │   ├── batch_norm_stats_kernel.cu
│       │   ├── bernoulli_kernel.cpp
│       │   ├── bias_add_kernel.cpp
│       │   ├── binary_concat_kernel.cu
│       │   ├── binary_cross_entropy_kernel.cpp
│       │   ├── binary_cross_entropy_kernel.cu
│       │   ├── binary_cross_entropy_with_logits_kernel.cpp
│       │   ├── binary_cross_entropy_with_logits_kernel.cu
│       │   ├── binary_cross_entropy_with_logits_mean_kernel.cu
│       │   ├── binary_cross_entropy_with_logits_mean_kernel_util.h
│       │   ├── binary_cross_entropy_with_logits_reduce_mean.cpp
│       │   ├── bincount_kernel.cpp
│       │   ├── bincount_kernel.cu
│       │   ├── broadcast_div_grad_kernel.cpp
│       │   ├── broadcast_like_kernel.cpp
│       │   ├── cast_kernel.cpp
│       │   ├── cast_to_static_shape_kernel.cpp
│       │   ├── categorical_ordinal_encode_kernel.cpp
│       │   ├── categorical_ordinal_encode_kernel_util.cpp
│       │   ├── categorical_ordinal_encode_kernel_util.cu
│       │   ├── categorical_ordinal_encode_kernel_util.h
│       │   ├── clip_by_value_kernel.cpp
│       │   ├── clip_by_value_kernel.cu
│       │   ├── clip_by_value_kernel.h
│       │   ├── coco_reader_kernel.cpp
│       │   ├── collective_communication/
│       │   │   ├── cpu/
│       │   │   │   ├── cpu_all_gather.cpp
│       │   │   │   ├── cpu_all_reduce.cpp
│       │   │   │   ├── cpu_broadcast.cpp
│       │   │   │   ├── cpu_collective_communication_util.h
│       │   │   │   ├── cpu_communication_context.cpp
│       │   │   │   ├── cpu_communication_context.h
│       │   │   │   ├── cpu_recv.cpp
│       │   │   │   ├── cpu_reduce.cpp
│       │   │   │   ├── cpu_reduce_scatter.cpp
│       │   │   │   └── cpu_send.cpp
│       │   │   ├── cuda/
│       │   │   │   ├── cuda_all_gather.cpp
│       │   │   │   ├── cuda_all_reduce.cpp
│       │   │   │   ├── cuda_all_to_all.cpp
│       │   │   │   ├── cuda_broadcast.cpp
│       │   │   │   ├── cuda_communication_context.cpp
│       │   │   │   ├── cuda_communication_context.h
│       │   │   │   ├── cuda_recv.cpp
│       │   │   │   ├── cuda_reduce.cpp
│       │   │   │   ├── cuda_reduce_scatter.cpp
│       │   │   │   ├── cuda_send.cpp
│       │   │   │   ├── cuda_send_recv_util.cpp
│       │   │   │   └── cuda_send_recv_util.h
│       │   │   └── include/
│       │   │       ├── all_gather.h
│       │   │       ├── all_reduce.h
│       │   │       ├── all_to_all.h
│       │   │       ├── broadcast.h
│       │   │       ├── collective_communication.h
│       │   │       ├── communication_context.h
│       │   │       ├── recv.h
│       │   │       ├── reduce.h
│       │   │       ├── reduce_scatter.h
│       │   │       └── send.h
│       │   ├── combined_margin_loss_kernel.cpp
│       │   ├── combined_margin_loss_kernel.cu
│       │   ├── communicate_util.cpp
│       │   ├── communicate_util.h
│       │   ├── complex_kernels.cpp
│       │   ├── concat_kernel.cpp
│       │   ├── constant_kernel.cpp
│       │   ├── conv_cudnn_kernels.cpp
│       │   ├── conv_cutlass_kernels.cu
│       │   ├── conv_kernels.cpp
│       │   ├── convert_memory_format_kernel.cpp
│       │   ├── convert_memory_format_util.cpp
│       │   ├── convert_memory_format_util.h
│       │   ├── copy_data_content_kernel.cpp
│       │   ├── copy_hd_kernel.cpp
│       │   ├── copy_kernel.cpp
│       │   ├── count_not_finite_kernel.cpp
│       │   ├── count_not_finite_kernel.cu
│       │   ├── ctc_greedy_decoder.cpp
│       │   ├── ctc_greedy_decoder.cu
│       │   ├── ctc_greedy_decoder.h
│       │   ├── ctc_loss_kernel.cpp
│       │   ├── ctc_loss_kernel_util.cpp
│       │   ├── ctc_loss_kernel_util.cu
│       │   ├── ctc_loss_kernel_util.h
│       │   ├── cublas_bias_add_relu_matmul_grad_kernel.cu
│       │   ├── cublas_fused_matmul_bias_add_grad.cu
│       │   ├── cublas_fused_mlp_grad_kernel.cu
│       │   ├── cublas_fused_mlp_kernel.cu
│       │   ├── cublas_fused_mlp_util.cuh
│       │   ├── cufft_plan_cache.h
│       │   ├── cum_backward_kernel.cpp
│       │   ├── cum_backward_kernel.cu
│       │   ├── cum_forward_kernel.cpp
│       │   ├── cum_forward_kernel.cu
│       │   ├── cutlass_conv_tuner.cpp
│       │   ├── cutlass_conv_tuner.h
│       │   ├── data_shuffle_kernel.cu
│       │   ├── deconv_cpu_kernel.cpp
│       │   ├── deconv_cudnn_kernel.cpp
│       │   ├── deform_conv_kernel.cpp
│       │   ├── deform_conv_kernel.cu
│       │   ├── det_kernel.cpp
│       │   ├── diag_kernel.cpp
│       │   ├── diag_kernel.cu
│       │   ├── diag_kernel.h
│       │   ├── diagonal_kernel.cpp
│       │   ├── diagonal_kernel.cu
│       │   ├── dim_gather_kernel_util.cpp
│       │   ├── dim_gather_kernel_util.cu
│       │   ├── dim_gather_kernel_util.h
│       │   ├── dim_gather_kernels.cpp
│       │   ├── dim_scatter_kernel_util.cpp
│       │   ├── dim_scatter_kernel_util.cu
│       │   ├── dim_scatter_kernel_util.h
│       │   ├── dim_scatter_kernels.cpp
│       │   ├── dim_scatter_scalar_kernel_util.cpp
│       │   ├── dim_scatter_scalar_kernel_util.cu
│       │   ├── dim_scatter_scalar_kernel_util.h
│       │   ├── dim_scatter_scalar_kernels.cpp
│       │   ├── distributions/
│       │   │   ├── common.h
│       │   │   ├── distribution_template_util.cuh
│       │   │   ├── exponential_distribution.cpp
│       │   │   ├── exponential_distribution.cu
│       │   │   ├── exponential_distribution.h
│       │   │   ├── exponential_kernel.cpp
│       │   │   ├── exponential_kernel.h
│       │   │   ├── multinomial_with_replacement_kernel.cpp
│       │   │   ├── multinomial_with_replacement_kernel.cu
│       │   │   ├── normal_distribution.cpp
│       │   │   ├── normal_distribution.cu
│       │   │   ├── normal_distribution.h
│       │   │   ├── normal_kernel.cpp
│       │   │   ├── normal_kernel.h
│       │   │   ├── uniform_distribution.cpp
│       │   │   ├── uniform_distribution.cu
│       │   │   ├── uniform_distribution.h
│       │   │   ├── uniform_int_distribution.cpp
│       │   │   ├── uniform_int_distribution.cu
│       │   │   ├── uniform_int_distribution.h
│       │   │   ├── uniform_int_kernel.cpp
│       │   │   ├── uniform_int_kernel.h
│       │   │   ├── uniform_kernel.cpp
│       │   │   └── uniform_kernel.h
│       │   ├── dot_kernel.cpp
│       │   ├── dropout_kernel.cpp
│       │   ├── dropout_kernel.cu
│       │   ├── dropout_kernel.h
│       │   ├── dynamic_loss_scale_schedule_kernel.cpp
│       │   ├── dynamic_loss_scale_schedule_kernel.cu
│       │   ├── eager_b_to_s_kernel.cpp
│       │   ├── eager_ccl_kernel.cpp
│       │   ├── eager_nccl_s2s_kernel.cu
│       │   ├── eager_p_to_b_kernel.cpp
│       │   ├── eager_p_to_s_kernel.cpp
│       │   ├── eager_s_to_b_kernel.cpp
│       │   ├── eager_s_to_p_kernel.cpp
│       │   ├── eager_s_to_s_kernel.cpp
│       │   ├── eager_symmetric_s_to_p_kernel.cpp
│       │   ├── elementwise_maximum_minimum_kernel.cpp
│       │   ├── elementwise_maximum_minimum_kernel.cu
│       │   ├── elementwise_maximum_minimum_kernel.h
│       │   ├── elementwise_primitive_kernel.h
│       │   ├── embedding_kernel.cpp
│       │   ├── embedding_kernel.cu
│       │   ├── embedding_kernel_util.cpp
│       │   ├── embedding_kernel_util.cu
│       │   ├── embedding_kernel_util.h
│       │   ├── empty_kernel.cpp
│       │   ├── erfinv_kernel.cpp
│       │   ├── erfinv_kernel.cu
│       │   ├── expand_kernel.cpp
│       │   ├── eye_kernel.cpp
│       │   ├── eye_kernel_util.cpp
│       │   ├── eye_kernel_util.cu
│       │   ├── eye_kernel_util.h
│       │   ├── fake_quantization_kernel.cpp
│       │   ├── fake_quantization_kernel.cu
│       │   ├── fft_kernel_util.cpp
│       │   ├── fft_kernel_util.cu
│       │   ├── fft_kernel_util.h
│       │   ├── fft_kernels.cpp
│       │   ├── fill_kernel.cpp
│       │   ├── fill_kernel.cu
│       │   ├── flip_kernel.cpp
│       │   ├── flip_kernel.cu
│       │   ├── fold_kernel.cpp
│       │   ├── fold_kernel_util.cpp
│       │   ├── fold_kernel_util.cu
│       │   ├── fold_kernel_util.h
│       │   ├── frac_kernel.cpp
│       │   ├── frac_kernel.cu
│       │   ├── fused_attention_kernels.cu
│       │   ├── fused_bias_add_kernel.cu
│       │   ├── fused_bias_add_scale_mask_softmax_dropout.cu
│       │   ├── fused_cast_scale_kernel.cpp
│       │   ├── fused_cast_scale_kernel.cu
│       │   ├── fused_center_kernel.cu
│       │   ├── fused_clip_grad.cu
│       │   ├── fused_clip_grad.h
│       │   ├── fused_clip_grad_util.h
│       │   ├── fused_codegeex_qkv_reshape_kernel.cu
│       │   ├── fused_cross_feature_interaction.cu
│       │   ├── fused_cross_feature_interaction_grad.cu
│       │   ├── fused_dot_feature_interaction_kernel.cu
│       │   ├── fused_gelu_mul_kernel.cu
│       │   ├── fused_get_bounding_boxes_coord_kernel.cu
│       │   ├── fused_get_ciou_diagonal_angle_kernel.cu
│       │   ├── fused_get_ciou_result_kernel.cu
│       │   ├── fused_get_convex_diagonal_squared_kernel.cu
│       │   ├── fused_get_intersection_area_kernel.cu
│       │   ├── fused_get_iou_kernel.cu
│       │   ├── fused_glu_kernel.cu
│       │   ├── fused_glu_without_linear_grad_kernel.cu
│       │   ├── fused_gru_cell_kernel.cu
│       │   ├── fused_lstm_cell_kernel.cu
│       │   ├── fused_matmul_bias_add_relu_dropout.cu
│       │   ├── fused_matmul_bias_kernel.cu
│       │   ├── fused_relu_dropout_grad_kernel.cu
│       │   ├── fused_rnn_cell_kernel_util.h
│       │   ├── fused_scale_mask_bias_softmax.cu
│       │   ├── fused_scale_mask_softmax.cu
│       │   ├── fused_scale_mask_softmax_dropout.cu
│       │   ├── fused_self_attention_query_mul_key_and_value_kernel.cu
│       │   ├── fused_softmax.cuh
│       │   ├── fused_tril_scale_softmax_mask_scale_kernel.cu
│       │   ├── fused_weighted_sum_kernel.cpp
│       │   ├── fused_weighted_sum_kernel.cu
│       │   ├── gather_kernel.cpp
│       │   ├── gather_kernel_util.cpp
│       │   ├── gather_kernel_util.cu
│       │   ├── gather_kernel_util.h
│       │   ├── generate_random_batch_permutation_indices_kernel.cpp
│       │   ├── generate_random_batch_permutation_indices_kernel.cu
│       │   ├── gpt_data_loader_kernel.cpp
│       │   ├── greater_inplace_kernel.cpp
│       │   ├── greater_inplace_kernel_util.cpp
│       │   ├── greater_inplace_kernel_util.cu
│       │   ├── greater_inplace_kernel_util.h
│       │   ├── grid_sample_kernel.cpp
│       │   ├── grid_sample_kernel_util.cpp
│       │   ├── grid_sample_kernel_util.cu
│       │   ├── grid_sample_kernel_util.h
│       │   ├── group_conv_kernel.cpp
│       │   ├── group_deconv_kernel.cpp
│       │   ├── group_norm_kernel.cu
│       │   ├── grouped_matmul_bias.cu
│       │   ├── groupwise_quantization_kernels.cu
│       │   ├── host_scalar_add_by_tensor_kernel.cu
│       │   ├── image_batch_align_kernel.cpp
│       │   ├── image_decode_kernel.cpp
│       │   ├── image_object_preprocess_kernels.cpp
│       │   ├── image_preprocess_kernels.cpp
│       │   ├── image_preprocess_kernels.cu
│       │   ├── image_resize_kernels.cpp
│       │   ├── image_target_resize_kernel.cpp
│       │   ├── in_top_k_kernel.cpp
│       │   ├── in_top_k_kernel_util.cpp
│       │   ├── in_top_k_kernel_util.cu
│       │   ├── in_top_k_kernel_util.h
│       │   ├── index_add_kernel.cpp
│       │   ├── index_add_kernel.cu
│       │   ├── indexed_slices_reduce_sum_kernel.cpp
│       │   ├── indexed_slices_reduce_sum_kernel_util.cpp
│       │   ├── indexed_slices_reduce_sum_kernel_util.h
│       │   ├── inv_kernels.cpp
│       │   ├── inv_kernels.cu
│       │   ├── kl_div_kernel.cpp
│       │   ├── kl_div_kernel.cu
│       │   ├── l1_l2_regularize_gradient_kernel.cpp
│       │   ├── l1_l2_regularize_gradient_kernel_util.cpp
│       │   ├── l1_l2_regularize_gradient_kernel_util.cu
│       │   ├── l1_l2_regularize_gradient_kernel_util.h
│       │   ├── l2_normalize_kernel.cpp
│       │   ├── l2_normalize_kernel.cu
│       │   ├── layer_norm_cpu_kernel.cpp
│       │   ├── layer_norm_gpu_kernel.cu
│       │   ├── lerp_kernel.cpp
│       │   ├── lerp_kernel_util.cpp
│       │   ├── lerp_kernel_util.cu
│       │   ├── lerp_kernel_util.h
│       │   ├── linalg_cross_kernel.cpp
│       │   ├── linalg_cross_kernel.cu
│       │   ├── log_softmax_kernel.cpp
│       │   ├── logical_not_kernel.cpp
│       │   ├── loss_kernel_util.h
│       │   ├── lu_decomposition_kernel.cu
│       │   ├── masked_fill_kernel.cpp
│       │   ├── math_binary_broadcast_kernels.cpp
│       │   ├── math_binary_elementwise_func.h
│       │   ├── math_binary_elementwise_kernel.cpp
│       │   ├── math_binary_elementwise_kernel.cu
│       │   ├── math_unary_elementwise_func.h
│       │   ├── math_unary_elementwise_primitive_kernel.cpp
│       │   ├── matmul_kernels.cpp
│       │   ├── matrix_vector_product_kernel.cpp
│       │   ├── max_pool_kernel.cpp
│       │   ├── max_pool_kernel.cu
│       │   ├── max_pool_kernel_util.cpp
│       │   ├── max_pool_kernel_util.h
│       │   ├── max_unpool_kernel.cpp
│       │   ├── max_unpool_kernel.cu
│       │   ├── max_unpool_kernel_util.cpp
│       │   ├── max_unpool_kernel_util.h
│       │   ├── median_kernel.cpp
│       │   ├── median_kernel.cu
│       │   ├── median_with_indices_kernel.cpp
│       │   ├── median_with_indices_kernel.cu
│       │   ├── min_max_observer_kernel.cpp
│       │   ├── min_max_observer_kernel.cu
│       │   ├── mode_kernel.cpp
│       │   ├── model_update_kernel_util.cpp
│       │   ├── model_update_kernel_util.cu
│       │   ├── model_update_kernel_util.h
│       │   ├── model_update_kernels.cpp
│       │   ├── moving_average_min_max_observer_kernel.cpp
│       │   ├── moving_average_min_max_observer_kernel.cu
│       │   ├── multi_reduce_kernel_util.h
│       │   ├── multi_reduce_kernels.cpp
│       │   ├── multi_reduce_kernels.cu
│       │   ├── multi_reduce_kernels.h
│       │   ├── multi_tensor_model_update_kernel.cpp
│       │   ├── multi_tensor_model_update_kernel_util.cu
│       │   ├── multi_tensor_model_update_kernel_util.h
│       │   ├── mutable_cast_once_kernel.cpp
│       │   ├── narrow_kernel.cpp
│       │   ├── nccl_logical_2d_sbp_kernels.cpp
│       │   ├── nccl_logical_fusion_kernel.cpp
│       │   ├── nccl_logical_kernels.cpp
│       │   ├── nccl_logical_send_recv_kernel.cpp
│       │   ├── nd_index_slice_kernels.cpp
│       │   ├── nd_index_slice_kernels.cu
│       │   ├── nd_index_slice_kernels.h
│       │   ├── nd_index_slice_util.h
│       │   ├── nll_kernel.cpp
│       │   ├── nll_kernel_util.cpp
│       │   ├── nll_kernel_util.cu
│       │   ├── nll_kernel_util.h
│       │   ├── nms_kernel.cpp
│       │   ├── nms_kernel.cu
│       │   ├── noncontiguous_binary_op.cu
│       │   ├── nop_kernel.cpp
│       │   ├── normalization_kernel.cpp
│       │   ├── normalization_kernel.cu
│       │   ├── nvtx_range_kernel.cu
│       │   ├── ofrecord_decoder_kernels.cpp
│       │   ├── ofrecord_image_classification_reader_kernel.cpp
│       │   ├── ofrecord_reader_kernel.cpp
│       │   ├── one_embedding_data_shuffle.cuh
│       │   ├── one_embedding_embedding_gradient_shuffle_p2p_kernel.cu
│       │   ├── one_embedding_embedding_shuffle_p2p_kernel.cu
│       │   ├── one_embedding_id_shuffle_p2p_kernel.cu
│       │   ├── one_embedding_kernels.cu
│       │   ├── one_embedding_update_kernels.cu
│       │   ├── one_hot_kernel.cpp
│       │   ├── one_hot_kernel.cu
│       │   ├── ones_like_kernel.cpp
│       │   ├── op_kernel_wrapper.h
│       │   ├── p2p_comm_kernel.cpp
│       │   ├── pack_kernel.cpp
│       │   ├── pad_kernel.cpp
│       │   ├── partial_fc_sample_kernel.cu
│       │   ├── pocketfft_hdronly.h
│       │   ├── pocketfftplan.h
│       │   ├── prelu_kernel.cpp
│       │   ├── prelu_kernel.cu
│       │   ├── quantization_kernel.cpp
│       │   ├── quantization_kernel.cu
│       │   ├── radix_sort.cuh
│       │   ├── random_crop_kernel_state.cpp
│       │   ├── random_crop_kernel_state.h
│       │   ├── random_mask_generator.cpp
│       │   ├── random_mask_generator.cu
│       │   ├── random_mask_generator.h
│       │   ├── random_mask_like_kernel.cpp
│       │   ├── random_mask_like_kernel.h
│       │   ├── random_seed_util.cpp
│       │   ├── random_seed_util.h
│       │   ├── randperm_kernel.cpp
│       │   ├── randperm_kernel.cu
│       │   ├── raw_reader_kernel.cpp
│       │   ├── reduce_kernel.cpp
│       │   ├── reduce_like_kernels.cpp
│       │   ├── reflection_pad_kernels.cpp
│       │   ├── reflection_pad_kernels_util.cpp
│       │   ├── reflection_pad_kernels_util.cu
│       │   ├── reflection_pad_kernels_util.h
│       │   ├── repeat_interleave_kernel.cpp
│       │   ├── repeat_interleave_kernel.cu
│       │   ├── replication_pad_kernels.cpp
│       │   ├── replication_pad_kernels_util.cpp
│       │   ├── replication_pad_kernels_util.cu
│       │   ├── replication_pad_kernels_util.h
│       │   ├── rms_norm_gpu_kernel.cu
│       │   ├── roc_auc_score_kernel.cpp
│       │   ├── roi_align_kernel.cu
│       │   ├── roll_kernel.cpp
│       │   ├── roll_kernel.cu
│       │   ├── roll_kernel_utils.h
│       │   ├── rrelu_kernel.cpp
│       │   ├── rrelu_kernel.cu
│       │   ├── same_padding_kernel.cpp
│       │   ├── scalar_bitwise_kernels.cpp
│       │   ├── scalar_by_tensor_kernel.cpp
│       │   ├── scalar_logical_kernels.cpp
│       │   ├── scalar_math_kernels.cpp
│       │   ├── scaled_dot_product_attention_grad_kernel.cu
│       │   ├── scaled_dot_product_attention_kernel.cu
│       │   ├── scaled_dot_product_attention_kernel.h
│       │   ├── scaled_dot_product_attention_util.h
│       │   ├── search_sorted_kernel.cpp
│       │   ├── search_sorted_kernel.cu
│       │   ├── search_sorted_kernel_util.h
│       │   ├── sigmoid_cross_entropy_kernel.cpp
│       │   ├── sigmoid_cross_entropy_kernel.cu
│       │   ├── sigmoid_cross_entropy_kernel.h
│       │   ├── skip_layer_norm_kernel.cu
│       │   ├── skip_rms_norm_kernel.cu
│       │   ├── slice_kernel.cpp
│       │   ├── slice_util.cpp
│       │   ├── slice_util.cu
│       │   ├── slice_util.h
│       │   ├── smooth_l1_loss_kernel.cpp
│       │   ├── smooth_l1_loss_kernel.cu
│       │   ├── softmax_cross_entropy_kernel.cpp
│       │   ├── softmax_cross_entropy_kernel.cu
│       │   ├── softmax_cross_entropy_kernel.h
│       │   ├── softmax_kernel.cpp
│       │   ├── sort_kernel.cpp
│       │   ├── sort_kernel.cu
│       │   ├── sparse_cross_entropy_kernel.cpp
│       │   ├── sparse_cross_entropy_kernel_util.cpp
│       │   ├── sparse_cross_entropy_kernel_util.cu
│       │   ├── sparse_cross_entropy_kernel_util.h
│       │   ├── sparse_softmax_cross_entropy_kernel.cpp
│       │   ├── sparse_softmax_cross_entropy_kernel.cu
│       │   ├── sparse_softmax_cross_entropy_kernel_util.cpp
│       │   ├── sparse_softmax_cross_entropy_kernel_util.cu
│       │   ├── sparse_softmax_cross_entropy_kernel_util.h
│       │   ├── split_like_kernel.cpp
│       │   ├── sqrt_square_sum_kernel.cpp
│       │   ├── sqrt_square_sum_kernel_util.cpp
│       │   ├── sqrt_square_sum_kernel_util.cu
│       │   ├── sqrt_square_sum_kernel_util.h
│       │   ├── square_sum_kernel.cpp
│       │   ├── square_sum_kernel_util.cpp
│       │   ├── square_sum_kernel_util.cu
│       │   ├── square_sum_kernel_util.h
│       │   ├── ssp_variable_proxy_kernel.cpp
│       │   ├── stack_kernel.cpp
│       │   ├── stateful_opkernel.cpp
│       │   ├── stateful_opkernel.h
│       │   ├── summary_kernels.cpp
│       │   ├── tensor_buffer_kernels.cpp
│       │   ├── tensor_constant_kernel.cpp
│       │   ├── tf_pool_cpu_kernel.cpp
│       │   ├── tf_pool_gpu_kernel.cpp
│       │   ├── tf_prelu_kernel.cpp
│       │   ├── tf_prelu_kernel.cu
│       │   ├── throw_error_kernel.cpp
│       │   ├── to_contiguous_kernel.cpp
│       │   ├── to_contiguous_kernel.cu
│       │   ├── to_contiguous_kernel.h
│       │   ├── top_k_kernel.cpp
│       │   ├── top_k_kernel.cu
│       │   ├── transpose_kernel.cpp
│       │   ├── tril_kernel.cpp
│       │   ├── tril_kernel.cu
│       │   ├── triu_kernel.cpp
│       │   ├── triu_kernel.cu
│       │   ├── tuple_identity_kernel.cpp
│       │   ├── two_stage_reduce_kernel.cpp
│       │   ├── two_stage_reduce_kernel_util.cpp
│       │   ├── two_stage_reduce_kernel_util.cu
│       │   ├── two_stage_reduce_kernel_util.h
│       │   ├── unfold_kernel.cpp
│       │   ├── unfold_kernel_util.cpp
│       │   ├── unfold_kernel_util.cu
│       │   ├── unfold_kernel_util.h
│       │   ├── unfold_tensor_kernel.cpp
│       │   ├── unfold_tensor_kernel.cu
│       │   ├── unfold_tensor_kernel_utils.h
│       │   ├── unique_kernel.cpp
│       │   ├── unique_kernel_util.cpp
│       │   ├── unique_kernel_util.cu
│       │   ├── unique_kernel_util.h
│       │   ├── unique_with_counts_kernel.cpp
│       │   ├── unpack_kernel.cpp
│       │   ├── unsorted_batch_segment_sum_kernel.cpp
│       │   ├── unsorted_segment_sum_kernel.cpp
│       │   ├── unsorted_segment_sum_kernel_util.cpp
│       │   ├── unsorted_segment_sum_kernel_util.cu
│       │   ├── unsorted_segment_sum_kernel_util.h
│       │   ├── upsample_bicubic_2d_kernel.cpp
│       │   ├── upsample_bicubic_2d_kernel.cu
│       │   ├── upsample_bilinear_2d_kernel.cpp
│       │   ├── upsample_bilinear_2d_kernel.cu
│       │   ├── upsample_kernel.h
│       │   ├── upsample_linear_1d_kernel.cpp
│       │   ├── upsample_linear_1d_kernel.cu
│       │   ├── upsample_nearest_kernel.cpp
│       │   ├── upsample_nearest_kernel.cu
│       │   ├── upsample_trilinear_3d_kernel.cpp
│       │   ├── upsample_trilinear_3d_kernel.cu
│       │   ├── util_ops_kernels.cpp
│       │   ├── variance_kernel.cpp
│       │   ├── variance_kernel_util.cpp
│       │   ├── variance_kernel_util.cu
│       │   ├── variance_kernel_util.h
│       │   ├── vector_matrix_product_kernel.cpp
│       │   ├── where_kernel.cpp
│       │   ├── where_kernel_util.cpp
│       │   ├── where_kernel_util.cu
│       │   ├── where_kernel_util.h
│       │   └── zero_like_kernel.cpp
│       ├── ops/
│       │   ├── acc_ctrl_tick_op.cpp
│       │   ├── acc_op.cpp
│       │   ├── adaptive_max_pool_op.cpp
│       │   ├── adaptive_pool_op.cpp
│       │   ├── add_n_op.cpp
│       │   ├── affine_grid_op.cpp
│       │   ├── amp_white_identity_op.cpp
│       │   ├── arange_op.cpp
│       │   ├── arg_sort_op.cpp
│       │   ├── arg_where_op.cpp
│       │   ├── argmax_op.cpp
│       │   ├── as_strided_op.cpp
│       │   ├── assign_op.cpp
│       │   ├── avg_pool_op.cpp
│       │   ├── batch_gather_op.cpp
│       │   ├── batch_norm_backward_elemt_op.cpp
│       │   ├── batch_norm_backward_reduce_op.cpp
│       │   ├── batch_norm_elemt_op.cpp
│       │   ├── batch_norm_gather_stats_with_counts_op.cpp
│       │   ├── batch_norm_stats_op.cpp
│       │   ├── bernoulli_op.cpp
│       │   ├── bias_add_op.cpp
│       │   ├── binary_cross_entropy_op.cpp
│       │   ├── binary_cross_entropy_with_logits_op.cpp
│       │   ├── binary_cross_entropy_with_logits_reduce_mean_op.cpp
│       │   ├── bincount_op.cpp
│       │   ├── broadcast_div_grad_op.cpp
│       │   ├── broadcast_like_op.cpp
│       │   ├── buffer_op.cpp
│       │   ├── cast_like_op.cpp
│       │   ├── cast_op.cpp
│       │   ├── cast_to_static_shape_op.cpp
│       │   ├── cast_to_tick_op.cpp
│       │   ├── categorical_ordinal_encode_op.cpp
│       │   ├── celu_op.cpp
│       │   ├── clip_by_value_op.cpp
│       │   ├── coco_reader_op.cpp
│       │   ├── combined_margin_loss_op.cpp
│       │   ├── comm_net_device_infer_util.cpp
│       │   ├── comm_net_device_infer_util.h
│       │   ├── complex_ops.cpp
│       │   ├── concat_op.cpp
│       │   ├── constant_op.cpp
│       │   ├── conv_op.cpp
│       │   ├── convert_memory_format_op.cpp
│       │   ├── convert_memory_format_op.h
│       │   ├── copy_hd_op.cpp
│       │   ├── copy_op.cpp
│       │   ├── count_not_finite_op.cpp
│       │   ├── ctc_loss_op.cpp
│       │   ├── cublas_bias_add_relu_matmul_grad_op.cpp
│       │   ├── cublas_fused_matmul_bias_add_grad_op.cpp
│       │   ├── cublas_fused_mlp_grad_op.cpp
│       │   ├── cublas_fused_mlp_op.cpp
│       │   ├── cum_ops.cpp
│       │   ├── data_shuffle_op.cpp
│       │   ├── deconv_op.cpp
│       │   ├── deform_conv_op.cpp
│       │   ├── depend_op.cpp
│       │   ├── det_op.cpp
│       │   ├── diag_op.cpp
│       │   ├── diagonal_op.cpp
│       │   ├── dim_gather_op.cpp
│       │   ├── dim_scatter_ops.cpp
│       │   ├── distributions/
│       │   │   ├── exponential_op.cpp
│       │   │   ├── multinomial_with_replacement_op.cpp
│       │   │   ├── normal_op.cpp
│       │   │   ├── uniform_int_op.cpp
│       │   │   └── uniform_op.cpp
│       │   ├── dot_op.cpp
│       │   ├── dropout_op.cpp
│       │   ├── dynamic_loss_scale_schedule_op.cpp
│       │   ├── eager_b_to_s_op.cpp
│       │   ├── eager_ccl_ops.cpp
│       │   ├── eager_p_to_b_op.cpp
│       │   ├── eager_p_to_s_op.cpp
│       │   ├── eager_s_to_b_op.cpp
│       │   ├── eager_s_to_p_op.cpp
│       │   ├── eager_s_to_s_op.cpp
│       │   ├── eager_symmetric_s_to_p_op.cpp
│       │   ├── elementwise_maximum_minimum_ops.cpp
│       │   ├── elu_op.cpp
│       │   ├── embedding_op.cpp
│       │   ├── empty_op.cpp
│       │   ├── erfinv_op.cpp
│       │   ├── expand_dims_op.cpp
│       │   ├── expand_op.cpp
│       │   ├── eye_op.cpp
│       │   ├── fake_quantization_op.cpp
│       │   ├── fft_ops.cpp
│       │   ├── fill_op.cpp
│       │   ├── flip_op.cpp
│       │   ├── frac_op.cpp
│       │   ├── fused_attention_ops.cpp
│       │   ├── fused_bias_add_op.cpp
│       │   ├── fused_bias_add_scale_mask_softmax_dropout_op.cpp
│       │   ├── fused_cast_scale_op.cpp
│       │   ├── fused_center_op.cpp
│       │   ├── fused_clip_grad_ops.cpp
│       │   ├── fused_codegeex_qkv_reshape.cpp
│       │   ├── fused_cross_feature_interaction_op.cpp
│       │   ├── fused_dot_feature_interaction_op.cpp
│       │   ├── fused_get_boundding_boxes_coord_op.cpp
│       │   ├── fused_get_ciou_diagonal_angle_op.cpp
│       │   ├── fused_get_ciou_result_op.cpp
│       │   ├── fused_get_convex_diagonal_squared_op.cpp
│       │   ├── fused_get_intersection_area_op.cpp
│       │   ├── fused_get_iou_op.cpp
│       │   ├── fused_glu_op.cpp
│       │   ├── fused_glu_without_linear_grad_op.cpp
│       │   ├── fused_gru_cell_op.cpp
│       │   ├── fused_linear_with_groupwise_quantized_weight_op.cpp
│       │   ├── fused_lstm_cell_op.cpp
│       │   ├── fused_matmul_bias_add_relu_dropout_op.cpp
│       │   ├── fused_matmul_bias_op.cpp
│       │   ├── fused_relu_dropout_grad_op.cpp
│       │   ├── fused_scale_mask_bias_softmax_op.cpp
│       │   ├── fused_scale_mask_softmax_dropout_op.cpp
│       │   ├── fused_scale_mask_softmax_op.cpp
│       │   ├── fused_scale_tril_softmax_mask_scale_op.cpp
│       │   ├── fused_self_attention_query_mul_key_and_value_ops.cpp
│       │   ├── fused_weighted_sum_op.cpp
│       │   ├── gather_op.cpp
│       │   ├── gelu_op.cpp
│       │   ├── generate_random_batch_permutation_indices_op.cpp
│       │   ├── gpt_data_loader_op.cpp
│       │   ├── greater_inplace_op.cpp
│       │   ├── grid_sample_op.cpp
│       │   ├── group_norm_op.cpp
│       │   ├── grouped_matmul_bias_op.cpp
│       │   ├── groupwise_dequantize_op.cpp
│       │   ├── hardshrink_op.cpp
│       │   ├── hardsigmoid_op.cpp
│       │   ├── hardswish_op.cpp
│       │   ├── hardtanh_op.cpp
│       │   ├── hierarchical_parallel_cast_op.cpp
│       │   ├── identity_op.cpp
│       │   ├── image_batch_align_op.cpp
│       │   ├── image_decode_op.cpp
│       │   ├── image_object_preprocess_ops.cpp
│       │   ├── image_preprocess_ops.cpp
│       │   ├── image_resize_ops.cpp
│       │   ├── image_target_resize_op.cpp
│       │   ├── in_top_k_op.cpp
│       │   ├── index_add_op.cpp
│       │   ├── indexed_slices_reduce_sum_op.cpp
│       │   ├── inv_op.cpp
│       │   ├── kl_div_op.cpp
│       │   ├── l1_l2_regularize_gradient_op.cpp
│       │   ├── l2_normalize_op.cpp
│       │   ├── layer_norm_op.cpp
│       │   ├── leaky_relu_op.cpp
│       │   ├── lerp_op.cpp
│       │   ├── linalg_cross_op.cpp
│       │   ├── log_softmax_op.cpp
│       │   ├── logical_not_op.cpp
│       │   ├── loss_op_util.cpp
│       │   ├── loss_op_util.h
│       │   ├── lu_composition_op.cpp
│       │   ├── masked_fill_op.cpp
│       │   ├── math_binary_broadcast_ops.cpp
│       │   ├── math_binary_broadcast_seq.h
│       │   ├── math_binary_elementwise_ops.cpp
│       │   ├── math_binary_elementwise_seq.h
│       │   ├── math_unary_elementwise_op.cpp
│       │   ├── math_unary_elementwise_seq.h
│       │   ├── matmul_op.cpp
│       │   ├── matrix_vector_product_op.cpp
│       │   ├── max_pool_op.cpp
│       │   ├── max_unpool_op.cpp
│       │   ├── median_op.cpp
│       │   ├── median_with_indices_op.cpp
│       │   ├── min_max_observer_op.cpp
│       │   ├── mish_op.cpp
│       │   ├── mode_op.cpp
│       │   ├── model_update_ops.cpp
│       │   ├── moving_average_min_max_observer_op.cpp
│       │   ├── multi_reduce_ops.cpp
│       │   ├── multi_tensor_model_update_ops.cpp
│       │   ├── mutable_cast_once_op.cpp
│       │   ├── narrow_op.cpp
│       │   ├── nccl_logical_2d_sbp_ops.cpp
│       │   ├── nccl_logical_fusion_op.cpp
│       │   ├── nccl_logical_ops.cpp
│       │   ├── nccl_logical_util.cpp
│       │   ├── nccl_logical_util.h
│       │   ├── nd_index_slice_ops.cpp
│       │   ├── nll_op.cpp
│       │   ├── nms_op.cpp
│       │   ├── nn_util.cpp
│       │   ├── nn_util.h
│       │   ├── noncontiguous_binary_op.cpp
│       │   ├── normalization_op.cpp
│       │   ├── nvtx_range_op.cpp
│       │   ├── ofrecord_decoder_ops.cpp
│       │   ├── ofrecord_image_classification_reader_op.cpp
│       │   ├── ofrecord_reader_op.cpp
│       │   ├── one_embedding_ops.cpp
│       │   ├── one_hot_op.cpp
│       │   ├── ones_like_op.cpp
│       │   ├── p2p_comm_op.cpp
│       │   ├── pack_op.cpp
│       │   ├── pad_op.cpp
│       │   ├── parallel_cast_op.cpp
│       │   ├── partial_fc_sample_op.cpp
│       │   ├── pinned_identity_op.cpp
│       │   ├── prelu_op.cpp
│       │   ├── quantization_op.cpp
│       │   ├── quick_gelu_op.cpp
│       │   ├── randperm_op.cpp
│       │   ├── raw_reader_op.cpp
│       │   ├── reduce_like_ops.cpp
│       │   ├── reduce_ops.cpp
│       │   ├── reflection_pad_op.cpp
│       │   ├── relu_op.cpp
│       │   ├── repeat_interleave_op.cpp
│       │   ├── repeat_op.cpp
│       │   ├── replication_pad_op.cpp
│       │   ├── reshape_like_op.cpp
│       │   ├── reshape_op.cpp
│       │   ├── reshape_user_op_util.cpp
│       │   ├── reshape_user_op_util.h
│       │   ├── reshape_user_op_util_test.cpp
│       │   ├── rms_norm_op.cpp
│       │   ├── roc_auc_score_op.cpp
│       │   ├── roi_align_op.cpp
│       │   ├── roll_op.cpp
│       │   ├── rrelu_op.cpp
│       │   ├── same_padding_op.cpp
│       │   ├── scalar_bitwise_op.cpp
│       │   ├── scalar_by_tensor_op.cpp
│       │   ├── scalar_logical_op.cpp
│       │   ├── scalar_math_op.cpp
│       │   ├── scaled_dot_product_flash_attention_op.cpp
│       │   ├── search_sorted_op.cpp
│       │   ├── selu_op.cpp
│       │   ├── sigmoid_cross_entropy_op.cpp
│       │   ├── silu_op.cpp
│       │   ├── skip_layer_norm_op.cpp
│       │   ├── skip_rms_norm_op.cpp
│       │   ├── slice_op.cpp
│       │   ├── smooth_l1_loss_op.cpp
│       │   ├── softmax_cross_entropy_op.cpp
│       │   ├── softmax_op.cpp
│       │   ├── softplus_op.cpp
│       │   ├── softshrink_op.cpp
│       │   ├── softsign_op.cpp
│       │   ├── sort_op.cpp
│       │   ├── sparse_cross_entropy_op.cpp
│       │   ├── sparse_softmax_cross_entropy_op.cpp
│       │   ├── split_like_op.cpp
│       │   ├── sqrt_square_sum_op.cpp
│       │   ├── square_relu_op.cpp
│       │   ├── square_sum_op.cpp
│       │   ├── squeeze_op.cpp
│       │   ├── ssp_variable_proxy_op.cpp
│       │   ├── stack_op.cpp
│       │   ├── stft_op.cpp
│       │   ├── summary_ops.cpp
│       │   ├── tanh_op.cpp
│       │   ├── tensor_buffer_ops.cpp
│       │   ├── tensor_constant_op.cpp
│       │   ├── tf_pool_op.cpp
│       │   ├── tf_prelu_op.cpp
│       │   ├── threshold_op.cpp
│       │   ├── throw_error_op.cpp
│       │   ├── to_contiguous_op.cpp
│       │   ├── top_k_op.cpp
│       │   ├── transpose_ops.cpp
│       │   ├── tril_op.cpp
│       │   ├── triu_op.cpp
│       │   ├── trunc_op.cpp
│       │   ├── tuple_identity_op.cpp
│       │   ├── two_stage_reduce_ops.cpp
│       │   ├── unfold_fold_op.cpp
│       │   ├── unfold_tensor_op.cpp
│       │   ├── unique_op.cpp
│       │   ├── unique_with_counts_op.cpp
│       │   ├── unpack_op.cpp
│       │   ├── unsorted_batch_segment_sum_op.cpp
│       │   ├── unsorted_segment_sum_op.cpp
│       │   ├── upsample_op.cpp
│       │   ├── util_ops.cpp
│       │   ├── variance_op.cpp
│       │   ├── vector_matrix_product_op.cpp
│       │   ├── where_op.cpp
│       │   └── zero_like_op.cpp
│       ├── summary/
│       │   ├── crc32c.h
│       │   ├── env_time.h
│       │   ├── event_writer_helper.cpp
│       │   ├── event_writer_helper.h
│       │   ├── events_writer.cpp
│       │   ├── events_writer.h
│       │   ├── histogram.cpp
│       │   ├── histogram.h
│       │   ├── plan_to_physical_graph.cpp
│       │   ├── plan_to_physical_graph.h
│       │   └── summary_converter.h
│       └── utils/
│           ├── pool_util.cpp
│           └── pool_util.h
├── python/
│   ├── .gitignore
│   ├── oneflow/
│   │   ├── _C/
│   │   │   ├── __init__.py
│   │   │   └── _nn.py
│   │   ├── __init__.py
│   │   ├── __main__.py
│   │   ├── _dynamo/
│   │   │   └── __init__.py
│   │   ├── _utils.py
│   │   ├── amp/
│   │   │   ├── __init__.py
│   │   │   ├── autocast_mode.py
│   │   │   └── grad_scaler.py
│   │   ├── ao/
│   │   │   └── quantization.py
│   │   ├── asyncs/
│   │   │   ├── __init__.py
│   │   │   └── thread.py
│   │   ├── autograd/
│   │   │   ├── __init__.py
│   │   │   ├── autograd.py
│   │   │   ├── autograd_function.py
│   │   │   ├── autograd_mode.py
│   │   │   ├── functional.py
│   │   │   ├── graph.py
│   │   │   └── profiler.py
│   │   ├── autoprof/
│   │   │   ├── __init__.py
│   │   │   ├── __main__.py
│   │   │   └── util.py
│   │   ├── backends/
│   │   │   ├── __init__.py
│   │   │   ├── cuda/
│   │   │   │   └── __init__.py
│   │   │   ├── cudnn/
│   │   │   │   └── __init__.py
│   │   │   └── mps/
│   │   │       └── __init__.py
│   │   ├── boxing/
│   │   │   ├── __init__.py
│   │   │   └── nccl/
│   │   │       └── __init__.py
│   │   ├── comm/
│   │   │   ├── __init__.py
│   │   │   └── comm_ops.py
│   │   ├── cuda/
│   │   │   ├── __init__.py
│   │   │   ├── _utils.py
│   │   │   ├── amp/
│   │   │   │   ├── __init__.py
│   │   │   │   └── autocast_mode.py
│   │   │   ├── random.py
│   │   │   └── type_tensor.py
│   │   ├── data.py
│   │   ├── distributed/
│   │   │   ├── __init__.py
│   │   │   ├── constants.py
│   │   │   └── launch.py
│   │   ├── distributions/
│   │   │   ├── __init__.py
│   │   │   ├── categorical.py
│   │   │   ├── distribution.py
│   │   │   └── utils.py
│   │   ├── env.py
│   │   ├── experimental/
│   │   │   └── load_mnist.py
│   │   ├── fft/
│   │   │   └── __init__.py
│   │   ├── framework/
│   │   │   ├── __init__.py
│   │   │   ├── args_tree.py
│   │   │   ├── attr_util.py
│   │   │   ├── balanced_splitter.py
│   │   │   ├── c_api_util.py
│   │   │   ├── check_point_v2.py
│   │   │   ├── config_util.py
│   │   │   ├── distribute.py
│   │   │   ├── docstr/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── activation.py
│   │   │   │   ├── addcdiv.py
│   │   │   │   ├── amax.py
│   │   │   │   ├── amin.py
│   │   │   │   ├── arange.py
│   │   │   │   ├── argsort.py
│   │   │   │   ├── array_ops.py
│   │   │   │   ├── as_tensor.py
│   │   │   │   ├── autograd.py
│   │   │   │   ├── baddbmm.py
│   │   │   │   ├── bitwise_ops.py
│   │   │   │   ├── bmm.py
│   │   │   │   ├── broadcast_like.py
│   │   │   │   ├── cast.py
│   │   │   │   ├── chunk.py
│   │   │   │   ├── clamp.py
│   │   │   │   ├── comm.py
│   │   │   │   ├── comparison.py
│   │   │   │   ├── constant.py
│   │   │   │   ├── conv.py
│   │   │   │   ├── convolution.py
│   │   │   │   ├── ctc_decode.py
│   │   │   │   ├── dataset.py
│   │   │   │   ├── deconv.py
│   │   │   │   ├── depend.py
│   │   │   │   ├── distance.py
│   │   │   │   ├── dropout.py
│   │   │   │   ├── einsum.py
│   │   │   │   ├── erfinv.py
│   │   │   │   ├── expand.py
│   │   │   │   ├── flatten.py
│   │   │   │   ├── flip.py
│   │   │   │   ├── hann_window.py
│   │   │   │   ├── in_top_k.py
│   │   │   │   ├── index_add.py
│   │   │   │   ├── index_select.py
│   │   │   │   ├── inv.py
│   │   │   │   ├── is_floating_point.py
│   │   │   │   ├── lerp.py
│   │   │   │   ├── linalg.py
│   │   │   │   ├── logaddexp.py
│   │   │   │   ├── logical_ops.py
│   │   │   │   ├── loss.py
│   │   │   │   ├── masked_fill.py
│   │   │   │   ├── math_ops.py
│   │   │   │   ├── meshgrid.py
│   │   │   │   ├── module.py
│   │   │   │   ├── nms.py
│   │   │   │   ├── nonzero.py
│   │   │   │   ├── norm.py
│   │   │   │   ├── normalization.py
│   │   │   │   ├── oneflow.py
│   │   │   │   ├── onehot.py
│   │   │   │   ├── pooling.py
│   │   │   │   ├── quantile.py
│   │   │   │   ├── random.py
│   │   │   │   ├── reduce_ops.py
│   │   │   │   ├── repeat.py
│   │   │   │   ├── repeat_interleave.py
│   │   │   │   ├── roc_auc_score.py
│   │   │   │   ├── searchsorted.py
│   │   │   │   ├── sort.py
│   │   │   │   ├── special_ops.py
│   │   │   │   ├── split.py
│   │   │   │   ├── swapaxes.py
│   │   │   │   ├── swapdims.py
│   │   │   │   ├── tensor.py
│   │   │   │   ├── tensor_attributes.py
│   │   │   │   ├── tensor_ops.py
│   │   │   │   ├── tensor_t.py
│   │   │   │   ├── tensordot.py
│   │   │   │   ├── tile.py
│   │   │   │   ├── topk.py
│   │   │   │   ├── trigonometric_ops.py
│   │   │   │   ├── unbind.py
│   │   │   │   ├── util_ops.py
│   │   │   │   ├── utils.py
│   │   │   │   ├── vision.py
│   │   │   │   └── where.py
│   │   │   ├── dtype.py
│   │   │   ├── env_util.py
│   │   │   ├── function_desc.py
│   │   │   ├── function_util.py
│   │   │   ├── generator.py
│   │   │   ├── graph_build_util.py
│   │   │   ├── hob.py
│   │   │   ├── id_util.py
│   │   │   ├── infer_compiler/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── import_tools/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── format_utils.py
│   │   │   │   │   └── importer.py
│   │   │   │   ├── transform/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── builtin_transform.py
│   │   │   │   │   ├── custom_transform.py
│   │   │   │   │   └── manager.py
│   │   │   │   ├── utils/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── args_tree_util.py
│   │   │   │   │   ├── cost_util.py
│   │   │   │   │   ├── log_utils.py
│   │   │   │   │   ├── oneflow_exec_mode.py
│   │   │   │   │   ├── param_utils.py
│   │   │   │   │   ├── patch_for_compiler.py
│   │   │   │   │   └── patch_for_diffusers.py
│   │   │   │   ├── with_fx_graph.py
│   │   │   │   ├── with_fx_interpreter.py
│   │   │   │   ├── with_oneflow_backend.py
│   │   │   │   └── with_oneflow_compile.py
│   │   │   ├── job_set_util.py
│   │   │   ├── model.py
│   │   │   ├── multi_client_session.py
│   │   │   ├── register_class_method_util.py
│   │   │   ├── scope_util.py
│   │   │   ├── session_context.py
│   │   │   ├── sysconfig.py
│   │   │   ├── tensor.py
│   │   │   ├── tensor_str.py
│   │   │   ├── tensor_str_util.py
│   │   │   ├── tensor_tuple_util.py
│   │   │   ├── type_tensor.py
│   │   │   └── unittest.py
│   │   ├── fx/
│   │   │   └── __init__.py
│   │   ├── hub.py
│   │   ├── ir/
│   │   │   ├── __main__.py
│   │   │   ├── ast_gen_transformer.py
│   │   │   ├── bisect_transformer.py
│   │   │   ├── lr_jit.py
│   │   │   ├── math_params_transformer.py
│   │   │   └── self_params_transformer.py
│   │   ├── jit/
│   │   │   ├── __init__.py
│   │   │   └── annotations.py
│   │   ├── library.py
│   │   ├── linalg.py
│   │   ├── mock_torch/
│   │   │   ├── __init__.py
│   │   │   ├── __main__.py
│   │   │   ├── dyn_mock_mod.py
│   │   │   ├── mock_importer.py
│   │   │   ├── mock_modules.py
│   │   │   ├── mock_utils.py
│   │   │   └── torch/
│   │   │       └── __init__.py
│   │   ├── model.py
│   │   ├── multiprocessing/
│   │   │   ├── __init__.py
│   │   │   ├── _atfork.py
│   │   │   ├── pool.py
│   │   │   ├── queue.py
│   │   │   ├── reductions.py
│   │   │   ├── shared_memory/
│   │   │   │   └── __init__.py
│   │   │   └── spawn.py
│   │   ├── nn/
│   │   │   ├── __init__.py
│   │   │   ├── common_types.py
│   │   │   ├── functional/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── batch_norm.py
│   │   │   │   ├── ctc_loss.py
│   │   │   │   ├── deform_conv.py
│   │   │   │   ├── depend.py
│   │   │   │   ├── maxpool.py
│   │   │   │   ├── pad.py
│   │   │   │   └── softmax.py
│   │   │   ├── graph/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── cache.py
│   │   │   │   ├── graph.py
│   │   │   │   ├── graph_block.py
│   │   │   │   ├── graph_config.py
│   │   │   │   ├── optimizer.py
│   │   │   │   ├── proxy.py
│   │   │   │   └── util.py
│   │   │   ├── image.py
│   │   │   ├── init.py
│   │   │   ├── modules/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── _functions.py
│   │   │   │   ├── activation.py
│   │   │   │   ├── affine_grid.py
│   │   │   │   ├── all_reduce.py
│   │   │   │   ├── arange.py
│   │   │   │   ├── argsort.py
│   │   │   │   ├── argwhere.py
│   │   │   │   ├── as_tensor.py
│   │   │   │   ├── batchnorm.py
│   │   │   │   ├── batchnorm_fused.py
│   │   │   │   ├── broadcast_ops.py
│   │   │   │   ├── constant.py
│   │   │   │   ├── container.py
│   │   │   │   ├── conv.py
│   │   │   │   ├── dataset.py
│   │   │   │   ├── distance.py
│   │   │   │   ├── distributed_partial_fc_sample.py
│   │   │   │   ├── dropout.py
│   │   │   │   ├── einsum.py
│   │   │   │   ├── empty.py
│   │   │   │   ├── expand.py
│   │   │   │   ├── fake_quantization.py
│   │   │   │   ├── flatten.py
│   │   │   │   ├── fold.py
│   │   │   │   ├── fused_mlp.py
│   │   │   │   ├── global_cast.py
│   │   │   │   ├── grid_sample.py
│   │   │   │   ├── instancenorm.py
│   │   │   │   ├── interpolate.py
│   │   │   │   ├── is_tensor.py
│   │   │   │   ├── linear.py
│   │   │   │   ├── linspace.py
│   │   │   │   ├── logspace.py
│   │   │   │   ├── loss.py
│   │   │   │   ├── masked_select.py
│   │   │   │   ├── math_ops.py
│   │   │   │   ├── meshgrid.py
│   │   │   │   ├── min_max_observer.py
│   │   │   │   ├── module.py
│   │   │   │   ├── moving_average_min_max_observer.py
│   │   │   │   ├── nms.py
│   │   │   │   ├── nonzero.py
│   │   │   │   ├── norm.py
│   │   │   │   ├── normalization.py
│   │   │   │   ├── numel.py
│   │   │   │   ├── padding.py
│   │   │   │   ├── pixelshuffle.py
│   │   │   │   ├── pooling.py
│   │   │   │   ├── quantization.py
│   │   │   │   ├── reshape.py
│   │   │   │   ├── rnn.py
│   │   │   │   ├── roll.py
│   │   │   │   ├── scatter.py
│   │   │   │   ├── slice.py
│   │   │   │   ├── sparse.py
│   │   │   │   ├── sparse_softmax_cross_entropy.py
│   │   │   │   ├── tensor_buffer.py
│   │   │   │   ├── tensordot.py
│   │   │   │   ├── trigonometric_ops.py
│   │   │   │   ├── unique.py
│   │   │   │   ├── upsampling.py
│   │   │   │   ├── utils.py
│   │   │   │   └── where.py
│   │   │   ├── optimizer/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── adadelta.py
│   │   │   │   ├── adagrad.py
│   │   │   │   ├── adam.py
│   │   │   │   ├── adamw.py
│   │   │   │   ├── chained_scheduler.py
│   │   │   │   ├── constant_lr.py
│   │   │   │   ├── cosine_annealing_lr.py
│   │   │   │   ├── cosine_annealing_warm_restarts.py
│   │   │   │   ├── cosine_decay_lr.py
│   │   │   │   ├── exponential_lr.py
│   │   │   │   ├── lamb.py
│   │   │   │   ├── lambda_lr.py
│   │   │   │   ├── lbfgs.py
│   │   │   │   ├── linear_lr.py
│   │   │   │   ├── lr_scheduler.py
│   │   │   │   ├── multiplicative_lr.py
│   │   │   │   ├── multistep_lr.py
│   │   │   │   ├── polynomial_lr.py
│   │   │   │   ├── reduce_lr_on_plateau.py
│   │   │   │   ├── rmsprop.py
│   │   │   │   ├── sequential_lr.py
│   │   │   │   ├── sgd.py
│   │   │   │   ├── step_lr.py
│   │   │   │   ├── swa_utils.py
│   │   │   │   └── warmup_lr.py
│   │   │   ├── parallel/
│   │   │   │   ├── __init__.py
│   │   │   │   └── distributed.py
│   │   │   ├── parameter.py
│   │   │   ├── qat/
│   │   │   │   ├── __init__.py
│   │   │   │   └── conv.py
│   │   │   └── utils/
│   │   │       ├── __init__.py
│   │   │       ├── clip_grad.py
│   │   │       ├── container.py
│   │   │       ├── convert_parameters.py
│   │   │       ├── parameters_grouping.py
│   │   │       ├── prune.py
│   │   │       ├── rnn.py
│   │   │       ├── skip_init.py
│   │   │       └── weight_norm.py
│   │   ├── one_embedding.py
│   │   ├── onnx/
│   │   │   ├── __init__.py
│   │   │   └── symbolic_helper.py
│   │   ├── ops/
│   │   │   ├── __init__.py
│   │   │   ├── array_ops.py
│   │   │   ├── stateful_ops.py
│   │   │   ├── transpose_util.py
│   │   │   └── util/
│   │   │       ├── __init__.py
│   │   │       └── initializer_util.py
│   │   ├── optim/
│   │   │   ├── __init__.py
│   │   │   ├── lr_scheduler.py
│   │   │   ├── optimizer.py
│   │   │   └── swa_utils.py
│   │   ├── profiler/
│   │   │   ├── __init__.py
│   │   │   ├── events.py
│   │   │   ├── profiler.py
│   │   │   └── util.py
│   │   ├── remat/
│   │   │   └── __init__.py
│   │   ├── sbp.py
│   │   ├── special/
│   │   │   ├── __init__.py
│   │   │   └── special_ops.py
│   │   ├── support/
│   │   │   ├── __init__.py
│   │   │   ├── async_util.py
│   │   │   ├── box.py
│   │   │   ├── enable_if.py
│   │   │   ├── env_var_util.py
│   │   │   ├── func_inspect_util.py
│   │   │   ├── high_order_bool.py
│   │   │   ├── lazy.py
│   │   │   ├── pb_util.py
│   │   │   ├── scope_stack.py
│   │   │   └── traceinfo.py
│   │   ├── sysconfig.py
│   │   ├── test/
│   │   │   ├── README.md
│   │   │   ├── dataloader/
│   │   │   │   ├── data_utils.py
│   │   │   │   ├── test_cifar_dataset_multiprocess.py
│   │   │   │   ├── test_cifar_dataset_singleprocess.py
│   │   │   │   ├── test_fashion_mnist_dataset.py
│   │   │   │   ├── test_lenet.py
│   │   │   │   ├── test_mnist_dataset.py
│   │   │   │   ├── test_numpy_dataset.py
│   │   │   │   ├── test_tensor_dataset.py
│   │   │   │   └── test_transforms.py
│   │   │   ├── exceptions/
│   │   │   │   ├── test_activation.py
│   │   │   │   ├── test_add_n_op.py
│   │   │   │   ├── test_arg_sort_op.py
│   │   │   │   ├── test_array_functor.py
│   │   │   │   ├── test_autograd.py
│   │   │   │   ├── test_batch_gather_op.py
│   │   │   │   ├── test_bias_add_op.py
│   │   │   │   ├── test_binary_functor_exception.py
│   │   │   │   ├── test_bmm.py
│   │   │   │   ├── test_broadcast_ops.py
│   │   │   │   ├── test_chunk.py
│   │   │   │   ├── test_cosine_similarity.py
│   │   │   │   ├── test_deform_conv2d_op.py
│   │   │   │   ├── test_device.py
│   │   │   │   ├── test_dot.py
│   │   │   │   ├── test_error_reported_in_thread.py
│   │   │   │   ├── test_gird_sample_op.py
│   │   │   │   ├── test_global_branch_error_local_to_global_with_broadcast_sbp_1n2d.py
│   │   │   │   ├── test_global_branch_error_local_to_global_with_broadcast_sbp_1n4d.py
│   │   │   │   ├── test_global_branch_error_local_to_global_with_split_sbp.py
│   │   │   │   ├── test_global_branch_error_with_global_mean.py
│   │   │   │   ├── test_hann_window.py
│   │   │   │   ├── test_in_top_k.py
│   │   │   │   ├── test_inv.py
│   │   │   │   ├── test_layernorm.py
│   │   │   │   ├── test_linalg.py
│   │   │   │   ├── test_local_global_convert_error.py
│   │   │   │   ├── test_median.py
│   │   │   │   ├── test_mm.py
│   │   │   │   ├── test_mode.py
│   │   │   │   ├── test_multi_input_with_diff_device_or_placement.py
│   │   │   │   ├── test_mv.py
│   │   │   │   ├── test_nn_functor.py
│   │   │   │   ├── test_optim_add_param_group.py
│   │   │   │   ├── test_pad.py
│   │   │   │   ├── test_placement.py
│   │   │   │   ├── test_randperm_op.py
│   │   │   │   ├── test_reduce_like_ops.py
│   │   │   │   ├── test_reduce_ops.py
│   │   │   │   ├── test_repeat_interleave.py
│   │   │   │   ├── test_reshape.py
│   │   │   │   ├── test_reshape_like_op.py
│   │   │   │   ├── test_roi_align_op.py
│   │   │   │   ├── test_save_load.py
│   │   │   │   ├── test_saved_tensor_hooks.py
│   │   │   │   ├── test_slice_op.py
│   │   │   │   ├── test_smooth_l1_loss_op.py
│   │   │   │   ├── test_softmax_cross_entropy_op.py
│   │   │   │   ├── test_sparse_cross_entropy_op.py
│   │   │   │   ├── test_sparse_softmax_cross_entropy_op.py
│   │   │   │   ├── test_split_like_op.py
│   │   │   │   ├── test_stft_op.py
│   │   │   │   ├── test_tensor_index.py
│   │   │   │   ├── test_tensordot.py
│   │   │   │   ├── test_to_global_error.py
│   │   │   │   ├── test_view.py
│   │   │   │   └── throw_error.py
│   │   │   ├── expensive/
│   │   │   │   ├── README.md
│   │   │   │   ├── _internally_replaced_utils.py
│   │   │   │   ├── _test_remat.py
│   │   │   │   ├── pytorch_alexnet.py
│   │   │   │   ├── pytorch_convmixer.py
│   │   │   │   ├── pytorch_convnext.py
│   │   │   │   ├── pytorch_crossformer.py
│   │   │   │   ├── pytorch_densenet.py
│   │   │   │   ├── pytorch_efficientnet.py
│   │   │   │   ├── pytorch_ghostnet.py
│   │   │   │   ├── pytorch_googlenet.py
│   │   │   │   ├── pytorch_inception_v3.py
│   │   │   │   ├── pytorch_levit.py
│   │   │   │   ├── pytorch_mnasnet.py
│   │   │   │   ├── pytorch_poolformer.py
│   │   │   │   ├── pytorch_pvt.py
│   │   │   │   ├── pytorch_res2net.py
│   │   │   │   ├── pytorch_resmlp.py
│   │   │   │   ├── pytorch_resnet.py
│   │   │   │   ├── pytorch_rexnet.py
│   │   │   │   ├── pytorch_rexnetv1_lite.py
│   │   │   │   ├── pytorch_senet.py
│   │   │   │   ├── pytorch_shufflenetv2.py
│   │   │   │   ├── pytorch_squeezenet.py
│   │   │   │   ├── pytorch_swin_transformer.py
│   │   │   │   ├── pytorch_uniformer.py
│   │   │   │   ├── pytroch_mlp_mixer.py
│   │   │   │   ├── resnet50_model.py
│   │   │   │   ├── test_compatibility.py
│   │   │   │   ├── test_conv3d.py
│   │   │   │   ├── test_convtranspose.py
│   │   │   │   ├── test_dynamic_allocation_gradient_shuffle.py
│   │   │   │   ├── test_einsum.py
│   │   │   │   ├── test_global_tensor_offload.py
│   │   │   │   ├── test_graph_multi_graph_v2.py
│   │   │   │   ├── test_id_shuffle.py
│   │   │   │   ├── test_id_shuffle_global.py
│   │   │   │   ├── test_layernorm.py
│   │   │   │   ├── test_oneembedding.py
│   │   │   │   ├── test_oneembedding_padding_idx.py
│   │   │   │   ├── test_permute.py
│   │   │   │   ├── test_remat.py
│   │   │   │   ├── test_resnet50_with_bn.py
│   │   │   │   ├── test_resnet50_without_bn.py
│   │   │   │   ├── test_rnn.py
│   │   │   │   ├── test_rnn_cell.py
│   │   │   │   ├── test_rnn_pack_sequence.py
│   │   │   │   ├── test_rnn_utils.py
│   │   │   │   ├── test_sqrt_square_sum.py
│   │   │   │   ├── test_tensor_offload.py
│   │   │   │   ├── test_tensor_str.py
│   │   │   │   └── test_util.py
│   │   │   ├── gen_ops_process.py
│   │   │   ├── graph/
│   │   │   │   ├── alexnet_model.py
│   │   │   │   ├── ofrecord_data_utils.py
│   │   │   │   ├── optimizer_test_util.py
│   │   │   │   ├── test_alexnet_auto_parallel.py
│   │   │   │   ├── test_alexnet_graph.py
│   │   │   │   ├── test_comb1to2d.py
│   │   │   │   ├── test_comb2d.py
│   │   │   │   ├── test_forward_graph.py
│   │   │   │   ├── test_free_tensor_not_in_job.py
│   │   │   │   ├── test_fx_fuse.py
│   │   │   │   ├── test_fx_replace_ops.py
│   │   │   │   ├── test_fx_symbolic_trace_module.py
│   │   │   │   ├── test_gbc1to2d.py
│   │   │   │   ├── test_gbc2d.py
│   │   │   │   ├── test_gbc2to1d.py
│   │   │   │   ├── test_gbc2to2d.py
│   │   │   │   ├── test_graph.py
│   │   │   │   ├── test_graph_activation_checkpoint.py
│   │   │   │   ├── test_graph_arange.py
│   │   │   │   ├── test_graph_asymmetric_io.py
│   │   │   │   ├── test_graph_block.py
│   │   │   │   ├── test_graph_buffer_limit.py
│   │   │   │   ├── test_graph_clip_grad_norm.py
│   │   │   │   ├── test_graph_copy.py
│   │   │   │   ├── test_graph_debug.py
│   │   │   │   ├── test_graph_depend.py
│   │   │   │   ├── test_graph_eye.py
│   │   │   │   ├── test_graph_free_eager_tensor.py
│   │   │   │   ├── test_graph_grad_acc.py
│   │   │   │   ├── test_graph_image_gpu_decoder.py
│   │   │   │   ├── test_graph_inplace_add.py
│   │   │   │   ├── test_graph_io_check.py
│   │   │   │   ├── test_graph_linear.py
│   │   │   │   ├── test_graph_linear_train.py
│   │   │   │   ├── test_graph_loss.py
│   │   │   │   ├── test_graph_lr_scale.py
│   │   │   │   ├── test_graph_lr_scheduler.py
│   │   │   │   ├── test_graph_lr_with_warmup.py
│   │   │   │   ├── test_graph_lrs.py
│   │   │   │   ├── test_graph_masked_fill.py
│   │   │   │   ├── test_graph_nccl_logical_fusion.py
│   │   │   │   ├── test_graph_non_contiguous_tensors.py
│   │   │   │   ├── test_graph_normal_inplace.py
│   │   │   │   ├── test_graph_ofrecord_reader.py
│   │   │   │   ├── test_graph_optim_adadelta.py
│   │   │   │   ├── test_graph_optim_adagrad.py
│   │   │   │   ├── test_graph_optim_adam.py
│   │   │   │   ├── test_graph_optim_adamw.py
│   │   │   │   ├── test_graph_optim_ftrl.py
│   │   │   │   ├── test_graph_optim_lamb.py
│   │   │   │   ├── test_graph_optim_rmsprop.py
│   │   │   │   ├── test_graph_optim_sgd.py
│   │   │   │   ├── test_graph_optimizer.py
│   │   │   │   ├── test_graph_pipeline.py
│   │   │   │   ├── test_graph_pipeline_delay.py
│   │   │   │   ├── test_graph_random_seed.py
│   │   │   │   ├── test_graph_relu.py
│   │   │   │   ├── test_graph_reshape_acc.py
│   │   │   │   ├── test_graph_reuse_var.py
│   │   │   │   ├── test_graph_save_load.py
│   │   │   │   ├── test_graph_save_load_global_b_s.py
│   │   │   │   ├── test_graph_scalar.py
│   │   │   │   ├── test_graph_separate_compile.py
│   │   │   │   ├── test_graph_session_env_destruct.py
│   │   │   │   ├── test_graph_session_env_destruct1.py
│   │   │   │   ├── test_graph_sparse_optimizer.py
│   │   │   │   ├── test_graph_sparse_softmax_cross_entropy.py
│   │   │   │   ├── test_graph_tensor_clone.py
│   │   │   │   ├── test_graph_tensor_detach.py
│   │   │   │   ├── test_graph_with_global.py
│   │   │   │   ├── test_graph_zero.py
│   │   │   │   ├── test_input_op_expr.py
│   │   │   │   ├── test_long_add_n_pass.py
│   │   │   │   ├── test_modify_module_forward.py
│   │   │   │   ├── test_multi_client_session.py
│   │   │   │   ├── test_multi_graph.py
│   │   │   │   ├── test_multi_tensor_adam_update_with_cast.py
│   │   │   │   ├── test_multi_tensor_sgd_update_with_cast.py
│   │   │   │   ├── test_nccl_logical_send_recv.py
│   │   │   │   ├── test_neq_device_process_num.py
│   │   │   │   ├── test_oneflow_compiler.py
│   │   │   │   ├── test_optimization_conf.py
│   │   │   │   ├── test_output_op_expr.py
│   │   │   │   ├── test_run_global_graph_by_vm.py
│   │   │   │   ├── test_run_graph_by_vm.py
│   │   │   │   ├── test_to_global.py
│   │   │   │   ├── test_tvm_frontend_dependency_on_graph.py
│   │   │   │   ├── test_user_op_expr.py
│   │   │   │   ├── test_util.py
│   │   │   │   └── test_variable_op_expr.py
│   │   │   ├── misc/
│   │   │   │   ├── mock_example.py
│   │   │   │   ├── test_autograd_functional.py
│   │   │   │   ├── test_distributed_env_vars.py
│   │   │   │   ├── test_empty_cache.py
│   │   │   │   ├── test_env_cuda.py
│   │   │   │   ├── test_manual_seed_api.py
│   │   │   │   ├── test_mock_diffusers.py
│   │   │   │   ├── test_mock_scope.py
│   │   │   │   ├── test_np_dtype_converter.py
│   │   │   │   ├── test_placement.py
│   │   │   │   └── test_pybind11_caster.py
│   │   │   ├── modules/
│   │   │   │   ├── image_test_util.py
│   │   │   │   ├── optimizer_test_util.py
│   │   │   │   ├── save_load_test_data/
│   │   │   │   │   ├── 3x3_i3o3_conv2d/
│   │   │   │   │   │   ├── pickled_data
│   │   │   │   │   │   ├── tensor_3/
│   │   │   │   │   │   │   ├── meta
│   │   │   │   │   │   │   └── out
│   │   │   │   │   │   └── tensor_4/
│   │   │   │   │   │       ├── meta
│   │   │   │   │   │       └── out
│   │   │   │   │   └── 3x3_i3o3_conv2d_params/
│   │   │   │   │       ├── pickled_data
│   │   │   │   │       ├── tensor_5/
│   │   │   │   │       │   ├── meta
│   │   │   │   │       │   └── out
│   │   │   │   │       └── tensor_6/
│   │   │   │   │           ├── meta
│   │   │   │   │           └── out
│   │   │   │   ├── sync_batchnorm_test_util.py
│   │   │   │   ├── test_0_dim_tensor.py
│   │   │   │   ├── test_TripletMarginLoss.py
│   │   │   │   ├── test_abs.py
│   │   │   │   ├── test_activation.py
│   │   │   │   ├── test_adaptive_max_pool.py
│   │   │   │   ├── test_adaptive_pool.py
│   │   │   │   ├── test_adaptive_pool_fp16.py
│   │   │   │   ├── test_add.py
│   │   │   │   ├── test_addcdiv.py
│   │   │   │   ├── test_addcmul.py
│   │   │   │   ├── test_addmm.py
│   │   │   │   ├── test_affine_grid.py
│   │   │   │   ├── test_allclose.py
│   │   │   │   ├── test_allreduce.py
│   │   │   │   ├── test_amax.py
│   │   │   │   ├── test_amin.py
│   │   │   │   ├── test_arange.py
│   │   │   │   ├── test_argmax.py
│   │   │   │   ├── test_argmin.py
│   │   │   │   ├── test_argsort.py
│   │   │   │   ├── test_argwhere.py
│   │   │   │   ├── test_as_strided.py
│   │   │   │   ├── test_as_tensor.py
│   │   │   │   ├── test_asyncs_thread.py
│   │   │   │   ├── test_atleast.py
│   │   │   │   ├── test_auto_to_global.py
│   │   │   │   ├── test_autograd.py
│   │   │   │   ├── test_autograd_function.py
│   │   │   │   ├── test_autograd_mode.py
│   │   │   │   ├── test_avgpool.py
│   │   │   │   ├── test_baddbmm.py
│   │   │   │   ├── test_batch_gather.py
│   │   │   │   ├── test_batchnorm.py
│   │   │   │   ├── test_batchnorm_add_relu.py
│   │   │   │   ├── test_bernoulli.py
│   │   │   │   ├── test_binary_math_ops_dtype.py
│   │   │   │   ├── test_bincount.py
│   │   │   │   ├── test_bitwise.py
│   │   │   │   ├── test_bmm.py
│   │   │   │   ├── test_broadcast_like.py
│   │   │   │   ├── test_broadcast_ops.py
│   │   │   │   ├── test_cast.py
│   │   │   │   ├── test_ceil.py
│   │   │   │   ├── test_check_meta_consistency.py
│   │   │   │   ├── test_checkpointing.py
│   │   │   │   ├── test_chunk.py
│   │   │   │   ├── test_clamp.py
│   │   │   │   ├── test_clip_grad.py
│   │   │   │   ├── test_clone.py
│   │   │   │   ├── test_coco_reader.py
│   │   │   │   ├── test_coin_flip.py
│   │   │   │   ├── test_comb2to2d.py
│   │   │   │   ├── test_combined_margin_loss.py
│   │   │   │   ├── test_comm.py
│   │   │   │   ├── test_comm_ops.py
│   │   │   │   ├── test_concat.py
│   │   │   │   ├── test_constant.py
│   │   │   │   ├── test_constant_pad.py
│   │   │   │   ├── test_contiguous.py
│   │   │   │   ├── test_conv1d.py
│   │   │   │   ├── test_conv2d.py
│   │   │   │   ├── test_copy.py
│   │   │   │   ├── test_cosine_similarity.py
│   │   │   │   ├── test_ctc_greedy_decoder.py
│   │   │   │   ├── test_ctc_loss.py
│   │   │   │   ├── test_cublas_fused_mlp.py
│   │   │   │   ├── test_cum_ops.py
│   │   │   │   ├── test_dataset.py
│   │   │   │   ├── test_ddp.py
│   │   │   │   ├── test_ddp_multi_outputs.py
│   │   │   │   ├── test_deconv2d.py
│   │   │   │   ├── test_default_dtype.py
│   │   │   │   ├── test_deform_conv2d.py
│   │   │   │   ├── test_det.py
│   │   │   │   ├── test_diag.py
│   │   │   │   ├── test_diagonal.py
│   │   │   │   ├── test_div.py
│   │   │   │   ├── test_dlpack.py
│   │   │   │   ├── test_dot.py
│   │   │   │   ├── test_dropout.py
│   │   │   │   ├── test_dynamic_allocation_gradient_shuffle_shuffle_global.py
│   │   │   │   ├── test_eager_boxing.py
│   │   │   │   ├── test_eager_boxing_exhaustive.py
│   │   │   │   ├── test_empty.py
│   │   │   │   ├── test_eq.py
│   │   │   │   ├── test_equal.py
│   │   │   │   ├── test_erf.py
│   │   │   │   ├── test_erfc.py
│   │   │   │   ├── test_erfinv.py
│   │   │   │   ├── test_expand.py
│   │   │   │   ├── test_expand_stride.py
│   │   │   │   ├── test_expm1.py
│   │   │   │   ├── test_eye.py
│   │   │   │   ├── test_fake_quantization.py
│   │   │   │   ├── test_fft.py
│   │   │   │   ├── test_flatten.py
│   │   │   │   ├── test_flip.py
│   │   │   │   ├── test_floor.py
│   │   │   │   ├── test_fmod.py
│   │   │   │   ├── test_fold.py
│   │   │   │   ├── test_fork_sub_process.py
│   │   │   │   ├── test_frac.py
│   │   │   │   ├── test_from_numpy.py
│   │   │   │   ├── test_from_torch.py
│   │   │   │   ├── test_functional_docstr.py
│   │   │   │   ├── test_functional_scalar_tensor_param.py
│   │   │   │   ├── test_fused_attention_ops.py
│   │   │   │   ├── test_fused_bias_add_dropout.py
│   │   │   │   ├── test_fused_bias_add_gelu.py
│   │   │   │   ├── test_fused_bias_add_scale_mask_softmax_dropout.py
│   │   │   │   ├── test_fused_center.py
│   │   │   │   ├── test_fused_codegeex_qkv_reshape.py
│   │   │   │   ├── test_fused_cross_interaction.py
│   │   │   │   ├── test_fused_dot_feature_interaction.py
│   │   │   │   ├── test_fused_gelu_mul.py
│   │   │   │   ├── test_fused_get_boundding_boxes_coord.py
│   │   │   │   ├── test_fused_get_ciou_diagonal_angle.py
│   │   │   │   ├── test_fused_get_ciou_result.py
│   │   │   │   ├── test_fused_get_convex_diagonal_squared.py
│   │   │   │   ├── test_fused_get_intersection_area.py
│   │   │   │   ├── test_fused_get_iou.py
│   │   │   │   ├── test_fused_glu.py
│   │   │   │   ├── test_fused_matmul_bias.py
│   │   │   │   ├── test_fused_matmul_bias_add_relu_dropout.py
│   │   │   │   ├── test_fused_rotary_embedding.py
│   │   │   │   ├── test_fused_scale_mask_bias_softmax.py
│   │   │   │   ├── test_fused_scale_mask_softmax.py
│   │   │   │   ├── test_fused_scale_mask_softmax_dropout.py
│   │   │   │   ├── test_fused_scale_tril.py
│   │   │   │   ├── test_fused_self_attention.py
│   │   │   │   ├── test_fused_tril_softmax_mask_scale.py
│   │   │   │   ├── test_fused_weighted_sum.py
│   │   │   │   ├── test_gather.py
│   │   │   │   ├── test_gather_nd.py
│   │   │   │   ├── test_gelu_approximate.py
│   │   │   │   ├── test_generator.py
│   │   │   │   ├── test_global_0_dim_tensor.py
│   │   │   │   ├── test_global_TripletMarginLoss.py
│   │   │   │   ├── test_global_abs.py
│   │   │   │   ├── test_global_activation.py
│   │   │   │   ├── test_global_adaptive_pool.py
│   │   │   │   ├── test_global_add.py
│   │   │   │   ├── test_global_addcdiv.py
│   │   │   │   ├── test_global_addcmul.py
│   │   │   │   ├── test_global_addmm.py
│   │   │   │   ├── test_global_affine_grid.py
│   │   │   │   ├── test_global_argmax.py
│   │   │   │   ├── test_global_argmin.py
│   │   │   │   ├── test_global_argsort.py
│   │   │   │   ├── test_global_argwhere.py
│   │   │   │   ├── test_global_atleast.py
│   │   │   │   ├── test_global_avgpool.py
│   │   │   │   ├── test_global_batch_gather.py
│   │   │   │   ├── test_global_bincount.py
│   │   │   │   ├── test_global_bitwise.py
│   │   │   │   ├── test_global_broadcase_like.py
│   │   │   │   ├── test_global_broadcast_matmul.py
│   │   │   │   ├── test_global_broadcast_ops.py
│   │   │   │   ├── test_global_cast.py
│   │   │   │   ├── test_global_chunk.py
│   │   │   │   ├── test_global_clone.py
│   │   │   │   ├── test_global_coin_flip.py
│   │   │   │   ├── test_global_concat.py
│   │   │   │   ├── test_global_constant.py
│   │   │   │   ├── test_global_ctc_loss.py
│   │   │   │   ├── test_global_cumprod.py
│   │   │   │   ├── test_global_cumsum.py
│   │   │   │   ├── test_global_deconv2d.py
│   │   │   │   ├── test_global_deform_conv2d.py
│   │   │   │   ├── test_global_det.py
│   │   │   │   ├── test_global_diag.py
│   │   │   │   ├── test_global_diagonal.py
│   │   │   │   ├── test_global_div.py
│   │   │   │   ├── test_global_dot.py
│   │   │   │   ├── test_global_dropout.py
│   │   │   │   ├── test_global_einsum_alphaflod_usecase1.py
│   │   │   │   ├── test_global_einsum_alphaflod_usecase10.py
│   │   │   │   ├── test_global_einsum_alphaflod_usecase11.py
│   │   │   │   ├── test_global_einsum_alphaflod_usecase2.py
│   │   │   │   ├── test_global_einsum_alphaflod_usecase3.py
│   │   │   │   ├── test_global_einsum_alphaflod_usecase4.py
│   │   │   │   ├── test_global_einsum_alphaflod_usecase5.py
│   │   │   │   ├── test_global_einsum_alphaflod_usecase6.py
│   │   │   │   ├── test_global_einsum_alphaflod_usecase7.py
│   │   │   │   ├── test_global_einsum_alphaflod_usecase8.py
│   │   │   │   ├── test_global_einsum_alphaflod_usecase9.py
│   │   │   │   ├── test_global_einsum_attention.py
│   │   │   │   ├── test_global_einsum_batch_matmul.py
│   │   │   │   ├── test_global_einsum_batch_matmul2.py
│   │   │   │   ├── test_global_einsum_batch_matmul3.py
│   │   │   │   ├── test_global_einsum_batch_matmul4.py
│   │   │   │   ├── test_global_einsum_batch_matrix_vector_multiply.py
│   │   │   │   ├── test_global_einsum_batch_permute.py
│   │   │   │   ├── test_global_einsum_bilinear_transformation.py
│   │   │   │   ├── test_global_einsum_eltwise_mul_sum_row.py
│   │   │   │   ├── test_global_einsum_eltwise_mul_then_reduce_sum.py
│   │   │   │   ├── test_global_einsum_eltwise_multiply.py
│   │   │   │   ├── test_global_einsum_get_diagonal.py
│   │   │   │   ├── test_global_einsum_matmul.py
│   │   │   │   ├── test_global_einsum_matmul2.py
│   │   │   │   ├── test_global_einsum_matrix_column_sum.py
│   │   │   │   ├── test_global_einsum_matrix_transpose.py
│   │   │   │   ├── test_global_einsum_matrix_vector_multiply.py
│   │   │   │   ├── test_global_einsum_reduce_sum.py
│   │   │   │   ├── test_global_einsum_tensor_contraction.py
│   │   │   │   ├── test_global_einsum_tensor_contraction2.py
│   │   │   │   ├── test_global_einsum_vector_inner_product.py
│   │   │   │   ├── test_global_einsum_vector_outer_product.py
│   │   │   │   ├── test_global_empty.py
│   │   │   │   ├── test_global_eq.py
│   │   │   │   ├── test_global_erf.py
│   │   │   │   ├── test_global_erfc.py
│   │   │   │   ├── test_global_expand_op.py
│   │   │   │   ├── test_global_expm1.py
│   │   │   │   ├── test_global_eye.py
│   │   │   │   ├── test_global_fill.py
│   │   │   │   ├── test_global_flatten.py
│   │   │   │   ├── test_global_flip.py
│   │   │   │   ├── test_global_floor.py
│   │   │   │   ├── test_global_fmod.py
│   │   │   │   ├── test_global_fold.py
│   │   │   │   ├── test_global_frac.py
│   │   │   │   ├── test_global_full.py
│   │   │   │   ├── test_global_full_like.py
│   │   │   │   ├── test_global_greater.py
│   │   │   │   ├── test_global_greater_equal.py
│   │   │   │   ├── test_global_grid_sample.py
│   │   │   │   ├── test_global_groupnorm.py
│   │   │   │   ├── test_global_gru_cell.py
│   │   │   │   ├── test_global_hann_window.py
│   │   │   │   ├── test_global_higher_derivative_activation.py
│   │   │   │   ├── test_global_higher_derivative_conv.py
│   │   │   │   ├── test_global_higher_derivative_div.py
│   │   │   │   ├── test_global_higher_derivative_loss.py
│   │   │   │   ├── test_global_higher_derivative_matmul.py
│   │   │   │   ├── test_global_higher_derivative_neg.py
│   │   │   │   ├── test_global_higher_derivative_pool.py
│   │   │   │   ├── test_global_higher_derivative_pow.py
│   │   │   │   ├── test_global_higher_derivative_scalar_pow.py
│   │   │   │   ├── test_global_higher_derivative_slice.py
│   │   │   │   ├── test_global_higher_derivative_softmax.py
│   │   │   │   ├── test_global_inv.py
│   │   │   │   ├── test_global_lerp.py
│   │   │   │   ├── test_global_linalg_cross.py
│   │   │   │   ├── test_global_linear.py
│   │   │   │   ├── test_global_linspace.py
│   │   │   │   ├── test_global_logspace.py
│   │   │   │   ├── test_global_lstm_cell.py
│   │   │   │   ├── test_global_masked_fill.py
│   │   │   │   ├── test_global_masked_select.py
│   │   │   │   ├── test_global_math_op_higher_derivative.py
│   │   │   │   ├── test_global_math_ops.py
│   │   │   │   ├── test_global_matmul.py
│   │   │   │   ├── test_global_max.py
│   │   │   │   ├── test_global_maximum_minimum.py
│   │   │   │   ├── test_global_maxpool.py
│   │   │   │   ├── test_global_maxunpool.py
│   │   │   │   ├── test_global_mean.py
│   │   │   │   ├── test_global_median.py
│   │   │   │   ├── test_global_meshgrid.py
│   │   │   │   ├── test_global_min.py
│   │   │   │   ├── test_global_min_max_observer.py
│   │   │   │   ├── test_global_movedim.py
│   │   │   │   ├── test_global_moving_average_max_min_observer.py
│   │   │   │   ├── test_global_mul.py
│   │   │   │   ├── test_global_mv.py
│   │   │   │   ├── test_global_nansum.py
│   │   │   │   ├── test_global_narrow.py
│   │   │   │   ├── test_global_ne.py
│   │   │   │   ├── test_global_negative.py
│   │   │   │   ├── test_global_nms.py
│   │   │   │   ├── test_global_normal.py
│   │   │   │   ├── test_global_normalize.py
│   │   │   │   ├── test_global_nozero.py
│   │   │   │   ├── test_global_ones_like.py
│   │   │   │   ├── test_global_pad.py
│   │   │   │   ├── test_global_partical_fc.py
│   │   │   │   ├── test_global_permute.py
│   │   │   │   ├── test_global_rand.py
│   │   │   │   ├── test_global_randint.py
│   │   │   │   ├── test_global_randint_like.py
│   │   │   │   ├── test_global_randn.py
│   │   │   │   ├── test_global_random_op_data.py
│   │   │   │   ├── test_global_randperm.py
│   │   │   │   ├── test_global_reciprocal.py
│   │   │   │   ├── test_global_reflection_pad2d.py
│   │   │   │   ├── test_global_repeat.py
│   │   │   │   ├── test_global_replication_pad2d.py
│   │   │   │   ├── test_global_reshape.py
│   │   │   │   ├── test_global_rnn.py
│   │   │   │   ├── test_global_rnn_cell.py
│   │   │   │   ├── test_global_roi_align.py
│   │   │   │   ├── test_global_roll.py
│   │   │   │   ├── test_global_round.py
│   │   │   │   ├── test_global_scatter_nd.py
│   │   │   │   ├── test_global_scatter_ops.py
│   │   │   │   ├── test_global_searchsorted.py
│   │   │   │   ├── test_global_sign.py
│   │   │   │   ├── test_global_slice.py
│   │   │   │   ├── test_global_slice_update.py
│   │   │   │   ├── test_global_sort.py
│   │   │   │   ├── test_global_sparse.py
│   │   │   │   ├── test_global_sparse_softmax_cross_entropy.py
│   │   │   │   ├── test_global_split.py
│   │   │   │   ├── test_global_sqrt_square_sum.py
│   │   │   │   ├── test_global_squeeze.py
│   │   │   │   ├── test_global_stack.py
│   │   │   │   ├── test_global_stateful_kernel_with_cache.py
│   │   │   │   ├── test_global_std.py
│   │   │   │   ├── test_global_sub.py
│   │   │   │   ├── test_global_sum.py
│   │   │   │   ├── test_global_tensor_new.py
│   │   │   │   ├── test_global_tensor_ops.py
│   │   │   │   ├── test_global_tensor_scatter_nd_update.py
│   │   │   │   ├── test_global_tensordot.py
│   │   │   │   ├── test_global_tile.py
│   │   │   │   ├── test_global_transpose.py
│   │   │   │   ├── test_global_tril.py
│   │   │   │   ├── test_global_triu.py
│   │   │   │   ├── test_global_unbind.py
│   │   │   │   ├── test_global_unfold.py
│   │   │   │   ├── test_global_unfold_tensor.py
│   │   │   │   ├── test_global_unique.py
│   │   │   │   ├── test_global_unsqueeze.py
│   │   │   │   ├── test_global_upsample.py
│   │   │   │   ├── test_global_var.py
│   │   │   │   ├── test_global_vector_matrix_product.py
│   │   │   │   ├── test_global_view.py
│   │   │   │   ├── test_global_weight_norm.py
│   │   │   │   ├── test_global_where.py
│   │   │   │   ├── test_global_zeropad2d.py
│   │   │   │   ├── test_global_zeros_like.py
│   │   │   │   ├── test_glu.py
│   │   │   │   ├── test_gpt_data_loader.py
│   │   │   │   ├── test_greater.py
│   │   │   │   ├── test_greater_equal.py
│   │   │   │   ├── test_grid_sample.py
│   │   │   │   ├── test_grouped_matmul_bias.py
│   │   │   │   ├── test_groupnorm.py
│   │   │   │   ├── test_groupwise_quantization.py
│   │   │   │   ├── test_gumbel_softmax.py
│   │   │   │   ├── test_hann_window.py
│   │   │   │   ├── test_higher_derivative_activation.py
│   │   │   │   ├── test_higher_derivative_conv.py
│   │   │   │   ├── test_higher_derivative_div.py
│   │   │   │   ├── test_higher_derivative_loss.py
│   │   │   │   ├── test_higher_derivative_matmul.py
│   │   │   │   ├── test_higher_derivative_neg.py
│   │   │   │   ├── test_higher_derivative_pool.py
│   │   │   │   ├── test_higher_derivative_pow.py
│   │   │   │   ├── test_higher_derivative_scalar_pow.py
│   │   │   │   ├── test_higher_derivative_slice.py
│   │   │   │   ├── test_higher_derivative_softmax.py
│   │   │   │   ├── test_host_memory_input.py
│   │   │   │   ├── test_hsplit.py
│   │   │   │   ├── test_hub.py
│   │   │   │   ├── test_image_batch_align.py
│   │   │   │   ├── test_image_decode.py
│   │   │   │   ├── test_image_flip.py
│   │   │   │   ├── test_image_normalize.py
│   │   │   │   ├── test_image_resize.py
│   │   │   │   ├── test_in_top_k.py
│   │   │   │   ├── test_index_add.py
│   │   │   │   ├── test_index_select.py
│   │   │   │   ├── test_info.py
│   │   │   │   ├── test_initializer.py
│   │   │   │   ├── test_instancenorm.py
│   │   │   │   ├── test_interpolate.py
│   │   │   │   ├── test_inv.py
│   │   │   │   ├── test_isclose.py
│   │   │   │   ├── test_jit_script_api.py
│   │   │   │   ├── test_layer_norm.py
│   │   │   │   ├── test_lerp.py
│   │   │   │   ├── test_less.py
│   │   │   │   ├── test_less_equal.py
│   │   │   │   ├── test_linalg_cross.py
│   │   │   │   ├── test_linear.py
│   │   │   │   ├── test_linspace.py
│   │   │   │   ├── test_log1p.py
│   │   │   │   ├── test_logaddexp.py
│   │   │   │   ├── test_logical_and.py
│   │   │   │   ├── test_logical_not.py
│   │   │   │   ├── test_logical_or.py
│   │   │   │   ├── test_logical_reduce.py
│   │   │   │   ├── test_logical_xor.py
│   │   │   │   ├── test_logspace.py
│   │   │   │   ├── test_logsumexp.py
│   │   │   │   ├── test_loss.py
│   │   │   │   ├── test_loss_global.py
│   │   │   │   ├── test_lr_scheduler.py
│   │   │   │   ├── test_masked_fill.py
│   │   │   │   ├── test_masked_select.py
│   │   │   │   ├── test_math_op_higher_derivative.py
│   │   │   │   ├── test_math_ops.py
│   │   │   │   ├── test_matmul.py
│   │   │   │   ├── test_max.py
│   │   │   │   ├── test_maxpool.py
│   │   │   │   ├── test_maxunpool.py
│   │   │   │   ├── test_mean.py
│   │   │   │   ├── test_median.py
│   │   │   │   ├── test_meshgrid.py
│   │   │   │   ├── test_min.py
│   │   │   │   ├── test_min_max_observer.py
│   │   │   │   ├── test_mock.py
│   │   │   │   ├── test_mode.py
│   │   │   │   ├── test_module.py
│   │   │   │   ├── test_module_to.py
│   │   │   │   ├── test_module_to_global_or_local.py
│   │   │   │   ├── test_module_to_half.py
│   │   │   │   ├── test_movedim.py
│   │   │   │   ├── test_moving_average_min_max_observer.py
│   │   │   │   ├── test_mul.py
│   │   │   │   ├── test_multi_tensor_yolov5_weight_update.py
│   │   │   │   ├── test_multinomial.py
│   │   │   │   ├── test_nansum.py
│   │   │   │   ├── test_narrow.py
│   │   │   │   ├── test_ne.py
│   │   │   │   ├── test_negative.py
│   │   │   │   ├── test_nll_loss.py
│   │   │   │   ├── test_nms.py
│   │   │   │   ├── test_noncontiguous_binary_op.py
│   │   │   │   ├── test_nonzero.py
│   │   │   │   ├── test_norm.py
│   │   │   │   ├── test_normalize.py
│   │   │   │   ├── test_ofrecord_reader.py
│   │   │   │   ├── test_one_embedding_adagrad.py
│   │   │   │   ├── test_one_embedding_adam.py
│   │   │   │   ├── test_one_embedding_ftrl.py
│   │   │   │   ├── test_one_embedding_sgd.py
│   │   │   │   ├── test_one_hot.py
│   │   │   │   ├── test_ones_like.py
│   │   │   │   ├── test_optim_adadelta.py
│   │   │   │   ├── test_optim_adagrad.py
│   │   │   │   ├── test_optim_adam.py
│   │   │   │   ├── test_optim_adamw.py
│   │   │   │   ├── test_optim_add_param_group.py
│   │   │   │   ├── test_optim_ftrl.py
│   │   │   │   ├── test_optim_lamb.py
│   │   │   │   ├── test_optim_lbfgs.py
│   │   │   │   ├── test_optim_rmsprop.py
│   │   │   │   ├── test_optim_sgd.py
│   │   │   │   ├── test_pairwise_distance.py
│   │   │   │   ├── test_param_group.py
│   │   │   │   ├── test_parameters_grouping.py
│   │   │   │   ├── test_parital_fc.py
│   │   │   │   ├── test_pixel_shuffle.py
│   │   │   │   ├── test_prelu.py
│   │   │   │   ├── test_prod.py
│   │   │   │   ├── test_pruning.py
│   │   │   │   ├── test_qat_conv_modules.py
│   │   │   │   ├── test_quantile.py
│   │   │   │   ├── test_quantization.py
│   │   │   │   ├── test_quick_gelu.py
│   │   │   │   ├── test_rand.py
│   │   │   │   ├── test_randint.py
│   │   │   │   ├── test_randint_like.py
│   │   │   │   ├── test_randn.py
│   │   │   │   ├── test_randn_like.py
│   │   │   │   ├── test_random_generator_and_seed.py
│   │   │   │   ├── test_randperm.py
│   │   │   │   ├── test_reciprocal.py
│   │   │   │   ├── test_reduce.py
│   │   │   │   ├── test_reduce_sum_like.py
│   │   │   │   ├── test_reflection_pad.py
│   │   │   │   ├── test_repeat.py
│   │   │   │   ├── test_repeat_interleave.py
│   │   │   │   ├── test_replication_pad.py
│   │   │   │   ├── test_reshape.py
│   │   │   │   ├── test_reshape_sbp.py
│   │   │   │   ├── test_resnet_load_torch_weight_compatibile.py
│   │   │   │   ├── test_rmsnorm.py
│   │   │   │   ├── test_roc_auc_score.py
│   │   │   │   ├── test_roi_align.py
│   │   │   │   ├── test_roll.py
│   │   │   │   ├── test_round.py
│   │   │   │   ├── test_rrelu.py
│   │   │   │   ├── test_save_load.py
│   │   │   │   ├── test_saved_tensor_hooks.py
│   │   │   │   ├── test_sbp_symbol.py
│   │   │   │   ├── test_scatter_nd.py
│   │   │   │   ├── test_scatter_ops.py
│   │   │   │   ├── test_searchsorted.py
│   │   │   │   ├── test_select.py
│   │   │   │   ├── test_shutting_down.py
│   │   │   │   ├── test_sign.py
│   │   │   │   ├── test_single_threaded_vm.py
│   │   │   │   ├── test_skip_layer_norm.py
│   │   │   │   ├── test_skip_rms_norm.py
│   │   │   │   ├── test_slice.py
│   │   │   │   ├── test_softmax.py
│   │   │   │   ├── test_softplus.py
│   │   │   │   ├── test_sort.py
│   │   │   │   ├── test_sparse.py
│   │   │   │   ├── test_sparse_softmax_cross_entropy.py
│   │   │   │   ├── test_special_ops.py
│   │   │   │   ├── test_split.py
│   │   │   │   ├── test_square_relu.py
│   │   │   │   ├── test_squeeze.py
│   │   │   │   ├── test_stack.py
│   │   │   │   ├── test_stateful_kernel_with_cache.py
│   │   │   │   ├── test_stateful_local_opkernel.py
│   │   │   │   ├── test_std.py
│   │   │   │   ├── test_stft.py
│   │   │   │   ├── test_sub.py
│   │   │   │   ├── test_sum.py
│   │   │   │   ├── test_swapaxes.py
│   │   │   │   ├── test_swapdims.py
│   │   │   │   ├── test_swautils.py
│   │   │   │   ├── test_sync_and_async_allreduce.py
│   │   │   │   ├── test_sync_batchnorm.py
│   │   │   │   ├── test_t.py
│   │   │   │   ├── test_t5_layernorm.py
│   │   │   │   ├── test_tensor_buffer.py
│   │   │   │   ├── test_tensor_ops.py
│   │   │   │   ├── test_tensor_scatter_nd_update.py
│   │   │   │   ├── test_tensor_split.py
│   │   │   │   ├── test_tensor_to.py
│   │   │   │   ├── test_tensordot.py
│   │   │   │   ├── test_tile.py
│   │   │   │   ├── test_to_torch.py
│   │   │   │   ├── test_topk.py
│   │   │   │   ├── test_transpose.py
│   │   │   │   ├── test_tril.py
│   │   │   │   ├── test_triu.py
│   │   │   │   ├── test_trunc.py
│   │   │   │   ├── test_trunc_divide.py
│   │   │   │   ├── test_type_tensor.py
│   │   │   │   ├── test_unbind.py
│   │   │   │   ├── test_unfold.py
│   │   │   │   ├── test_unfold_tensor.py
│   │   │   │   ├── test_unique.py
│   │   │   │   ├── test_unsqueeze.py
│   │   │   │   ├── test_upsample.py
│   │   │   │   ├── test_util_ops.py
│   │   │   │   ├── test_utils.py
│   │   │   │   ├── test_var.py
│   │   │   │   ├── test_view.py
│   │   │   │   ├── test_vsplit.py
│   │   │   │   ├── test_weight_norm.py
│   │   │   │   ├── test_where.py
│   │   │   │   └── test_zeropad2d.py
│   │   │   ├── profiler/
│   │   │   │   ├── test_events.py
│   │   │   │   └── test_profile_lenet.py
│   │   │   └── tensor/
│   │   │       ├── test_autocast.py
│   │   │       ├── test_bfloat16_activation.py
│   │   │       ├── test_complex.py
│   │   │       ├── test_data_ptr.py
│   │   │       ├── test_global_tensor.py
│   │   │       ├── test_global_tensor_and_ndarray_compatibility.py
│   │   │       ├── test_global_tensor_indexing.py
│   │   │       ├── test_lazy_tensor_indexing.py
│   │   │       ├── test_meta_tensor.py
│   │   │       ├── test_new_tensor.py
│   │   │       ├── test_parameter.py
│   │   │       ├── test_safetensors.py
│   │   │       ├── test_tensor_and_ndarray_compatibility.py
│   │   │       ├── test_tensor_exponential.py
│   │   │       ├── test_tensor_indexing.py
│   │   │       ├── test_tensor_indexing2.py
│   │   │       ├── test_tensor_is_view.py
│   │   │       ├── test_tensor_part_1.py
│   │   │       ├── test_tensor_part_2.py
│   │   │       ├── test_tensor_part_3.py
│   │   │       ├── test_tensor_pin_memory.py
│   │   │       └── test_tensor_to_memory_format.py
│   │   ├── test_utils/
│   │   │   ├── __init__.py
│   │   │   ├── automated_test_util/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── generators.py
│   │   │   │   ├── global_scope.py
│   │   │   │   ├── profiler.py
│   │   │   │   ├── torch_flow_dual_object.py
│   │   │   │   └── util.py
│   │   │   ├── oneflow_pytorch_compatibility/
│   │   │   │   ├── __init__.py
│   │   │   │   └── oneflow_pytorch_compatiblity_test.py
│   │   │   ├── test_util.py
│   │   │   └── throttle.py
│   │   ├── unittest/
│   │   │   ├── __init__.py
│   │   │   ├── dataset.py
│   │   │   ├── env.py
│   │   │   └── mlir.py
│   │   └── utils/
│   │       ├── __init__.py
│   │       ├── checkpoint.py
│   │       ├── data/
│   │       │   ├── __init__.py
│   │       │   ├── _utils/
│   │       │   │   ├── __init__.py
│   │       │   │   ├── collate.py
│   │       │   │   ├── fetch.py
│   │       │   │   ├── pin_memory.py
│   │       │   │   ├── signal_handling.py
│   │       │   │   └── worker.py
│   │       │   ├── dataloader.py
│   │       │   ├── dataset.py
│   │       │   ├── decorator.py
│   │       │   ├── distributed.py
│   │       │   └── sampler.py
│   │       ├── global_view/
│   │       │   ├── __init__.py
│   │       │   ├── global_mode.py
│   │       │   ├── global_utils.py
│   │       │   ├── to_global.py
│   │       │   └── to_local.py
│   │       ├── hooks.py
│   │       ├── insight/
│   │       │   ├── README.md
│   │       │   ├── requirements.txt
│   │       │   └── sqlite_to_google_trace_event.py
│   │       ├── model_zoo.py
│   │       └── tensor/
│   │           ├── __init__.py
│   │           └── from_or_to_torch_tensor.py
│   └── setup.py
└── tools/
    ├── check_src.py
    ├── clean_generated_api.py
    ├── create_pip_index.py
    ├── flags_from_git_diff.py
    ├── functional/
    │   ├── generate_dispatch_stateful_ops.py
    │   ├── generate_functional_api.py
    │   ├── generate_tensor_api.py
    │   └── generator.py
    ├── generate_header_list.py
    ├── generate_pip_version.py
    ├── oneflow-tblgen/
    │   ├── CMakeLists.txt
    │   ├── backends.h
    │   ├── example/
    │   │   └── constant.td
    │   ├── op_schema_emitter.cpp
    │   ├── op_schema_header.inc
    │   ├── op_schema_source.inc
    │   ├── op_schema_types.inc
    │   └── tablegen.cpp
    ├── oss_file_exist.py
    └── package_mirror.py

Download .txt

Showing preview only (2,441K chars total). Download the full file or copy to clipboard to get everything.

SYMBOL INDEX (25463 symbols across 3642 files)

FILE: .github/scripts/set_initial_variables.py
  function create_one (line 4) | def create_one(name=None, allow_fail=None):
  function create_conda (line 15) | def create_conda(name=None):
  function print_github_action_output (line 19) | def print_github_action_output(name=None, value=None):
  function print_result (line 23) | def print_result(build_matrix=None, test_matrix=None, out=None):
  function check_include (line 42) | def check_include(include_key=None, matrix: dict = None):

FILE: ci/build/ensure_img.py
  function check_and_download (line 9) | def check_and_download(tag, url):

FILE: ci/check/lintutils.py
  function chunk (line 24) | def chunk(seq, n):
  function dechunk (line 41) | def dechunk(chunks):
  function run_parallel (line 49) | def run_parallel(cmds, **kwargs):
  function get_sources (line 75) | def get_sources(source_dir, exclude_globs=[]):
  function stdout_pathcolonline (line 93) | def stdout_pathcolonline(completed_process, filenames):

FILE: ci/check/run_clang_format.py
  function split_and_print (line 28) | def split_and_print(prefix, text):
  function handle_stream (line 37) | async def handle_stream(stream, cb):
  function run_command (line 46) | async def run_command(cmd=None, dry=False, name=None):
  function chunks (line 62) | def chunks(lst, n):
  function check_version (line 68) | def check_version(bin):
  function download (line 77) | def download(dry=False):

FILE: ci/check/run_clang_tidy.py
  function split_and_print (line 27) | def split_and_print(prefix, text):
  function handle_stream (line 36) | async def handle_stream(stream, cb):
  function run_command (line 45) | async def run_command(cmd=None, dry=False, name=None):
  function download (line 60) | def download(build_dir, dry=False) -> Optional[List[str]]:

FILE: ci/check/run_cmake_format.py
  function gen_cmd (line 45) | def gen_cmd(file):

FILE: ci/check/run_license_format.py
  function get_txt (line 25) | def get_txt(path: str):
  function check_file (line 34) | def check_file(path):
  function format_file (line 52) | def format_file(path):
  function do_check (line 68) | def do_check(x):
  function do_format (line 73) | def do_format(x):
  function glob_files (line 77) | def glob_files(path: str = None, excludes=None):

FILE: ci/test/distributed_run.py
  function is_img_existing (line 26) | def is_img_existing(tag):
  function get_affiliations (line 40) | def get_affiliations(host):
  function resolve_hostname_hardcoded (line 48) | def resolve_hostname_hardcoded(host: str):
  function find_free_port (line 56) | def find_free_port():
  function spawn_shell (line 63) | async def spawn_shell(cmd: str = None):
  function spawn_shell_ignoring_failure (line 69) | async def spawn_shell_ignoring_failure(cmd: str = None):
  function build_docker_img (line 74) | async def build_docker_img(remote_host=None, workspace_dir=None):
  function create_remote_workspace_dir (line 92) | async def create_remote_workspace_dir(
  function get_docker_cache_args (line 108) | def get_docker_cache_args():
  function launch_remote_container (line 117) | async def launch_remote_container(
  function handle_cast (line 165) | def handle_cast(conn=None, cmd=None):
  function handle_call (line 173) | def handle_call(conn=None, cmd=None, response=None):
  class DockerAgent (line 183) | class DockerAgent:
    method __init__ (line 184) | def __init__(
    method __enter__ (line 219) | def __enter__(self):
    method run_bash_script_async (line 222) | def run_bash_script_async(self, bash_script=None, cmd=None):
    method __exit__ (line 268) | def __exit__(self, exc_type, exc_val, exc_tb):
  function fix_and_sync_libs (line 272) | async def fix_and_sync_libs(oneflow_internal_path=None, remote_hosts=None):
  function remove_containers_by_name (line 342) | async def remove_containers_by_name(remote_hosts=None, container_name=No...
  function get_remote_hosts (line 355) | def get_remote_hosts(args):
  function exit_handler (line 555) | def exit_handler():

FILE: ci/test/multi_launch.py
  function parse_args (line 38) | def parse_args():
  function run_and_capture (line 103) | async def run_and_capture(cmd=None, prefix=None, **kwargs):
  function launch_multiple (line 120) | async def launch_multiple(
  function main (line 142) | def main():

FILE: ci/test/parallel_run.py
  function gen_cmds (line 13) | def gen_cmds(cmd=None, dir=None, doctest=False):
  function find_free_port (line 36) | def find_free_port():
  function split_and_print (line 43) | def split_and_print(prefix, text):
  function everyN (line 51) | def everyN(l: list, n: int):
  function contains_oom_info (line 56) | def contains_oom_info(txt: str):
  function should_retry (line 60) | def should_retry(txt: str):
  function print_out (line 64) | def print_out(prefix: str = "", content: str = ""):
  function spawn_shell_and_check (line 69) | async def spawn_shell_and_check(cmd: str = None, gpu_id: int = -1, check...
  function run_cmds (line 92) | async def run_cmds(

FILE: docker/package/manylinux/build_wheel.py
  function get_arg_env (line 9) | def get_arg_env(env_var_name: str, mode="run"):
  function get_proxy_build_args (line 20) | def get_proxy_build_args():
  function get_proxy_env_args (line 31) | def get_proxy_env_args():
  function build_img (line 42) | def build_img(
  function common_cmake_args (line 75) | def common_cmake_args(cache_dir=None, extra_oneflow_cmake_args=None):
  function get_build_dir_arg (line 90) | def get_build_dir_arg(cache_dir, oneflow_src_dir):
  function force_rm_dir (line 97) | def force_rm_dir(dir_to_clean):
  function create_tmp_bash_and_run (line 104) | def create_tmp_bash_and_run(docker_cmd, img, bash_cmd, bash_args, bash_w...
  function get_common_docker_args (line 136) | def get_common_docker_args(
  function get_python_dir (line 168) | def get_python_dir(inplace=True, oneflow_src_dir=None, cache_dir=None):
  function build_third_party (line 177) | def build_third_party(
  function get_python_bin (line 243) | def get_python_bin(version):
  function build_oneflow (line 254) | def build_oneflow(
  function is_img_existing (line 350) | def is_img_existing(tag):
  function build (line 464) | def build():

FILE: docs/source/conf.py
  function should_skip_member (line 203) | def should_skip_member(app, what, name, obj, skip, options):
  function setup (line 213) | def setup(app):

FILE: oneflow/api/common/ir_pass.cpp
  type oneflow (line 23) | namespace oneflow {

FILE: oneflow/api/common/job_build_and_infer_ctx.h
  function namespace (line 23) | namespace oneflow {

FILE: oneflow/api/common/sbp.h
  function namespace (line 25) | namespace oneflow {

FILE: oneflow/api/common/variable_tensor_mgr.h
  function namespace (line 23) | namespace oneflow {

FILE: oneflow/api/cpp/embedding/embedding.cpp
  type oneflow_api (line 19) | namespace oneflow_api {
    type embedding (line 20) | namespace embedding {
      function CreateKeyValueStore (line 22) | std::string CreateKeyValueStore(const std::string& key_value_store_o...
      function LoadSnapshot (line 35) | void LoadSnapshot(const std::string& snapshot_name, const std::strin...

FILE: oneflow/api/cpp/embedding/embedding.h
  function namespace (line 21) | namespace oneflow_api {

FILE: oneflow/api/cpp/env.cpp
  type oneflow_api (line 23) | namespace oneflow_api {
    function initialize (line 24) | void initialize() {
    function release (line 29) | void release() {

FILE: oneflow/api/cpp/env.h
  function namespace (line 19) | namespace oneflow_api {

FILE: oneflow/api/cpp/env_impl.cpp
  type oneflow_api (line 39) | namespace oneflow_api {
    function IsEnvInited (line 45) | inline bool IsEnvInited() { return of::Singleton<of::EnvGlobalObjectsS...
    function HasEnvVar (line 47) | bool HasEnvVar(const std::string& key) {
    function GetEnvVar (line 52) | std::string GetEnvVar(const std::string& key, const std::string& defau...
    function GetEnvVar (line 58) | int64_t GetEnvVar(const std::string& key, int64_t default_value) {
    function FindFreePort (line 64) | int32_t FindFreePort(const std::string& addr) {
    function CompleteEnvProto (line 93) | void CompleteEnvProto(of::EnvProto& env_proto) {

FILE: oneflow/api/cpp/env_impl.h
  function namespace (line 23) | namespace oneflow_api {

FILE: oneflow/api/cpp/framework/device.cpp
  type oneflow_api (line 22) | namespace oneflow_api {

FILE: oneflow/api/cpp/framework/device.h
  function namespace (line 22) | namespace oneflow {
  function namespace (line 31) | namespace oneflow_api {

FILE: oneflow/api/cpp/framework/dtype.cpp
  type oneflow_api (line 20) | namespace oneflow_api {
    function GetDTypeSize (line 32) | int32_t GetDTypeSize(DType dtype) { return DTypeSize[dtype]; }

FILE: oneflow/api/cpp/framework/dtype.h
  function DType (line 23) | enum class DType {

FILE: oneflow/api/cpp/framework/graph.cpp
  type oneflow_api (line 64) | namespace oneflow_api {
    class CompileScope (line 70) | class CompileScope {
      method CompileScope (line 72) | CompileScope(const of::JobConfigProto& job_config, const of::Device&...
    function ConvertToTensorTuple (line 90) | std::shared_ptr<of::one::TensorTuple> ConvertToTensorTuple(
    function GetDeviceTag (line 97) | std::string GetDeviceTag(const Device& device) { return device.type(); }
    function Unzip (line 100) | const std::pair<std::vector<T1>, std::vector<T2>> Unzip(const of::Hash...
    function Shape (line 110) | Shape OfShapeToOfApiShape(const of::Shape& of_shape) {
    function LoadOneEmbedding (line 117) | void LoadOneEmbedding(const std::string& model_path, const Device& dev...
    class Graph::GraphImpl (line 141) | class Graph::GraphImpl final {
      method GraphImpl (line 145) | GraphImpl(const GraphImpl& graph) = delete;
      method GraphImpl (line 146) | GraphImpl(GraphImpl&& graph) = default;
      method GraphImpl (line 150) | GraphImpl& operator=(const GraphImpl& graph) = delete;
      method GraphImpl (line 151) | GraphImpl& operator=(GraphImpl&& graph) = default;
      method set_batch_size (line 156) | void set_batch_size(int batch_size) { batch_size_ = batch_size; }
    function Graph (line 194) | Graph& Graph::operator=(Graph&& graph) noexcept {
    function InputOutputInfos (line 200) | InputOutputInfos Graph::GetInputInfos() { return graph_->GetInputInfos...
    function InputOutputInfos (line 202) | InputOutputInfos Graph::GetOutputInfos() { return graph_->GetOutputInf...
    function IValue (line 208) | IValue Graph::Forward(const IValue& inputs) {
    function Graph (line 232) | Graph Graph::Load(const std::string& model_path, const Device& device) {
    function InputOutputInfos (line 249) | InputOutputInfos Graph::GraphImpl::GetInputInfos() { return input_info...
    function InputOutputInfos (line 251) | InputOutputInfos Graph::GraphImpl::GetOutputInfos() { return output_in...

FILE: oneflow/api/cpp/framework/graph.h
  function namespace (line 30) | namespace oneflow {
  function namespace (line 36) | namespace oneflow_api {

FILE: oneflow/api/cpp/framework/ivalue.cpp
  type oneflow_api (line 19) | namespace oneflow_api {
    function Tensor (line 43) | const Tensor& IValue::ToTensor() const {

FILE: oneflow/api/cpp/framework/ivalue.h
  function namespace (line 24) | namespace oneflow_api {

FILE: oneflow/api/cpp/framework/shape.cpp
  type oneflow_api (line 20) | namespace oneflow_api {
    function ToOneflowDimVcetor (line 25) | of::DimVector ToOneflowDimVcetor(const std::vector<int64_t>& dim_vec) {
    function Shape (line 39) | Shape& Shape::operator=(const Shape& shape) {

FILE: oneflow/api/cpp/framework/shape.h
  function namespace (line 22) | namespace oneflow {
  function namespace (line 28) | namespace oneflow_api {

FILE: oneflow/api/cpp/framework/tensor.cpp
  type oneflow_api (line 29) | namespace oneflow_api {
    function Tensor (line 46) | Tensor& Tensor::operator=(const Tensor& tensor) {
    function Tensor (line 51) | Tensor& Tensor::operator=(Tensor&& tensor) noexcept {
    function Shape (line 57) | Shape Tensor::shape() const {
    function Device (line 62) | Device Tensor::device() const {
    function DType (line 67) | DType Tensor::dtype() const { return static_cast<DType>(tensor_->dtype...
    function Tensor (line 84) | Tensor Tensor::from_buffer(const void* buffer, const Shape& shape, con...

FILE: oneflow/api/cpp/framework/tensor.h
  function namespace (line 24) | namespace oneflow {
  function namespace (line 33) | namespace oneflow_api {

FILE: oneflow/api/cpp/nn/functional/activation.cpp
  type oneflow_api (line 19) | namespace oneflow_api {
    type nn (line 20) | namespace nn {
      function Tensor (line 25) | Tensor relu(const Tensor& tensor) {

FILE: oneflow/api/cpp/nn/functional/activation.h
  function namespace (line 21) | namespace oneflow_api {

FILE: oneflow/api/cpp/tests/api_test.cpp
  type oneflow_api (line 31) | namespace oneflow_api {
    function Shape (line 33) | Shape RandomShape() {
    function RandomData (line 42) | std::vector<T> RandomData(size_t size) {
    function GetExeDir (line 57) | std::string GetExeDir() {

FILE: oneflow/api/cpp/tests/api_test.h
  function namespace (line 22) | namespace oneflow_api {

FILE: oneflow/api/cpp/tests/graph_test.cpp
  type oneflow_api (line 31) | namespace oneflow_api {
    function Graph (line 35) | inline Graph LoadGraph(const Device& device) {
    function Forward (line 41) | inline void Forward(Graph& graph, const Device& device, int expected_b...
    function TEST (line 60) | TEST(Api, graph_cpu_test) {
    function TEST (line 68) | TEST(Api, graph_gpu_test) {
    function TEST (line 75) | TEST(Api, graph_multi_gpu_test) {
    function TEST (line 87) | TEST(Api, graph_cpu_batching_test) {
    function TEST (line 96) | TEST(Api, graph_gpu_batching_test) {
    function TEST (line 104) | TEST(Api, graph_multi_device_test) {
    function TEST (line 119) | TEST(Api, graph_unload_test) {
    function TEST (line 148) | TEST(Api, graph_thread_test) {
    function TEST (line 162) | TEST(Api, graph_input_order_test) {
    function TEST (line 191) | TEST(Api, graph_input_output_infos_test) {

FILE: oneflow/api/cpp/tests/ivalue_test.cpp
  type oneflow_api (line 23) | namespace oneflow_api {
    function TEST (line 31) | TEST(Api, ivalue) {
    function TEST (line 51) | TEST(Api, ivalue_tensor) {
    function TEST (line 66) | TEST(Api, ivalue_tensor_vector) {
    function TEST (line 85) | TEST(Api, ivalue_copy) {
    function TEST (line 107) | TEST(Api, ivalue_move) {

FILE: oneflow/api/cpp/tests/nn_test.cpp
  type oneflow_api (line 22) | namespace oneflow_api {
    function Relu (line 29) | std::vector<T> Relu(const std::vector<T>& data) {
    function TestRelu (line 40) | void TestRelu() {
    function TEST (line 54) | TEST(Api, nn_relu) {
    function TEST (line 60) | TEST(Api, nn_relu_multithreading) {

FILE: oneflow/api/cpp/tests/one_embedding_test.cpp
  type oneflow_api (line 19) | namespace oneflow_api {
    function TEST (line 22) | TEST(Api, embedding_test) {

FILE: oneflow/api/cpp/tests/tensor_test.cpp
  type oneflow_api (line 20) | namespace oneflow_api {
    function TEST (line 22) | TEST(Api, device) {
    function TEST (line 39) | TEST(Api, tensor) {
    function TEST (line 58) | TEST(Api, tensor_from_buffer_and_copy_to) {
    function TEST (line 78) | TEST(Api, tensor_zeros) {

FILE: oneflow/api/python/autograd/autograd.cpp
  type oneflow (line 36) | namespace oneflow {
    type autograd (line 37) | namespace autograd {
      function IsScalarTensor (line 41) | bool IsScalarTensor(const one::Tensor& tensor) {
      function CheckAndInitOutGrads (line 50) | Maybe<one::TensorTuple> CheckAndInitOutGrads(const one::TensorTuple&...
      function Backward (line 105) | Maybe<one::TensorTuple> Backward(const one::TensorTuple& outputs, co...
      function Grad (line 117) | Maybe<one::TensorTuple> Grad(const one::TensorTuple& outputs, const ...
      class PySavedTensorHook (line 136) | class PySavedTensorHook final : public one::SavedTensorHook {
        method PySavedTensorHook (line 138) | PySavedTensorHook(const py::function& pack_hook, const py::functio...
        method pack (line 141) | void pack(const std::shared_ptr<one::Tensor>& tensor) {
        method unpack (line 146) | std::shared_ptr<one::Tensor> unpack() {
      class PySavedTensorHookCreator (line 166) | class PySavedTensorHookCreator final : public one::SavedTensorHookCr...
        method new_saved_tensor_hook (line 168) | std::unique_ptr<one::SavedTensorHook> new_saved_tensor_hook() cons...
        method append_new_hooks (line 172) | void append_new_hooks(const py::function& pack_hook, const py::fun...
        method pop_hooks (line 175) | void pop_hooks() {

FILE: oneflow/api/python/autograd/autograd_engine.cpp
  type oneflow (line 25) | namespace oneflow {

FILE: oneflow/api/python/autograd/autograd_function.cpp
  type oneflow (line 30) | namespace oneflow {
    function UnpackTensorTuple (line 35) | Maybe<one::TensorTuple> UnpackTensorTuple(const py::object& input) {
    function PackTensorTuple (line 62) | py::object PackTensorTuple(const one::TensorTuple& tp) {
    function PackPyFunctionToFType (line 73) | one::AutogradFunctionBase::FType PackPyFunctionToFType(const py::funct...
    type one (line 84) | namespace one {

FILE: oneflow/api/python/autograd/autograd_function_state.cpp
  type oneflow (line 24) | namespace oneflow {
    type one (line 25) | namespace one {
      function FunctionAutoGradCaptureState (line 27) | inline FunctionAutoGradCaptureState* CheckAndGetStateData(PyAutograd...
      function PyObject (line 45) | static PyObject* PyAutogradFunctionState_new(PyTypeObject* type, PyO...
      function PyAutogradFunctionState_dealloc (line 57) | static void PyAutogradFunctionState_dealloc(PyAutogradFunctionState*...
      function PyObject (line 63) | static PyObject* PyAutogradFunctionState_save_for_backward(PyObject*...
      function PyObject (line 78) | static PyObject* PyAutogradFunctionState_mark_non_differentiable(PyO...
      function PyObject (line 93) | static PyObject* PyAutogradFunctionState_is_data_valid(PyObject* sel...
      function PyObject (line 109) | static PyObject* PyAutogradFunctionState_saved_tensors(PyObject* sel...
      function PyObject (line 115) | static PyObject* PyAutogradFunctionState_get_dict(PyObject* self, Py...
      function PyObject (line 131) | PyObject* PyAutogradFunctionState_getattro(PyObject* self, PyObject*...
      function PyAutogradFunctionState_setattro (line 144) | int PyAutogradFunctionState_setattro(PyObject* self, PyObject* attr,...
      function PyObject (line 190) | PyObject* PyAutogradFunctionState_NewFromPtr(

FILE: oneflow/api/python/autograd/autograd_function_state.h
  function namespace (line 25) | namespace oneflow {

FILE: oneflow/api/python/autograd/autograd_mode.cpp
  type oneflow (line 24) | namespace oneflow {
    type autograd (line 26) | namespace autograd {

FILE: oneflow/api/python/autograd/function_node.cpp
  type oneflow (line 26) | namespace oneflow {
    type FunctionNodeUtil (line 30) | struct FunctionNodeUtil final {
      method ToString (line 31) | static std::string ToString(const one::FunctionNode& func_node) {

FILE: oneflow/api/python/caster/autograd_function_state.h
  function namespace (line 26) | namespace pybind11 {

FILE: oneflow/api/python/caster/common.h
  function namespace (line 22) | namespace pybind11 {

FILE: oneflow/api/python/caster/maybe.h
  function namespace (line 23) | namespace pybind11 {

FILE: oneflow/api/python/caster/optional.h
  function namespace (line 24) | namespace pybind11 {

FILE: oneflow/api/python/caster/size.h
  function shape_type_caster (line 48) | PYBIND11_NAMESPACE_BEGIN(detail)
  function handle (line 67) | handle cast(U* src, return_value_policy policy, handle parent) {
  function operator (line 72) | operator T*() { return value_.get(); }

FILE: oneflow/api/python/caster/tensor.h
  function namespace (line 24) | namespace pybind11 {

FILE: oneflow/api/python/caster/test.cpp
  type oneflow (line 21) | namespace oneflow {
    class A (line 23) | class A {
      method inc_x (line 25) | void inc_x() { x++; }
      method get_x (line 26) | int get_x() { return x; }
    function get_singleton_a (line 32) | std::shared_ptr<A> get_singleton_a() {

FILE: oneflow/api/python/deprecated.cpp
  type oneflow (line 24) | namespace oneflow {

FILE: oneflow/api/python/dlpack/converter.cpp
  type oneflow (line 26) | namespace oneflow {
    function ToOneFlowDevice (line 28) | Maybe<Symbol<Device>> ToOneFlowDevice(const DLDevice& ctx) {
    function ToOneFlowDataType (line 38) | Maybe<DataType> ToOneFlowDataType(const DLDataType& dtype) {
    function fromDLPack (line 83) | Maybe<one::Tensor> fromDLPack(const DLManagedTensor* src) {
    function ToDLDevice (line 132) | Maybe<DLDevice> ToDLDevice(Symbol<Device> ofdevice) {
    function ToDLDataType (line 145) | Maybe<DLDataType> ToDLDataType(DataType ofdtype) {
    type ATenDLMTensor (line 165) | struct ATenDLMTensor {
    function deleter (line 170) | void deleter(DLManagedTensor* arg) { delete static_cast<ATenDLMTensor*...
    function toDLPack (line 172) | Maybe<DLManagedTensor*> toDLPack(const std::shared_ptr<one::Tensor>& s...
    function DLPack_Capsule_Destructor (line 209) | void DLPack_Capsule_Destructor(PyObject* data) {

FILE: oneflow/api/python/dlpack/converter.h
  function namespace (line 19) | namespace oneflow {

FILE: oneflow/api/python/dlpack/dlpack.h
  type DLDeviceType (line 62) | typedef enum {
  type DLDevice (line 112) | typedef struct {
  type DLDataTypeCode (line 125) | typedef enum {
  type DLDataType (line 157) | typedef struct {
  type DLTensor (line 175) | typedef struct {
  type DLManagedTensor (line 227) | typedef struct DLManagedTensor {

FILE: oneflow/api/python/env/env.cpp
  type oneflow (line 34) | namespace oneflow {
    function RegisterCudaDeviceProperties (line 38) | void RegisterCudaDeviceProperties(py::module& m) {
    function SwitchToShuttingDownPhase (line 60) | Maybe<void> SwitchToShuttingDownPhase(EnvGlobalObjectsScope* env, bool...

FILE: oneflow/api/python/env/env.h
  function namespace (line 34) | namespace oneflow {

FILE: oneflow/api/python/ep/cuda_matmul_mode.cpp
  type oneflow (line 24) | namespace oneflow {
    type ep (line 26) | namespace ep {

FILE: oneflow/api/python/exception/exception.cpp
  type oneflow (line 23) | namespace oneflow {

FILE: oneflow/api/python/flags.cpp
  type oneflow (line 21) | namespace oneflow {

FILE: oneflow/api/python/framework/autocast.cpp
  type oneflow (line 24) | namespace oneflow {
    function is_nested_count_zero (line 31) | bool is_nested_count_zero() { return (*nested_count()) == 0; }
    function increase_nested_count (line 32) | void increase_nested_count() { (*nested_count())++; }
    function decrease_nested_count (line 33) | void decrease_nested_count() { (*nested_count())--; }
    class AutoCastMode (line 35) | class AutoCastMode {
      method AutoCastMode (line 39) | AutoCastMode(const std::string& device_type, Symbol<DType> dtype, bo...

FILE: oneflow/api/python/framework/device.cpp
  type oneflow (line 27) | namespace oneflow {

FILE: oneflow/api/python/framework/doc.cpp
  type oneflow (line 23) | namespace oneflow {
    function AddFunctionDoc (line 25) | py::object AddFunctionDoc(py::object f, const std::string& doc_string) {
    function ReplaceDoc (line 78) | py::object ReplaceDoc(py::object f, const std::string& doc_string) {

FILE: oneflow/api/python/framework/dtype.cpp
  type oneflow (line 25) | namespace oneflow {

FILE: oneflow/api/python/framework/framework.cpp
  type oneflow (line 25) | namespace oneflow {

FILE: oneflow/api/python/framework/framework.h
  function namespace (line 34) | namespace oneflow {

FILE: oneflow/api/python/framework/global_mode.cpp
  type oneflow (line 27) | namespace oneflow {

FILE: oneflow/api/python/framework/id_util.cpp
  type oneflow (line 22) | namespace oneflow {

FILE: oneflow/api/python/framework/instructions_builder.cpp
  type oneflow (line 27) | namespace oneflow {
    function DeprecatedPhysicalRun (line 31) | Maybe<void> DeprecatedPhysicalRun(const std::function<void(Instruction...

FILE: oneflow/api/python/framework/layout.cpp
  type oneflow (line 26) | namespace oneflow {

FILE: oneflow/api/python/framework/memory_format.cpp
  type oneflow (line 24) | namespace oneflow {
    function PyObject (line 26) | static PyObject* PyMemoryFormat_repr(PyMemoryFormatObject* self) {
    function PyMemoryFormat_Check (line 62) | bool PyMemoryFormat_Check(PyObject* self) { return self && self->ob_ty...
    function PyObject (line 64) | PyObject* PyMemoryFormat_New(MemoryFormat memory_format) {

FILE: oneflow/api/python/framework/memory_format.h
  function namespace (line 25) | namespace oneflow {

FILE: oneflow/api/python/framework/nn_graph.cpp
  type oneflow (line 33) | namespace oneflow {
    function APINNGraphAdditionalVarNames (line 35) | Maybe<py::object> APINNGraphAdditionalVarNames(const std::shared_ptr<N...
    function APINNGraphAdditionalVarTensors (line 40) | Maybe<py::object> APINNGraphAdditionalVarTensors(const std::shared_ptr...
    function APINNGraphGetCurrentSerializedJob (line 46) | Maybe<py::bytes> APINNGraphGetCurrentSerializedJob(const std::shared_p...

FILE: oneflow/api/python/framework/one_embedding.cpp
  type oneflow (line 28) | namespace oneflow {
    class OneEmbeddingHandler (line 30) | class OneEmbeddingHandler final {
      method OneEmbeddingHandler (line 32) | OneEmbeddingHandler(const std::string& key_value_store_option_string...
      method LoadSnapshot (line 40) | void LoadSnapshot(const std::string& snapshot_name) {
      method SaveSnapshot (line 49) | void SaveSnapshot(const std::string& snapshot_name) {
      method CreateKeyValueStore (line 59) | void CreateKeyValueStore(const embedding::KeyValueStoreOptions& key_...
    type embedding (line 74) | namespace embedding {
      class PersistentTableWriter (line 76) | class PersistentTableWriter {
        method PersistentTableWriter (line 79) | PersistentTableWriter() = default;
      class PersistentTableWriterImpl (line 87) | class PersistentTableWriterImpl : public PersistentTableWriter {
        method PersistentTableWriterImpl (line 90) | PersistentTableWriterImpl(const std::vector<std::string>& paths, c...
        method Write (line 107) | void Write(const py::array& keys, const py::array& values) override {
        method Close (line 136) | void Close() override { CloseImpl(); }
        method CloseImpl (line 139) | void CloseImpl() {
      function NewPersistentTableWriter (line 156) | std::shared_ptr<PersistentTableWriter> NewPersistentTableWriter(
      function NewPersistentTableWriter (line 168) | std::shared_ptr<PersistentTableWriter> NewPersistentTableWriter(
      class PersistentTableReader (line 194) | class PersistentTableReader {
        method PersistentTableReader (line 197) | PersistentTableReader() = default;
      class PersistentTableReaderImpl (line 205) | class PersistentTableReaderImpl : public PersistentTableReader {
        method PersistentTableReaderImpl (line 209) | PersistentTableReaderImpl(const std::vector<std::string>& paths, c...
        method Next (line 235) | std::tuple<py::object, py::object> Next() override {
        method Close (line 255) | void Close() override { CloseImpl(); }
        method CloseImpl (line 258) | void CloseImpl() {
      function NewPersistentTableReader (line 276) | std::shared_ptr<PersistentTableReader> NewPersistentTableReader(
      function NewPersistentTableReader (line 288) | std::shared_ptr<PersistentTableReader> NewPersistentTableReader(

FILE: oneflow/api/python/framework/op_builder.cpp
  type oneflow (line 27) | namespace oneflow {
    type one (line 29) | namespace one {

FILE: oneflow/api/python/framework/op_expr.cpp
  type oneflow (line 30) | namespace oneflow {
    function PybindExportOpExpr (line 36) | py::class_<OpT, one::BuiltinOpExpr, std::shared_ptr<OpT>> PybindExport...

FILE: oneflow/api/python/framework/parallel_conf_util.cpp
  type oneflow (line 23) | namespace oneflow {

FILE: oneflow/api/python/framework/random_generator.cpp
  type oneflow (line 27) | namespace oneflow {
    function CreateGenerator (line 29) | Maybe<one::Generator> CreateGenerator(const std::string& device_str) {
    function GetCudaDefaultGenerators (line 34) | py::tuple GetCudaDefaultGenerators() {

FILE: oneflow/api/python/framework/scope_util.cpp
  type oneflow (line 22) | namespace oneflow {

FILE: oneflow/api/python/framework/session_util.cpp
  type oneflow (line 21) | namespace oneflow {

FILE: oneflow/api/python/framework/shut_down_util.cpp
  type oneflow (line 22) | namespace oneflow {

FILE: oneflow/api/python/framework/size.cpp
  type oneflow (line 24) | namespace oneflow {
    function PyObject (line 28) | static PyObject* TensorSize_repr(TensorSize* self) {
    function PyObject (line 42) | static PyObject* TensorSize_new(PyTypeObject* type, PyObject* args, Py...
    function Py_ssize_t (line 57) | static Py_ssize_t TensorSize_length(TensorSize* self) {
    function PyObject (line 61) | static PyObject* TensorSize_concat(TensorSize* self, PyObject* other) {
    function PyObject (line 71) | static PyObject* TensorSize_repeat(TensorSize* self, Py_ssize_t n) {
    function PyObject (line 81) | static PyObject* TensorSize_item(TensorSize* self, Py_ssize_t i) {
    function TensorSize_contains (line 85) | static int TensorSize_contains(TensorSize* self, PyObject* el) {
    function PyObject (line 100) | static PyObject* TensorSize_subscript(TensorSize* self, PyObject* item) {
    function PyObject (line 116) | static PyObject* TensorSize_numel(PyObject* self, PyObject* args) {
    function TensorSize_Check (line 168) | int TensorSize_Check(PyObject* p) { return p && p->ob_type == &TensorS...
    function PyObject (line 170) | PyObject* TensorSize_New(Py_ssize_t len) { return TensorSize_Type.tp_a...
    function PyObject (line 172) | PyObject* TensorSize_NewFromShape(const Shape& size) {
    function Shape (line 182) | Shape TensorSize_AsShape(PyObject* self) {

FILE: oneflow/api/python/framework/size.h
  function namespace (line 24) | namespace oneflow {

FILE: oneflow/api/python/framework/tensor.cpp
  type oneflow (line 43) | namespace oneflow {
    type one (line 44) | namespace one {
      type AllocType (line 62) | struct AllocType {}
      function PyObject (line 75) | PyObject* PyTensor_wrap(const std::shared_ptr<T>& data, PyTensorObje...
      function PyTensor_tryResurrect (line 108) | bool PyTensor_tryResurrect(PyObject* py_tensor) {
      function PyTensorObject_init (line 125) | static int PyTensorObject_init(PyObject* self, PyObject* args, PyObj...
      function PyTensorObject_dealloc (line 134) | static void PyTensorObject_dealloc(PyObject* self) {
      function PyParameterObject_init (line 145) | static int PyParameterObject_init(PyObject* self, PyObject* args, Py...
      function Py_ssize_t (line 163) | static Py_ssize_t PyTensorObject_length(PyTensorObject* self) {
      function PyObject (line 168) | static PyObject* PyTensorObject_getitem(PyObject* self, Py_ssize_t i...
      function PyObject (line 176) | static PyObject* PyTensorObject_subscript(PyObject* self, PyObject* ...
      function PyObject (line 197) | static PyObject* PyTensorObject_storage_offset(PyObject* self, PyObj...
      function PyObject (line 203) | static PyObject* PyTensorObject_stride(PyObject* self, PyObject* unu...
      function PyObject (line 214) | static PyObject* PyTensorObject_is_contiguous(PyObject* self, PyObje...
      function PyObject (line 220) | static PyObject* PyTensorObject_is_view(PyObject* self, PyObject* un...
    function PyObject (line 230) | static PyObject* PyTensorObject_contiguous(PyObject* self, PyObject* u...
    function PyObject (line 236) | static PyObject* PyTensorObject_contiguous_(PyObject* self, PyObject* ...
    function PyObject (line 243) | static PyObject* PyTensorObject_pin_memory(PyObject* self, PyObject* u...
    function PyObject (line 249) | static PyObject* PyTensorObject_is_pinned(PyObject* self, PyObject* un...
    function PyObject (line 255) | static PyObject* PyTensorObject_offload(PyObject* self, PyObject* unus...
    function PyObject (line 263) | static PyObject* PyTensorObject_load(PyObject* self, PyObject* unused) {
    function PyObject (line 271) | static PyObject* PyTensorObject_is_offloaded(PyObject* self, PyObject*...
    function PyObject (line 277) | static PyObject* PyTensorObject_is_floating_point(PyObject* self, PyOb...
  function PyObject (line 287) | static PyObject* PyTensorObject_requires_grad_(PyObject* self, PyObject*...
  function PyObject (line 301) | static PyObject* PyTensorObject_retain_grad(PyObject* self, PyObject* un...
  function PyObject (line 309) | static PyObject* PyTensorObject_detach(PyObject* self, PyObject* unused) {
  function PyObject (line 315) | static PyObject* PyTensorObject_clone(PyObject* self, PyObject* unused) {
  function PyObject (line 321) | static PyObject* PyTensorObject_zero_(PyObject* self, PyObject* unused) {
  function RawSbpBToP (line 329) | std::vector<Symbol<SbpParallel>> RawSbpBToP(Symbol<NdSbp> nd_sbp) {
  function PyObject (line 341) | static PyObject* PyTensorObject_zero_grad(PyObject* self, PyObject* args...
  function PyObject (line 370) | static PyObject* PyTensorObject_register_hook(PyObject* self, PyObject* ...
  function PyObject (line 378) | static PyObject* PyTensorObject__register_post_grad_accumulation_hook(Py...
  function PyObject (line 387) | static PyObject* PyTensorObject_global_id(PyObject* self, PyObject* unus...
  function PyObject (line 394) | static PyObject* PyTensorObject_check_meta_consistency(PyObject* self, P...
  function PyObject (line 401) | static PyObject* PyTensorObject_data_ptr(PyObject* self, PyObject* unuse...
  function PyObject (line 411) | static PyObject* PyTensorObject_to_numpy(PyObject* self, PyObject* unuse...
  function PyObject (line 430) | static PyObject* PyTensorObject_item(PyObject* self, PyObject* unused) {
  function PyObject (line 448) | static PyObject* PyTensorObject_type(PyObject* self, PyObject* args, PyO...
  function CopyFromNumpyArray (line 492) | void CopyFromNumpyArray(ep::Stream* stream,
  function CopyToNumpyArray (line 500) | void CopyToNumpyArray(ep::Stream* stream,
  function PyObject (line 509) | static PyObject* PyTensorObject__copy_to_numpy(PyObject* self, PyObject*...
  function PyObject (line 516) | static PyObject* PyTensorObject__copy_from_numpy(PyObject* self, PyObjec...
  function PyObject (line 526) | static PyObject* PyTensorObject__register_storage_delete_hook(PyObject* ...
  function concat_method_def (line 534) | static std::vector<PyMethodDef> concat_method_def(PyMethodDef methods[],
  function PyObject (line 583) | static PyObject* PyTensorObject_ndim(PyObject* self, void* unused) {
  function PyObject (line 587) | static PyObject* PyTensorObject_shape(PyObject* self, void* unused) {
  function PyObject (line 591) | static PyObject* PyTensorObject_dtype(PyObject* self, void* unused) {
  function PyObject (line 598) | static PyObject* PyTensorObject_is_cpu(PyObject* self, void* unused) {
  function PyObject (line 602) | static PyObject* PyTensorObject_is_cuda(PyObject* self, void* unused) {
  function PyObject (line 606) | static PyObject* PyTensorObject_grad(PyObject* self, void* unused) {
  function PyTensorObject_set_grad (line 612) | static int PyTensorObject_set_grad(PyObject* self, PyObject* grad, void*...
  function PyObject (line 625) | static PyObject* PyTensorObject_data(PyObject* self, void* unused) {
  function PyTensorObject_set_data (line 631) | static int PyTensorObject_set_data(PyObject* self, PyObject* data, void*...
  function PyObject (line 642) | static PyObject* PyTensorObject_ref_tensor(PyObject* self, void* unused) {
  function PyTensorObject_set_ref_tensor (line 648) | static int PyTensorObject_set_ref_tensor(PyObject* self, PyObject* ref, ...
  function PyObject (line 661) | static PyObject* PyTensorObject_ref_index(PyObject* self, void* unused) {
  function PyTensorObject_set_ref_index (line 665) | static int PyTensorObject_set_ref_index(PyObject* self, PyObject* index,...
  function PyObject (line 674) | static PyObject* PyTensorObject_grad_fn(PyObject* self, void* unused) {
  function PyObject (line 678) | static PyObject* PyTensorObject_is_leaf(PyObject* self, void* unused) {
  function PyObject (line 682) | static PyObject* PyTensorObject_requires_grad(PyObject* self, void* unus...
  function PyTensorObject_set_requires_grad (line 686) | static int PyTensorObject_set_requires_grad(PyObject* self, PyObject* re...
  function PyObject (line 696) | static PyObject* PyTensorObject_is_lazy(PyObject* self, void* unused) {
  function PyObject (line 700) | static PyObject* PyTensorObject_is_eager(PyObject* self, void* unused) {
  function PyObject (line 704) | static PyObject* PyTensorObject_is_global(PyObject* self, void* unused) {
  function PyObject (line 708) | static PyObject* PyTensorObject_is_local(PyObject* self, void* unused) {
  function PyObject (line 712) | static PyObject* PyTensorObject__tensor_buffer_shapes_and_dtypes(PyObjec...
  function PyObject (line 718) | static PyObject* PyTensorObject_device(PyObject* self, void* unused) {
  function PyObject (line 724) | static PyObject* PyTensorObject_placement(PyObject* self, void* unused) {
  function PyObject (line 730) | static PyObject* PyTensorObject_sbp(PyObject* self, void* unused) {
  function PyObject (line 767) | static PyObject* TensorMetaCls_call(PyObject* type, PyObject* args, PyOb...
  function TensorMetaCls_dealloc (line 771) | static void TensorMetaCls_dealloc(PyObject* type) { PyType_Type.tp_deall...
  function PyHeapTypeObject (line 773) | static PyHeapTypeObject* MakeTensorMetaclass() {
  function PyTypeObject (line 799) | static PyTypeObject* MakeTensorType() {
  function PyTypeObject (line 832) | static PyTypeObject* MakeParameterType() {
  function PyObject (line 855) | PyObject* PyTensor_New(const std::shared_ptr<Tensor>& data) {
  function PyObject (line 859) | PyObject* PyParameter_New(const std::shared_ptr<Parameter>& data) {
  function PyObject (line 863) | PyObject* PyParameter_New(const std::shared_ptr<Tensor>& data, bool requ...

FILE: oneflow/api/python/framework/tensor.h
  function namespace (line 24) | namespace oneflow {

FILE: oneflow/api/python/framework/tensor_functions.cpp
  type oneflow (line 34) | namespace oneflow {
    type one (line 35) | namespace one {
      function PyObject (line 42) | PyObject* concat_self(PyObject* self, PyObject* args) {
      function PyObject (line 49) | PyObject* ndarray_judgment_and_compatibility(PyObject* self, PyObjec...
      function PyObject (line 109) | static PyObject* PyTensorObject_nb_pow(PyObject* a, PyObject* b, PyO...
      function PyObject (line 139) | PyObject* PyTensorObject_nb_inplace_pow(PyObject* a, PyObject* b, Py...
      function PyObject (line 358) | static PyObject* PyTensorObject_byte(PyObject* self, PyObject* unuse...
      function PyObject (line 364) | static PyObject* PyTensorObject_dim(PyObject* self, PyObject* unused) {
      function PyObject (line 370) | static PyObject* PyTensorObject_nelement(PyObject* self, PyObject* u...
      function PyObject (line 376) | static PyObject* PyTensorObject_element_size(PyObject* self, PyObjec...
      function PyObject (line 382) | static PyObject* PyTensorObject_get_device(PyObject* self, PyObject*...
      function PyObject (line 391) | static PyObject* PyTensorObject_size(PyObject* self, PyObject* args,...
      function PyObject (line 409) | static PyObject* PyTensorObject_cast(PyObject* self, PyObject* args,...
      function PyObject (line 427) | static PyObject* PyTensorObject_diag(PyObject* self, PyObject* args,...
      function PyObject (line 439) | static PyObject* PyTensorObject_diagonal(PyObject* self, PyObject* a...
      function PyObject (line 453) | static PyObject* PyTensorObject_matmul(PyObject* self, PyObject* arg...
      function PyObject (line 468) | static PyObject* PyTensorObject_reshape(PyObject* self, PyObject* ar...
      function PyObject (line 478) | static PyObject* PyTensorObject_reshape_as(PyObject* self, PyObject*...
      function PyObject (line 491) | static PyObject* PyTensorObject_cpu(PyObject* self, PyObject* unused) {
      function PyObject (line 498) | static PyObject* PyTensorObject_cuda(PyObject* self, PyObject* args,...
      function PyObject (line 521) | static PyObject* PyTensorObject_var(PyObject* self, PyObject* args, ...
      function PyObject (line 552) | static PyObject* PyTensorObject_std(PyObject* self, PyObject* args, ...
      function PyObject (line 584) | static PyObject* PyTensorObject_softplus(PyObject* self, PyObject* a...
      function PyObject (line 597) | static PyObject* PyTensorObject_relu(PyObject* self, PyObject* unuse...
      function PyObject (line 603) | static PyObject* PyTensorObject_relu_(PyObject* self, PyObject* unus...
      function PyObject (line 643) | static PyObject* PyTensorObject_view(PyObject* self, PyObject* args,...
      function PyObject (line 653) | static PyObject* PyTensorObject_view_as(PyObject* self, PyObject* ar...
      function PyObject (line 666) | static PyObject* PyTensorObject_permute(PyObject* self, PyObject* ar...
      function PyObject (line 677) | static PyObject* PyTensorObject_transpose(PyObject* self, PyObject* ...
      function PyObject (line 691) | static PyObject* PyTensorObject_local_to_global(PyObject* self, PyOb...
      function PyObject (line 729) | static PyObject* PyTensorObject_global_to_global(PyObject* self, PyO...
      function PyObject (line 789) | static PyObject* PyTensorObject_to_global(PyObject* self, PyObject* ...
      function PyObject (line 804) | static PyObject* PyTensorObject_to_local(PyObject* self, PyObject* u...
      function PyObject (line 819) | static PyObject* PyTensorObject_type_as(PyObject* self, PyObject* ar...
      function PyObject (line 854) | static PyObject* PyTensorObject_new(PyObject* self, PyObject* args, ...
      function PyTensorObject_setitem (line 910) | int PyTensorObject_setitem(PyObject* self, PyObject* item, PyObject*...
      function PyObject (line 1179) | PyObject* PyTensorObject_richcompare(PyObject* self, PyObject* other...

FILE: oneflow/api/python/framework/tensor_functions_util.h
  function namespace (line 24) | namespace oneflow {

FILE: oneflow/api/python/framework/tensor_tuple.cpp
  type oneflow (line 26) | namespace oneflow {
    type one (line 27) | namespace one {
      type TensorTupleUtil (line 31) | struct TensorTupleUtil final {
        method ToString (line 32) | static std::string ToString(const TensorTuple& tensor_tuple) {
        method MergeFrom (line 44) | static void MergeFrom(std::shared_ptr<TensorTuple>& tensor_tuple, ...
        method AppendTensor (line 48) | static void AppendTensor(std::shared_ptr<TensorTuple>& tensor_tuple,

FILE: oneflow/api/python/framework/tensortype.cpp
  type oneflow (line 30) | namespace oneflow {
    type one (line 31) | namespace one {
      function get_dtype_string (line 66) | static const std::string get_dtype_string(PyTensorType* tensortype) {
      function PyObject (line 75) | static PyObject* PyTensorTypeMetaCls_call(PyObject* self, PyObject* ...

FILE: oneflow/api/python/framework/tensortype.h
  function namespace (line 24) | namespace oneflow {

FILE: oneflow/api/python/framework/thread.cpp
  type oneflow (line 23) | namespace oneflow {
    class UsingThreadUidSet (line 27) | class UsingThreadUidSet final {
      method UsingThreadUidSet (line 29) | UsingThreadUidSet()
      method Get (line 35) | Maybe<int64_t> Get() {
      method Put (line 50) | Maybe<void> Put(int64_t thread_uid) {
    function UsingThreadUidSet (line 65) | UsingThreadUidSet* MutUsingThreadUidSet() {
      method UsingThreadUidSet (line 29) | UsingThreadUidSet()
      method Get (line 35) | Maybe<int64_t> Get() {
      method Put (line 50) | Maybe<void> Put(int64_t thread_uid) {

FILE: oneflow/api/python/framework/thread.h
  function namespace (line 22) | namespace oneflow {

FILE: oneflow/api/python/framework/typeinfo.cpp
  type oneflow (line 23) | namespace oneflow {
    type one (line 24) | namespace one {
      type is_floating_point_with_half (line 39) | struct is_floating_point_with_half : public std::false_type {}
      function PyGetVal (line 49) | typename std::enable_if<is_floating_point_with_half<T>::value, PyObj...
      function PyGetVal (line 54) | typename std::enable_if<std::is_integral<T>::value, PyObject*>::type...
      function PyObject (line 58) | PyObject* PyGetMaxVal(DataType datatype) {
      function PyObject (line 69) | PyObject* PyGetMinVal(DataType datatype) {
      function PyObject (line 104) | static PyObject* PyIInfo_new(PyTypeObject* self, PyObject* args, PyO...
      function PyObject (line 127) | static PyObject* PyFInfo_new(PyTypeObject* self, PyObject* args, PyO...
      function PyObject (line 150) | static PyObject* PyDInfo_bits(PyObject* self, void*) {
      function PyObject (line 157) | static PyObject* PyDInfo_min(PyObject* self, void*) {
      function PyObject (line 169) | static PyObject* PyDInfo_max(PyObject* self, void*) {
      function PyObject (line 181) | static PyObject* PyFInfo_resolution(PyObject* self, void*) {
      function PyObject (line 194) | static PyObject* PyFInfo_eps(PyObject* self, void*) {
      function PyObject (line 207) | static PyObject* PyFInfo_tiny(PyObject* self, void*) {
      function PyObject (line 220) | static PyObject* PyDInfo_dtype(PyObject* self, void*) {
      function PyObject (line 228) | static PyObject* PyIInfo_str(PyObject* self) {
      function PyObject (line 239) | static PyObject* PyFInfo_str(PyObject* self) {
      type PyGetSetDef (line 254) | struct PyGetSetDef
      type PyGetSetDef (line 262) | struct PyGetSetDef
      function init_info_type (line 273) | static void init_info_type() {

FILE: oneflow/api/python/framework/typeinfo.h
  function namespace (line 25) | namespace oneflow {

FILE: oneflow/api/python/framework/variable_tensor_mgr.cpp
  type oneflow (line 24) | namespace oneflow {

FILE: oneflow/api/python/functional/common.cpp
  type oneflow (line 39) | namespace oneflow {
    type one (line 40) | namespace one {
      type functional (line 41) | namespace functional {
        type detail (line 43) | namespace detail {
          function GetItemInPyScalarTensor (line 48) | Maybe<T> GetItemInPyScalarTensor(PyObject* obj) {
          function isinstance_fast (line 55) | bool isinstance_fast(PyObject* obj) {
          function T (line 66) | const T& cast_fast(PyObject* obj) {
          function T (line 79) | const T& cast_fast(PyObject* obj) {
        function PySequenceCheck (line 89) | bool PySequenceCheck(PyObject* obj, const std::function<bool(PyObj...
        function PyLongSequenceCheck (line 98) | bool PyLongSequenceCheck(PyObject* obj) {
        function PyFloatSequenceCheck (line 103) | bool PyFloatSequenceCheck(PyObject* obj) {
        function PyStringCheck (line 110) | bool PyStringCheck(PyObject* obj) { return PyBytes_Check(obj) || P...
        function PyStringSequenceCheck (line 112) | bool PyStringSequenceCheck(PyObject* obj) {
        function PyStringAsString (line 116) | std::string PyStringAsString(PyObject* obj) {
        function PyObjectToReprStr (line 123) | std::string PyObjectToReprStr(PyObject* obj) {
        function PyTensorSequenceCheck (line 131) | bool PyTensorSequenceCheck(PyObject* obj) {
        function PyUnpackTensorSequence (line 134) | std::vector<std::shared_ptr<Tensor>> PyUnpackTensorSequence(PyObje...
        function PyTensorTupleCheck (line 140) | bool PyTensorTupleCheck(PyObject* obj) { return detail::isinstance...
        function PyUnpackTensorTuple (line 142) | std::shared_ptr<TensorTuple> PyUnpackTensorTuple(PyObject* obj) {
        function PyScalarCheck (line 147) | bool PyScalarCheck(PyObject* obj) {
        function Scalar (line 151) | Scalar PyUnpackScalar(PyObject* obj) {
        function PyScalarTensorCheck (line 174) | bool PyScalarTensorCheck(PyObject* obj) {
        function Scalar (line 183) | Scalar PyUnpackScalarTensor(PyObject* obj) {
        function PyDTypeCheck (line 230) | bool PyDTypeCheck(PyObject* obj) { return detail::isinstance_fast<...
        function PyUnpackDType (line 231) | Symbol<DType> PyUnpackDType(PyObject* obj) { return *detail::cast_...
        function PyLayoutCheck (line 234) | bool PyLayoutCheck(PyObject* obj) { return detail::isinstance_fast...
        function PyUnpackLayout (line 235) | Symbol<Layout> PyUnpackLayout(PyObject* obj) { return *detail::cas...
        function PyMemoryFormatCheck (line 238) | bool PyMemoryFormatCheck(PyObject* obj) { return PyMemoryFormat_Ch...
        function MemoryFormat (line 239) | MemoryFormat PyUnpackMemoryFormat(PyObject* obj) { return PyMemory...
        function PyDTypeSequenceCheck (line 242) | bool PyDTypeSequenceCheck(PyObject* obj) {
        function PyUnpackDTypeSequence (line 245) | std::vector<Symbol<DType>> PyUnpackDTypeSequence(PyObject* obj) {
        function PyShapeCheck (line 250) | bool PyShapeCheck(PyObject* obj) { return PyLongSequenceCheck(obj); }
        function Shape (line 252) | Shape PyUnpackShape(PyObject* obj) {
        function PyShapeSequenceCheck (line 266) | bool PyShapeSequenceCheck(PyObject* obj) {
        function PyUnpackShapeSequence (line 269) | std::vector<Shape> PyUnpackShapeSequence(PyObject* obj) {
        function PyGeneratorCheck (line 274) | bool PyGeneratorCheck(PyObject* obj) { return detail::isinstance_f...
        function PyUnpackGenerator (line 275) | std::shared_ptr<Generator> PyUnpackGenerator(PyObject* obj) {
        function PyDeviceCheck (line 280) | bool PyDeviceCheck(PyObject* obj) { return detail::isinstance_fast...
        function PyUnpackDevice (line 281) | Symbol<Device> PyUnpackDevice(PyObject* obj) {
        function PyParallelDescCheck (line 286) | bool PyParallelDescCheck(PyObject* obj) {
        function PyUnpackParallelDesc (line 289) | Symbol<ParallelDesc> PyUnpackParallelDesc(PyObject* obj) {
        function PySbpParallelCheck (line 294) | bool PySbpParallelCheck(PyObject* obj) { return detail::isinstance...
        function PyUnpackSbpParallel (line 295) | Symbol<SbpParallel> PyUnpackSbpParallel(PyObject* obj) {
        function PySbpParallelSequenceCheck (line 300) | bool PySbpParallelSequenceCheck(PyObject* obj) {
        function PyUnpackSbpParallelSequence (line 303) | std::vector<Symbol<SbpParallel>> PyUnpackSbpParallelSequence(PyObj...
        function PyTensorIndexCheck (line 309) | bool PyTensorIndexCheck(PyObject* obj) {
        function TensorIndex (line 314) | TensorIndex PyUnpackTensorIndex(PyObject* obj) {
        function PyOpExprCheck (line 403) | bool PyOpExprCheck(PyObject* obj) { return detail::isinstance_fast...
        function PyUnpackOpExpr (line 405) | std::shared_ptr<OpExpr> PyUnpackOpExpr(PyObject* obj) {
        function PyUnpackLong (line 410) | Maybe<int64_t> PyUnpackLong(PyObject* py_obj) {

FILE: oneflow/api/python/functional/common.h
  function namespace (line 43) | namespace oneflow {

FILE: oneflow/api/python/functional/dispatch_stateful_ops.cpp
  type oneflow (line 28) | namespace oneflow {
    type one (line 29) | namespace one {
      type functional (line 30) | namespace functional {
        type impl (line 32) | namespace impl {
          function ONEFLOW_FUNCTION_LIBRARY (line 34) | ONEFLOW_FUNCTION_LIBRARY(m) {

FILE: oneflow/api/python/functional/function_def.h
  function namespace (line 26) | namespace oneflow {

FILE: oneflow/api/python/functional/indexing.cpp
  type oneflow (line 29) | namespace oneflow {
    type one (line 30) | namespace one {
      type functional (line 31) | namespace functional {
        type detail (line 33) | namespace detail {
          function PySliceUnpack (line 35) | void PySliceUnpack(PyObject* object, Py_ssize_t* start, Py_ssize...
          function DataType (line 59) | DataType InferScalarType(PyObject* object) {
          function ParseScalar (line 88) | void ParseScalar(PyObject* object, char* data, const DataType& d...
          function RecursiveParseAndAssign (line 113) | void RecursiveParseAndAssign(PyObject* object, char* data, const...
          function ParseArrayToTensor (line 128) | void ParseArrayToTensor(PyObject* object,
          function Shape (line 142) | Shape InferArraySizes(PyObject* object) {
          function ConvertToIndexingTensor (line 158) | Maybe<Tensor> ConvertToIndexingTensor(PyObject* object) {
          function IndexItem (line 196) | IndexItem UnpackIndexItem(PyObject* object) {

FILE: oneflow/api/python/functional/indexing.h
  function namespace (line 27) | namespace oneflow {

FILE: oneflow/api/python/functional/python_arg.cpp
  type oneflow (line 35) | namespace oneflow {
    type one (line 36) | namespace one {
      type functional (line 37) | namespace functional {
        function Scalar (line 98) | Scalar PythonArg::ObjectAs<Scalar>() const {
        function Shape (line 141) | Shape PythonArg::ObjectAs<Shape>() const {
        function TensorIndex (line 186) | TensorIndex PythonArg::ObjectAs<TensorIndex>() const {
        function PyObject (line 197) | PyObject* PythonArg::ObjectAs<PyObject*>() const {
        function MemoryFormat (line 210) | MemoryFormat PythonArg::ObjectAs<MemoryFormat>() const {

FILE: oneflow/api/python/functional/python_arg.h
  function namespace (line 29) | namespace oneflow {

FILE: oneflow/api/python/functional/python_arg_parser.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type functional (line 22) | namespace functional {

FILE: oneflow/api/python/functional/python_arg_parser.h
  function namespace (line 27) | namespace oneflow {

FILE: oneflow/api/python/functional/python_return_types.h
  function namespace (line 32) | namespace oneflow {

FILE: oneflow/api/python/functional/tensor_api.cpp
  type oneflow (line 42) | namespace oneflow {
    type one (line 43) | namespace one {
      type functional (line 44) | namespace functional {
        type impl (line 46) | namespace impl {
          class TensorWithDataFunctor (line 48) | class TensorWithDataFunctor {
          class GlobalTensorWithDataFunctor (line 83) | class GlobalTensorWithDataFunctor {
          class TensorEmptyGenericCtorFunctor (line 111) | class TensorEmptyGenericCtorFunctor {
          class GlobalTensorEmptyGenericCtorFunctor (line 120) | class GlobalTensorEmptyGenericCtorFunctor {
          class TensorWithOtherGenericCtorFunctor (line 130) | class TensorWithOtherGenericCtorFunctor {
          class TensorWithDataGenericCtorFunctor (line 142) | class TensorWithDataGenericCtorFunctor {
          class GlobalTensorWithDataGenericCtorFunctor (line 172) | class GlobalTensorWithDataGenericCtorFunctor {
          class TensorWithShapeGenericCtorFunctor (line 202) | class TensorWithShapeGenericCtorFunctor {
          class GlobalTensorWithShapeGenericCtorFunctor (line 218) | class GlobalTensorWithShapeGenericCtorFunctor {
          class AssignLocalTensorFunctor (line 230) | class AssignLocalTensorFunctor {
            method AssignLocalTensorFunctor (line 232) | AssignLocalTensorFunctor() {
          function get_shape_or_stride_from_numpy (line 253) | static std::vector<int64_t> get_shape_or_stride_from_numpy(size_...
          class LocalTensorSharedDlPackDataFunctor (line 259) | class LocalTensorSharedDlPackDataFunctor {
            method LocalTensorSharedDlPackDataFunctor (line 261) | LocalTensorSharedDlPackDataFunctor() {}
          class LocalTensorSharedNumpyDataFunctor (line 282) | class LocalTensorSharedNumpyDataFunctor {
            method LocalTensorSharedNumpyDataFunctor (line 284) | LocalTensorSharedNumpyDataFunctor() {}
        function ONEFLOW_FUNCTION_LIBRARY (line 366) | ONEFLOW_FUNCTION_LIBRARY(m) {

FILE: oneflow/api/python/functional/value_types.cpp
  type oneflow (line 21) | namespace oneflow {
    type one (line 22) | namespace one {
      type functional (line 23) | namespace functional {
        function IsIntegralType (line 86) | bool IsIntegralType(ValueType type) { return type >= kINT32 && typ...
        function IsIntegralListType (line 87) | bool IsIntegralListType(ValueType type) {
        function IsFloatingType (line 90) | bool IsFloatingType(ValueType type) { return type >= kFLOAT && typ...
        function IsFloatingListType (line 91) | bool IsFloatingListType(ValueType type) {

FILE: oneflow/api/python/functional/value_types.h
  function namespace (line 32) | namespace oneflow {
  function namespace (line 207) | namespace std {

FILE: oneflow/api/python/gil_foreign_lock_helper.cpp
  type oneflow (line 24) | namespace oneflow {
    class GILForeignLockHelper (line 25) | class GILForeignLockHelper final : public ForeignLockHelper {
      method WithScopedRelease (line 26) | Maybe<void> WithScopedRelease(const std::function<Maybe<void>()>& Ca...
      method WithScopedAcquire (line 36) | Maybe<void> WithScopedAcquire(const std::function<Maybe<void>()>& Ca...

FILE: oneflow/api/python/init.cpp
  type oneflow (line 29) | namespace oneflow {
    function Int2IntListMapContaining (line 36) | bool Int2IntListMapContaining(const Int2IntListMap& bigger, const Int2...
    function PYBIND11_MODULE (line 51) | PYBIND11_MODULE(_oneflow_internal, m) {

FILE: oneflow/api/python/ir.cpp
  type oneflow (line 41) | namespace oneflow {

FILE: oneflow/api/python/job_build/job_build_and_infer.cpp
  type oneflow (line 23) | namespace oneflow {
    function MarkVariableGradients (line 25) | Maybe<void> MarkVariableGradients(const one::TensorTuple& variables,
    function MarkOutputGradients (line 42) | Maybe<void> MarkOutputGradients(const one::TensorTuple& outputs,

FILE: oneflow/api/python/job_build/job_build_and_infer.h
  function namespace (line 30) | namespace oneflow {

FILE: oneflow/api/python/job_build/lazy_mode.cpp
  type oneflow (line 23) | namespace oneflow {

FILE: oneflow/api/python/multiprocessing/init.cpp
  type oneflow (line 34) | namespace oneflow {
    type multiprocessing (line 35) | namespace multiprocessing {
      function multiprocessing_init (line 39) | void multiprocessing_init() {
      function set_num_threads (line 57) | void set_num_threads(int num) {

FILE: oneflow/api/python/multiprocessing/object_ptr.cpp
  class OFPointer<PyObject> (line 23) | class OFPointer<PyObject>

FILE: oneflow/api/python/multiprocessing/object_ptr.h
  function explicit (line 27) | explicit OFPointer(T* ptr) noexcept : ptr(ptr){}
  function T (line 35) | T* get() { return ptr; }
  function T (line 36) | const T* get() const { return ptr; }
  function T (line 37) | T* release() {
  function operator (line 42) | operator T*() { return ptr; }

FILE: oneflow/api/python/multiprocessing/shared_memory.cpp
  type oneflow (line 21) | namespace oneflow {

FILE: oneflow/api/python/of_api_registry.cpp
  type oneflow (line 18) | namespace oneflow {
    function SubModuleMap (line 26) | SubModuleMap* GetSubModuleMap() {

FILE: oneflow/api/python/of_api_registry.h
  function namespace (line 29) | namespace oneflow {

FILE: oneflow/api/python/profiler.cpp
  type oneflow (line 22) | namespace oneflow {

FILE: oneflow/api/python/registry/registry.cpp
  type oneflow (line 23) | namespace oneflow {

FILE: oneflow/api/python/remat/remat.cpp
  type oneflow (line 27) | namespace oneflow {
    function rematable_storage (line 30) | Maybe<vm::RematableTensorStorage> rematable_storage(const std::shared_...

FILE: oneflow/api/python/rpc/ccl.cpp
  type oneflow (line 26) | namespace oneflow {
    function CpuBroadcast (line 29) | Maybe<py::bytes> CpuBroadcast(py::bytes* in, int64_t root) {

FILE: oneflow/api/python/rpc/rank_group.cpp
  type oneflow (line 27) | namespace oneflow {
    function CheckCurrentRankGroupConsistency (line 31) | Maybe<void> CheckCurrentRankGroupConsistency() {

FILE: oneflow/api/python/session/session.cpp
  type oneflow (line 26) | namespace oneflow {

FILE: oneflow/api/python/stack_getter.cpp
  type oneflow (line 29) | namespace oneflow {

FILE: oneflow/api/python/symbol/job_conf_symbol.cpp
  type oneflow (line 26) | namespace oneflow {
    function CreateJobConfSymbol (line 28) | Maybe<JobDesc> CreateJobConfSymbol(int64_t symbol_id, const std::strin...

FILE: oneflow/api/python/symbol/op_conf_symbol.cpp
  type oneflow (line 25) | namespace oneflow {

FILE: oneflow/api/python/symbol/placement_symbol.cpp
  type oneflow (line 36) | namespace oneflow {
    function GetDeviceCount (line 40) | int64_t GetDeviceCount(const std::string& device_name) {
    type PlacementSymbolExportUtil (line 44) | struct PlacementSymbolExportUtil {
      method CheckDeviceTag (line 45) | static Maybe<void> CheckDeviceTag(const std::string& type) {
      method CreateParallelDesc (line 53) | static Maybe<ParallelDesc> CreateParallelDesc(
      method CreateParallelDesc (line 67) | static Maybe<ParallelDesc> CreateParallelDesc(const std::string& pro...
      method ParseAndFormatRanks (line 80) | static Maybe<std::vector<std::string>> ParseAndFormatRanks(const py:...
      method GetRanksShape (line 107) | static Maybe<Shape> GetRanksShape(PyArrayObject* ranks) {
      method ParseAndFormatRanks (line 113) | static Maybe<std::vector<std::string>> ParseAndFormatRanks(PyArrayOb...
      method CreateParallelDescSymbol (line 135) | static Maybe<Symbol<ParallelDesc>> CreateParallelDescSymbol(
      method CreateParallelDescSymbol (line 143) | static Maybe<Symbol<ParallelDesc>> CreateParallelDescSymbol(const st...
      method CreateParallelDescSymbol (line 154) | static Maybe<Symbol<ParallelDesc>> CreateParallelDescSymbol(const st...
      method AllDevicePlacement (line 158) | static Maybe<Symbol<ParallelDesc>> AllDevicePlacement(const std::str...
      method GetPlacementRanks (line 185) | static Maybe<py::array> GetPlacementRanks(const Symbol<ParallelDesc>...

FILE: oneflow/api/python/symbol/sbp_symbol.cpp
  type oneflow (line 30) | namespace oneflow {
    function MakeSplitSbpParallelList (line 34) | Maybe<std::vector<Symbol<SbpParallel>>> MakeSplitSbpParallelList(int m...
    function GetSplitSbpParallel (line 41) | Maybe<Symbol<SbpParallel>> GetSplitSbpParallel(int axis) {
    function GetBroadcastSbpParallel (line 52) | Maybe<Symbol<SbpParallel>> GetBroadcastSbpParallel() {
    function GetPartialSumSbpParallel (line 57) | Maybe<Symbol<SbpParallel>> GetPartialSumSbpParallel() {
    function SbpGetState (line 62) | Maybe<std::pair<std::string, int>> SbpGetState(const Symbol<SbpParalle...
    function GetSbpFromState (line 74) | Maybe<Symbol<SbpParallel>> GetSbpFromState(const std::pair<std::string...

FILE: oneflow/api/python/symbol/scope_symbol.cpp
  type oneflow (line 25) | namespace oneflow {
    function CreateScopeSymbol (line 27) | Maybe<Scope> CreateScopeSymbol(int64_t symbol_id, const std::string& s...

FILE: oneflow/api/python/utils/dataloader.cpp
  type oneflow (line 30) | namespace oneflow {
    function setSignalHandler (line 58) | static inline void setSignalHandler(int signal, void (*handler)(int, s...
    function handler_SIGTERM (line 86) | static void handler_SIGTERM(int sig, siginfo_t* info, void* ctx) {
    function set_worker_signal_handlers (line 98) | static void set_worker_signal_handlers() {
    function error_if_any_worker_fails (line 108) | static void error_if_any_worker_fails() {
    function utils_unpackLong (line 155) | inline int64_t utils_unpackLong(PyObject* obj) {
    function set_worker_pids (line 166) | static void set_worker_pids(py::args py_args) {
    function remove_worker_pids (line 193) | static void remove_worker_pids(py::args py_args) {
    function PyObject (line 209) | static PyObject* set_worker_signal_handlers(PyObject* module, PyObject...
    function PyObject (line 213) | static PyObject* set_worker_pids(PyObject* module, PyObject* _ignored)...
    function PyObject (line 215) | static PyObject* remove_worker_pids(PyObject* module, PyObject* _ignor...
    function PyObject (line 217) | static PyObject* error_if_any_worker_fails(PyObject* module, PyObject*...

FILE: oneflow/api/python/utils/tensor_utils.cpp
  type oneflow (line 34) | namespace oneflow {
    type one (line 35) | namespace one {
      function EagerLocalTensorZeros (line 37) | Maybe<void> EagerLocalTensorZeros(const std::shared_ptr<Tensor>& t) {
      function CopyFromNumpyArray (line 60) | void CopyFromNumpyArray(ep::Stream* stream,
      function CopyLocalTensorFromUntypedArray (line 69) | Maybe<void> CopyLocalTensorFromUntypedArray(const std::shared_ptr<Te...
      function RegisterTensorHook (line 104) | Maybe<void> RegisterTensorHook(const std::shared_ptr<Tensor>& self,
      function RegisterTensorPostGradAccumulationHook (line 113) | Maybe<void> RegisterTensorPostGradAccumulationHook(const std::shared...
      function TensorGetPyTupleOfSbp (line 120) | Maybe<py::tuple> TensorGetPyTupleOfSbp(const Tensor& tensor) {
      function MakeLocalTensorFromData (line 129) | Maybe<Tensor> MakeLocalTensorFromData(PyObject* data, const Optional...
      function GetAllBroadcastNdSbp (line 198) | Maybe<Symbol<NdSbp>> GetAllBroadcastNdSbp(size_t ndim) {
      function MakeGlobalTensorFromData (line 210) | Maybe<Tensor> MakeGlobalTensorFromData(PyObject* data, const Optiona...
      function MakeTensorFromOtherTensor (line 274) | Maybe<Tensor> MakeTensorFromOtherTensor(const std::shared_ptr<Tensor...
      function MakeTensorFromOtherTensor (line 289) | Maybe<Tensor> MakeTensorFromOtherTensor(const std::shared_ptr<Tensor...
      function MakeTensorFromOtherTensor (line 314) | Maybe<Tensor> MakeTensorFromOtherTensor(const std::shared_ptr<Tensor...

FILE: oneflow/api/python/utils/tensor_utils.h
  function namespace (line 43) | namespace pybind11 {
  function namespace (line 59) | namespace oneflow {

FILE: oneflow/core/auto_parallel/algorithm_util.cpp
  type oneflow (line 19) | namespace oneflow {
    type auto_parallel (line 20) | namespace auto_parallel {
      function InverseOrder (line 27) | void InverseOrder(const std::vector<int32_t>& order, std::vector<int...
    function CeilQuotient (line 43) | int64_t CeilQuotient(int64_t dividend, int64_t divisor) {

FILE: oneflow/core/auto_parallel/algorithm_util.h
  function namespace (line 24) | namespace oneflow {

FILE: oneflow/core/auto_parallel/auto_memory.cpp
  type oneflow (line 26) | namespace oneflow {
    type auto_parallel (line 27) | namespace auto_parallel {
      class TopoStruct (line 31) | class TopoStruct {
      type comp (line 83) | struct comp {
      function IsProducedRegisterReusable (line 96) | bool IsProducedRegisterReusable(const Operator& op) {
      function ComputeAllMemoryIncrement (line 204) | void ComputeAllMemoryIncrement(std::vector<TopoStruct*>& topo_structs,
      function UpdateSat (line 244) | void UpdateSat(const std::vector<TopoStruct*>& topo_structs, Straigh...
      function InitInOutTopoStructs (line 264) | void InitInOutTopoStructs(std::vector<TopoStruct*>* topo_structs) {
      function ComputeLayer (line 297) | void ComputeLayer(std::vector<TopoStruct*>* topo_structs) {
      function InitAllParameters (line 310) | void InitAllParameters(std::vector<TopoStruct*>* topo_structs,
      function StraightenOpNodes (line 349) | void StraightenOpNodes(HashMap<const OpNode*, TopoStruct>& op_node2t...
      function InitMemory (line 393) | void InitMemory(const OpGraph& op_graph, SbpGraph* sbp_graph, bool n...
      function StraightenSubGraph (line 514) | void StraightenSubGraph(const std::vector<const OpNode*>& sub_graph,
      function StraightenOpGraph (line 541) | void StraightenOpGraph(const OpGraph& op_graph, std::vector<const Op...

FILE: oneflow/core/auto_parallel/auto_memory.h
  function namespace (line 21) | namespace oneflow {

FILE: oneflow/core/auto_parallel/binary_set.cpp
  type oneflow (line 18) | namespace oneflow {
    type auto_parallel (line 19) | namespace auto_parallel {
      function InitLog2 (line 23) | std::unordered_map<BinarySetEntryType, int32_t> InitLog2() {

FILE: oneflow/core/auto_parallel/binary_set.h
  function namespace (line 24) | namespace oneflow {

FILE: oneflow/core/auto_parallel/boxing_collector.cpp
  type oneflow (line 36) | namespace oneflow {
    function DfsSetNdSbp (line 42) | void DfsSetNdSbp(const std::vector<SbpParallel>& id2sbp_parallel, int3...
    function SetNdSbpDim (line 57) | Maybe<NdSbp> SetNdSbpDim(const NdSbp& nd_sbp, int32_t hierarchy_num) {
    function TotalNumSplit (line 79) | int32_t TotalNumSplit(const NdSbp& nd_sbp, const ParallelDesc& paralle...
    function AskSbpCombinationFor1DSbp (line 91) | Maybe<void> AskSbpCombinationFor1DSbp(const NdSbp& sbp_producer, const...

FILE: oneflow/core/auto_parallel/boxing_collector.h
  function namespace (line 25) | namespace oneflow {

FILE: oneflow/core/auto_parallel/sbp_collector.cpp
  type oneflow (line 23) | namespace oneflow {
    type auto_parallel (line 25) | namespace auto_parallel {
      function IfIntersectAll (line 29) | bool IfIntersectAll(
      function FindUniqueSbpSets (line 40) | void FindUniqueSbpSets(
      function FindUniqueSbpGroups (line 58) | void FindUniqueSbpGroups(
      function No2SbpFromSameUniqueGroup (line 80) | bool No2SbpFromSameUniqueGroup(const BinarySet& bs,

FILE: oneflow/core/auto_parallel/sbp_collector.h
  function namespace (line 33) | namespace oneflow {

FILE: oneflow/core/auto_parallel/sbp_constructor.cpp
  type oneflow (line 33) | namespace oneflow {
    type auto_parallel (line 35) | namespace auto_parallel {
      function UpdateMemoryRatio (line 55) | double UpdateMemoryRatio() {

FILE: oneflow/core/auto_parallel/sbp_constructor.h
  function namespace (line 24) | namespace oneflow {

FILE: oneflow/core/auto_parallel/sbp_edge.cpp
  type oneflow (line 29) | namespace oneflow {
    type auto_parallel (line 30) | namespace auto_parallel {

FILE: oneflow/core/auto_parallel/sbp_edge.h
  function namespace (line 30) | namespace oneflow {

FILE: oneflow/core/auto_parallel/sbp_graph.cpp
  type oneflow (line 25) | namespace oneflow {
    type auto_parallel (line 26) | namespace auto_parallel {
      function SbpNode (line 36) | SbpNode* SbpGraph::GenerateNode() {

FILE: oneflow/core/auto_parallel/sbp_graph.h
  function namespace (line 27) | namespace oneflow {

FILE: oneflow/core/auto_parallel/sbp_node.cpp
  type oneflow (line 33) | namespace oneflow {
    type auto_parallel (line 34) | namespace auto_parallel {
      function SbpEdge (line 839) | SbpEdge* SbpNode::FindEdgeWithNode(const SbpNode* other_node) const {
      function NdSbpSignature (line 850) | const NdSbpSignature& SbpNode::FinalSbpSignature() const {

FILE: oneflow/core/auto_parallel/sbp_node.h
  function namespace (line 32) | namespace oneflow {

FILE: oneflow/core/auto_parallel/sbp_util.cpp
  type oneflow (line 23) | namespace oneflow {
    type auto_parallel (line 24) | namespace auto_parallel {
      function RequireSameSbp (line 27) | bool RequireSameSbp(const OpNode* consumer, const std::string& ibn) {

FILE: oneflow/core/auto_parallel/sbp_util.h
  function namespace (line 21) | namespace oneflow {

FILE: oneflow/core/autograd/autograd_captured_tensor.h
  function namespace (line 22) | namespace oneflow {

FILE: oneflow/core/autograd/autograd_engine.cpp
  type oneflow (line 41) | namespace oneflow {
    type one (line 42) | namespace one {
      function GatherFunctionNodes (line 46) | void GatherFunctionNodes(FunctionNode* node, std::stack<std::shared_...
      function FunctionNodeDeleter (line 65) | void FunctionNodeDeleter(FunctionNode* node) {
      function IsReadyToRun (line 79) | bool IsReadyToRun(const std::vector<std::shared_ptr<AutogradMeta>>& ...
      function CopyOrAccGrad (line 86) | Maybe<void> CopyOrAccGrad(AutogradMeta* autograd_meta, bool autograd...
      function RawTouchGlobalTensor (line 108) | Maybe<void> RawTouchGlobalTensor(const std::shared_ptr<one::Tensor>&...
      function CheckGlobalTensorsMeta (line 115) | Maybe<void> CheckGlobalTensorsMeta(const TensorTuple& tensor_tuple) {
      function GetDebugGraphFileName (line 122) | std::string GetDebugGraphFileName(const std::string& mode, const std...
      type NodeFrame (line 356) | struct NodeFrame {
        method NodeFrame (line 357) | explicit NodeFrame(FunctionNode* node) : node_(node), next_functio...
        method FunctionNode (line 361) | FunctionNode* GetNextFunction() {
      function AutogradEngine (line 515) | AutogradEngine* GetThreadLocalAutogradEngine() {
      function AddAccumulateFunctionNode (line 520) | Maybe<void> AddAccumulateFunctionNode(const std::shared_ptr<Tensor>&...

FILE: oneflow/core/autograd/autograd_engine.h
  function namespace (line 31) | namespace oneflow {
  function class (line 98) | class AutogradEngine {
  function class (line 146) | class GraphTask final {
  function ClearEngine (line 183) | void ClearEngine() override{}

FILE: oneflow/core/autograd/autograd_function.cpp
  type oneflow (line 23) | namespace oneflow {
    type one (line 24) | namespace one {

FILE: oneflow/core/autograd/autograd_function.h
  function namespace (line 22) | namespace oneflow {

FILE: oneflow/core/autograd/autograd_meta.cpp
  type oneflow (line 25) | namespace oneflow {
    type one (line 27) | namespace one {
      function GetSbpTuple (line 38) | Maybe<const std::vector<Symbol<SbpParallel>>&> GetSbpTuple(Symbol<Nd...

FILE: oneflow/core/autograd/autograd_meta.h
  function namespace (line 27) | namespace oneflow {

FILE: oneflow/core/autograd/autograd_mode.cpp
  type oneflow (line 19) | namespace oneflow {
    type autograd (line 21) | namespace autograd {

FILE: oneflow/core/autograd/autograd_mode.h
  function namespace (line 20) | namespace oneflow {

FILE: oneflow/core/autograd/gradient_funcs/activation.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type BaseActivationCaptureState (line 23) | struct BaseActivationCaptureState : public AutoGradCaptureState {
      class BaseActivation (line 27) | class BaseActivation : public OpExprGradFunction<BaseActivationCaptu...
        method Init (line 29) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 31) | Maybe<void> Capture(BaseActivationCaptureState* ctx, const TensorT...
      class Silu (line 41) | class Silu : public BaseActivation {
        method Apply (line 43) | Maybe<void> Apply(const BaseActivationCaptureState* ctx, const Ten...
      class Mish (line 55) | class Mish : public BaseActivation {
        method Apply (line 57) | Maybe<void> Apply(const BaseActivationCaptureState* ctx, const Ten...
      class Selu (line 69) | class Selu : public BaseActivation {
        method Apply (line 71) | Maybe<void> Apply(const BaseActivationCaptureState* ctx, const Ten...
      class Softsign (line 83) | class Softsign : public BaseActivation {
        method Apply (line 85) | Maybe<void> Apply(const BaseActivationCaptureState* ctx, const Ten...
      class GeLU (line 97) | class GeLU : public BaseActivation {
        method Apply (line 99) | Maybe<void> Apply(const BaseActivationCaptureState* ctx, const Ten...
      class FastGeLU (line 111) | class FastGeLU : public BaseActivation {
        method Apply (line 113) | Maybe<void> Apply(const BaseActivationCaptureState* ctx, const Ten...
      type QuickGeluCaptureState (line 125) | struct QuickGeluCaptureState : public AutoGradCaptureState {
      class QuickGeLU (line 129) | class QuickGeLU : public OpExprGradFunction<QuickGeluCaptureState> {
        method Init (line 131) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 133) | Maybe<void> Capture(QuickGeluCaptureState* ctx, const TensorTuple&...
        method Apply (line 143) | Maybe<void> Apply(const QuickGeluCaptureState* ctx, const TensorTu...
      type SquareReLUCaptureState (line 155) | struct SquareReLUCaptureState : public AutoGradCaptureState {
      class SquareReLU (line 159) | class SquareReLU : public OpExprGradFunction<SquareReLUCaptureState> {
        method Init (line 161) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 163) | Maybe<void> Capture(SquareReLUCaptureState* ctx, const TensorTuple...
        method Apply (line 173) | Maybe<void> Apply(const SquareReLUCaptureState* ctx, const TensorT...
      class HardSigmoid (line 185) | class HardSigmoid : public BaseActivation {
        method Apply (line 187) | Maybe<void> Apply(const BaseActivationCaptureState* ctx, const Ten...
      type HardShrinkCaptureState (line 199) | struct HardShrinkCaptureState : public AutoGradCaptureState {
      class HardShrink (line 204) | class HardShrink : public OpExprGradFunction<HardShrinkCaptureState> {
        method Init (line 206) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 213) | Maybe<void> Capture(HardShrinkCaptureState* ctx, const TensorTuple...
        method Apply (line 225) | Maybe<void> Apply(const HardShrinkCaptureState* ctx, const TensorT...
      class HardSwish (line 241) | class HardSwish : public BaseActivation {
        method Apply (line 243) | Maybe<void> Apply(const BaseActivationCaptureState* ctx, const Ten...
      type ReLUCaptureState (line 256) | struct ReLUCaptureState : public AutoGradCaptureState {
      class ReLU (line 260) | class ReLU : public OpExprGradFunction<ReLUCaptureState> {
        method Init (line 262) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 264) | Maybe<void> Capture(ReLUCaptureState* ctx, const TensorTuple& inpu...
        method Apply (line 273) | Maybe<void> Apply(const ReLUCaptureState* ctx, const TensorTuple& ...
      type LeakyReluCaptureState (line 286) | struct LeakyReluCaptureState : public AutoGradCaptureState {
      class LeakyRelu (line 291) | class LeakyRelu : public OpExprGradFunction<LeakyReluCaptureState> {
        method Init (line 293) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 300) | Maybe<void> Capture(LeakyReluCaptureState* ctx, const TensorTuple&...
        method Apply (line 312) | Maybe<void> Apply(const LeakyReluCaptureState* ctx, const TensorTu...
      type SoftplusCaptureState (line 327) | struct SoftplusCaptureState : public AutoGradCaptureState {
      class Softplus (line 333) | class Softplus : public OpExprGradFunction<SoftplusCaptureState> {
        method Init (line 335) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 342) | Maybe<void> Capture(SoftplusCaptureState* ctx, const TensorTuple& ...
        method Apply (line 353) | Maybe<void> Apply(const SoftplusCaptureState* ctx, const TensorTup...
      type HardTanhCaptureState (line 369) | struct HardTanhCaptureState : public AutoGradCaptureState {
      class HardTanh (line 375) | class HardTanh : public OpExprGradFunction<HardTanhCaptureState> {
        method Init (line 377) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 384) | Maybe<void> Capture(HardTanhCaptureState* ctx, const TensorTuple& ...
        method Apply (line 397) | Maybe<void> Apply(const HardTanhCaptureState* ctx, const TensorTup...
      type EluCaptureState (line 413) | struct EluCaptureState : public AutoGradCaptureState {
      class Elu (line 418) | class Elu : public OpExprGradFunction<EluCaptureState> {
        method Init (line 420) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 427) | Maybe<void> Capture(EluCaptureState* ctx, const TensorTuple& input...
        method Apply (line 439) | Maybe<void> Apply(const EluCaptureState* ctx, const TensorTuple& o...
      type CeluCaptureState (line 454) | struct CeluCaptureState : public AutoGradCaptureState {
      class Celu (line 459) | class Celu : public OpExprGradFunction<CeluCaptureState> {
        method Init (line 461) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 468) | Maybe<void> Capture(CeluCaptureState* ctx, const TensorTuple& inpu...
        method Apply (line 480) | Maybe<void> Apply(const CeluCaptureState* ctx, const TensorTuple& ...
      type SoftShrinkCaptureState (line 495) | struct SoftShrinkCaptureState : public AutoGradCaptureState {
      class SoftShrink (line 500) | class SoftShrink : public OpExprGradFunction<SoftShrinkCaptureState> {
        method Init (line 502) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 509) | Maybe<void> Capture(SoftShrinkCaptureState* ctx, const TensorTuple...
        method Apply (line 521) | Maybe<void> Apply(const SoftShrinkCaptureState* ctx, const TensorT...
      type PReLUCaptureState (line 537) | struct PReLUCaptureState : public AutoGradCaptureState {
      class PReLU (line 542) | class PReLU : public OpExprGradFunction<PReLUCaptureState> {
        method Init (line 544) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 546) | Maybe<void> Capture(PReLUCaptureState* ctx, const TensorTuple& inp...
        method Apply (line 557) | Maybe<void> Apply(const PReLUCaptureState* ctx, const TensorTuple&...
      type ThresholdCaptureState (line 576) | struct ThresholdCaptureState : public AutoGradCaptureState {
      class Threshold (line 581) | class Threshold : public OpExprGradFunction<ThresholdCaptureState> {
        method Init (line 583) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 590) | Maybe<void> Capture(ThresholdCaptureState* ctx, const TensorTuple&...
        method Apply (line 602) | Maybe<void> Apply(const ThresholdCaptureState* ctx, const TensorTu...
      type FracCaptureState (line 618) | struct FracCaptureState : public AutoGradCaptureState {
      class Frac (line 622) | class Frac : public OpExprGradFunction<FracCaptureState> {
        method Init (line 624) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 631) | Maybe<void> Capture(FracCaptureState* ctx, const TensorTuple& inpu...
        method Apply (line 639) | Maybe<void> Apply(const FracCaptureState* ctx, const TensorTuple& ...

FILE: oneflow/core/autograd/gradient_funcs/adaptive_avg_pool.cpp
  type oneflow (line 22) | namespace oneflow {
    type one (line 23) | namespace one {
      type AdaptivePoolCaptureState (line 25) | struct AdaptivePoolCaptureState : public AutoGradCaptureState {
      class AdaptivePoolNdGrad (line 30) | class AdaptivePoolNdGrad : public OpExprGradFunction<AdaptivePoolCap...
      class AdaptiveAvgPool1dGrad (line 76) | class AdaptiveAvgPool1dGrad final : public AdaptivePoolNdGrad {
        method Init (line 78) | Maybe<void> Init(const OpExpr& op) override { return AdaptivePoolN...
      class AdaptiveAvgPool2dGrad (line 81) | class AdaptiveAvgPool2dGrad final : public AdaptivePoolNdGrad {
        method Init (line 83) | Maybe<void> Init(const OpExpr& op) override { return AdaptivePoolN...
      class AdaptiveAvgPool3dGrad (line 86) | class AdaptiveAvgPool3dGrad final : public AdaptivePoolNdGrad {
        method Init (line 88) | Maybe<void> Init(const OpExpr& op) override { return AdaptivePoolN...

FILE: oneflow/core/autograd/gradient_funcs/adaptive_max_pool.cpp
  type oneflow (line 22) | namespace oneflow {
    type one (line 23) | namespace one {
      type AdaptiveMaxPoolCaptureState (line 25) | struct AdaptiveMaxPoolCaptureState : public AutoGradCaptureState {
      class AdaptiveMaxPoolNdGrad (line 30) | class AdaptiveMaxPoolNdGrad : public OpExprGradFunction<AdaptiveMaxP...
      class AdaptiveMaxPool1dGrad (line 75) | class AdaptiveMaxPool1dGrad final : public AdaptiveMaxPoolNdGrad {
        method Init (line 77) | Maybe<void> Init(const OpExpr& op) override { return AdaptiveMaxPo...
      class AdaptiveMaxPool2dGrad (line 80) | class AdaptiveMaxPool2dGrad final : public AdaptiveMaxPoolNdGrad {
        method Init (line 82) | Maybe<void> Init(const OpExpr& op) override { return AdaptiveMaxPo...
      class AdaptiveMaxPool3dGrad (line 85) | class AdaptiveMaxPool3dGrad final : public AdaptiveMaxPoolNdGrad {
        method Init (line 87) | Maybe<void> Init(const OpExpr& op) override { return AdaptiveMaxPo...

FILE: oneflow/core/autograd/gradient_funcs/add_n.cpp
  type oneflow (line 18) | namespace oneflow {
    type one (line 19) | namespace one {
      type AddNCaptureState (line 21) | struct AddNCaptureState : public AutoGradCaptureState {
      class AddN (line 26) | class AddN : public OpExprGradFunction<AddNCaptureState> {
        method Init (line 28) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 30) | Maybe<void> Capture(AddNCaptureState* ctx, const TensorTuple& inpu...
        method Apply (line 40) | Maybe<void> Apply(const AddNCaptureState* ctx, const TensorTuple& ...

FILE: oneflow/core/autograd/gradient_funcs/affine_grid.cpp
  type oneflow (line 21) | namespace oneflow {
    type one (line 22) | namespace one {
      type AffineGridInterpState (line 24) | struct AffineGridInterpState : public AutoGradCaptureState {
      class AffineGrid (line 30) | class AffineGrid : public OpExprGradFunction<AffineGridInterpState> {
        method Init (line 32) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 39) | Maybe<void> Capture(AffineGridInterpState* ctx, const TensorTuple&...
        method Apply (line 51) | Maybe<void> Apply(const AffineGridInterpState* ctx, const TensorTu...

FILE: oneflow/core/autograd/gradient_funcs/amp_white_identity.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type AmpIdentityType (line 23) | enum class AmpIdentityType {
      type AmpIdentityCaptureState (line 28) | struct AmpIdentityCaptureState : public AutoGradCaptureState {}
      class AmpIdentityGrad (line 31) | class AmpIdentityGrad : public OpExprGradFunction<AmpIdentityCapture...
        method Init (line 33) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 39) | Maybe<void> Capture(AmpIdentityCaptureState* ctx, const TensorTupl...
        method Apply (line 44) | Maybe<void> Apply(const AmpIdentityCaptureState* ctx, const Tensor...

FILE: oneflow/core/autograd/gradient_funcs/as_strided.cpp
  type oneflow (line 21) | namespace oneflow {
    type one (line 22) | namespace one {
      type AsStridedCaptureState (line 24) | struct AsStridedCaptureState : public AutoGradCaptureState {
      class AsStrided (line 31) | class AsStrided : public OpExprGradFunction<AsStridedCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/avg_pool.cpp
  type oneflow (line 23) | namespace oneflow {
    type one (line 24) | namespace one {
      type AvgPoolCaptureState (line 28) | struct AvgPoolCaptureState : public AutoGradCaptureState {
      class AvgPoolNdGrad (line 41) | class AvgPoolNdGrad : public OpExprGradFunction<AvgPoolCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/batch_gather.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type BatchGatherCaptureState (line 23) | struct BatchGatherCaptureState : public AutoGradCaptureState {
      class BatchGather (line 28) | class BatchGather : public OpExprGradFunction<BatchGatherCaptureStat...

FILE: oneflow/core/autograd/gradient_funcs/bias_add.cpp
  type oneflow (line 23) | namespace oneflow {
    type one (line 24) | namespace one {
      type BiasAddCaptureState (line 26) | struct BiasAddCaptureState : public AutoGradCaptureState {
      class BiasAdd (line 32) | class BiasAdd : public OpExprGradFunction<BiasAddCaptureState> {
        method Init (line 34) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 41) | Maybe<void> Capture(BiasAddCaptureState* ctx, const TensorTuple& i...
        method Apply (line 51) | Maybe<void> Apply(const BiasAddCaptureState* ctx, const TensorTupl...

FILE: oneflow/core/autograd/gradient_funcs/binary_cross_entropy.cpp
  type oneflow (line 19) | namespace oneflow {
    type one (line 20) | namespace one {
      type BinaryCrossEntropyCaptureState (line 22) | struct BinaryCrossEntropyCaptureState : public AutoGradCaptureState {
      class BinaryCrossEntropy (line 28) | class BinaryCrossEntropy : public OpExprGradFunction<BinaryCrossEntr...

FILE: oneflow/core/autograd/gradient_funcs/binary_cross_entropy_with_logits.cpp
  type oneflow (line 19) | namespace oneflow {
    type one (line 20) | namespace one {
      type BinaryCrossEntropyWithLogitsCaptureState (line 22) | struct BinaryCrossEntropyWithLogitsCaptureState : public AutoGradCap...
      class BinaryCrossEntropyWithLogits (line 29) | class BinaryCrossEntropyWithLogits

FILE: oneflow/core/autograd/gradient_funcs/binary_cross_entropy_with_logits_reduce_mean.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type BinaryCrossEntropyWithLogitsReduceMeanCaptureState (line 23) | struct BinaryCrossEntropyWithLogitsReduceMeanCaptureState : public A...
      class BinaryCrossEntropyWithLogitsReduceMean (line 28) | class BinaryCrossEntropyWithLogitsReduceMean

FILE: oneflow/core/autograd/gradient_funcs/broadcast_binary_ops.cpp
  type oneflow (line 22) | namespace oneflow {
    type one (line 23) | namespace one {
      type BroadcastBinaryCaptureState (line 25) | struct BroadcastBinaryCaptureState : public AutoGradCaptureState {
      class BroadcastBinaryGrad (line 35) | class BroadcastBinaryGrad : public OpExprGradFunction<BroadcastBinar...
        method BroadcastBinaryGrad (line 37) | BroadcastBinaryGrad() = default;
        method Init (line 40) | virtual Maybe<void> Init(const OpExpr& op) override { return Maybe...
        method Capture (line 42) | Maybe<void> Capture(BroadcastBinaryCaptureState* ctx, const Tensor...
      class BroadcastAdd (line 59) | class BroadcastAdd : public BroadcastBinaryGrad {
        method Apply (line 61) | Maybe<void> Apply(const BroadcastBinaryCaptureState* ctx, const Te...
        method SaveTensorForBackward (line 84) | Maybe<void> SaveTensorForBackward(BroadcastBinaryCaptureState* ctx...
      class BroadcastSub (line 98) | class BroadcastSub : public BroadcastBinaryGrad {
        method Apply (line 100) | Maybe<void> Apply(const BroadcastBinaryCaptureState* ctx, const Te...
        method SaveTensorForBackward (line 124) | Maybe<void> SaveTensorForBackward(BroadcastBinaryCaptureState* ctx...
      class BroadcastMul (line 138) | class BroadcastMul : public BroadcastBinaryGrad {
        method Apply (line 140) | Maybe<void> Apply(const BroadcastBinaryCaptureState* ctx, const Te...
        method SaveTensorForBackward (line 167) | Maybe<void> SaveTensorForBackward(BroadcastBinaryCaptureState* ctx...
      class BroadcastDiv (line 187) | class BroadcastDiv : public BroadcastBinaryGrad {
        method Apply (line 189) | Maybe<void> Apply(const BroadcastBinaryCaptureState* ctx, const Te...
        method SaveTensorForBackward (line 212) | Maybe<void> SaveTensorForBackward(BroadcastBinaryCaptureState* ctx...
      class BroadcastPow (line 230) | class BroadcastPow : public BroadcastBinaryGrad {
        method Apply (line 232) | Maybe<void> Apply(const BroadcastBinaryCaptureState* ctx, const Te...
        method SaveTensorForBackward (line 247) | Maybe<void> SaveTensorForBackward(BroadcastBinaryCaptureState* ctx...
      class BroadcastMinMax (line 257) | class BroadcastMinMax : public BroadcastBinaryGrad {
        method Apply (line 259) | Maybe<void> Apply(const BroadcastBinaryCaptureState* ctx, const Te...
        method SaveTensorForBackward (line 315) | Maybe<void> SaveTensorForBackward(BroadcastBinaryCaptureState* ctx...
      class BroadcastMinimum (line 329) | class BroadcastMinimum : public BroadcastMinMax {
        method Init (line 331) | Maybe<void> Init(const OpExpr& op) override {
      class BroadcastMaximum (line 338) | class BroadcastMaximum : public BroadcastMinMax {
        method Init (line 340) | Maybe<void> Init(const OpExpr& op) override {
      class BroadcastFMod (line 350) | class BroadcastFMod : public BroadcastBinaryGrad {
        method Apply (line 352) | Maybe<void> Apply(const BroadcastBinaryCaptureState* ctx, const Te...
        method SaveTensorForBackward (line 410) | Maybe<void> SaveTensorForBackward(BroadcastBinaryCaptureState* ctx...

FILE: oneflow/core/autograd/gradient_funcs/broadcast_like.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type BroadCastLikeCaptureState (line 23) | struct BroadCastLikeCaptureState : public AutoGradCaptureState {
      class BroadCastLike (line 30) | class BroadCastLike : public OpExprGradFunction<BroadCastLikeCapture...

FILE: oneflow/core/autograd/gradient_funcs/cast.cpp
  type oneflow (line 25) | namespace oneflow {
    type one (line 26) | namespace one {
      type CastCaptureState (line 28) | struct CastCaptureState : public AutoGradCaptureState {
      class Cast (line 33) | class Cast : public OpExprGradFunction<CastCaptureState> {
        method Init (line 35) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 41) | Maybe<void> Capture(CastCaptureState* ctx, const TensorTuple& inpu...
        method Apply (line 48) | Maybe<void> Apply(const CastCaptureState* ctx, const TensorTuple& ...

FILE: oneflow/core/autograd/gradient_funcs/clip_by_scalar.cpp
  type oneflow (line 19) | namespace oneflow {
    type one (line 20) | namespace one {
      type ClipByScalarCaptureState (line 22) | struct ClipByScalarCaptureState : public AutoGradCaptureState {
      class ClipByScalar (line 28) | class ClipByScalar : public OpExprGradFunction<ClipByScalarCaptureSt...
        method Init (line 30) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 37) | Maybe<void> Capture(ClipByScalarCaptureState* ctx, const TensorTup...
        method Apply (line 57) | Maybe<void> Apply(const ClipByScalarCaptureState* ctx, const Tenso...

FILE: oneflow/core/autograd/gradient_funcs/clip_by_scalar_max.cpp
  type oneflow (line 19) | namespace oneflow {
    type one (line 20) | namespace one {
      type ClipByScalarMaxCaptureState (line 22) | struct ClipByScalarMaxCaptureState : public AutoGradCaptureState {
      class ClipByScalarMax (line 27) | class ClipByScalarMax : public OpExprGradFunction<ClipByScalarMaxCap...
        method Init (line 29) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 36) | Maybe<void> Capture(ClipByScalarMaxCaptureState* ctx, const Tensor...
        method Apply (line 54) | Maybe<void> Apply(const ClipByScalarMaxCaptureState* ctx, const Te...

FILE: oneflow/core/autograd/gradient_funcs/clip_by_scalar_min.cpp
  type oneflow (line 19) | namespace oneflow {
    type one (line 20) | namespace one {
      type ClipByScalarMinCaptureState (line 22) | struct ClipByScalarMinCaptureState : public AutoGradCaptureState {
      class ClipByScalarMin (line 27) | class ClipByScalarMin : public OpExprGradFunction<ClipByScalarMinCap...
        method Init (line 29) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 36) | Maybe<void> Capture(ClipByScalarMinCaptureState* ctx, const Tensor...
        method Apply (line 54) | Maybe<void> Apply(const ClipByScalarMinCaptureState* ctx, const Te...

FILE: oneflow/core/autograd/gradient_funcs/combined_margin_loss.cpp
  type oneflow (line 21) | namespace oneflow {
    type one (line 22) | namespace one {
      type CombinedMarginLossCaptureState (line 24) | struct CombinedMarginLossCaptureState : public AutoGradCaptureState {
      class CombinedMarginLoss (line 34) | class CombinedMarginLoss : public OpExprGradFunction<CombinedMarginL...
        method Init (line 36) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 43) | Maybe<void> Capture(CombinedMarginLossCaptureState* ctx, const Ten...
        method Apply (line 60) | Maybe<void> Apply(const CombinedMarginLossCaptureState* ctx, const...

FILE: oneflow/core/autograd/gradient_funcs/complex.cpp
  type oneflow (line 19) | namespace oneflow {
    type one (line 20) | namespace one {
      type BaseComplexCaptureState (line 22) | struct BaseComplexCaptureState : public AutoGradCaptureState {
      class RealGrad (line 27) | class RealGrad : public OpExprGradFunction<BaseComplexCaptureState> {
        method Init (line 29) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 31) | Maybe<void> Capture(BaseComplexCaptureState* ctx, const TensorTupl...
        method Apply (line 39) | Maybe<void> Apply(const BaseComplexCaptureState* ctx, const Tensor...
      class ImagGrad (line 51) | class ImagGrad : public OpExprGradFunction<BaseComplexCaptureState> {
        method Init (line 53) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 55) | Maybe<void> Capture(BaseComplexCaptureState* ctx, const TensorTupl...
        method Apply (line 63) | Maybe<void> Apply(const BaseComplexCaptureState* ctx, const Tensor...
      class ConjPhysicalGrad (line 75) | class ConjPhysicalGrad : public OpExprGradFunction<BaseComplexCaptur...
        method Init (line 77) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 79) | Maybe<void> Capture(BaseComplexCaptureState* ctx, const TensorTupl...
        method Apply (line 87) | Maybe<void> Apply(const BaseComplexCaptureState* ctx, const Tensor...

FILE: oneflow/core/autograd/gradient_funcs/concat.cpp
  type oneflow (line 22) | namespace oneflow {
    type one (line 23) | namespace one {
      type ConcatCaptureState (line 25) | struct ConcatCaptureState : public AutoGradCaptureState {
      class Concat (line 31) | class Concat : public OpExprGradFunction<ConcatCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/conv.cpp
  type oneflow (line 23) | namespace oneflow {
    type one (line 24) | namespace one {
      type ConvolutionNdCaptureState (line 26) | struct ConvolutionNdCaptureState : public AutoGradCaptureState {
      class ConvolutionNd (line 42) | class ConvolutionNd : public OpExprGradFunction<ConvolutionNdCapture...

FILE: oneflow/core/autograd/gradient_funcs/copy.cpp
  type oneflow (line 23) | namespace oneflow {
    type one (line 24) | namespace one {
      type CopyCaptureState (line 26) | struct CopyCaptureState : public AutoGradCaptureState {
      class Copy (line 31) | class Copy : public OpExprGradFunction<CopyCaptureState> {
        method Init (line 33) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 39) | Maybe<void> Capture(CopyCaptureState* ctx, const TensorTuple& inpu...
        method Apply (line 52) | Maybe<void> Apply(const CopyCaptureState* ctx, const TensorTuple& ...

FILE: oneflow/core/autograd/gradient_funcs/ctc_loss.cpp
  type oneflow (line 23) | namespace oneflow {
    type one (line 24) | namespace one {
      type CTCLossCaptureState (line 26) | struct CTCLossCaptureState : public AutoGradCaptureState {
      class CTCLoss (line 33) | class CTCLoss : public OpExprGradFunction<CTCLossCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/cublas_fused_mlp.cpp
  type oneflow (line 28) | namespace oneflow {
    type one (line 30) | namespace one {
      type CublasFusedMLPCaptureState (line 32) | struct CublasFusedMLPCaptureState : public AutoGradCaptureState {
      class CublasFusedMLP (line 40) | class CublasFusedMLP : public OpExprGradFunction<CublasFusedMLPCaptu...

FILE: oneflow/core/autograd/gradient_funcs/cum_ops.cpp
  type oneflow (line 19) | namespace oneflow {
    type one (line 20) | namespace one {
      type CumCaptureState (line 22) | struct CumCaptureState : public AutoGradCaptureState {
      class CumGrad (line 28) | class CumGrad : public OpExprGradFunction<StateT> {
        method Init (line 30) | Maybe<void> Init(const OpExpr& op) override {
      class CumsumGrad (line 41) | class CumsumGrad : public CumGrad<CumCaptureState> {
        method Capture (line 43) | Maybe<void> Capture(CumCaptureState* ctx, const TensorTuple& input...
        method Apply (line 53) | Maybe<void> Apply(const CumCaptureState* ctx, const TensorTuple& o...
      class CumProdGrad (line 70) | class CumProdGrad : public CumGrad<CumCaptureState> {
        method Capture (line 72) | Maybe<void> Capture(CumCaptureState* ctx, const TensorTuple& input...
        method Apply (line 85) | Maybe<void> Apply(const CumCaptureState* ctx, const TensorTuple& o...

FILE: oneflow/core/autograd/gradient_funcs/deconv.cpp
  type oneflow (line 22) | namespace oneflow {
    type one (line 23) | namespace one {
      type DeConvolutionNdCaptureState (line 25) | struct DeConvolutionNdCaptureState : public AutoGradCaptureState {
      class DeConvolutionNd (line 37) | class DeConvolutionNd : public OpExprGradFunction<DeConvolutionNdCap...

FILE: oneflow/core/autograd/gradient_funcs/deform_conv.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type DeformConvNdCaptureState (line 23) | struct DeformConvNdCaptureState : public AutoGradCaptureState {
      class DeformConvNd (line 40) | class DeformConvNd : public OpExprGradFunction<DeformConvNdCaptureSt...

FILE: oneflow/core/autograd/gradient_funcs/depand.cpp
  type oneflow (line 19) | namespace oneflow {
    type one (line 20) | namespace one {
      type DependCaptureState (line 22) | struct DependCaptureState : public AutoGradCaptureState {
      class Depend (line 30) | class Depend : public OpExprGradFunction<DependCaptureState> {
        method Init (line 32) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 34) | Maybe<void> Capture(DependCaptureState* ctx, const TensorTuple& in...
        method Apply (line 48) | Maybe<void> Apply(const DependCaptureState* ctx, const TensorTuple...

FILE: oneflow/core/autograd/gradient_funcs/det.cpp
  type oneflow (line 21) | namespace oneflow {
    type one (line 22) | namespace one {
      type DetCaptureState (line 24) | struct DetCaptureState : public AutoGradCaptureState {
      class Det (line 30) | class Det : public OpExprGradFunction<DetCaptureState> {
        method Init (line 32) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 33) | Maybe<void> Capture(DetCaptureState* ctx, const TensorTuple& input...
        method Apply (line 42) | Maybe<void> Apply(const DetCaptureState* ctx, const TensorTuple& o...

FILE: oneflow/core/autograd/gradient_funcs/diag.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type DiagCaptureState (line 23) | struct DiagCaptureState : public AutoGradCaptureState {
      class Diag (line 28) | class Diag : public OpExprGradFunction<DiagCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/diagonal.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type DiagonalInterpState (line 23) | struct DiagonalInterpState : public AutoGradCaptureState {
      class Diagonal (line 28) | class Diagonal : public OpExprGradFunction<DiagonalInterpState> {

FILE: oneflow/core/autograd/gradient_funcs/dim_gather.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type DimGatherCaptureState (line 23) | struct DimGatherCaptureState : public AutoGradCaptureState {
      class DimGather (line 28) | class DimGather : public OpExprGradFunction<DimGatherCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/dim_scatter.cpp
  type oneflow (line 21) | namespace oneflow {
    type one (line 22) | namespace one {
      type DimScatterCaptureState (line 24) | struct DimScatterCaptureState : public AutoGradCaptureState {
      type ScatterType (line 29) | enum class ScatterType { kUpdate, kAdd, kMultiply }
      class DimScatter (line 32) | class DimScatter : public OpExprGradFunction<DimScatterCaptureState> {
      class DimScatterUpdateScalar (line 103) | class DimScatterUpdateScalar : public OpExprGradFunction<DimScatterC...

FILE: oneflow/core/autograd/gradient_funcs/dot.cpp
  type oneflow (line 19) | namespace oneflow {
    type one (line 20) | namespace one {
      type DotCaptureState (line 22) | struct DotCaptureState : public AutoGradCaptureState {
      class DotGrad (line 29) | class DotGrad : public OpExprGradFunction<DotCaptureState> {
        method Init (line 31) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 33) | Maybe<void> Capture(DotCaptureState* ctx, const TensorTuple& input...
        method Apply (line 44) | Maybe<void> Apply(const DotCaptureState* ctx, const TensorTuple& o...

FILE: oneflow/core/autograd/gradient_funcs/dropout.cpp
  type oneflow (line 19) | namespace oneflow {
    type one (line 20) | namespace one {
      type DropoutCaptureState (line 22) | struct DropoutCaptureState : public AutoGradCaptureState {
      class Dropout (line 28) | class Dropout : public OpExprGradFunction<DropoutCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/eager_ccl_broadcast.cpp
  type oneflow (line 22) | namespace oneflow {
    type one (line 24) | namespace one {
      function EagerCclReduce (line 28) | Maybe<one::UserOpExpr> EagerCclReduce(Symbol<ParallelDesc> parallel_...
      function FindOrCreatEagerCclReduceOpExpr (line 37) | Maybe<one::UserOpExpr> FindOrCreatEagerCclReduceOpExpr(Symbol<Parall...
      type EagerCclBroadcastCaptureState (line 52) | struct EagerCclBroadcastCaptureState : public AutoGradCaptureState {...
      class EagerCclBroadcast (line 57) | class EagerCclBroadcast : public OpExprGradFunction<EagerCclBroadcas...
        method Init (line 59) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 65) | Maybe<void> Capture(EagerCclBroadcastCaptureState* ctx, const Tens...
        method Apply (line 73) | Maybe<void> Apply(const EagerCclBroadcastCaptureState* ctx, const ...

FILE: oneflow/core/autograd/gradient_funcs/elementwise_minimum_maximum.cpp
  type oneflow (line 22) | namespace oneflow {
    type one (line 23) | namespace one {
      type ElementwiseXimumCaptureState (line 25) | struct ElementwiseXimumCaptureState : public AutoGradCaptureState {
      class ElementwiseXimumOp (line 30) | class ElementwiseXimumOp : public OpExprGradFunction<ElementwiseXimu...
        method Capture (line 32) | Maybe<void> Capture(ElementwiseXimumCaptureState* ctx, const Tenso...
        method Apply (line 41) | Maybe<void> Apply(const ElementwiseXimumCaptureState* ctx, const T...
      class ElementwiseMinimum (line 63) | class ElementwiseMinimum : public ElementwiseXimumOp {
        method Init (line 65) | Maybe<void> Init(const OpExpr& op) override {
      class ElementwiseMaximum (line 71) | class ElementwiseMaximum : public ElementwiseXimumOp {
        method Init (line 73) | Maybe<void> Init(const OpExpr& op) override {

FILE: oneflow/core/autograd/gradient_funcs/embedding.cpp
  type oneflow (line 23) | namespace oneflow {
    type one (line 24) | namespace one {
      type EmbeddingCaptureState (line 26) | struct EmbeddingCaptureState : public AutoGradCaptureState {
      class Embedding (line 32) | class Embedding : public OpExprGradFunction<EmbeddingCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/expand.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type ExpandCaptureState (line 23) | struct ExpandCaptureState : public AutoGradCaptureState {
      class Expand (line 30) | class Expand : public OpExprGradFunction<ExpandCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/fake_quantization.cpp
  type oneflow (line 18) | namespace oneflow {
    type one (line 19) | namespace one {
      type FakeQuantizationCaptureState (line 21) | struct FakeQuantizationCaptureState : public AutoGradCaptureState {
      class FakeQuantization (line 25) | class FakeQuantization : public OpExprGradFunction<FakeQuantizationC...
        method Init (line 27) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 29) | Maybe<void> Capture(FakeQuantizationCaptureState* ctx, const Tenso...
        method Apply (line 36) | Maybe<void> Apply(const FakeQuantizationCaptureState* ctx, const T...

FILE: oneflow/core/autograd/gradient_funcs/fft.cpp
  type oneflow (line 24) | namespace oneflow {
    type one (line 25) | namespace one {
      type FftR2CCaptureState (line 27) | struct FftR2CCaptureState : public AutoGradCaptureState {
      class FftR2C (line 35) | class FftR2C : public OpExprGradFunction<FftR2CCaptureState> {
        method Init (line 37) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 39) | Maybe<void> Capture(FftR2CCaptureState* ctx, const TensorTuple& in...
        method Apply (line 53) | Maybe<void> Apply(const FftR2CCaptureState* ctx, const TensorTuple...
      type FftC2CCaptureState (line 96) | struct FftC2CCaptureState : public AutoGradCaptureState {
      class FftC2C (line 103) | class FftC2C : public OpExprGradFunction<FftC2CCaptureState> {
        method Init (line 105) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 107) | Maybe<void> Capture(FftC2CCaptureState* ctx, const TensorTuple& in...
        method Apply (line 121) | Maybe<void> Apply(const FftC2CCaptureState* ctx, const TensorTuple...
      type FftC2RCaptureState (line 134) | struct FftC2RCaptureState : public AutoGradCaptureState {
      class FftC2R (line 142) | class FftC2R : public OpExprGradFunction<FftC2RCaptureState> {
        method Init (line 144) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 146) | Maybe<void> Capture(FftC2RCaptureState* ctx, const TensorTuple& in...
        method Apply (line 160) | Maybe<void> Apply(const FftC2RCaptureState* ctx, const TensorTuple...

FILE: oneflow/core/autograd/gradient_funcs/fill.cpp
  type oneflow (line 22) | namespace oneflow {
    type one (line 23) | namespace one {
      type FillCaptureState (line 25) | struct FillCaptureState : public AutoGradCaptureState {
      class Fill (line 30) | class Fill : public OpExprGradFunction<FillCaptureState> {
      class FillTensor (line 63) | class FillTensor : public OpExprGradFunction<FillCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/flatten.cpp
  type oneflow (line 23) | namespace oneflow {
    type one (line 24) | namespace one {
      type FlattenCaptureState (line 26) | struct FlattenCaptureState : public AutoGradCaptureState {
      class Flatten (line 30) | class Flatten : public OpExprGradFunction<FlattenCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/flip.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type FlipCaptureState (line 23) | struct FlipCaptureState : public AutoGradCaptureState {
      class Flip (line 28) | class Flip : public OpExprGradFunction<FlipCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/fold.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type FoldInterpState (line 23) | struct FoldInterpState : public AutoGradCaptureState {
      class Fold (line 32) | class Fold : public OpExprGradFunction<FoldInterpState> {

FILE: oneflow/core/autograd/gradient_funcs/fused_bias_add_dropout.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type FusedBiasAddDropoutInterpState (line 23) | struct FusedBiasAddDropoutInterpState : public AutoGradCaptureState {
      class FusedBiasAddDropout (line 30) | class FusedBiasAddDropout : public OpExprGradFunction<FusedBiasAddDr...

FILE: oneflow/core/autograd/gradient_funcs/fused_bias_add_gelu.cpp
  type oneflow (line 19) | namespace oneflow {
    type one (line 20) | namespace one {
      type FusedBiasAddGeluInterpState (line 22) | struct FusedBiasAddGeluInterpState : public AutoGradCaptureState {
      class FusedBiasAddGelu (line 28) | class FusedBiasAddGelu : public OpExprGradFunction<FusedBiasAddGeluI...
        method Init (line 30) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 37) | Maybe<void> Capture(FusedBiasAddGeluInterpState* ctx, const Tensor...
        method Apply (line 51) | Maybe<void> Apply(const FusedBiasAddGeluInterpState* ctx, const Te...

FILE: oneflow/core/autograd/gradient_funcs/fused_bias_add_scale_mask_softmax_dropout.cpp
  type oneflow (line 19) | namespace oneflow {
    type one (line 20) | namespace one {
      type FusedBiasAddScaleMaskSoftmaxDropoutCaptureState (line 22) | struct FusedBiasAddScaleMaskSoftmaxDropoutCaptureState : public Auto...
      class FusedBiasAddScaleMaskSoftmaxDropoutGradFunction (line 34) | class FusedBiasAddScaleMaskSoftmaxDropoutGradFunction
        method Init (line 37) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 44) | Maybe<void> Capture(FusedBiasAddScaleMaskSoftmaxDropoutCaptureStat...
        method Apply (line 74) | Maybe<void> Apply(const FusedBiasAddScaleMaskSoftmaxDropoutCapture...

FILE: oneflow/core/autograd/gradient_funcs/fused_center.cpp
  type oneflow (line 19) | namespace oneflow {
    type one (line 20) | namespace one {
      type FusedCenterCaptureState (line 23) | struct FusedCenterCaptureState : public AutoGradCaptureState {
      class FusedCenterGrad (line 27) | class FusedCenterGrad : public OpExprGradFunction<FusedCenterCapture...
        method Init (line 29) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 31) | Maybe<void> Capture(FusedCenterCaptureState* ctx, const TensorTupl...
        method Apply (line 43) | Maybe<void> Apply(const FusedCenterCaptureState* ctx, const Tensor...

FILE: oneflow/core/autograd/gradient_funcs/fused_cross_interaction.cpp
  type oneflow (line 24) | namespace oneflow {
    type one (line 25) | namespace one {
      type FusedCrossFeatureInteractionInterpState (line 27) | struct FusedCrossFeatureInteractionInterpState : public AutoGradCapt...
      class FusedCrossFeatureInteraction (line 40) | class FusedCrossFeatureInteraction
        method Init (line 43) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 50) | Maybe<void> Capture(FusedCrossFeatureInteractionInterpState* ctx, ...
        method Apply (line 70) | Maybe<void> Apply(const FusedCrossFeatureInteractionInterpState* ctx,

FILE: oneflow/core/autograd/gradient_funcs/fused_dot_feature_interaction.cpp
  type oneflow (line 23) | namespace oneflow {
    type one (line 24) | namespace one {
      type FusedDotFeatureInteractionCaptureState (line 26) | struct FusedDotFeatureInteractionCaptureState : public AutoGradCaptu...
      class FusedDotFeatureInteraction (line 37) | class FusedDotFeatureInteraction

FILE: oneflow/core/autograd/gradient_funcs/fused_fast_gelu_mul.cpp
  type oneflow (line 19) | namespace oneflow {
    type one (line 20) | namespace one {
      type FusedFastGeluMulGradCaptureState (line 22) | struct FusedFastGeluMulGradCaptureState : public AutoGradCaptureState {
      class FusedFastGeluMulGrad (line 26) | class FusedFastGeluMulGrad : public OpExprGradFunction<FusedFastGelu...
        method Init (line 28) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 30) | Maybe<void> Capture(FusedFastGeluMulGradCaptureState* ctx, const T...
        method Apply (line 42) | Maybe<void> Apply(const FusedFastGeluMulGradCaptureState* ctx, con...

FILE: oneflow/core/autograd/gradient_funcs/fused_get_boundding_boxes_coord.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type FusedGetBounddingBoxesCoordGradCaptureState (line 24) | struct FusedGetBounddingBoxesCoordGradCaptureState : public AutoGrad...
      class FusedGetBounddingBoxesCoordGrad (line 28) | class FusedGetBounddingBoxesCoordGrad
        method Init (line 31) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 33) | Maybe<void> Capture(FusedGetBounddingBoxesCoordGradCaptureState* c...
        method Apply (line 44) | Maybe<void> Apply(const FusedGetBounddingBoxesCoordGradCaptureStat...

FILE: oneflow/core/autograd/gradient_funcs/fused_get_ciou_diagonal_angle.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type FusedCiouAngleCaptureState (line 24) | struct FusedCiouAngleCaptureState : public AutoGradCaptureState {
      class FusedGetCiouDiagonalAngleGrad (line 29) | class FusedGetCiouDiagonalAngleGrad : public OpExprGradFunction<Fuse...
        method Init (line 31) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 33) | Maybe<void> Capture(FusedCiouAngleCaptureState* ctx, const TensorT...
        method Apply (line 49) | Maybe<void> Apply(const FusedCiouAngleCaptureState* ctx, const Ten...

FILE: oneflow/core/autograd/gradient_funcs/fused_get_ciou_result.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type FusedGetCiouResultGradCaptureState (line 23) | struct FusedGetCiouResultGradCaptureState : public AutoGradCaptureSt...
      class FusedGetCiouResultGrad (line 30) | class FusedGetCiouResultGrad : public OpExprGradFunction<FusedGetCio...
        method Init (line 32) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 34) | Maybe<void> Capture(FusedGetCiouResultGradCaptureState* ctx, const...
        method Apply (line 51) | Maybe<void> Apply(const FusedGetCiouResultGradCaptureState* ctx, c...

FILE: oneflow/core/autograd/gradient_funcs/fused_get_convex_diagonal_squared.cpp
  type oneflow (line 19) | namespace oneflow {
    type one (line 20) | namespace one {
      type FusedGetConvexDiagonalSquaredCaptureState (line 23) | struct FusedGetConvexDiagonalSquaredCaptureState : public AutoGradCa...
      class FusedGetConvexDiagonalSquaredGrad (line 28) | class FusedGetConvexDiagonalSquaredGrad
        method Init (line 31) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 38) | Maybe<void> Capture(FusedGetConvexDiagonalSquaredCaptureState* ctx...
        method Apply (line 52) | Maybe<void> Apply(const FusedGetConvexDiagonalSquaredCaptureState*...

FILE: oneflow/core/autograd/gradient_funcs/fused_get_intersection_area.cpp
  type oneflow (line 19) | namespace oneflow {
    type one (line 20) | namespace one {
      type FusedGetIntersectionAreaCaptureState (line 23) | struct FusedGetIntersectionAreaCaptureState : public AutoGradCapture...
      class FusedGetIntersectionAreaGrad (line 27) | class FusedGetIntersectionAreaGrad
        method Init (line 30) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 32) | Maybe<void> Capture(FusedGetIntersectionAreaCaptureState* ctx, con...
        method Apply (line 44) | Maybe<void> Apply(const FusedGetIntersectionAreaCaptureState* ctx,...

FILE: oneflow/core/autograd/gradient_funcs/fused_get_iou.cpp
  type oneflow (line 21) | namespace oneflow {
    type one (line 22) | namespace one {
      type FusedGetIouGradCaptureState (line 24) | struct FusedGetIouGradCaptureState : public AutoGradCaptureState {
      class FusedGetIouGrad (line 29) | class FusedGetIouGrad : public OpExprGradFunction<FusedGetIouGradCap...
        method Init (line 31) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 38) | Maybe<void> Capture(FusedGetIouGradCaptureState* ctx, const Tensor...
        method Apply (line 56) | Maybe<void> Apply(const FusedGetIouGradCaptureState* ctx, const Te...

FILE: oneflow/core/autograd/gradient_funcs/fused_glu.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type FusedGluGradCaptureState (line 23) | struct FusedGluGradCaptureState : public AutoGradCaptureState {
      class FusedGluGrad (line 33) | class FusedGluGrad : public OpExprGradFunction<FusedGluGradCaptureSt...

FILE: oneflow/core/autograd/gradient_funcs/fused_gru_cell.cpp
  type oneflow (line 21) | namespace oneflow {
    type one (line 22) | namespace one {
      type FusedGruCellGradCaptureState (line 24) | struct FusedGruCellGradCaptureState : public AutoGradCaptureState {
      class FusedGruCellGrad (line 29) | class FusedGruCellGrad : public OpExprGradFunction<FusedGruCellGradC...
        method Init (line 31) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 37) | Maybe<void> Capture(FusedGruCellGradCaptureState* ctx, const Tenso...
        method Apply (line 48) | Maybe<void> Apply(const FusedGruCellGradCaptureState* ctx, const T...

FILE: oneflow/core/autograd/gradient_funcs/fused_lstm_cell.cpp
  type oneflow (line 21) | namespace oneflow {
    type one (line 22) | namespace one {
      type FusedLstmCellGradCaptureState (line 24) | struct FusedLstmCellGradCaptureState : public AutoGradCaptureState {
      class FusedLstmCellGrad (line 29) | class FusedLstmCellGrad : public OpExprGradFunction<FusedLstmCellGra...
        method Init (line 31) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 37) | Maybe<void> Capture(FusedLstmCellGradCaptureState* ctx, const Tens...
        method Apply (line 50) | Maybe<void> Apply(const FusedLstmCellGradCaptureState* ctx, const ...

FILE: oneflow/core/autograd/gradient_funcs/fused_matmul_bias.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 22) | namespace one {
      type FusedMatmulBiasCaptureState (line 24) | struct FusedMatmulBiasCaptureState : public AutoGradCaptureState {
      class FusedMatmulBias (line 30) | class FusedMatmulBias : public OpExprGradFunction<FusedMatmulBiasCap...

FILE: oneflow/core/autograd/gradient_funcs/fused_matmul_bias_add_relu_dropout.cpp
  type oneflow (line 28) | namespace oneflow {
    type one (line 30) | namespace one {
      type FusedMatmulBiasAddReluDropoutCaptureState (line 32) | struct FusedMatmulBiasAddReluDropoutCaptureState : public AutoGradCa...
      class FusedMatmulBiasAddReluDropout (line 41) | class FusedMatmulBiasAddReluDropout

FILE: oneflow/core/autograd/gradient_funcs/fused_scale_mask_bias_softmax.cpp
  type oneflow (line 26) | namespace oneflow {
    type one (line 27) | namespace one {
      type FusedScaleMaskBiasSoftmaxCaptureState (line 29) | struct FusedScaleMaskBiasSoftmaxCaptureState : public AutoGradCaptur...
      class FusedScaleMaskBiasSoftmax (line 36) | class FusedScaleMaskBiasSoftmax : public OpExprGradFunction<FusedSca...

FILE: oneflow/core/autograd/gradient_funcs/fused_scale_mask_softmax.cpp
  type oneflow (line 23) | namespace oneflow {
    type one (line 24) | namespace one {
      type FusedScaleMaskSoftmaxInterState (line 26) | struct FusedScaleMaskSoftmaxInterState : public AutoGradCaptureState {
      class FusedScaleMaskSoftmax (line 31) | class FusedScaleMaskSoftmax : public OpExprGradFunction<FusedScaleMa...

FILE: oneflow/core/autograd/gradient_funcs/fused_scale_mask_softmax_dropout.cpp
  type oneflow (line 23) | namespace oneflow {
    type one (line 24) | namespace one {
      type FusedScaleMaskSoftmaxDropoutInterState (line 26) | struct FusedScaleMaskSoftmaxDropoutInterState : public AutoGradCaptu...
      class FusedScaleMaskSoftmaxDropout (line 32) | class FusedScaleMaskSoftmaxDropout

FILE: oneflow/core/autograd/gradient_funcs/fused_scale_tril.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type FusedScaleTrilState (line 23) | struct FusedScaleTrilState : public AutoGradCaptureState {
      class FusedScaleTril (line 31) | class FusedScaleTril : public OpExprGradFunction<FusedScaleTrilState> {

FILE: oneflow/core/autograd/gradient_funcs/fused_scale_tril_softmax_mask_scale.cpp
  type oneflow (line 22) | namespace oneflow {
    type one (line 23) | namespace one {
      type FusedScaleTrilSoftmaxMaskScaleInterpState (line 25) | struct FusedScaleTrilSoftmaxMaskScaleInterpState : public AutoGradCa...
      class FusedScaleTrilSoftmaxMaskScale (line 32) | class FusedScaleTrilSoftmaxMaskScale

FILE: oneflow/core/autograd/gradient_funcs/fused_self_attention.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type FusedSelfAttentionInterpState (line 23) | struct FusedSelfAttentionInterpState : public AutoGradCaptureState {
      class FusedSelfAttention (line 28) | class FusedSelfAttention : public OpExprGradFunction<FusedSelfAttent...
        method Init (line 30) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 37) | Maybe<void> Capture(FusedSelfAttentionInterpState* ctx, const Tens...
        method Apply (line 48) | Maybe<void> Apply(const FusedSelfAttentionInterpState* ctx, const ...

FILE: oneflow/core/autograd/gradient_funcs/fused_weighted_sum.cpp
  type oneflow (line 18) | namespace oneflow {
    type one (line 19) | namespace one {
      type FusedWeightedSumCaptureState (line 21) | struct FusedWeightedSumCaptureState : public AutoGradCaptureState {
      class FusedWeightedSum (line 27) | class FusedWeightedSum : public OpExprGradFunction<FusedWeightedSumC...
        method Init (line 29) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 31) | Maybe<void> Capture(FusedWeightedSumCaptureState* ctx, const Tenso...
        method Apply (line 41) | Maybe<void> Apply(const FusedWeightedSumCaptureState* ctx, const T...

FILE: oneflow/core/autograd/gradient_funcs/gather.cpp
  type oneflow (line 21) | namespace oneflow {
    type one (line 22) | namespace one {
      type GatherCaptureState (line 24) | struct GatherCaptureState : public AutoGradCaptureState {
      class Gather (line 29) | class Gather : public OpExprGradFunction<GatherCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/gather_nd.cpp
  type oneflow (line 19) | namespace oneflow {
    type one (line 20) | namespace one {
      type GatherNdCaptureState (line 22) | struct GatherNdCaptureState : public AutoGradCaptureState {
      class GatherNd (line 26) | class GatherNd : public OpExprGradFunction<GatherNdCaptureState> {
        method Init (line 28) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 30) | Maybe<void> Capture(GatherNdCaptureState* ctx, const TensorTuple& ...
        method Apply (line 42) | Maybe<void> Apply(const GatherNdCaptureState* ctx, const TensorTup...

FILE: oneflow/core/autograd/gradient_funcs/global_cast.cpp
  type oneflow (line 25) | namespace oneflow {
    type one (line 26) | namespace one {
      type CastGlobalCaptureState (line 28) | struct CastGlobalCaptureState : public AutoGradCaptureState {
      class LocalToGlobal (line 35) | class LocalToGlobal : public OpExprGradFunction<CastGlobalCaptureSta...
        method Init (line 37) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 45) | Maybe<void> Capture(CastGlobalCaptureState* ctx, const TensorTuple...
        method Apply (line 53) | Maybe<void> Apply(const CastGlobalCaptureState* ctx, const TensorT...
      class GlobalToLocal (line 77) | class GlobalToLocal : public OpExprGradFunction<CastGlobalCaptureSta...
        method Init (line 79) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 87) | Maybe<void> Capture(CastGlobalCaptureState* ctx, const TensorTuple...
        method Apply (line 100) | Maybe<void> Apply(const CastGlobalCaptureState* ctx, const TensorT...

FILE: oneflow/core/autograd/gradient_funcs/global_to_global.cpp
  type oneflow (line 25) | namespace oneflow {
    type one (line 26) | namespace one {
      type GlobalToGlobalState (line 28) | struct GlobalToGlobalState : public AutoGradCaptureState {
      class GlobalToGlobalGradFunction (line 33) | class GlobalToGlobalGradFunction : public OpExprGradFunction<GlobalT...
        method Init (line 35) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 42) | Maybe<void> Capture(GlobalToGlobalState* ctx, const TensorTuple& i...
        method Apply (line 51) | Maybe<void> Apply(const GlobalToGlobalState* ctx, const TensorTupl...

FILE: oneflow/core/autograd/gradient_funcs/gradient_accumulation.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type GradAccRepeatCaptureState (line 23) | struct GradAccRepeatCaptureState : public AutoGradCaptureState {
      class GradAccRepeat (line 27) | class GradAccRepeat : public OpExprGradFunction<GradAccRepeatCapture...
      type GradAccCollectCaptureState (line 63) | struct GradAccCollectCaptureState : public AutoGradCaptureState {
      class GradAccCollect (line 67) | class GradAccCollect : public OpExprGradFunction<GradAccCollectCaptu...
      type GradAccPackCaptureState (line 103) | struct GradAccPackCaptureState : public AutoGradCaptureState {
      class GradAccPack (line 107) | class GradAccPack : public OpExprGradFunction<GradAccPackCaptureStat...
      type GradAccUnpackCaptureState (line 143) | struct GradAccUnpackCaptureState : public AutoGradCaptureState {
      class GradAccUnpack (line 147) | class GradAccUnpack : public OpExprGradFunction<GradAccUnpackCapture...

FILE: oneflow/core/autograd/gradient_funcs/graph_feed_and_fetch.cpp
  type oneflow (line 19) | namespace oneflow {
    type one (line 20) | namespace one {
      type GraphFeedAndFetchCaptureState (line 22) | struct GraphFeedAndFetchCaptureState : public AutoGradCaptureState {
      class GraphFeedAndFetch (line 26) | class GraphFeedAndFetch : public OpExprGradFunction<GraphFeedAndFetc...
        method Init (line 28) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 30) | Maybe<void> Capture(GraphFeedAndFetchCaptureState* ctx, const Tens...
        method Apply (line 37) | Maybe<void> Apply(const GraphFeedAndFetchCaptureState* ctx, const ...

FILE: oneflow/core/autograd/gradient_funcs/grid_sample.cpp
  type oneflow (line 21) | namespace oneflow {
    type one (line 22) | namespace one {
      type GridSampleInterpState (line 24) | struct GridSampleInterpState : public AutoGradCaptureState {
      class GridSample (line 35) | class GridSample : public OpExprGradFunction<GridSampleInterpState> {
        method Init (line 37) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 44) | Maybe<void> Capture(GridSampleInterpState* ctx, const TensorTuple&...
        method Apply (line 62) | Maybe<void> Apply(const GridSampleInterpState* ctx, const TensorTu...

FILE: oneflow/core/autograd/gradient_funcs/group_norm.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type GroupNormCaptureState (line 23) | struct GroupNormCaptureState : public AutoGradCaptureState {
      class GroupNorm (line 38) | class GroupNorm : public OpExprGradFunction<GroupNormCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/identity.cpp
  type oneflow (line 19) | namespace oneflow {
    type one (line 20) | namespace one {
      type IdentityCaptureState (line 22) | struct IdentityCaptureState : public AutoGradCaptureState {
      class Identity (line 26) | class Identity : public OpExprGradFunction<IdentityCaptureState> {
        method Init (line 28) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 30) | Maybe<void> Capture(IdentityCaptureState* ctx, const TensorTuple& ...
        method Apply (line 37) | Maybe<void> Apply(const IdentityCaptureState* ctx, const TensorTup...

FILE: oneflow/core/autograd/gradient_funcs/inv.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type InvCaptureState (line 23) | struct InvCaptureState : public AutoGradCaptureState {
      class Inv (line 27) | class Inv : public OpExprGradFunction<InvCaptureState> {
        method Init (line 29) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 30) | Maybe<void> Capture(InvCaptureState* ctx, const TensorTuple& input...
        method Apply (line 36) | Maybe<void> Apply(const InvCaptureState* ctx, const TensorTuple& o...

FILE: oneflow/core/autograd/gradient_funcs/kl_div.cpp
  type oneflow (line 19) | namespace oneflow {
    type one (line 20) | namespace one {
      type KLDivLossCaptureState (line 22) | struct KLDivLossCaptureState : public AutoGradCaptureState {
      class KLDivLoss (line 28) | class KLDivLoss : public OpExprGradFunction<KLDivLossCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/l2_normalize.cpp
  type oneflow (line 22) | namespace oneflow {
    type one (line 23) | namespace one {
      type L2NormalizeCaptureState (line 25) | struct L2NormalizeCaptureState : public AutoGradCaptureState {
      class L2Normalize (line 31) | class L2Normalize : public OpExprGradFunction<L2NormalizeCaptureStat...

FILE: oneflow/core/autograd/gradient_funcs/layer_norm.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 24) | namespace one {
      type LayerNormCaptureState (line 26) | struct LayerNormCaptureState : public AutoGradCaptureState {
      class LayerNorm (line 47) | class LayerNorm : public OpExprGradFunction<LayerNormCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/lerp.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type LerpCaptureState (line 24) | struct LerpCaptureState : public AutoGradCaptureState {
      type ScalarLerpCaptureState (line 27) | struct ScalarLerpCaptureState : public AutoGradCaptureState {
      class LerpGrad (line 32) | class LerpGrad : public OpExprGradFunction<LerpCaptureState> {
        method Init (line 34) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 36) | Maybe<void> Capture(LerpCaptureState* ctx, const TensorTuple& inpu...
        method Apply (line 49) | Maybe<void> Apply(const LerpCaptureState* ctx, const TensorTuple& ...
      class ScalarLerpGrad (line 69) | class ScalarLerpGrad : public OpExprGradFunction<ScalarLerpCaptureSt...
        method Init (line 71) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 78) | Maybe<void> Capture(ScalarLerpCaptureState* ctx, const TensorTuple...
        method Apply (line 99) | Maybe<void> Apply(const ScalarLerpCaptureState* ctx, const TensorT...

FILE: oneflow/core/autograd/gradient_funcs/linalg_cross.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type LinalgCrossCaptureState (line 23) | struct LinalgCrossCaptureState : public AutoGradCaptureState {
      class LinalgCross (line 29) | class LinalgCross : public OpExprGradFunction<LinalgCrossCaptureStat...

FILE: oneflow/core/autograd/gradient_funcs/log_softmax.cpp
  type oneflow (line 21) | namespace oneflow {
    type one (line 22) | namespace one {
      type LogSoftmaxCaptureState (line 24) | struct LogSoftmaxCaptureState : public AutoGradCaptureState {
      class LogSoftmax (line 28) | class LogSoftmax : public OpExprGradFunction<LogSoftmaxCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/masked_fill.cpp
  type oneflow (line 21) | namespace oneflow {
    type one (line 22) | namespace one {
      type MaskedFillCaptureState (line 24) | struct MaskedFillCaptureState : public AutoGradCaptureState {
      class MaskedFill (line 28) | class MaskedFill : public OpExprGradFunction<MaskedFillCaptureState> {
        method Init (line 30) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 31) | Maybe<void> Capture(MaskedFillCaptureState* ctx, const TensorTuple...
        method Apply (line 40) | Maybe<void> Apply(const MaskedFillCaptureState* ctx, const TensorT...

FILE: oneflow/core/autograd/gradient_funcs/math_binary_op.cpp
  type oneflow (line 22) | namespace oneflow {
    type one (line 23) | namespace one {
      type BinaryMathCaptureState (line 25) | struct BinaryMathCaptureState : public AutoGradCaptureState {
      class BinaryMathOp (line 35) | class BinaryMathOp : public OpExprGradFunction<BinaryMathCaptureStat...
        method Init (line 36) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 38) | Maybe<void> Capture(BinaryMathCaptureState* ctx, const TensorTuple...
        method Apply (line 47) | Maybe<void> Apply(const BinaryMathCaptureState* ctx, const TensorT...

FILE: oneflow/core/autograd/gradient_funcs/math_unary_op.cpp
  type oneflow (line 22) | namespace oneflow {
    type one (line 23) | namespace one {
      type UnaryMathCaptureState (line 25) | struct UnaryMathCaptureState : public AutoGradCaptureState {
      class UnaryMathBwdWithDyXOp (line 33) | class UnaryMathBwdWithDyXOp : public OpExprGradFunction<UnaryMathCap...
        method Init (line 34) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 36) | Maybe<void> Capture(UnaryMathCaptureState* ctx, const TensorTuple&...
        method Apply (line 43) | Maybe<void> Apply(const UnaryMathCaptureState* ctx, const TensorTu...
      class UnaryMathBwdWithDyYOp (line 56) | class UnaryMathBwdWithDyYOp : public OpExprGradFunction<UnaryMathCap...
        method Init (line 57) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 59) | Maybe<void> Capture(UnaryMathCaptureState* ctx, const TensorTuple&...
        method Apply (line 66) | Maybe<void> Apply(const UnaryMathCaptureState* ctx, const TensorTu...
      class UnaryMathBwdWithFillZeroOp (line 78) | class UnaryMathBwdWithFillZeroOp : public OpExprGradFunction<UnaryMa...
        method Init (line 79) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 81) | Maybe<void> Capture(UnaryMathCaptureState* ctx, const TensorTuple&...
        method Apply (line 87) | Maybe<void> Apply(const UnaryMathCaptureState* ctx, const TensorTu...
      class NegativeOp (line 127) | class NegativeOp : public OpExprGradFunction<UnaryMathCaptureState> {
        method Init (line 128) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 130) | Maybe<void> Capture(UnaryMathCaptureState* ctx, const TensorTuple&...
        method Apply (line 136) | Maybe<void> Apply(const UnaryMathCaptureState* ctx, const TensorTu...

FILE: oneflow/core/autograd/gradient_funcs/matmul.cpp
  type oneflow (line 23) | namespace oneflow {
    type one (line 24) | namespace one {
      type MatmulCaptureState (line 26) | struct MatmulCaptureState : public AutoGradCaptureState {
      class Matmul (line 36) | class Matmul : public OpExprGradFunction<MatmulCaptureState> {
      type BroadcastMatmulCaptureState (line 106) | struct BroadcastMatmulCaptureState : public AutoGradCaptureState {
      class BroadcastMatmul (line 119) | class BroadcastMatmul : public OpExprGradFunction<BroadcastMatmulCap...

FILE: oneflow/core/autograd/gradient_funcs/matrix_vector_product.cpp
  type oneflow (line 23) | namespace oneflow {
    type one (line 24) | namespace one {
      type MatrixVectorProductCaptureState (line 26) | struct MatrixVectorProductCaptureState : public AutoGradCaptureState {
      class MatrixVectorProduct (line 33) | class MatrixVectorProduct : public OpExprGradFunction<MatrixVectorPr...

FILE: oneflow/core/autograd/gradient_funcs/max_pool.cpp
  type oneflow (line 24) | namespace oneflow {
    type one (line 25) | namespace one {
      type MaxPoolCaptureState (line 29) | struct MaxPoolCaptureState : public AutoGradCaptureState {
      class MaxPoolNdGrad (line 43) | class MaxPoolNdGrad : public OpExprGradFunction<MaxPoolCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/max_unpool.cpp
  type oneflow (line 19) | namespace oneflow {
    type one (line 20) | namespace one {
      type MaxUnpoolCaptureState (line 24) | struct MaxUnpoolCaptureState : public AutoGradCaptureState {
      class MaxUnpoolNdGrad (line 33) | class MaxUnpoolNdGrad : public OpExprGradFunction<MaxUnpoolCaptureSt...

FILE: oneflow/core/autograd/gradient_funcs/median.cpp
  type oneflow (line 24) | namespace oneflow {
    type one (line 25) | namespace one {
      type MedianCaptureState (line 27) | struct MedianCaptureState : public AutoGradCaptureState {
      class Median (line 31) | class Median : public OpExprGradFunction<MedianCaptureState> {
        method Init (line 33) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 34) | Maybe<void> Capture(MedianCaptureState* ctx, const TensorTuple& in...
        method Apply (line 43) | Maybe<void> Apply(const MedianCaptureState* ctx, const TensorTuple...
      type MedianWithIndicesCaptureState (line 72) | struct MedianWithIndicesCaptureState : public AutoGradCaptureState {
      class MedianWithIndices (line 76) | class MedianWithIndices : public OpExprGradFunction<MedianWithIndice...
        method Init (line 78) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 79) | Maybe<void> Capture(MedianWithIndicesCaptureState* ctx, const Tens...
        method Apply (line 88) | Maybe<void> Apply(const MedianWithIndicesCaptureState* ctx, const ...

FILE: oneflow/core/autograd/gradient_funcs/mode.cpp
  type oneflow (line 24) | namespace oneflow {
    type one (line 25) | namespace one {
      type ModeCaptureState (line 27) | struct ModeCaptureState : public AutoGradCaptureState {
      class Mode (line 31) | class Mode : public OpExprGradFunction<ModeCaptureState> {
        method Init (line 33) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 34) | Maybe<void> Capture(ModeCaptureState* ctx, const TensorTuple& inpu...
        method Apply (line 43) | Maybe<void> Apply(const ModeCaptureState* ctx, const TensorTuple& ...

FILE: oneflow/core/autograd/gradient_funcs/narrow.cpp
  type oneflow (line 23) | namespace oneflow {
    type one (line 24) | namespace one {
      type NarrowCaptureState (line 26) | struct NarrowCaptureState : public AutoGradCaptureState {
      class Narrow (line 34) | class Narrow : public OpExprGradFunction<NarrowCaptureState> {
        method Init (line 36) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 43) | Maybe<void> Capture(NarrowCaptureState* ctx, const TensorTuple& in...
        method Apply (line 62) | Maybe<void> Apply(const NarrowCaptureState* ctx, const TensorTuple...

FILE: oneflow/core/autograd/gradient_funcs/nll.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 22) | namespace one {
      type NLLCaptureState (line 24) | struct NLLCaptureState : public AutoGradCaptureState {
      class NLLGradFunction (line 29) | class NLLGradFunction : public OpExprGradFunction<NLLCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/noncontiguous_binary_op.cpp
  type oneflow (line 26) | namespace oneflow {
    type one (line 27) | namespace one {
      type NonContiguousBinaryOpCaptureState (line 29) | struct NonContiguousBinaryOpCaptureState : public AutoGradCaptureSta...
      class NonContiguousBinaryOp (line 36) | class NonContiguousBinaryOp : public OpExprGradFunction<NonContiguou...

FILE: oneflow/core/autograd/gradient_funcs/normalization.cpp
  type oneflow (line 21) | namespace oneflow {
    type one (line 22) | namespace one {
      type NormalizationGradCaptureState (line 24) | struct NormalizationGradCaptureState : public AutoGradCaptureState {
      class NormalizationGrad (line 42) | class NormalizationGrad : public OpExprGradFunction<NormalizationGra...
        method Init (line 44) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 51) | Maybe<void> Capture(NormalizationGradCaptureState* ctx, const Tens...
        method Apply (line 90) | Maybe<void> Apply(const NormalizationGradCaptureState* ctx, const ...

FILE: oneflow/core/autograd/gradient_funcs/normalization_add_relu.cpp
  type oneflow (line 21) | namespace oneflow {
    type one (line 22) | namespace one {
      type NormalizationAddReluGradCaptureState (line 24) | struct NormalizationAddReluGradCaptureState : public AutoGradCapture...
      class NormalizationAddReluGrad (line 45) | class NormalizationAddReluGrad : public OpExprGradFunction<Normaliza...
        method Init (line 47) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 54) | Maybe<void> Capture(NormalizationAddReluGradCaptureState* ctx, con...
        method Apply (line 127) | Maybe<void> Apply(const NormalizationAddReluGradCaptureState* ctx,...

FILE: oneflow/core/autograd/gradient_funcs/one_embedding_fused_lookup.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type OneEmbeddingFusedLookupCaptureState (line 23) | struct OneEmbeddingFusedLookupCaptureState : public AutoGradCaptureS...
      class OneEmbeddingFusedLookup (line 33) | class OneEmbeddingFusedLookup : public OpExprGradFunction<OneEmbeddi...
        method Init (line 35) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 37) | Maybe<void> Capture(OneEmbeddingFusedLookupCaptureState* ctx, cons...
        method Apply (line 50) | Maybe<void> Apply(const OneEmbeddingFusedLookupCaptureState* ctx, ...

FILE: oneflow/core/autograd/gradient_funcs/padding.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type PadNdCaptureState (line 23) | struct PadNdCaptureState : public AutoGradCaptureState {
      class PadNd (line 28) | class PadNd : public OpExprGradFunction<PadNdCaptureState> {
        method Init (line 30) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 37) | Maybe<void> Capture(PadNdCaptureState* ctx, const TensorTuple& inp...
      class ReflectionPadNd (line 53) | class ReflectionPadNd : public PadNd {
        method Apply (line 55) | Maybe<void> Apply(const PadNdCaptureState* ctx, const TensorTuple&...
      class ReplicationPadNd (line 67) | class ReplicationPadNd : public PadNd {
        method Apply (line 69) | Maybe<void> Apply(const PadNdCaptureState* ctx, const TensorTuple&...
      type ConstantPadNdCaptureState (line 81) | struct ConstantPadNdCaptureState : public AutoGradCaptureState {
      class ConstantPadNd (line 86) | class ConstantPadNd : public OpExprGradFunction<ConstantPadNdCapture...
        method Init (line 88) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 95) | Maybe<void> Capture(ConstantPadNdCaptureState* ctx, const TensorTu...
        method Apply (line 108) | Maybe<void> Apply(const ConstantPadNdCaptureState* ctx, const Tens...

FILE: oneflow/core/autograd/gradient_funcs/partial_fc_sample.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type PartialFCSampleState (line 23) | struct PartialFCSampleState : public AutoGradCaptureState {
      class PartialFCSample (line 29) | class PartialFCSample : public OpExprGradFunction<PartialFCSampleSta...

FILE: oneflow/core/autograd/gradient_funcs/reduce_ops.cpp
  type oneflow (line 23) | namespace oneflow {
    type one (line 24) | namespace one {
      type ReduceSumCaptureState (line 26) | struct ReduceSumCaptureState : public AutoGradCaptureState {
      class ReduceSum (line 30) | class ReduceSum : public OpExprGradFunction<ReduceSumCaptureState> {
      type ReduceProdOpInterpState (line 69) | struct ReduceProdOpInterpState : public AutoGradCaptureState {
      class ReduceProdOp (line 74) | class ReduceProdOp : public OpExprGradFunction<ReduceProdOpInterpSta...
      type ReduceMaxOrMinCaptureState (line 122) | struct ReduceMaxOrMinCaptureState : public AutoGradCaptureState {
      class ReduceMaxOrMin (line 127) | class ReduceMaxOrMin : public OpExprGradFunction<ReduceMaxOrMinCaptu...

FILE: oneflow/core/autograd/gradient_funcs/reduce_sum_like.cpp
  type oneflow (line 23) | namespace oneflow {
    type one (line 24) | namespace one {
      type ReduceSumLikeCaptureState (line 26) | struct ReduceSumLikeCaptureState : public AutoGradCaptureState {
      class ReduceSumLike (line 31) | class ReduceSumLike : public OpExprGradFunction<ReduceSumLikeCapture...

FILE: oneflow/core/autograd/gradient_funcs/reshape.cpp
  type oneflow (line 24) | namespace oneflow {
    type one (line 25) | namespace one {
      type ReshapeCaptureState (line 27) | struct ReshapeCaptureState : public AutoGradCaptureState {
      class ReshapeGrad (line 31) | class ReshapeGrad : public OpExprGradFunction<ReshapeCaptureState> {
        method Init (line 33) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 39) | Maybe<void> Capture(ReshapeCaptureState* ctx, const TensorTuple& i...
        method Apply (line 45) | Maybe<void> Apply(const ReshapeCaptureState* ctx, const TensorTupl...
      class ReshapeLikeGrad (line 54) | class ReshapeLikeGrad : public OpExprGradFunction<ReshapeCaptureStat...
        method Init (line 56) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 62) | Maybe<void> Capture(ReshapeCaptureState* ctx, const TensorTuple& i...
        method Apply (line 71) | Maybe<void> Apply(const ReshapeCaptureState* ctx, const TensorTupl...

FILE: oneflow/core/autograd/gradient_funcs/rms_norm.cpp
  type oneflow (line 19) | namespace oneflow {
    type one (line 20) | namespace one {
      type RMSNormCaptureState (line 22) | struct RMSNormCaptureState : public AutoGradCaptureState {
      class RMSNormGrad (line 30) | class RMSNormGrad : public OpExprGradFunction<RMSNormCaptureState> {
        method Init (line 32) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...

FILE: oneflow/core/autograd/gradient_funcs/roi_align.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type RoiAlignCaptureState (line 23) | struct RoiAlignCaptureState : public AutoGradCaptureState {
      class RoiAlign (line 32) | class RoiAlign : public OpExprGradFunction<RoiAlignCaptureState> {
        method Init (line 34) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 41) | Maybe<void> Capture(RoiAlignCaptureState* ctx, const TensorTuple& ...
        method Apply (line 58) | Maybe<void> Apply(const RoiAlignCaptureState* ctx, const TensorTup...

FILE: oneflow/core/autograd/gradient_funcs/roll.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type RollCaptureState (line 23) | struct RollCaptureState : public AutoGradCaptureState {
      class Roll (line 29) | class Roll : public OpExprGradFunction<RollCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/rrelu.cpp
  type oneflow (line 19) | namespace oneflow {
    type one (line 20) | namespace one {
      type RReluCaptureState (line 22) | struct RReluCaptureState : public AutoGradCaptureState {
      class RRelu (line 31) | class RRelu : public OpExprGradFunction<RReluCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/scalar_add.cpp
  type oneflow (line 19) | namespace oneflow {
    type one (line 20) | namespace one {
      type ScalarAddCaptureState (line 22) | struct ScalarAddCaptureState : public AutoGradCaptureState {
      class ScalarAdd (line 26) | class ScalarAdd : public OpExprGradFunction<ScalarAddCaptureState> {
        method Init (line 28) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 30) | Maybe<void> Capture(ScalarAddCaptureState* ctx, const TensorTuple&...
        method Apply (line 37) | Maybe<void> Apply(const ScalarAddCaptureState* ctx, const TensorTu...

FILE: oneflow/core/autograd/gradient_funcs/scalar_div.cpp
  type oneflow (line 21) | namespace oneflow {
    type one (line 22) | namespace one {
      type ScalarDivCaptureState (line 24) | struct ScalarDivCaptureState : public AutoGradCaptureState {
      class ScalarDiv (line 29) | class ScalarDiv : public OpExprGradFunction<ScalarDivCaptureState> {
        method Init (line 31) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 38) | Maybe<void> Capture(ScalarDivCaptureState* ctx, const TensorTuple&...
        method Apply (line 53) | Maybe<void> Apply(const ScalarDivCaptureState* ctx, const TensorTu...

FILE: oneflow/core/autograd/gradient_funcs/scalar_floordiv.cpp
  type oneflow (line 21) | namespace oneflow {
    type one (line 22) | namespace one {
      type ScalarFloorDivCaptureState (line 24) | struct ScalarFloorDivCaptureState : public AutoGradCaptureState {
      class ScalarFloorDiv (line 28) | class ScalarFloorDiv : public OpExprGradFunction<ScalarFloorDivCaptu...
        method Init (line 30) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 32) | Maybe<void> Capture(ScalarFloorDivCaptureState* ctx, const TensorT...
        method Apply (line 39) | Maybe<void> Apply(const ScalarFloorDivCaptureState* ctx, const Ten...

FILE: oneflow/core/autograd/gradient_funcs/scalar_fmod.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type ScalarFModGradCaptureState (line 23) | struct ScalarFModGradCaptureState : public AutoGradCaptureState {
      class ScalarFModGrad (line 27) | class ScalarFModGrad : public OpExprGradFunction<ScalarFModGradCaptu...
        method Init (line 29) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 31) | Maybe<void> Capture(ScalarFModGradCaptureState* ctx, const TensorT...
        method Apply (line 38) | Maybe<void> Apply(const ScalarFModGradCaptureState* ctx, const Ten...

FILE: oneflow/core/autograd/gradient_funcs/scalar_mul.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type ScalarMulCaptureState (line 23) | struct ScalarMulCaptureState : public AutoGradCaptureState {
      class ScalarMul (line 28) | class ScalarMul : public OpExprGradFunction<ScalarMulCaptureState> {
        method Init (line 30) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 37) | Maybe<void> Capture(ScalarMulCaptureState* ctx, const TensorTuple&...
        method Apply (line 52) | Maybe<void> Apply(const ScalarMulCaptureState* ctx, const TensorTu...

FILE: oneflow/core/autograd/gradient_funcs/scalar_pow.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type ScalarPowCaptureState (line 23) | struct ScalarPowCaptureState : public AutoGradCaptureState {
      class ScalarPow (line 28) | class ScalarPow : public OpExprGradFunction<ScalarPowCaptureState> {
        method Init (line 30) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 37) | Maybe<void> Capture(ScalarPowCaptureState* ctx, const TensorTuple&...
        method Apply (line 55) | Maybe<void> Apply(const ScalarPowCaptureState* ctx, const TensorTu...
      class ScalarReversePow (line 71) | class ScalarReversePow : public OpExprGradFunction<ScalarPowCaptureS...
        method Init (line 73) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 80) | Maybe<void> Capture(ScalarPowCaptureState* ctx, const TensorTuple&...
        method Apply (line 98) | Maybe<void> Apply(const ScalarPowCaptureState* ctx, const TensorTu...

FILE: oneflow/core/autograd/gradient_funcs/scalar_truncdiv.cpp
  type oneflow (line 21) | namespace oneflow {
    type one (line 22) | namespace one {
      type ScalarTruncDivCaptureState (line 24) | struct ScalarTruncDivCaptureState : public AutoGradCaptureState {
      class ScalarTruncDiv (line 28) | class ScalarTruncDiv : public OpExprGradFunction<ScalarTruncDivCaptu...
        method Init (line 30) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 32) | Maybe<void> Capture(ScalarTruncDivCaptureState* ctx, const TensorT...
        method Apply (line 39) | Maybe<void> Apply(const ScalarTruncDivCaptureState* ctx, const Ten...

FILE: oneflow/core/autograd/gradient_funcs/scaled_dot_product_attention.cpp
  type oneflow (line 25) | namespace oneflow {
    type one (line 27) | namespace one {
      type ScaledDotProductFlashAttentionCaptureState (line 29) | struct ScaledDotProductFlashAttentionCaptureState : public AutoGradC...
      class ScaledDotProductFlashAttention (line 44) | class ScaledDotProductFlashAttention
        method Init (line 47) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 54) | Maybe<void> Capture(ScaledDotProductFlashAttentionCaptureState* ct...
        method Apply (line 73) | Maybe<void> Apply(const ScaledDotProductFlashAttentionCaptureState...

FILE: oneflow/core/autograd/gradient_funcs/scatter_nd.cpp
  type oneflow (line 19) | namespace oneflow {
    type one (line 20) | namespace one {
      type ScatterNdCaptureState (line 22) | struct ScatterNdCaptureState : public AutoGradCaptureState {
      class ScatterNd (line 26) | class ScatterNd : public OpExprGradFunction<ScatterNdCaptureState> {
        method Init (line 28) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 30) | Maybe<void> Capture(ScatterNdCaptureState* ctx, const TensorTuple&...
        method Apply (line 41) | Maybe<void> Apply(const ScatterNdCaptureState* ctx, const TensorTu...

FILE: oneflow/core/autograd/gradient_funcs/select_top_n.cpp
  type oneflow (line 22) | namespace oneflow {
    type one (line 23) | namespace one {
      type SelectTopNCaptureState (line 25) | struct SelectTopNCaptureState : public AutoGradCaptureState {
      class SelectTopN (line 31) | class SelectTopN : public OpExprGradFunction<SelectTopNCaptureState> {
        method Init (line 33) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 35) | Maybe<void> Capture(SelectTopNCaptureState* ctx, const TensorTuple...
        method Apply (line 46) | Maybe<void> Apply(const SelectTopNCaptureState* ctx, const TensorT...

FILE: oneflow/core/autograd/gradient_funcs/slice.cpp
  type oneflow (line 22) | namespace oneflow {
    type one (line 23) | namespace one {
      type SliceCaptureState (line 25) | struct SliceCaptureState : public AutoGradCaptureState {
      class Slice (line 32) | class Slice : public OpExprGradFunction<SliceCaptureState> {
        method Init (line 34) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 41) | Maybe<void> Capture(SliceCaptureState* ctx, const TensorTuple& inp...
        method Apply (line 54) | Maybe<void> Apply(const SliceCaptureState* ctx, const TensorTuple&...
      type SliceUpdateCaptureState (line 66) | struct SliceUpdateCaptureState : public AutoGradCaptureState {
      class SliceUpdate (line 76) | class SliceUpdate : public OpExprGradFunction<SliceUpdateCaptureStat...
        method Init (line 78) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 86) | Maybe<void> Capture(SliceUpdateCaptureState* ctx, const TensorTupl...
        method Apply (line 106) | Maybe<void> Apply(const SliceUpdateCaptureState* ctx, const Tensor...

FILE: oneflow/core/autograd/gradient_funcs/smooth_l1_loss.cpp
  type oneflow (line 21) | namespace oneflow {
    type one (line 22) | namespace one {
      type SmoothL1LossCaptureState (line 24) | struct SmoothL1LossCaptureState : public AutoGradCaptureState {
      class SmoothL1Loss (line 30) | class SmoothL1Loss : public OpExprGradFunction<SmoothL1LossCaptureSt...
        method Init (line 32) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 39) | Maybe<void> Capture(SmoothL1LossCaptureState* ctx, const TensorTup...
        method Apply (line 54) | Maybe<void> Apply(const SmoothL1LossCaptureState* ctx, const Tenso...

FILE: oneflow/core/autograd/gradient_funcs/softmax.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type SoftmaxCaptureState (line 23) | struct SoftmaxCaptureState : public AutoGradCaptureState {
      class Softmax (line 27) | class Softmax : public OpExprGradFunction<SoftmaxCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/softmax_cross_entropy.cpp
  type oneflow (line 19) | namespace oneflow {
    type one (line 20) | namespace one {
      type SoftmaxCrossEntropyGradState (line 22) | struct SoftmaxCrossEntropyGradState : public AutoGradCaptureState {
      class SoftmaxCrossEntropy (line 26) | class SoftmaxCrossEntropy : public OpExprGradFunction<SoftmaxCrossEn...

FILE: oneflow/core/autograd/gradient_funcs/sparse_cross_entropy.cpp
  type oneflow (line 21) | namespace oneflow {
    type one (line 22) | namespace one {
      type SparseCrossEntropyCaptureState (line 24) | struct SparseCrossEntropyCaptureState : public AutoGradCaptureState {
      class SparseCrossEntropy (line 32) | class SparseCrossEntropy : public OpExprGradFunction<SparseCrossEntr...
        method Init (line 34) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 41) | Maybe<void> Capture(SparseCrossEntropyCaptureState* ctx, const Ten...
        method Apply (line 54) | Maybe<void> Apply(const SparseCrossEntropyCaptureState* ctx, const...

FILE: oneflow/core/autograd/gradient_funcs/sparse_softmax_cross_entropy.cpp
  type oneflow (line 22) | namespace oneflow {
    type one (line 23) | namespace one {
      type SparseSoftmaxCrossEntropyCaptureState (line 25) | struct SparseSoftmaxCrossEntropyCaptureState : public AutoGradCaptur...
      class SparseSoftmaxCrossEntropy (line 29) | class SparseSoftmaxCrossEntropy : public OpExprGradFunction<SparseSo...

FILE: oneflow/core/autograd/gradient_funcs/sparse_softmax_cross_entropy_ms.cpp
  type oneflow (line 22) | namespace oneflow {
    type one (line 23) | namespace one {
      type SparseSoftmaxCrossEntropyMsCaptureState (line 25) | struct SparseSoftmaxCrossEntropyMsCaptureState : public AutoGradCapt...
      class SparseSoftmaxCrossEntropyMs (line 29) | class SparseSoftmaxCrossEntropyMs

FILE: oneflow/core/autograd/gradient_funcs/split_like.cpp
  type oneflow (line 22) | namespace oneflow {
    type one (line 23) | namespace one {
      type SplitLikeCaptureState (line 25) | struct SplitLikeCaptureState : public AutoGradCaptureState {
      class SplitLike (line 30) | class SplitLike : public OpExprGradFunction<SplitLikeCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/squeeze.cpp
  type oneflow (line 22) | namespace oneflow {
    type one (line 23) | namespace one {
      type SqueezeCaptureState (line 25) | struct SqueezeCaptureState : public AutoGradCaptureState {
      class Squeeze (line 29) | class Squeeze : public OpExprGradFunction<SqueezeCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/stack.cpp
  type oneflow (line 22) | namespace oneflow {
    type one (line 23) | namespace one {
      type StackCaptureState (line 25) | struct StackCaptureState : public AutoGradCaptureState {
      class Stack (line 31) | class Stack : public OpExprGradFunction<StackCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/tensor_scalar_binary.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type TensorScalarCaptureState (line 23) | struct TensorScalarCaptureState : public AutoGradCaptureState {
      class TensorScalarAddOrSub (line 28) | class TensorScalarAddOrSub : public OpExprGradFunction<TensorScalarC...
        method TensorScalarAddOrSub (line 30) | TensorScalarAddOrSub() = default;
      class TensorScalarAdd (line 51) | class TensorScalarAdd : public TensorScalarAddOrSub {
        method Apply (line 53) | Maybe<void> Apply(const TensorScalarCaptureState* ctx, const Tenso...
      class TensorScalarSub (line 67) | class TensorScalarSub : public TensorScalarAddOrSub {
        method Apply (line 69) | Maybe<void> Apply(const TensorScalarCaptureState* ctx, const Tenso...
      class TensorScalarMul (line 88) | class TensorScalarMul : public OpExprGradFunction<TensorScalarCaptur...
      class TensorScalarDiv (line 132) | class TensorScalarDiv : public OpExprGradFunction<TensorScalarCaptur...

FILE: oneflow/core/autograd/gradient_funcs/tensor_scatter_nd_update.cpp
  type oneflow (line 19) | namespace oneflow {
    type one (line 20) | namespace one {
      type TensorScatterNdUpdateCaptureState (line 22) | struct TensorScatterNdUpdateCaptureState : public AutoGradCaptureSta...
      class TensorScatterNdUpdate (line 27) | class TensorScatterNdUpdate : public OpExprGradFunction<TensorScatte...
        method Init (line 29) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 31) | Maybe<void> Capture(TensorScatterNdUpdateCaptureState* ctx, const ...
        method Apply (line 46) | Maybe<void> Apply(const TensorScatterNdUpdateCaptureState* ctx, co...

FILE: oneflow/core/autograd/gradient_funcs/tf_pool.cpp
  type oneflow (line 23) | namespace oneflow {
    type one (line 24) | namespace one {
      type TFPoolCaptureState (line 28) | struct TFPoolCaptureState : public AutoGradCaptureState {
      class TFPoolNdGrad (line 42) | class TFPoolNdGrad : public OpExprGradFunction<TFPoolCaptureState> {
      class TFMaxPoolNdGrad (line 105) | class TFMaxPoolNdGrad final : public TFPoolNdGrad {
        method Init (line 107) | Maybe<void> Init(const OpExpr& op) override { return TFPoolNdGrad:...
      class TFAvgPoolNdGrad (line 114) | class TFAvgPoolNdGrad final : public TFPoolNdGrad {
        method Init (line 116) | Maybe<void> Init(const OpExpr& op) override { return TFPoolNdGrad:...

FILE: oneflow/core/autograd/gradient_funcs/to_contiguous.cpp
  type oneflow (line 18) | namespace oneflow {
    type one (line 19) | namespace one {
      type ToContiguousCaptureState (line 21) | struct ToContiguousCaptureState : public AutoGradCaptureState {
      class ToContiguous (line 25) | class ToContiguous : public OpExprGradFunction<ToContiguousCaptureSt...
        method Init (line 27) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 29) | Maybe<void> Capture(ToContiguousCaptureState* ctx, const TensorTup...
        method Apply (line 36) | Maybe<void> Apply(const ToContiguousCaptureState* ctx, const Tenso...

FILE: oneflow/core/autograd/gradient_funcs/transpose.cpp
  type oneflow (line 22) | namespace oneflow {
    type one (line 23) | namespace one {
      type TransposeCaptureState (line 25) | struct TransposeCaptureState : public AutoGradCaptureState {
      class Transpose (line 30) | class Transpose : public OpExprGradFunction<TransposeCaptureState> {
      function FOR_RANGE (line 65) | FOR_RANGE(int32_t, i, 0, ctx->perm.size()) { grad_perm.at(ctx->perm....

FILE: oneflow/core/autograd/gradient_funcs/tril.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type TrilCaptureState (line 23) | struct TrilCaptureState : public AutoGradCaptureState {
      class Tril (line 28) | class Tril : public OpExprGradFunction<TrilCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/triu.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type TriuCaptureState (line 23) | struct TriuCaptureState : public AutoGradCaptureState {
      class Triu (line 28) | class Triu : public OpExprGradFunction<TriuCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/trunc.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type TruncCaptureState (line 23) | struct TruncCaptureState : public AutoGradCaptureState {
      class Trunc (line 27) | class Trunc : public OpExprGradFunction<TruncCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/two_stage_reduce.cpp
  type oneflow (line 21) | namespace oneflow {
    type one (line 22) | namespace one {
      type ReduceMode (line 24) | enum class ReduceMode : int32_t {
      type ReduceDeviceCaptureState (line 29) | struct ReduceDeviceCaptureState : public AutoGradCaptureState {
      class ReduceDevice (line 37) | class ReduceDevice : public OpExprGradFunction<ReduceDeviceCaptureSt...
        method Init (line 39) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 46) | Maybe<void> Capture(ReduceDeviceCaptureState* ctx, const TensorTup...
        method Apply (line 59) | Maybe<void> Apply(const ReduceDeviceCaptureState* ctx, const Tenso...
      type ReduceGlobalCaptureState (line 84) | struct ReduceGlobalCaptureState : public AutoGradCaptureState {
      class ReduceGlobal (line 93) | class ReduceGlobal : public OpExprGradFunction<ReduceGlobalCaptureSt...
        method Init (line 95) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 102) | Maybe<void> Capture(ReduceGlobalCaptureState* ctx, const TensorTup...
        method Apply (line 117) | Maybe<void> Apply(const ReduceGlobalCaptureState* ctx, const Tenso...

FILE: oneflow/core/autograd/gradient_funcs/unfold.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type UnfoldInterpState (line 23) | struct UnfoldInterpState : public AutoGradCaptureState {
      class Unfold (line 33) | class Unfold : public OpExprGradFunction<UnfoldInterpState> {

FILE: oneflow/core/autograd/gradient_funcs/unfold_tensor.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 22) | namespace one {
      type UnfoldTensorCaptureState (line 24) | struct UnfoldTensorCaptureState : public AutoGradCaptureState {
      class UnfoldTensor (line 31) | class UnfoldTensor : public OpExprGradFunction<UnfoldTensorCaptureSt...

FILE: oneflow/core/autograd/gradient_funcs/unsqueeze.cpp
  type oneflow (line 23) | namespace oneflow {
    type one (line 24) | namespace one {
      type UnsqueezeCaptureState (line 26) | struct UnsqueezeCaptureState : public AutoGradCaptureState {
      class Unsqueeze (line 31) | class Unsqueeze : public OpExprGradFunction<UnsqueezeCaptureState> {

FILE: oneflow/core/autograd/gradient_funcs/upsample.cpp
  type oneflow (line 21) | namespace oneflow {
    type one (line 22) | namespace one {
      type UpsampleCaptureState (line 24) | struct UpsampleCaptureState : public AutoGradCaptureState {
      class Upsample (line 33) | class Upsample : public OpExprGradFunction<UpsampleCaptureState> {
      type UpsampleNearest2DCaptureState (line 82) | struct UpsampleNearest2DCaptureState : public AutoGradCaptureState {
      class UpsampleNearest2D (line 90) | class UpsampleNearest2D : public OpExprGradFunction<UpsampleNearest2...
        method Init (line 92) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 94) | Maybe<void> Capture(UpsampleNearest2DCaptureState* ctx, const Tens...
        method Apply (line 111) | Maybe<void> Apply(const UpsampleNearest2DCaptureState* ctx, const ...
      type UpsampleBilinear2DCaptureState (line 130) | struct UpsampleBilinear2DCaptureState : public AutoGradCaptureState {
      class UpsampleBilinear2D (line 139) | class UpsampleBilinear2D : public OpExprGradFunction<UpsampleBilinea...
        method Init (line 141) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 143) | Maybe<void> Capture(UpsampleBilinear2DCaptureState* ctx, const Ten...
        method Apply (line 161) | Maybe<void> Apply(const UpsampleBilinear2DCaptureState* ctx, const...
      type UpsampleLinear1DCaptureState (line 180) | struct UpsampleLinear1DCaptureState : public AutoGradCaptureState {
      class UpsampleLinear1D (line 188) | class UpsampleLinear1D : public OpExprGradFunction<UpsampleLinear1DC...
        method Init (line 190) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 192) | Maybe<void> Capture(UpsampleLinear1DCaptureState* ctx, const Tenso...
        method Apply (line 209) | Maybe<void> Apply(const UpsampleLinear1DCaptureState* ctx, const T...
      type UpsampleNearest1DCaptureState (line 228) | struct UpsampleNearest1DCaptureState : public AutoGradCaptureState {
      class UpsampleNearest1D (line 235) | class UpsampleNearest1D : public OpExprGradFunction<UpsampleNearest1...
        method Init (line 237) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 239) | Maybe<void> Capture(UpsampleNearest1DCaptureState* ctx, const Tens...
        method Apply (line 255) | Maybe<void> Apply(const UpsampleNearest1DCaptureState* ctx, const ...
      type UpsampleBicubic2DCaptureState (line 274) | struct UpsampleBicubic2DCaptureState : public AutoGradCaptureState {
      class UpsampleBicubic2D (line 283) | class UpsampleBicubic2D : public OpExprGradFunction<UpsampleBicubic2...
        method Init (line 285) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 287) | Maybe<void> Capture(UpsampleBicubic2DCaptureState* ctx, const Tens...
        method Apply (line 305) | Maybe<void> Apply(const UpsampleBicubic2DCaptureState* ctx, const ...
      type UpsampleNearest3DCaptureState (line 323) | struct UpsampleNearest3DCaptureState : public AutoGradCaptureState {
      class UpsampleNearest3D (line 332) | class UpsampleNearest3D : public OpExprGradFunction<UpsampleNearest3...
        method Init (line 334) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 336) | Maybe<void> Capture(UpsampleNearest3DCaptureState* ctx, const Tens...
        method Apply (line 354) | Maybe<void> Apply(const UpsampleNearest3DCaptureState* ctx, const ...
      type UpsampleTrilinear3DCaptureState (line 373) | struct UpsampleTrilinear3DCaptureState : public AutoGradCaptureState {
      class UpsampleTrilinear3D (line 383) | class UpsampleTrilinear3D : public OpExprGradFunction<UpsampleTrilin...
        method Init (line 385) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 387) | Maybe<void> Capture(UpsampleTrilinear3DCaptureState* ctx, const Te...
        method Apply (line 406) | Maybe<void> Apply(const UpsampleTrilinear3DCaptureState* ctx, cons...

FILE: oneflow/core/autograd/gradient_funcs/variance.cpp
  type oneflow (line 23) | namespace oneflow {
    type one (line 24) | namespace one {
      type VarianceState (line 26) | struct VarianceState : public AutoGradCaptureState {
        method VarianceState (line 27) | VarianceState() : requires_grad(false), unbiased(true), keepdim(fa...
      class Variance (line 34) | class Variance : public OpExprGradFunction<VarianceState> {

FILE: oneflow/core/autograd/gradient_funcs/vector_matrix_product.cpp
  type oneflow (line 23) | namespace oneflow {
    type one (line 24) | namespace one {
      type VectorMatrixProductCaptureState (line 26) | struct VectorMatrixProductCaptureState : public AutoGradCaptureState {
      class VectorMatrixProduct (line 33) | class VectorMatrixProduct : public OpExprGradFunction<VectorMatrixPr...

FILE: oneflow/core/autograd/gradient_funcs/where.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type WhereCaptureState (line 23) | struct WhereCaptureState : public AutoGradCaptureState {
      class Where (line 32) | class Where : public OpExprGradFunction<WhereCaptureState> {

FILE: oneflow/core/autograd/higher_order_gradient_funcs/activation.cpp
  type oneflow (line 24) | namespace oneflow {
    type one (line 25) | namespace one {
      type BaseActivationGradGradCaptureState (line 27) | struct BaseActivationGradGradCaptureState : public AutoGradCaptureSt...
      class NoParamActivationGradGrad (line 36) | class NoParamActivationGradGrad : public OpExprGradFunction<BaseActi...
        method Init (line 38) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 40) | Maybe<void> Capture(BaseActivationGradGradCaptureState* ctx, const...
        method Apply (line 57) | Maybe<void> Apply(const BaseActivationGradGradCaptureState* ctx, c...
      type HardShrinkGradGradCaptureState (line 88) | struct HardShrinkGradGradCaptureState : public AutoGradCaptureState {
      class HardShrinkGradGrad (line 94) | class HardShrinkGradGrad : public OpExprGradFunction<HardShrinkGradG...
        method Init (line 96) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 102) | Maybe<void> Capture(HardShrinkGradGradCaptureState* ctx, const Ten...
        method Apply (line 118) | Maybe<void> Apply(const HardShrinkGradGradCaptureState* ctx, const...
      type SoftShrinkGradGradCaptureState (line 134) | struct SoftShrinkGradGradCaptureState : public AutoGradCaptureState {
      class SoftShrinkGradGrad (line 140) | class SoftShrinkGradGrad : public OpExprGradFunction<SoftShrinkGradG...
        method Init (line 142) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 148) | Maybe<void> Capture(SoftShrinkGradGradCaptureState* ctx, const Ten...
        method Apply (line 164) | Maybe<void> Apply(const SoftShrinkGradGradCaptureState* ctx, const...
      type ReluGradGradCaptureState (line 180) | struct ReluGradGradCaptureState : public AutoGradCaptureState {
      class ReluGradGrad (line 185) | class ReluGradGrad : public OpExprGradFunction<ReluGradGradCaptureSt...
        method Init (line 187) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 188) | Maybe<void> Capture(ReluGradGradCaptureState* ctx, const TensorTup...
        method Apply (line 200) | Maybe<void> Apply(const ReluGradGradCaptureState* ctx, const Tenso...
      type LeakyReluGradGradCaptureState (line 212) | struct LeakyReluGradGradCaptureState : public AutoGradCaptureState {
      class LeakyReluGradGrad (line 218) | class LeakyReluGradGrad : public OpExprGradFunction<LeakyReluGradGra...
        method Init (line 220) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 227) | Maybe<void> Capture(LeakyReluGradGradCaptureState* ctx, const Tens...
        method Apply (line 244) | Maybe<void> Apply(const LeakyReluGradGradCaptureState* ctx, const ...
      type SoftplusGradGradCaptureState (line 259) | struct SoftplusGradGradCaptureState : public AutoGradCaptureState {
      class SoftplusGradGrad (line 266) | class SoftplusGradGrad : public OpExprGradFunction<SoftplusGradGradC...
        method Init (line 268) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 275) | Maybe<void> Capture(SoftplusGradGradCaptureState* ctx, const Tenso...
        method Apply (line 293) | Maybe<void> Apply(const SoftplusGradGradCaptureState* ctx, const T...
      type HardTanhGradGradCaptureState (line 314) | struct HardTanhGradGradCaptureState : public AutoGradCaptureState {
      class HardTanhGradGrad (line 321) | class HardTanhGradGrad : public OpExprGradFunction<HardTanhGradGradC...
        method Init (line 323) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 329) | Maybe<void> Capture(HardTanhGradGradCaptureState* ctx, const Tenso...
        method Apply (line 346) | Maybe<void> Apply(const HardTanhGradGradCaptureState* ctx, const T...
      type EluGradGradCaptureState (line 363) | struct EluGradGradCaptureState : public AutoGradCaptureState {
      class EluGradGrad (line 369) | class EluGradGrad : public OpExprGradFunction<EluGradGradCaptureStat...
        method Init (line 371) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 378) | Maybe<void> Capture(EluGradGradCaptureState* ctx, const TensorTupl...
        method Apply (line 395) | Maybe<void> Apply(const EluGradGradCaptureState* ctx, const Tensor...
      class CeluGradGrad (line 415) | class CeluGradGrad : public EluGradGrad {
        method Apply (line 417) | Maybe<void> Apply(const EluGradGradCaptureState* ctx, const Tensor...
      type PReluGradGradCaptureState (line 434) | struct PReluGradGradCaptureState : public AutoGradCaptureState {
      class PReluGradGrad (line 443) | class PReluGradGrad : public OpExprGradFunction<PReluGradGradCapture...
        method Init (line 445) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 447) | Maybe<void> Capture(PReluGradGradCaptureState* ctx, const TensorTu...
        method Apply (line 463) | Maybe<void> Apply(const PReluGradGradCaptureState* ctx, const Tens...
      type ThresholdGradGradCaptureState (line 497) | struct ThresholdGradGradCaptureState : public AutoGradCaptureState {
      class ThresholdGradGrad (line 503) | class ThresholdGradGrad : public OpExprGradFunction<ThresholdGradGra...
        method Init (line 505) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 512) | Maybe<void> Capture(ThresholdGradGradCaptureState* ctx, const Tens...
        method Apply (line 529) | Maybe<void> Apply(const ThresholdGradGradCaptureState* ctx, const ...

FILE: oneflow/core/autograd/higher_order_gradient_funcs/avg_pool.cpp
  type oneflow (line 22) | namespace oneflow {
    type one (line 23) | namespace one {
      type AdaptiveAvgPoolNDGradGradCaptureState (line 25) | struct AdaptiveAvgPoolNDGradGradCaptureState : public AutoGradCaptur...
      class AdaptiveAvgPoolNdNdGradGrad (line 33) | class AdaptiveAvgPoolNdNdGradGrad
        method Init (line 36) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 43) | Maybe<void> Capture(AdaptiveAvgPoolNDGradGradCaptureState* ctx, co...
        method Apply (line 71) | Maybe<void> Apply(const AdaptiveAvgPoolNDGradGradCaptureState* ctx...
      type AvgPoolGradGradCaptureState (line 98) | struct AvgPoolGradGradCaptureState : public AutoGradCaptureState {
      class AvgPoolNdGradGrad (line 111) | class AvgPoolNdGradGrad : public OpExprGradFunction<AvgPoolGradGradC...
        method Init (line 114) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 120) | Maybe<void> Capture(AvgPoolGradGradCaptureState* ctx, const Tensor...
        method Apply (line 140) | Maybe<void> Apply(const AvgPoolGradGradCaptureState* ctx, const Te...

FILE: oneflow/core/autograd/higher_order_gradient_funcs/binary_cross_entropy_loss.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type BinaryCrossEntropyGradGradCaptureState (line 23) | struct BinaryCrossEntropyGradGradCaptureState : public AutoGradCaptu...
      class BinaryCrossEntropyGradGrad (line 30) | class BinaryCrossEntropyGradGrad

FILE: oneflow/core/autograd/higher_order_gradient_funcs/binary_cross_entropy_with_logits.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 21) | namespace one {
      type BinaryCrossEntropyWithLogitsGradGradCaptureState (line 23) | struct BinaryCrossEntropyWithLogitsGradGradCaptureState : public Aut...
      class BinaryCrossEntropyWithLogitsGradGrad (line 31) | class BinaryCrossEntropyWithLogitsGradGrad

FILE: oneflow/core/autograd/higher_order_gradient_funcs/binary_cross_entropy_with_logits_reduce_mean.cpp
  type oneflow (line 21) | namespace oneflow {
    type one (line 22) | namespace one {
      type BinaryCrossEntropyWithLogitsReduceMeanGradGradCaptureState (line 24) | struct BinaryCrossEntropyWithLogitsReduceMeanGradGradCaptureState : ...
      class BinaryCrossEntropyWithLogitsReduceMeanGradGrad (line 34) | class BinaryCrossEntropyWithLogitsReduceMeanGradGrad

FILE: oneflow/core/autograd/higher_order_gradient_funcs/conv.cpp
  type oneflow (line 23) | namespace oneflow {
    type one (line 24) | namespace one {
      type ConvDataGradGradCaptureState (line 26) | struct ConvDataGradGradCaptureState : public AutoGradCaptureState {
      class ConvDataGradGrad (line 41) | class ConvDataGradGrad : public OpExprGradFunction<ConvDataGradGradC...
      type ConvFilterGradGradCaptureState (line 124) | struct ConvFilterGradGradCaptureState : public AutoGradCaptureState {
      class ConvFilterGradGrad (line 139) | class ConvFilterGradGrad : public OpExprGradFunction<ConvFilterGradG...

FILE: oneflow/core/autograd/higher_order_gradient_funcs/div.cpp
  type oneflow (line 24) | namespace oneflow {
    type one (line 25) | namespace one {
      type DivGradGradCaptureState (line 27) | struct DivGradGradCaptureState : public AutoGradCaptureState {
      class DivGradGrad (line 37) | class DivGradGrad : public OpExprGradFunction<DivGradGradCaptureStat...
        method Init (line 43) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 45) | Maybe<void> Capture(DivGradGradCaptureState* ctx, const TensorTupl...
        method Apply (line 65) | Maybe<void> Apply(const DivGradGradCaptureState* ctx, const Tensor...

FILE: oneflow/core/autograd/higher_order_gradient_funcs/kl_div_loss.cpp
  type oneflow (line 21) | namespace oneflow {
    type one (line 22) | namespace one {
      type KLDivLossGradGradCaptureState (line 24) | struct KLDivLossGradGradCaptureState : public AutoGradCaptureState {
      class KLDivLossGradGrad (line 34) | class KLDivLossGradGrad : public OpExprGradFunction<KLDivLossGradGra...

FILE: oneflow/core/autograd/higher_order_gradient_funcs/log_softmax.cpp
  type oneflow (line 22) | namespace oneflow {
    type one (line 23) | namespace one {
      type LogSoftmaxGradGradCaptureState (line 25) | struct LogSoftmaxGradGradCaptureState : public AutoGradCaptureState {
      class LogSoftmaxGradGrad (line 30) | class LogSoftmaxGradGrad : public OpExprGradFunction<LogSoftmaxGradG...

FILE: oneflow/core/autograd/higher_order_gradient_funcs/math_unary_op.cpp
  type oneflow (line 24) | namespace oneflow {
    type one (line 25) | namespace one {
      type UnaryMathGradGradState (line 27) | struct UnaryMathGradGradState : public AutoGradCaptureState {
      class UnaryMathGradGrad (line 36) | class UnaryMathGradGrad : public OpExprGradFunction<UnaryMathGradGra...
        method Init (line 37) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 38) | Maybe<void> Capture(UnaryMathGradGradState* ctx, const TensorTuple...
        method Apply (line 48) | Maybe<void> Apply(const UnaryMathGradGradState* ctx, const TensorT...
      class UnaryMathGradGradWithZeroDDX (line 62) | class UnaryMathGradGradWithZeroDDX : public OpExprGradFunction<Unary...
        method Init (line 63) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 64) | Maybe<void> Capture(UnaryMathGradGradState* ctx, const TensorTuple...
        method Apply (line 73) | Maybe<void> Apply(const UnaryMathGradGradState* ctx, const TensorT...

FILE: oneflow/core/autograd/higher_order_gradient_funcs/matmul.cpp
  type oneflow (line 23) | namespace oneflow {
    type one (line 24) | namespace one {
      type BroadcastMatmulGradBGradCaptureState (line 26) | struct BroadcastMatmulGradBGradCaptureState : public AutoGradCapture...
      class BroadcastMatmulGradBGrad (line 34) | class BroadcastMatmulGradBGrad : public OpExprGradFunction<Broadcast...
        method Init (line 36) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 42) | Maybe<void> Capture(BroadcastMatmulGradBGradCaptureState* ctx, con...
        method Apply (line 57) | Maybe<void> Apply(const BroadcastMatmulGradBGradCaptureState* ctx,...

FILE: oneflow/core/autograd/higher_order_gradient_funcs/max_pool.cpp
  type oneflow (line 21) | namespace oneflow {
    type one (line 22) | namespace one {
      type MaxPoolGradGradCaptureState (line 24) | struct MaxPoolGradGradCaptureState : public AutoGradCaptureState {
      class MaxPoolNdGradGrad (line 30) | class MaxPoolNdGradGrad : public OpExprGradFunction<MaxPoolGradGradC...
        method Init (line 32) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 34) | Maybe<void> Capture(MaxPoolGradGradCaptureState* ctx, const Tensor...
        method Apply (line 47) | Maybe<void> Apply(const MaxPoolGradGradCaptureState* ctx, const Te...

FILE: oneflow/core/autograd/higher_order_gradient_funcs/nll_loss.cpp
  type oneflow (line 20) | namespace oneflow {
    type one (line 22) | namespace one {
      type NLLCaptureState (line 24) | struct NLLCaptureState : public AutoGradCaptureState {
      class NLLLossGradGrad (line 31) | class NLLLossGradGrad : public OpExprGradFunction<NLLCaptureState> {

FILE: oneflow/core/autograd/higher_order_gradient_funcs/pow.cpp
  type oneflow (line 24) | namespace oneflow {
    type one (line 25) | namespace one {
      type PowXGradGradCaptureState (line 26) | struct PowXGradGradCaptureState : public AutoGradCaptureState {
      class PowXGradGrad (line 36) | class PowXGradGrad : public OpExprGradFunction<PowXGradGradCaptureSt...
        method Init (line 38) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 39) | Maybe<void> Capture(PowXGradGradCaptureState* ctx, const TensorTup...
        method Apply (line 56) | Maybe<void> Apply(const PowXGradGradCaptureState* ctx, const Tenso...
      type PowYGradGradCaptureState (line 97) | struct PowYGradGradCaptureState : public AutoGradCaptureState {
      class PowYGradGrad (line 108) | class PowYGradGrad : public OpExprGradFunction<PowYGradGradCaptureSt...
        method Init (line 111) | Maybe<void> Init(const OpExpr& op) override { return Maybe<void>::...
        method Capture (line 112) | Maybe<void> Capture(PowYGradGradCaptureState* ctx, const TensorTup...
        method Apply (line 131) | Maybe<void> Apply(const PowYGradGradCaptureState* ctx, const Tenso...

FILE: oneflow/core/autograd/higher_order_gradient_funcs/scalar_pow.cpp
  type oneflow (line 22) | namespace oneflow {
    type one (line 23) | namespace one {
      type ScalarPowGradGradCaptureState (line 25) | struct ScalarPowGradGradCaptureState : public AutoGradCaptureState {
      class ScalarPowGradGrad (line 31) | class ScalarPowGradGrad : public OpExprGradFunction<ScalarPowGradGra...
        method Init (line 33) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 40) | Maybe<void> Capture(ScalarPowGradGradCaptureState* ctx, const Tens...
        method Apply (line 61) | Maybe<void> Apply(const ScalarPowGradGradCaptureState* ctx, const ...
      class ScalarReversePowGradGrad (line 91) | class ScalarReversePowGradGrad : public OpExprGradFunction<ScalarPow...
        method Init (line 93) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 100) | Maybe<void> Capture(ScalarPowGradGradCaptureState* ctx, const Tens...
        method Apply (line 121) | Maybe<void> Apply(const ScalarPowGradGradCaptureState* ctx, const ...

FILE: oneflow/core/autograd/higher_order_gradient_funcs/slice.cpp
  type oneflow (line 23) | namespace oneflow {
    type one (line 24) | namespace one {
      type SliceGradGradCaptureState (line 26) | struct SliceGradGradCaptureState : public AutoGradCaptureState {
      class SliceGradGrad (line 32) | class SliceGradGrad : public OpExprGradFunction<SliceGradGradCapture...
        method Init (line 34) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 41) | Maybe<void> Capture(SliceGradGradCaptureState* ctx, const TensorTu...
        method Apply (line 52) | Maybe<void> Apply(const SliceGradGradCaptureState* ctx, const Tens...

FILE: oneflow/core/autograd/higher_order_gradient_funcs/smooth_l1_loss.cpp
  type oneflow (line 22) | namespace oneflow {
    type one (line 23) | namespace one {
      type SmoothL1LossGradGradCaptureState (line 25) | struct SmoothL1LossGradGradCaptureState : public AutoGradCaptureState {
      class SmoothL1LossGradGrad (line 35) | class SmoothL1LossGradGrad : public OpExprGradFunction<SmoothL1LossG...
        method Init (line 37) | Maybe<void> Init(const OpExpr& op) override {
        method Capture (line 44) | Maybe<void> Capture(SmoothL1LossGradGradCaptureState* ctx, const T...
        method Apply (line 64) | Maybe<void> Apply(const SmoothL1LossGradGradCaptureState* ctx, con...

FILE: oneflow/core/autograd/higher_order_gradient_funcs/softmax.cpp
  type oneflow (line 22) | namespace oneflow {
    type one (line 23) | namespace one {
      type SoftmaxGradGradCaptureState (line 25) | struct SoftmaxGradGradCaptureState : public AutoGradCaptureState {
      class SoftmaxGradGrad (line 30) | class SoftmaxGradGrad : public OpExprGradFunction<SoftmaxGradGradCap...

FILE: oneflow/core/boxing/asymmetric_broadcast.cpp
  type oneflow (line 29) | namespace oneflow {
    function RawCheckAsymmetricBroadcast (line 33) | Maybe<void> RawCheckAsymmetricBroadcast(Symbol<PlacedNdSbp> in, Symbol...
    function CalBroadcastRoot (line 51) | Maybe<int64_t> CalBroadcastRoot(Symbol<ParallelDesc> src_parallel_desc,
    function EagerCclBroadcast (line 81) | Maybe<one::UserOpExpr> EagerCclBroadcast(Symbol<ParallelDesc> parallel...
    function AsymmetricBroadcast (line 96) | Maybe<one::Tensor> AsymmetricBroadcast(const std::shared_ptr<one::Tens...

FILE: oneflow/core/boxing/boxing_dividor.h
  function namespace (line 23) | namespace oneflow {

FILE: oneflow/core/boxing/boxing_dividor_util.cpp
  type oneflow (line 23) | namespace oneflow {
    function RawReplaceInDeviceType (line 27) | Maybe<BoxingDividor> RawReplaceInDeviceType(DeviceType device_type) {
    function RawReplaceOutDeviceType (line 36) | Maybe<BoxingDividor> RawReplaceOutDeviceType(DeviceType device_type) {
    function RawFlattenHierarchy (line 54) | Maybe<Symbol<PlacedNdSbp>> RawFlattenHierarchy(Symbol<PlacedNdSbp> pla...
    function RawFlattenInHierarchy (line 75) | Maybe<BoxingDividor> RawFlattenInHierarchy() {
    function RawUnflattenHierarchy (line 83) | Maybe<Symbol<PlacedNdSbp>> RawUnflattenHierarchy(Symbol<PlacedNdSbp> i...
    function RawUnflattenInHierarchy (line 99) | Maybe<BoxingDividor> RawUnflattenInHierarchy() {
    function RawUnflattenOutHierarchy (line 107) | Maybe<BoxingDividor> RawUnflattenOutHierarchy() {
    function GetAllPartialSumNdSbp (line 126) | Maybe<Symbol<NdSbp>> GetAllPartialSumNdSbp(int64_t ndim) {
    function RawReplaceNdSbpWithPartialSum (line 136) | Maybe<Symbol<PlacedNdSbp>> RawReplaceNdSbpWithPartialSum(Symbol<Placed...
    function RawOutPlacementAndPartialSum (line 145) | Maybe<BoxingDividor> RawOutPlacementAndPartialSum() {
    function GetAllBroadcastNdSbp (line 160) | Maybe<Symbol<NdSbp>> GetAllBroadcastNdSbp(int64_t ndim) {
    function RawReplaceNdSbpWithBroadcast (line 170) | Maybe<Symbol<PlacedNdSbp>> RawReplaceNdSbpWithBroadcast(Symbol<PlacedN...
    function RawInPlacementAndBroadcast (line 179) | Maybe<BoxingDividor> RawInPlacementAndBroadcast() {
    function RawOutPlacementAndBroadcast (line 187) | Maybe<BoxingDividor> RawOutPlacementAndBroadcast() {
    function GetSplitNdSbp (line 204) | Maybe<Symbol<NdSbp>> GetSplitNdSbp(int64_t axis) {
    function RawInPlacementAndSplit (line 212) | Maybe<BoxingDividor> RawInPlacementAndSplit(int64_t axis) {
    function RawOutPlacementAndSplit (line 221) | Maybe<BoxingDividor> RawOutPlacementAndSplit(int64_t axis) {
    function GetFisrtDeviceOfPlacement (line 239) | Maybe<Symbol<ParallelDesc>> GetFisrtDeviceOfPlacement(Symbol<ParallelD...
    function RawInFirstDeviceAndAllBroadcast (line 257) | Maybe<BoxingDividor> RawInFirstDeviceAndAllBroadcast() {
    function RawOutFirstDeviceAndAllBroadcast (line 266) | Maybe<BoxingDividor> RawOutFirstDeviceAndAllBroadcast() {
    function RawPlacementAndRepeatFirstSbp (line 285) | Maybe<Symbol<PlacedNdSbp>> RawPlacementAndRepeatFirstSbp(Symbol<Placed...
    function RawInPlacementAndRepeatFirstSbp (line 297) | Maybe<BoxingDividor> RawInPlacementAndRepeatFirstSbp() {

FILE: oneflow/core/boxing/boxing_dividor_util.h
  function namespace (line 22) | namespace oneflow {

FILE: oneflow/core/boxing/boxing_interpreter_status.cpp
  type oneflow (line 22) | namespace oneflow {
    function RawMakeBoxingInterpreterStatus (line 26) | Maybe<BoxingInterpreterStatus> RawMakeBoxingInterpreterStatus(const st...
    function RawMakeComposedBoxingInterpreterStatus (line 35) | Maybe<BoxingInterpreterStatus> RawMakeComposedBoxingInterpreterStatus(
    function RawGetNdSbpRouting (line 75) | Maybe<std::string> RawGetNdSbpRouting(Symbol<PlacedNdSbp> src_placed_n...
    function RawGetPlacementRouting (line 87) | Maybe<std::string> RawGetPlacementRouting(
    function RawGetBoxingDesc (line 100) | Maybe<std::string> RawGetBoxingDesc(Symbol<std::vector<std::string>> s...

FILE: oneflow/core/boxing/boxing_interpreter_status.h
  function std (line 69) | const std::string& boxing_routing() const;

FILE: oneflow/core/boxing/ccl_boxing_function.cpp
  type oneflow (line 24) | namespace oneflow {
    class EagerBoxingKernelRegContext (line 28) | class EagerBoxingKernelRegContext final : public user_op::KernelRegCon...
      method EagerBoxingKernelRegContext (line 30) | explicit EagerBoxingKernelRegContext(DeviceType device_type) : devic...
      method DeviceType (line 33) | DeviceType device_type() const override { return device_type_; }
      method ParallelContext (line 34) | const ParallelContext& parallel_ctx() const override { PRINT_BUG_PRO...
    function RawCheckCclKernelRegistered (line 57) | Maybe<bool> RawCheckCclKernelRegistered(const std::string& op_type_nam...
    function RawCheckCclP2B (line 65) | Maybe<void> RawCheckCclP2B(Symbol<PlacedNdSbp> in, Symbol<PlacedNdSbp>...
    function RawCheckCclP2S (line 84) | Maybe<void> RawCheckCclP2S(Symbol<PlacedNdSbp> in, Symbol<PlacedNdSbp>...
    function RawCheckCclS2B (line 105) | Maybe<void> RawCheckCclS2B(Symbol<PlacedNdSbp> in, Symbol<PlacedNdSbp>...
    function RawCheckCclS2S (line 127) | Maybe<void> RawCheckCclS2S(Symbol<PlacedNdSbp> in, Symbol<PlacedNdSbp>...
    function CclP2B (line 156) | Maybe<one::Tensor> CclP2B(const std::shared_ptr<one::Tensor>& tensor, ...
    function CclP2S (line 170) | Maybe<one::Tensor> CclP2S(const std::shared_ptr<one::Tensor>& tensor, ...
    function CclS2B (line 185) | Maybe<one::Tensor> CclS2B(const std::shared_ptr<one::Tensor>& tensor, ...
    function CclS2S (line 199) | Maybe<one::Tensor> CclS2S(const std::shared_ptr<one::Tensor>& tensor, ...

FILE: oneflow/core/boxing/cuda_copy_boxing_interpreter.cpp
  type oneflow (line 23) | namespace oneflow {
    function IgnoringDeviceTypeEqual (line 27) | Maybe<bool> IgnoringDeviceTypeEqual(Symbol<ParallelDesc> lhs, Symbol<P...
    function CheckCopyH2D (line 34) | Maybe<void> CheckCopyH2D(Symbol<PlacedNdSbp> in, Symbol<PlacedNdSbp> out,
    function CheckCopyD2H (line 44) | Maybe<void> CheckCopyD2H(Symbol<PlacedNdSbp> in, Symbol<PlacedNdSbp> out,
    function CopyBoxingFunction (line 55) | Maybe<one::Tensor> CopyBoxingFunction(const std::shared_ptr<one::Tenso...

FILE: oneflow/core/boxing/eager_boxing_interpreter.cpp
  type oneflow (line 25) | namespace oneflow {
    function CheckEagerBoxingDataType (line 28) | Maybe<void> CheckEagerBoxingDataType(DataType val) {
    function RawGetBoxingFunction (line 68) | Maybe

Copy disabled (too large) Download .json

Condensed preview — 4508 files, each showing path, character count, and a content snippet. Download the .json file for the full structured content (27,751K chars).

[
  {
    "path": ".clang-format",
    "chars": 2802,
    "preview": "---\nLanguage:        Cpp\nAccessModifierOffset: -1\nAlignAfterOpenBracket: Align\nAlignConsecutiveAssignments: false\nAlignC"
  },
  {
    "path": ".clang-tidy",
    "chars": 2712,
    "preview": "# `maybe-*` checks are only available on OneFlow custom clang-tidy and clangd\n# `-allow-enabling-analyzer-alpha-checkers"
  },
  {
    "path": ".cmake-format.py",
    "chars": 13813,
    "preview": "# ----------------------------------\n# Options affecting listfile parsing\n# ----------------------------------\nwith sect"
  },
  {
    "path": ".devcontainer/Dockerfile",
    "chars": 219,
    "preview": "# See here for image contents: https://github.com/Oneflow-Inc/docker-images/blob/main/oneflow/Dockerfile\n# [Choice] llvm"
  },
  {
    "path": ".devcontainer/devcontainer.json",
    "chars": 2365,
    "preview": "// For format details, see https://aka.ms/devcontainer.json. For config options, see the README at:\n// https://github.co"
  },
  {
    "path": ".dockerignore",
    "chars": 404,
    "preview": "**/.git\n/build\n/build-*\n/docs/build\n/cmake-build-*\n/third_party\n/examples/**/oneflow\n/benchmark/**/oneflow\n/.vscode\n/.id"
  },
  {
    "path": ".github/CODEOWNERS",
    "chars": 768,
    "preview": "*.cu @liujuncheng\n*.py @BBuf @daquexian\n/oneflow/core/cuda @liujuncheng\n/oneflow/core/eager @daquexian\n/oneflow/core/fra"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/blank_issue.yml",
    "chars": 291,
    "preview": "name: Blank Issue\ndescription: Submit an issue about OneFlow.\nlabels: [Blank Issue]\nbody:\n  - type: textarea\n    id: des"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.md",
    "chars": 502,
    "preview": "---\nname: Bug report\nabout: Create a report to help us improve\ntitle: ''\nlabels: bug, community\nassignees: ''\n\n---\n\n## S"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/documention_issue.yml",
    "chars": 922,
    "preview": "name: Documentation Issue\ndescription: Report an issue about OneFlow ducumention or require a documention.\ntitle: \"[Docu"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature_request.yml",
    "chars": 2344,
    "preview": "name: Feature Request\ndescription: Request/Propose a new OneFlow feature.\ntitle: \"[Feature Request]: \"\nlabels: [feature-"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/performance_issue.yml",
    "chars": 1546,
    "preview": "name: Performance Issue\ndescription: Submit an issue about performance problem or regression of OneFlow.\ntitle: \"[Perfor"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/question.yml",
    "chars": 831,
    "preview": "name: Question\ndescription: Ask a question about OneFlow and discuss with community members.\ntitle: \"[Question]: \"\nlabel"
  },
  {
    "path": ".github/PULL_REQUEST_TEMPLATE/general_template.md",
    "chars": 343,
    "preview": "## 概述\n\n\n## PR Checklist\n - [ ] PR 标题语句通畅，明确表达 PR 内容，适合直接作为新版本发布时的 changelog\n - [ ] 代码格式化\n - [ ] 已经本地编译通过\n - [ ] 已本地针对改动测"
  },
  {
    "path": ".github/PULL_REQUEST_TEMPLATE/op_template.md",
    "chars": 1684,
    "preview": "## 概述\n描述 op 的功能、公式等。若参考了其它框架的接口，应列出超链接。\n\n## 功能 CheckList\n**注意** : 功能复选框均为可选项，若未选择，说明理由即可。例如：该 Op 由 Python 接口拼接而成，因此无 `Se"
  },
  {
    "path": ".github/actions/mac-build/action.yml",
    "chars": 1310,
    "preview": "name: \"Build OneFlow on macOS\"\ndescription: \"\"\nruns:\n  using: \"composite\"\n  steps:\n    - name: Install dependencies\n    "
  },
  {
    "path": ".github/actions/setup/action.yml",
    "chars": 424,
    "preview": "inputs:\n  name:\n    description: 'Placeholder'\n    default: 'Placeholder'\nruns:\n  using: \"composite\"\n  steps:\n    - run:"
  },
  {
    "path": ".github/actions/upload_oss/action.yml",
    "chars": 1287,
    "preview": "inputs:\n  src_path:\n    required: true\n  oss_dst_path:\n    required: true\n  oss_access_key_id:\n    required: true\n  oss_"
  },
  {
    "path": ".github/actions/upload_ssh/action.yml",
    "chars": 678,
    "preview": "name: \"Upload via ssh\"\ndescription: \"\"\ninputs:\n  src_path:\n    required: true\n    description: \"\"\n  dst_host:\n    requir"
  },
  {
    "path": ".github/actions/whl/action.yml",
    "chars": 1049,
    "preview": "inputs:\n  tmp_dir:\n    description: \"tmp dir\"\n    required: true\n  cuda_version:\n    description: \"cuda_version\"\n    def"
  },
  {
    "path": ".github/scripts/requirements.txt",
    "chars": 19,
    "preview": "PyYAML>=5.1\nparsec\n"
  },
  {
    "path": ".github/scripts/set_initial_variables.py",
    "chars": 6144,
    "preview": "import json\n\n\ndef create_one(name=None, allow_fail=None):\n    return {\n        \"test_suite\": name,\n        \"cuda_version"
  },
  {
    "path": ".github/workflows/canary.yml",
    "chars": 3603,
    "preview": "name: Canary\n\non:\n  push:\n    branches:\n      - master\n      - \"canary/*\"\n  workflow_dispatch:\n    inputs:\n      oneflow"
  },
  {
    "path": ".github/workflows/community_release.yml",
    "chars": 1096,
    "preview": "name: Community Release\n\non:\n  push:\n    branches:\n      - \"community/*\"\n  schedule:\n    # beijing: 6 pm.\n    # utc: 10 "
  },
  {
    "path": ".github/workflows/on_merge.yml",
    "chars": 484,
    "preview": "name: Update Benchmark History\non:\n  pull_request:\n    types:\n      - closed\n    branches:\n      - master\n\nenv:\n  OSS_AC"
  },
  {
    "path": ".github/workflows/pr.yml",
    "chars": 1693,
    "preview": "name: Check PR\n\non:\n  pull_request:\n    types: [opened, labeled, unlabeled, synchronize]\n\njobs:\n  check_labels:\n    runs"
  },
  {
    "path": ".github/workflows/priv_release.yml",
    "chars": 1031,
    "preview": "name: Priv Release\n\non:\n  push:\n    branches:\n      - \"pro/*\"\n  schedule:\n    # beijing: 12 pm.\n    # utc: 4 am.\n    - c"
  },
  {
    "path": ".github/workflows/release.yml",
    "chars": 10098,
    "preview": "name: Release\n\non:\n  push:\n    branches:\n      - \"release/*\"\n\n  schedule:\n    # beijing: 2 am.\n    # utc: 6 pm.\n    - cr"
  },
  {
    "path": ".github/workflows/simple.yml",
    "chars": 10618,
    "preview": "name: Simple CI\non:\n  pull_request:\n    types: [review_requested]\n    branches:\n      - \"*\"\n  push:\n    branches:\n      "
  },
  {
    "path": ".github/workflows/test.yml",
    "chars": 61486,
    "preview": "name: Build and Test CI\non:\n  pull_request:\n    types: [opened, review_requested, ready_for_review, synchronize, unlocke"
  },
  {
    "path": ".gitignore",
    "chars": 754,
    "preview": "/build\n/build-*\n/docs/build/\n/docs/build-cn/\n/docs/source/generated\n/cmake-build-*\n/dist\n/third_party/\n/examples/**/onef"
  },
  {
    "path": ".lsan-suppressions",
    "chars": 14,
    "preview": "leak:CommandT\n"
  },
  {
    "path": ".mergify.yml",
    "chars": 510,
    "preview": "pull_request_rules:\n  - name: automatic update for PR with label “automerge“\n    conditions:\n      - \"#approved-reviews-"
  },
  {
    "path": ".tsan-suppressions",
    "chars": 406,
    "preview": "# These four group of functions are designed to be thread unsafe,\n# it's user's responsibility to use them correctly.\nra"
  },
  {
    "path": ".ubsan-suppressions",
    "chars": 22,
    "preview": "# llvm\nvptr:Class.cpp\n"
  },
  {
    "path": "CMakeLists.txt",
    "chars": 6737,
    "preview": "# Minimum CMake required\nset(CMAKE_POLICY_DEFAULT_CMP0135 NEW)\ncmake_minimum_required(VERSION 3.18.0)\n\nset(CMAKE_INSTALL"
  },
  {
    "path": "LICENSE",
    "chars": 11414,
    "preview": "Copyright 2020 The OneFlow Authors. All rights reserved.\n                                 Apache License\n               "
  },
  {
    "path": "README.md",
    "chars": 7326,
    "preview": "# OneFlow\n\nOneFlow is a deep learning framework designed to be **user-friendly, scalable and efficient**. With OneFlow, "
  },
  {
    "path": "ci/CMakeLists.txt",
    "chars": 23,
    "preview": "add_subdirectory(test)\n"
  },
  {
    "path": "ci/build/ensure_img.py",
    "chars": 2247,
    "preview": "import os\nimport argparse\nfrom pathlib import Path\nimport re\nimport json\nimport subprocess\n\n\ndef check_and_download(tag,"
  },
  {
    "path": "ci/build/make.sh",
    "chars": 1926,
    "preview": "set -ex\n\nsrc_dir=${ONEFLOW_SRC_DIR:-\"$PWD\"}\ntmp_dir=${ONEFLOW_CI_TMP_DIR:-\"$HOME/ci-tmp\"}\nextra_oneflow_cmake_args=${ONE"
  },
  {
    "path": "ci/check/clang_tidy_warnings_as_errors_on_diff",
    "chars": 99,
    "preview": "*,-maybe-glog-fatal,-clang-analyzer-alpha.*,-clang-analyzer-cplusplus.NewDelete,-clang-diagnostic-*"
  },
  {
    "path": "ci/check/lintutils.py",
    "chars": 3425,
    "preview": "# Licensed to the Apache Software Foundation (ASF) under one\n# or more contributor license agreements.  See the NOTICE f"
  },
  {
    "path": "ci/check/run_clang_format.py",
    "chars": 5188,
    "preview": "#!/usr/bin/env python3\n# Licensed to the Apache Software Foundation (ASF) under one\n# or more contributor license agreem"
  },
  {
    "path": "ci/check/run_clang_tidy.py",
    "chars": 4116,
    "preview": "#!/usr/bin/env python3\n# Licensed to the Apache Software Foundation (ASF) under one\n# or more contributor license agreem"
  },
  {
    "path": "ci/check/run_cmake_format.py",
    "chars": 1749,
    "preview": "from subprocess import call\nfrom argparse import ArgumentParser\nfrom glob import glob\nfrom pathlib import Path\nfrom mult"
  },
  {
    "path": "ci/check/run_license_format.py",
    "chars": 4076,
    "preview": "import argparse\nimport os\nimport glob\nfrom multiprocessing import Pool\n\nLICENSE_TXT = \"\"\"Copyright 2020 The OneFlow Auth"
  },
  {
    "path": "ci/check/run_py_format.py",
    "chars": 1333,
    "preview": "import argparse\nimport sys\nimport platform\nfrom subprocess import Popen\nimport os\n\nif __name__ == \"__main__\":\n\n    major"
  },
  {
    "path": "ci/clang/build-llvm.sh",
    "chars": 975,
    "preview": "set -ex\nexport PATH=/usr/lib/llvm-15/bin:/usr/lib64/ccache:/root/.local/bin:$PATH\n\n# clean python dir\ncd ${ONEFLOW_CI_SR"
  },
  {
    "path": "ci/conda/build-clang.sh",
    "chars": 240,
    "preview": "set -ex\nconda activate oneflow-dev-clang10-v2\nmkdir -p build\ncd build\ncmake .. -C ../cmake/caches/cn/fast/cpu-clang.cmak"
  },
  {
    "path": "ci/conda/tuna.condarc",
    "chars": 641,
    "preview": "channels:\n  - defaults\nshow_channel_urls: true\ndefault_channels:\n  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/"
  },
  {
    "path": "ci/fixed-dev-requirements.txt",
    "chars": 178,
    "preview": "numpy==1.26.4 ; python_version >= \"3.12\"\nnumpy==1.22.1 ; python_version >= \"3.10\" and python_version < \"3.12\"\nnumpy==1.2"
  },
  {
    "path": "ci/manylinux/build-gcc7-xla.sh",
    "chars": 1101,
    "preview": "source scl_source enable devtoolset-7\nset -ex\nONEFLOW_CI_BUILD_PARALLEL=${ONEFLOW_CI_BUILD_PARALLEL:-$(nproc)}\ngcc --ver"
  },
  {
    "path": "ci/manylinux/build-gcc9.sh",
    "chars": 1583,
    "preview": "source scl_source enable devtoolset-9\nset -ex\nONEFLOW_CI_BUILD_PARALLEL=${ONEFLOW_CI_BUILD_PARALLEL:-$(nproc)}\ngcc --ver"
  },
  {
    "path": "ci/manylinux/build.sh",
    "chars": 1438,
    "preview": "set -ex\nONEFLOW_CI_BUILD_PARALLEL=${ONEFLOW_CI_BUILD_PARALLEL:-$(nproc)}\ngcc --version\nld --version\n# clean python dir\nc"
  },
  {
    "path": "ci/requirements.txt",
    "chars": 169,
    "preview": "pycocotools\nopencv-python==4.3.0.38; sys_platform == 'darwin'\nopencv-python==4.2.0.34; sys_platform != 'darwin'\nscipy\npi"
  },
  {
    "path": "ci/reset_submodule.sh",
    "chars": 79,
    "preview": "set -x\nset -e\ngit reset --hard\ngit submodule deinit -f .\nrm -rf .git/modules/*\n"
  },
  {
    "path": "ci/setup_submodule.py",
    "chars": 947,
    "preview": "import configparser\nimport argparse\nimport os\n\nparser = argparse.ArgumentParser()\nparser.add_argument(\"-s\", \"--oneflow_s"
  },
  {
    "path": "ci/setup_submodule.sh",
    "chars": 184,
    "preview": "set -x\nset -e\nsrc_dir=${ONEFLOW_CI_SRC_DIR:-\"$HOME/oneflow\"}\npython3 ci/setup_submodule.py --oneflow_src_local_path=$src"
  },
  {
    "path": "ci/test/1node_benchmark_test.sh",
    "chars": 1845,
    "preview": "set -xe\n\nrm -rf /benchmarks\ncp -r python/oneflow/compatible/single_client/benchmarks /benchmarks\ncd /benchmarks\n\npython3"
  },
  {
    "path": "ci/test/1node_benchmark_test_fp16.sh",
    "chars": 1746,
    "preview": "set -ex\n\nrm -rf /benchmarks\ncp -r python/oneflow/compatible/single_client/benchmarks /benchmarks\ncd /benchmarks\n\npython3"
  },
  {
    "path": "ci/test/1node_custom_op_test.sh",
    "chars": 356,
    "preview": "\n#!/bin/bash\nset -xe\n\nsrc_dir=${ONEFLOW_SRC_DIR:-\"$PWD\"}\ntest_tmp_dir=${ONEFLOW_TEST_TMP_DIR:-\"./test_tmp_dir\"}\n\nrm -rf "
  },
  {
    "path": "ci/test/1node_model_eager_test.sh",
    "chars": 106,
    "preview": "#!/bin/bash\nset -xe\n\ncp -r python/oneflow/test /test_dir\ncd /test_dir\n\npython3 models/eager_1node_test.py\n"
  },
  {
    "path": "ci/test/1node_model_test.sh",
    "chars": 125,
    "preview": "#!/bin/bash\nset -xe\n\ncp -r python/oneflow/compatible/single_client/test /test_dir\ncd /test_dir\n\npython3 models/1node_tes"
  },
  {
    "path": "ci/test/1node_op_test.sh",
    "chars": 950,
    "preview": "#!/bin/bash\nset -xe\n\nexport TF_CPP_MIN_LOG_LEVEL=3\nexport PYTHONUNBUFFERED=1\n\nsrc_dir=${ONEFLOW_SRC_DIR:-\"$PWD\"}\ntest_tm"
  },
  {
    "path": "ci/test/2node_op_test.sh",
    "chars": 888,
    "preview": "#!/bin/bash\nset -xe\n\nexport PYTHONUNBUFFERED=1\n\nsrc_dir=${ONEFLOW_SRC_DIR:-\"$PWD\"}\ntest_tmp_dir=${ONEFLOW_TEST_TMP_DIR:-"
  },
  {
    "path": "ci/test/2node_op_test_multi_client.sh",
    "chars": 780,
    "preview": "#!/bin/bash\n\nset -xeu\n\nexport PYTHONUNBUFFERED=1\n\nsrc_dir=${ONEFLOW_SRC_DIR:-\"$PWD\"}\nONEFLOW_CI_DEVICE_NUMS=${ONEFLOW_CI"
  },
  {
    "path": "ci/test/CMakeLists.txt",
    "chars": 1012,
    "preview": "set(PYTHON_EXECUTABLE python3 CACHE STRING \"python3 exe to run test, usually is the python3 installation oneflow is link"
  },
  {
    "path": "ci/test/build_docs.sh",
    "chars": 206,
    "preview": "set -ex\nsrc_dir=${ONEFLOW_SRC_DIR:-\"$PWD\"}\ntest_tmp_dir=${ONEFLOW_TEST_TMP_DIR:-\"$PWD/build-docs\"}\nrm -rf $test_tmp_dir\n"
  },
  {
    "path": "ci/test/distributed_run.py",
    "chars": 22992,
    "preview": "from multiprocessing.connection import Listener\nimport os\nimport subprocess\nimport socket\nimport tempfile\nfrom contextli"
  },
  {
    "path": "ci/test/doctest.sh",
    "chars": 512,
    "preview": "#!/bin/bash\nset -xe\nexport PYTHONUNBUFFERED=1\nsrc_dir=${ONEFLOW_SRC_DIR:-\"$PWD\"}\ntest_tmp_dir=${ONEFLOW_TEST_TMP_DIR:-\"."
  },
  {
    "path": "ci/test/excludelist",
    "chars": 9968,
    "preview": "# This file lists libraries that we will assume to be present on the host system and hence\n# should NOT be bundled insid"
  },
  {
    "path": "ci/test/expensive_generic_test_multi_client.sh",
    "chars": 1230,
    "preview": "#!/bin/bash\nset -xe\n\nexport PYTHONUNBUFFERED=1\n\nsrc_dir=${ONEFLOW_SRC_DIR:-\"$PWD\"}\nONEFLOW_TEST_DIR=${ONEFLOW_TEST_DIR:-"
  },
  {
    "path": "ci/test/generic_test.sh",
    "chars": 832,
    "preview": "#!/bin/bash\nset -xe\n\nexport TF_CPP_MIN_LOG_LEVEL=3\nexport PYTHONUNBUFFERED=1\n\nsrc_dir=${ONEFLOW_SRC_DIR:-\"$PWD\"}\ntest_di"
  },
  {
    "path": "ci/test/generic_test_multi_client.sh",
    "chars": 2347,
    "preview": "#!/bin/bash\nset -xe\n\nexport PYTHONUNBUFFERED=1\n\nsrc_dir=${ONEFLOW_SRC_DIR:-\"$PWD\"}\nONEFLOW_TEST_DIR=${ONEFLOW_TEST_DIR:-"
  },
  {
    "path": "ci/test/ir_tests.sh",
    "chars": 777,
    "preview": "#!/bin/bash\nset -xe\n\nexport PYTHONUNBUFFERED=1\n\nsrc_dir=${ONEFLOW_SRC_DIR:-\"$PWD\"}\nONEFLOW_TEST_DIR=${ONEFLOW_TEST_DIR:-"
  },
  {
    "path": "ci/test/multi_client_exception_test.sh",
    "chars": 1128,
    "preview": "#!/bin/bash\nset -xe\n\nexport PYTHONUNBUFFERED=1\n\nsrc_dir=${ONEFLOW_SRC_DIR:-\"$PWD\"}\ntest_dir=\"$PWD/python/oneflow/test/ex"
  },
  {
    "path": "ci/test/multi_launch.py",
    "chars": 6466,
    "preview": "\"\"\"\nCopyright 2020 The OneFlow Authors. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Licen"
  },
  {
    "path": "ci/test/parallel_run.py",
    "chars": 4958,
    "preview": "import asyncio\nimport os\nimport argparse\nfrom subprocess import PIPE, STDOUT\nimport glob\nimport sys\nimport time\nimport s"
  },
  {
    "path": "ci/test/print_stack_from_core.sh",
    "chars": 134,
    "preview": "set -ex\nif compgen -G \"$2/core.*\" > /dev/null; then\n    gdb --batch --quiet -ex \"thread apply all bt full\" -ex \"quit\" $1"
  },
  {
    "path": "ci/test/print_stack_in_all_dirs.sh",
    "chars": 120,
    "preview": "set -ex\nfind . -type f -name \"core.*\" -exec gdb --batch --quiet -ex \"thread apply all bt full\" -ex \"quit\" python3 {} \\;\n"
  },
  {
    "path": "ci/test/resource-spec/1x-gtx-1080.json",
    "chars": 173,
    "preview": "{\n  \"version\": {\n    \"major\": 1,\n    \"minor\": 0\n  },\n  \"local\": [\n    {\n      \"vram\": [\n        {\n          \"id\": \"0\",\n "
  },
  {
    "path": "ci/test/resource-spec/2x-rtx-2080.json",
    "chars": 239,
    "preview": "{\n  \"version\": {\n    \"major\": 1,\n    \"minor\": 0\n  },\n  \"local\": [\n    {\n      \"vram\": [\n        {\n          \"id\": \"0\",\n "
  },
  {
    "path": "ci/test/resource-spec/4x-rtx-2080ti.json",
    "chars": 375,
    "preview": "{\n  \"version\": {\n    \"major\": 1,\n    \"minor\": 0\n  },\n  \"local\": [\n    {\n      \"vram\": [\n        {\n          \"id\": \"0\",\n "
  },
  {
    "path": "ci/test/test_mock_function.sh",
    "chars": 541,
    "preview": "#!/bin/bash\nset -e\nMOCK_UNITTEST=$PWD/python/oneflow/test/misc/test_mock_scope.py\n\npython3 $MOCK_UNITTEST --failfast --v"
  },
  {
    "path": "ci/test/test_mock_script.sh",
    "chars": 2030,
    "preview": "#!/bin/bash\nset -e\npython_version=$(python3 --version 2>&1 | awk '{print $2}')\n\nif [[ \"$python_version\" < \"3.8\" ]]; then"
  },
  {
    "path": "ci/test/test_resnet50_graph_ddp.sh",
    "chars": 1208,
    "preview": "#!/usr/bin/env bash\n\nset -ex\n\ncd $ONEFLOW_MODELS_DIR\nONEFLOW_TEST_DATASET_DIR=${ONEFLOW_TEST_DATASET_DIR:-\"/dataset\"}\nOF"
  },
  {
    "path": "ci/test/test_speed_multi_client.sh",
    "chars": 4145,
    "preview": "#!/usr/bin/env bash\n\nset -uxo pipefail\n\nrc=0\n# accumulate the score of every test\ntrap 'rc=$(($rc + $?))' ERR\n\ncd $ONEFL"
  },
  {
    "path": "ci/test/try_install.sh",
    "chars": 870,
    "preview": "#!/bin/bash\nset -xe\n\nsrc_dir=${ONEFLOW_SRC_DIR:-\"$PWD\"}\nwheel_path=${ONEFLOW_WHEEL_PATH:-\"$PWD/wheelhouse\"}\nindex=${ONEF"
  },
  {
    "path": "cmake/caches/ci/canary/cuda.cmake",
    "chars": 819,
    "preview": "set(BUILD_CUDA YES CACHE BOOL \"\")\nset(BUILD_GIT_VERSION YES CACHE BOOL \"\")\nset(BUILD_TESTING OFF CACHE BOOL \"\")\nset(BUIL"
  },
  {
    "path": "cmake/caches/ci/cpu-asan-ubsan.cmake",
    "chars": 799,
    "preview": "set(BUILD_CUDA NO CACHE BOOL \"\")\nset(BUILD_GIT_VERSION YES CACHE BOOL \"\")\nset(BUILD_TESTING YES CACHE BOOL \"\")\nset(WITH_"
  },
  {
    "path": "cmake/caches/ci/cpu-tsan.cmake",
    "chars": 763,
    "preview": "set(BUILD_CUDA NO CACHE BOOL \"\")\nset(BUILD_GIT_VERSION YES CACHE BOOL \"\")\nset(BUILD_TESTING YES CACHE BOOL \"\")\nset(WITH_"
  },
  {
    "path": "cmake/caches/ci/cpu.cmake",
    "chars": 728,
    "preview": "set(BUILD_CUDA NO CACHE BOOL \"\")\nset(BUILD_NPU NO CACHE BOOL \"\")\nset(BUILD_MLU NO CACHE BOOL \"\")\nset(BUILD_GIT_VERSION Y"
  },
  {
    "path": "cmake/caches/ci/cuda-xla.cmake",
    "chars": 757,
    "preview": "set(BUILD_CUDA YES CACHE BOOL \"\")\nset(BUILD_GIT_VERSION YES CACHE BOOL \"\")\nset(BUILD_TESTING YES CACHE BOOL \"\")\nset(BUIL"
  },
  {
    "path": "cmake/caches/ci/cuda.cmake",
    "chars": 987,
    "preview": "set(BUILD_CUDA YES CACHE BOOL \"\")\nset(BUILD_GIT_VERSION YES CACHE BOOL \"\")\nset(BUILD_TESTING YES CACHE BOOL \"\")\nset(BUIL"
  },
  {
    "path": "cmake/caches/ci/gh-hosted/cpu-clang.cmake",
    "chars": 669,
    "preview": "set(CMAKE_C_COMPILER \"clang\" CACHE STRING \"\")\nset(CMAKE_CXX_COMPILER \"clang++\" CACHE STRING \"\")\nset(CMAKE_EXE_LINKER_FLA"
  },
  {
    "path": "cmake/caches/ci/gh-hosted/cpu-gcc.cmake",
    "chars": 86,
    "preview": "set(BUILD_CUDA NO CACHE BOOL \"\")\nset(CMAKE_BUILD_TYPE RelWithDebInfo CACHE STRING \"\")\n"
  },
  {
    "path": "cmake/caches/ci/llvm/cuda-75-clang.cmake",
    "chars": 1073,
    "preview": "set(CMAKE_C_COMPILER \"clang\" CACHE STRING \"\")\nset(CMAKE_CXX_COMPILER \"clang++\" CACHE STRING \"\")\nset(CMAKE_CUDA_COMPILER "
  },
  {
    "path": "cmake/caches/ci/profiler/cuda.cmake",
    "chars": 856,
    "preview": "set(BUILD_CUDA YES CACHE BOOL \"\")\nset(BUILD_GIT_VERSION YES CACHE BOOL \"\")\nset(BUILD_TESTING OFF CACHE BOOL \"\")\nset(BUIL"
  },
  {
    "path": "cmake/caches/ci/release/cpu.cmake",
    "chars": 866,
    "preview": "set(BUILD_CUDA OFF CACHE BOOL \"\")\nset(BUILD_GIT_VERSION YES CACHE BOOL \"\")\nset(BUILD_TESTING OFF CACHE BOOL \"\")\nset(TREA"
  },
  {
    "path": "cmake/caches/ci/release/cu118.cmake",
    "chars": 988,
    "preview": "set(BUILD_CUDA YES CACHE BOOL \"\")\nset(BUILD_GIT_VERSION YES CACHE BOOL \"\")\nset(BUILD_TESTING OFF CACHE BOOL \"\")\nset(BUIL"
  },
  {
    "path": "cmake/caches/ci/release/cuda.cmake",
    "chars": 900,
    "preview": "set(BUILD_CUDA YES CACHE BOOL \"\")\nset(BUILD_GIT_VERSION YES CACHE BOOL \"\")\nset(BUILD_TESTING OFF CACHE BOOL \"\")\nset(BUIL"
  },
  {
    "path": "cmake/caches/ci/serving/cuda-75.cmake",
    "chars": 937,
    "preview": "set(BUILD_CUDA YES CACHE BOOL \"\")\nset(BUILD_TESTING YES CACHE BOOL \"\")\nset(BUILD_CPP_API YES CACHE BOOL \"\")\nset(WITH_MLI"
  },
  {
    "path": "cmake/caches/ci/serving/openvino.cmake",
    "chars": 1169,
    "preview": "set(BUILD_SHARED_LIBS YES CACHE BOOL \"\")\n# uncomment only if you know what you are doing\n# set(CMAKE_LINK_DEPENDS_NO_SHA"
  },
  {
    "path": "cmake/caches/cn/cpu.cmake",
    "chars": 266,
    "preview": "set(BUILD_CUDA NO CACHE BOOL \"\")\nset(BUILD_NPU NO CACHE BOOL \"\")\nset(BUILD_MLU NO CACHE BOOL \"\")\nset(BUILD_SHARED_LIBS Y"
  },
  {
    "path": "cmake/caches/cn/cuda.cmake",
    "chars": 203,
    "preview": "set(BUILD_CUDA YES CACHE BOOL \"\")\nset(BUILD_SHARED_LIBS YES CACHE BOOL \"\")\nset(THIRD_PARTY_MIRROR aliyun CACHE STRING \"\""
  },
  {
    "path": "cmake/caches/cn/fast/cpu-clang.cmake",
    "chars": 934,
    "preview": "set(CMAKE_C_COMPILER \"clang\" CACHE STRING \"\")\nset(CMAKE_CXX_COMPILER \"clang++\" CACHE STRING \"\")\nset(CMAKE_EXE_LINKER_FLA"
  },
  {
    "path": "cmake/caches/cn/fast/cpu.cmake",
    "chars": 640,
    "preview": "set(BUILD_SHARED_LIBS YES CACHE BOOL \"\")\n# uncomment only if you know what you are doing\n# set(CMAKE_LINK_DEPENDS_NO_SHA"
  },
  {
    "path": "cmake/caches/cn/fast/cuda-61-clang.cmake",
    "chars": 1043,
    "preview": "set(CMAKE_C_COMPILER \"clang\" CACHE STRING \"\")\nset(CMAKE_CXX_COMPILER \"clang++\" CACHE STRING \"\")\nset(CMAKE_EXE_LINKER_FLA"
  },
  {
    "path": "cmake/caches/cn/fast/cuda-61.cmake",
    "chars": 749,
    "preview": "set(BUILD_CUDA YES CACHE BOOL \"\")\nset(BUILD_TESTING YES CACHE BOOL \"\")\nset(BUILD_SHARED_LIBS YES CACHE BOOL \"\")\n# uncomm"
  },
  {
    "path": "cmake/caches/cn/fast/cuda-75-clang.cmake",
    "chars": 1122,
    "preview": "set(CMAKE_C_COMPILER \"clang\" CACHE STRING \"\")\nset(WITH_MLIR YES CACHE BOOL \"\")\nset(WITH_MLIR_CUDA_CODEGEN YES CACHE BOOL"
  },
  {
    "path": "cmake/caches/cn/fast/cuda-75.cmake",
    "chars": 1307,
    "preview": "set(BUILD_CUDA YES CACHE BOOL \"\")\nset(BUILD_TESTING YES CACHE BOOL \"\")\nset(BUILD_SHARED_LIBS YES CACHE BOOL \"\")\n# uncomm"
  },
  {
    "path": "cmake/caches/cn/fast/cuda-86.cmake",
    "chars": 749,
    "preview": "set(BUILD_CUDA YES CACHE BOOL \"\")\nset(BUILD_TESTING YES CACHE BOOL \"\")\nset(BUILD_SHARED_LIBS YES CACHE BOOL \"\")\n# uncomm"
  },
  {
    "path": "cmake/caches/cn/fast/mlir-cpu.cmake",
    "chars": 1073,
    "preview": "set(BUILD_SHARED_LIBS YES CACHE BOOL \"\")\n# uncomment only if you know what you are doing\n# set(CMAKE_LINK_DEPENDS_NO_SHA"
  },
  {
    "path": "cmake/caches/cn/fast/mlir-cuda-61.cmake",
    "chars": 1361,
    "preview": "set(BUILD_SHARED_LIBS YES CACHE BOOL \"\")\n# uncomment only if you know what you are doing\n# set(CMAKE_LINK_DEPENDS_NO_SHA"
  },
  {
    "path": "cmake/caches/cn/fast/mlir-cuda-75.cmake",
    "chars": 1295,
    "preview": "set(BUILD_SHARED_LIBS YES CACHE BOOL \"\")\n# uncomment only if you know what you are doing\n# set(CMAKE_LINK_DEPENDS_NO_SHA"
  },
  {
    "path": "cmake/caches/cn/fast/mlir-cuda-80.cmake",
    "chars": 1438,
    "preview": "set(BUILD_SHARED_LIBS YES CACHE BOOL \"\")\n# uncomment only if you know what you are doing\n# set(CMAKE_LINK_DEPENDS_NO_SHA"
  },
  {
    "path": "cmake/caches/cn/fast/mlir-cuda-86.cmake",
    "chars": 1438,
    "preview": "set(BUILD_SHARED_LIBS YES CACHE BOOL \"\")\n# uncomment only if you know what you are doing\n# set(CMAKE_LINK_DEPENDS_NO_SHA"
  },
  {
    "path": "cmake/caches/international/cpu.cmake",
    "chars": 127,
    "preview": "set(BUILD_CUDA NO CACHE BOOL \"\")\nset(BUILD_SHARED_LIBS YES CACHE BOOL \"\")\nset(CMAKE_BUILD_TYPE RelWithDebInfo CACHE STRI"
  },
  {
    "path": "cmake/caches/international/cuda.cmake",
    "chars": 128,
    "preview": "set(BUILD_CUDA YES CACHE BOOL \"\")\nset(BUILD_SHARED_LIBS YES CACHE BOOL \"\")\nset(CMAKE_BUILD_TYPE RelWithDebInfo CACHE STR"
  },
  {
    "path": "cmake/cuda.cmake",
    "chars": 5267,
    "preview": "if(BUILD_CUDA)\n  if(DEFINED CUDA_TOOLKIT_ROOT_DIR)\n    message(WARNING \"CUDA_TOOLKIT_ROOT_DIR is deprecated, use CUDAToo"
  },
  {
    "path": "cmake/functional.cmake",
    "chars": 5824,
    "preview": "function(GENERATE_FUNCTIONAL_API_AND_PYBIND11_CPP SRCS HDRS PYBIND_SRCS ROOT_DIR)\n  set(YAML_FILE ${PROJECT_SOURCE_DIR}/"
  },
  {
    "path": "cmake/git_version.cmake",
    "chars": 748,
    "preview": "cmake_minimum_required(VERSION 3.5)\nexecute_process(\n  COMMAND git describe --tags --always --dirty=-snapshot\n  WORKING_"
  },
  {
    "path": "cmake/oneflow-config.cmake",
    "chars": 794,
    "preview": "if(DEFINED ENV{ONEFLOW_INSTALL_PREFIX})\n  set(ONEFLOW_INSTALL_PREFIX $ENV{ONEFLOW_INSTALL_PREFIX})\nelse()\n  get_filename"
  },
  {
    "path": "cmake/oneflow.cmake",
    "chars": 30748,
    "preview": "include(python)\n\nfunction(oneflow_add_executable)\n  add_executable(${ARGV})\n  set_compile_options_to_oneflow_target(${AR"
  },
  {
    "path": "cmake/op_schema.cmake",
    "chars": 3131,
    "preview": "get_property(LLVM_INSTALL_DIR GLOBAL PROPERTY LLVM_INSTALL_DIR)\nset(LLVM_INSTALL_DIR ${THIRD_PARTY_DIR}/llvm)\nset(LLVM_D"
  },
  {
    "path": "cmake/platform.cmake",
    "chars": 1534,
    "preview": "if(WIN32)\n  set(CMAKE_BUILD_TYPE Debug)\n  add_definitions(-DNOMINMAX -D_WIN32_WINNT=0x0A00 -DLANG_CXX11 -DCOMPILER_MSVC\n"
  },
  {
    "path": "cmake/proto2cpp.cmake",
    "chars": 1530,
    "preview": "function(RELATIVE_PROTOBUF_GENERATE_CPP SRCS HDRS ROOT_DIR)\n  if(NOT ARGN)\n    message(SEND_ERROR \"Error: RELATIVE_PROTO"
  },
  {
    "path": "cmake/pybind11.cmake",
    "chars": 299,
    "preview": "include(FetchContent)\n\nset_mirror_url_with_hash(PYBIND11_URL https://github.com/pybind/pybind11/archive/v2.11.1.zip\n    "
  },
  {
    "path": "cmake/python.cmake",
    "chars": 3423,
    "preview": "if(NOT DEFINED Python3_EXECUTABLE)\n  execute_process(\n    COMMAND which python3\n    RESULT_VARIABLE STATUS\n    OUTPUT_VA"
  },
  {
    "path": "cmake/third_party/FindBFD.cmake",
    "chars": 1289,
    "preview": "# - BFD Library module.\n#=============================================================================\n# This module fin"
  },
  {
    "path": "cmake/third_party/FindBLAS.cmake",
    "chars": 19421,
    "preview": "#.rst:\n# FindBLAS\n# --------\n#\n# Find BLAS library\n#\n# This module finds an installed fortran library that implements th"
  },
  {
    "path": "cmake/third_party/FindCUDNN.cmake",
    "chars": 3654,
    "preview": "# - Try to find cuDNN\n#\n# The following variables are optionally searched for defaults\n#  CUDNN_ROOT_DIR:            Bas"
  },
  {
    "path": "cmake/third_party/FindUnwind.cmake",
    "chars": 2169,
    "preview": "# - Try to find libunwind\n# Once done this will define\n#\n#  Unwind_FOUND - system has libunwind\n#  unwind::unwind - cmak"
  },
  {
    "path": "cmake/third_party/absl.cmake",
    "chars": 2105,
    "preview": "include(ExternalProject)\ninclude(GNUInstallDirs)\n\nset(ABSL_PROJECT absl)\nset(ABSL_TAR_URL https://github.com/abseil/abse"
  },
  {
    "path": "cmake/third_party/cares.cmake",
    "chars": 580,
    "preview": "include(ExternalProject)\nset(CARES_TAR_URL\n    https://github.com/c-ares/c-ares/releases/download/cares-1_15_0/c-ares-1."
  },
  {
    "path": "cmake/third_party/cocoapi.cmake",
    "chars": 2043,
    "preview": "include(ExternalProject)\n\nset(COCOAPI_INCLUDE_DIR ${THIRD_PARTY_DIR}/cocoapi/include)\nset(COCOAPI_LIBRARY_DIR ${THIRD_PA"
  },
  {
    "path": "cmake/third_party/cub.cmake",
    "chars": 697,
    "preview": "include(ExternalProject)\n\nset(CUB_INCLUDE_DIR ${THIRD_PARTY_DIR}/cub/include)\nset(CUB_BUILD_INCLUDE ${CMAKE_CURRENT_BINA"
  },
  {
    "path": "cmake/third_party/cutlass.cmake",
    "chars": 4434,
    "preview": "include(ExternalProject)\n\nif(CMAKE_CXX_COMPILER_ID STREQUAL \"Clang\")\n  set(WITH_CUTLASS_INIT OFF)\nelse()\n  set(WITH_CUTL"
  },
  {
    "path": "cmake/third_party/eigen.cmake",
    "chars": 1122,
    "preview": "include(ExternalProject)\n\nset(EIGEN_INCLUDE_DIR ${THIRD_PARTY_DIR}/eigen/include/eigen3)\nset(EIGEN_INSTALL_DIR ${THIRD_P"
  },
  {
    "path": "cmake/third_party/flash_attention.cmake",
    "chars": 1800,
    "preview": "include(ExternalProject)\n\nfind_package(Threads)\n\n# NOTE: A git version of 1.6.5 or later is required if this download me"
  },
  {
    "path": "cmake/third_party/flatbuffers.cmake",
    "chars": 1763,
    "preview": "include(ExternalProject)\n\nset(FLATBUFFERS_URL https://github.com/google/flatbuffers/archive/v1.12.0.tar.gz)\n\nset(FLATBUF"
  },
  {
    "path": "cmake/third_party/glog.cmake",
    "chars": 547,
    "preview": "include(ExternalProject)\n\nset_mirror_url_with_hash(glog_URL https://github.com/google/glog/archive/refs/tags/v0.5.0.tar."
  },
  {
    "path": "cmake/third_party/googletest.cmake",
    "chars": 301,
    "preview": "include(FetchContent)\n\nset_mirror_url_with_hash(\n  googletest_URL https://github.com/google/googletest/archive/release-1"
  },
  {
    "path": "cmake/third_party/grpc.cmake",
    "chars": 3617,
    "preview": "include(ExternalProject)\n\nset(GRPC_INSTALL_DIR ${THIRD_PARTY_DIR}/grpc)\nset(GRPC_INSTALL_INCLUDE_DIR include)\nset(GRPC_I"
  },
  {
    "path": "cmake/third_party/half.cmake",
    "chars": 1122,
    "preview": "include(ExternalProject)\n\nset(HALF_INCLUDE_DIR ${THIRD_PARTY_DIR}/half/include)\n\nset(HALF_URL https://github.com/Oneflow"
  },
  {
    "path": "cmake/third_party/header_index/cub_headers.txt",
    "chars": 2794,
    "preview": "config.cuh\ncub.cuh\nutil_allocator.cuh\nutil_arch.cuh\nutil_compiler.cuh\nutil_cpp_dialect.cuh\nutil_debug.cuh\nutil_deprecate"
  },
  {
    "path": "cmake/third_party/header_index/grpc_headers.txt",
    "chars": 9403,
    "preview": "grpc++/alarm.h\ngrpc++/channel.h\ngrpc++/client_context.h\ngrpc++/completion_queue.h\ngrpc++/create_channel.h\ngrpc++/create_"
  },
  {
    "path": "cmake/third_party/header_index/libpng_headers.txt",
    "chars": 72,
    "preview": "png.h\npngconf.h\npngdebug.h\npnginfo.h\npnglibconf.h\npngpriv.h\npngstruct.h\n"
  },
  {
    "path": "cmake/third_party/header_index/opencv_headers.txt",
    "chars": 2918,
    "preview": "opencv2/cvconfig.h\nopencv2/core/cv_cpu_dispatch.h\nopencv2/core/types_c.h\nopencv2/core/cvdef.h\nopencv2/core/core_c.h\nopen"
  },
  {
    "path": "cmake/third_party/hwloc.cmake",
    "chars": 3891,
    "preview": "include(ExternalProject)\n\nif(UNIX AND NOT APPLE)\n  set(BUILD_HWLOC_DEFAULT ON)\nelse()\n  set(BUILD_HWLOC_DEFAULT OFF)\nend"
  },
  {
    "path": "cmake/third_party/json.cmake",
    "chars": 330,
    "preview": "include(FetchContent)\n\nset_mirror_url_with_hash(JSON_URL https://github.com/nlohmann/json/archive/refs/tags/v3.11.2.zip\n"
  },
  {
    "path": "cmake/third_party/libjpeg-turbo.cmake",
    "chars": 3821,
    "preview": "include(ExternalProject)\n\nset(LIBJPEG_INCLUDE_DIR ${THIRD_PARTY_DIR}/libjpeg-turbo/include)\nset(LIBJPEG_LIBRARY_DIR ${TH"
  },
  {
    "path": "cmake/third_party/nccl.cmake",
    "chars": 2947,
    "preview": "option(NCCL_STATIC \"\" ON)\nif(OF_CUDA_LINK_DYNAMIC_LIBRARY)\n  set(NCCL_STATIC OFF)\nendif()\noption(USE_SYSTEM_NCCL \"\" OFF)"
  },
  {
    "path": "cmake/third_party/oneDNN.cmake",
    "chars": 2975,
    "preview": "include(ExternalProject)\ninclude(GNUInstallDirs)\n\nset(ONEDNN_INSTALL_DIR ${THIRD_PARTY_DIR}/onednn)\nset(ONEDNN_INCLUDE_D"
  },
  {
    "path": "cmake/third_party/opencv.cmake",
    "chars": 6172,
    "preview": "include(ExternalProject)\ninclude(GNUInstallDirs)\n\nset(OPENCV_INSTALL_DIR ${THIRD_PARTY_DIR}/opencv)\nset(OPENCV_INCLUDE_D"
  },
  {
    "path": "cmake/third_party/openssl.cmake",
    "chars": 1528,
    "preview": "include(ExternalProject)\n\nset(OPENSSL_INSTALL ${THIRD_PARTY_DIR}/openssl)\nset(OPENSSL_INCLUDE_DIR ${THIRD_PARTY_DIR}/ope"
  },
  {
    "path": "cmake/third_party/patches/tensorflow-logging.patch",
    "chars": 534,
    "preview": "--- ./build/third_party_install/tensorflow/include/tensorflow_inc/tensorflow/stream_executor/platform/logging.h\t2021-06-"
  },
  {
    "path": "cmake/third_party/protobuf.cmake",
    "chars": 3227,
    "preview": "include(ExternalProject)\n\nset(PROTOBUF_INSTALL_DIR ${THIRD_PARTY_DIR}/protobuf)\nset(PROTOBUF_INSTALL_INCLUDEDIR include)"
  },
  {
    "path": "cmake/third_party/re2.cmake",
    "chars": 1395,
    "preview": "include(ExternalProject)\n\nset(RE2_PROJECT re2)\n\nset(RE2_INSTALL_DIR ${THIRD_PARTY_DIR}/re2)\n\nset(RE2_INCLUDE_DIR ${RE2_I"
  },
  {
    "path": "cmake/third_party/trt_flash_attention.cmake",
    "chars": 1773,
    "preview": "include(ExternalProject)\n\nfind_package(Threads)\n\nset(TRT_FLASH_ATTENTION_PROJECT trt_flash_attention)\n\nset(TRT_FLASH_ATT"
  },
  {
    "path": "cmake/third_party/zlib.cmake",
    "chars": 1889,
    "preview": "include(ExternalProject)\n\nset(ZLIB_INSTALL ${THIRD_PARTY_DIR}/zlib)\nset(ZLIB_INCLUDE_DIR ${ZLIB_INSTALL}/include)\nset(ZL"
  },
  {
    "path": "cmake/third_party.cmake",
    "chars": 7922,
    "preview": "cmake_policy(SET CMP0074 NEW)\nif(NOT WIN32)\n  find_package(Threads)\nendif()\n\nif(WITH_ZLIB)\n  include(zlib)\nendif()\ninclu"
  },
  {
    "path": "cmake/threading.cmake",
    "chars": 935,
    "preview": "foreach(threading_runtime_item ${CPU_THREADING_RUNTIMES})\n  if(NOT ${threading_runtime_item} MATCHES \"^(TBB|OMP)$\")\n    "
  },
  {
    "path": "cmake/util.cmake",
    "chars": 12314,
    "preview": "function(SHOW_VARIABLES)\n  get_cmake_property(_variableNames VARIABLES)\n  foreach(_variableName ${_variableNames})\n    m"
  },
  {
    "path": "dev-requirements.txt",
    "chars": 549,
    "preview": "black==19.10b0; python_version >= \"3.6\"\nclick==8.0.0; python_version >= \"3.6\" # https://github.com/psf/black/issues/2964"
  },
  {
    "path": "docker/build/Dockerfile",
    "chars": 2485,
    "preview": "# warning: never share the container image this dockerfile produces\nARG CUDA=10.0\n\nFROM nvidia/cuda:${CUDA}-cudnn7-devel"
  },
  {
    "path": "docker/build/build-ubuntu.sh",
    "chars": 92,
    "preview": "docker build \\\n  --rm \\\n  -t oneflow-build:ubuntu -f docker/build/build.ubuntu.dockerfile .\n"
  },
  {
    "path": "docker/build/build.sh",
    "chars": 72,
    "preview": "docker build \\\n  --rm \\\n  -t oneflow-build -f docker/build/Dockerfile .\n"
  },
  {
    "path": "docker/build/build.ubuntu.dockerfile",
    "chars": 1377,
    "preview": "ARG CUDA=10.0\nARG UBUNTU_VERSION=16.04\nFROM nvidia/cuda:${CUDA}-cudnn7-devel-ubuntu${UBUNTU_VERSION}\n\nUSER 0\n\nRUN apt-ge"
  },
  {
    "path": "docker/build/launch.sh",
    "chars": 63,
    "preview": "docker run -it --rm \\\n\t-v /dataset:/dataset/ \\\n\toneflow-build \n"
  },
  {
    "path": "docker/build/test.sh",
    "chars": 96,
    "preview": "docker run -it --rm \\\n\t-v /dataset:/dataset/ \\\n\toneflow-build \\\n    python3 -c \"import oneflow\"\n"
  },
  {
    "path": "docker/ci/base/Dockerfile",
    "chars": 1218,
    "preview": "# warning: never share the container image this dockerfile produces\nARG CUDA=10.0\n\nFROM nvidia/cuda:${CUDA}-cudnn7-devel"
  },
  {
    "path": "docker/ci/fmt/Dockerfile",
    "chars": 209,
    "preview": "FROM python:3.7\nRUN curl https://oneflow-static.oss-cn-beijing.aliyuncs.com/bin/clang-format -o /usr/local/bin/clang-for"
  },
  {
    "path": "docker/ci/fmt/build.sh",
    "chars": 55,
    "preview": "set -ex\ncd docker/ci/fmt\ndocker build -t oneflow-fmt .\n"
  },
  {
    "path": "docker/ci/make/Dockerfile",
    "chars": 519,
    "preview": "ARG from\nFROM ${from}\nWORKDIR /workspace/build\n\n# BUILD ONEFLOW\nCOPY oneflow /workspace/oneflow\nCOPY tools /workspace/to"
  },
  {
    "path": "docker/ci/test/Dockerfile",
    "chars": 802,
    "preview": "FROM ufoym/deepo\n\nRUN apt remove openmpi-common libfabric1 openmpi-bin librdmacm1:amd64 libopenmpi2 libopenmpi2:amd64 -y"
  },
  {
    "path": "docker/ci/test/build.sh",
    "chars": 571,
    "preview": "set -ex\ntest_img_dir=\"$(dirname \"${BASH_SOURCE[0]}\")\"\ntest_img_dir=\"$(realpath \"${test_img_dir}\")\"\ncd $test_img_dir\n\npro"
  },
  {
    "path": "docker/ci/test/launch.sh",
    "chars": 184,
    "preview": "docker run --shm-size=8g --privileged --network=host --rm -it -w $PWD -v $PWD:$PWD -v /dataset:/dataset -v /model_zoo:/m"
  },
  {
    "path": "docker/ci/test/requirements.txt",
    "chars": 323,
    "preview": "sphinx==3.5.4\njinja2<3.1\nrecommonmark==0.6.0\nfuro==2021.4.11b34\nsphinx-copybutton==0.5.0\n# dependencies above must be id"
  },
  {
    "path": "docker/ci/test-v2/Dockerfile",
    "chars": 296,
    "preview": "FROM pytorch/pytorch:1.9.0-cuda10.2-cudnn7-runtime\nCOPY sources.list /etc/apt/sources.list\nRUN apt update && apt install"
  },
  {
    "path": "docker/ci/test-v2/build.sh",
    "chars": 574,
    "preview": "set -ex\ntest_img_dir=\"$(dirname \"${BASH_SOURCE[0]}\")\"\ntest_img_dir=\"$(realpath \"${test_img_dir}\")\"\ncd $test_img_dir\n\npro"
  },
  {
    "path": "docker/ci/test-v2/requirements.txt",
    "chars": 298,
    "preview": "sphinx==3.5.4\njinja2<3.1\nrecommonmark==0.6.0\nfuro==2021.4.11b34\nsphinx-copybutton==0.5.0\n# dependencies above must be id"
  },
  {
    "path": "docker/ci/test-v2/sources.list",
    "chars": 1071,
    "preview": "# 默认注释了源码镜像以提高 apt update 速度，如有需要可自行取消注释\ndeb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ focal main restricted universe"
  },
  {
    "path": "docker/ci/third_party/Dockerfile",
    "chars": 426,
    "preview": "ARG from\nFROM ${from}\nWORKDIR /workspace/build\n\nCOPY cmake /workspace/cmake\nCOPY CMakeLists.txt /workspace/CMakeLists.tx"
  },
  {
    "path": "docker/package/manylinux/CentOS-Base.repo",
    "chars": 2300,
    "preview": "# CentOS-Base.repo\n#\n# From https://mirror.tuna.tsinghua.edu.cn/help/centos/\n#\n# The mirror system uses the connecting I"
  },
  {
    "path": "docker/package/manylinux/CentOS7-Base-163.repo",
    "chars": 1571,
    "preview": "# CentOS-Base.repo\n#\n# The mirror system uses the connecting IP address of the client and the\n# update status of each mi"
  },
  {
    "path": "docker/package/manylinux/Dockerfile",
    "chars": 2522,
    "preview": "ARG from\nFROM ${from}\nARG use_tuna_yum=0\nARG pip_args=\"\"\nARG bazel_url=\"https://github.com/bazelbuild/bazel/releases/dow"
  },
  {
    "path": "docker/package/manylinux/README.md",
    "chars": 1918,
    "preview": "# 使用 docker 生成 OneFlow wheel 包\n\n### 创建 docker 容器\n\n在 OneFlow 源码根目录下运行:\n```\ndocker build -f docker/package/manylinux/Docke"
  },
  {
    "path": "docker/package/manylinux/build_wheel.py",
    "chars": 21160,
    "preview": "import os\nimport subprocess\nimport tempfile\nfrom pathlib import Path\nimport getpass\nimport uuid\n\n\ndef get_arg_env(env_va"
  },
  {
    "path": "docker/package/manylinux/launch.sh",
    "chars": 105,
    "preview": "set -ex\ndocker run --rm -it \\\n    -v `pwd`:`pwd` \\\n    -w `pwd` oneflow:rel-manylinux2014-cuda-11.0 bash\n"
  },
  {
    "path": "docs/Makefile",
    "chars": 596,
    "preview": "# Minimal makefile for Sphinx documentation\n#\n\n# You can set these variables from the command line.\nSPHINXOPTS    =\nSPHI"
  },
  {
    "path": "docs/requirements.txt",
    "chars": 215,
    "preview": "sphinx==3.5.4\njinja2<3.1\nrecommonmark==0.6.0\nfuro==2021.4.11b34\nsphinx-copybutton==0.5.0\n# above are dev dependencies\n--"
  },
  {
    "path": "docs/source/_static/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "docs/source/auto_parallel.rst",
    "chars": 2374,
    "preview": "Auto Parallelism\n====================================================\n\nAs the scale of deep-learning models grows larger"
  },
  {
    "path": "docs/source/autograd.rst",
    "chars": 2760,
    "preview": "oneflow.autograd\n====================================================\n\n.. The documentation is referenced from:\n   https"
  },
  {
    "path": "docs/source/cn/__init__.py",
    "chars": 50,
    "preview": "from .math_ops import *\nfrom .activation import *\n"
  },
  {
    "path": "docs/source/cn/activation.py",
    "chars": 701,
    "preview": "import oneflow\nfrom oneflow.framework.docstr.utils import reset_docstr\n\nreset_docstr(\n    oneflow.nn.ReLU,\n    r\"\"\"ReLU("
  },
  {
    "path": "docs/source/cn/math_ops.py",
    "chars": 1027,
    "preview": "import oneflow\nfrom oneflow.framework.docstr.utils import reset_docstr\n\nreset_docstr(\n    oneflow.add,\n    r\"\"\"add(input"
  },
  {
    "path": "docs/source/conf.py",
    "chars": 6116,
    "preview": "# -*- coding: utf-8 -*-\n#\n# Configuration file for the Sphinx documentation builder.\n#\n# This file does only contain a s"
  }
]

// ... and 4308 more files (download for full content)

About this extraction

This page contains the full source code of the Oneflow-Inc/oneflow GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 4508 files (25.3 MB), approximately 6.9M tokens, and a symbol index with 25463 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo