Full Code of mlc-ai/mlc-llm for AI

main 20d7fb309664 cached

661 files

4.0 MB

1.1M tokens

3167 symbols

1 requests

Download .txt

Showing preview only (4,296K chars total). Download the full file or copy to clipboard to get everything.

Repository: mlc-ai/mlc-llm
Branch: main
Commit: 20d7fb309664
Files: 661
Total size: 4.0 MB

Directory structure:
gitextract_s4bq7ahm/

├── .clang-format
├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug-report.md
│   │   ├── config.yml
│   │   ├── documentation.md
│   │   ├── feature-request.md
│   │   ├── general.md
│   │   ├── model-request.md
│   │   ├── speed-report.md
│   │   └── tracking.md
│   └── workflows/
│       ├── documentation.yaml
│       ├── update-relax.yaml
│       └── windows-build.yaml
├── .gitignore
├── .gitmodules
├── .pre-commit-config.yaml
├── .pylintrc
├── CMakeLists.txt
├── CONTRIBUTORS.md
├── LICENSE
├── NOTICE
├── README.md
├── android/
│   ├── .gitignore
│   ├── MLCChat/
│   │   ├── README.md
│   │   ├── app/
│   │   │   ├── .gitignore
│   │   │   ├── build.gradle
│   │   │   ├── proguard-rules.pro
│   │   │   └── src/
│   │   │       └── main/
│   │   │           ├── AndroidManifest.xml
│   │   │           ├── java/
│   │   │           │   └── ai/
│   │   │           │       └── mlc/
│   │   │           │           └── mlcchat/
│   │   │           │               ├── AppViewModel.kt
│   │   │           │               ├── ChatView.kt
│   │   │           │               ├── MainActivity.kt
│   │   │           │               ├── NavView.kt
│   │   │           │               ├── StartView.kt
│   │   │           │               └── ui/
│   │   │           │                   └── theme/
│   │   │           │                       ├── Color.kt
│   │   │           │                       ├── Theme.kt
│   │   │           │                       └── Type.kt
│   │   │           └── res/
│   │   │               ├── drawable/
│   │   │               │   ├── ic_android_black_24dp.xml
│   │   │               │   └── mlc_logo_108.xml
│   │   │               ├── values/
│   │   │               │   ├── colors.xml
│   │   │               │   ├── strings.xml
│   │   │               │   └── themes.xml
│   │   │               └── xml/
│   │   │                   ├── backup_rules.xml
│   │   │                   └── data_extraction_rules.xml
│   │   ├── build.gradle
│   │   ├── bundle_weight.py
│   │   ├── gradle/
│   │   │   └── wrapper/
│   │   │       ├── gradle-wrapper.jar
│   │   │       └── gradle-wrapper.properties
│   │   ├── gradle.properties
│   │   ├── gradlew
│   │   ├── gradlew.bat
│   │   ├── mlc-package-config.json
│   │   └── settings.gradle
│   ├── MLCEngineExample/
│   │   ├── README.md
│   │   ├── app/
│   │   │   ├── .gitignore
│   │   │   ├── build.gradle
│   │   │   ├── proguard-rules.pro
│   │   │   └── src/
│   │   │       └── main/
│   │   │           ├── AndroidManifest.xml
│   │   │           ├── java/
│   │   │           │   └── ai/
│   │   │           │       └── mlc/
│   │   │           │           └── mlcengineexample/
│   │   │           │               ├── MainActivity.kt
│   │   │           │               └── ui/
│   │   │           │                   └── theme/
│   │   │           │                       ├── Color.kt
│   │   │           │                       ├── Theme.kt
│   │   │           │                       └── Type.kt
│   │   │           └── res/
│   │   │               ├── drawable/
│   │   │               │   ├── ic_android_black_24dp.xml
│   │   │               │   └── mlc_logo_108.xml
│   │   │               ├── values/
│   │   │               │   ├── colors.xml
│   │   │               │   ├── strings.xml
│   │   │               │   └── themes.xml
│   │   │               └── xml/
│   │   │                   ├── backup_rules.xml
│   │   │                   └── data_extraction_rules.xml
│   │   ├── build.gradle
│   │   ├── bundle_weight.py
│   │   ├── gradle/
│   │   │   └── wrapper/
│   │   │       ├── gradle-wrapper.jar
│   │   │       └── gradle-wrapper.properties
│   │   ├── gradle.properties
│   │   ├── gradlew
│   │   ├── gradlew.bat
│   │   ├── mlc-package-config.json
│   │   └── settings.gradle
│   ├── README.md
│   └── mlc4j/
│       ├── .gitignore
│       ├── CMakeLists.txt
│       ├── build.gradle
│       ├── prepare_libs.py
│       └── src/
│           ├── cpp/
│           │   └── tvm_runtime.h
│           └── main/
│               ├── AndroidManifest.xml
│               └── java/
│                   └── ai/
│                       └── mlc/
│                           └── mlcllm/
│                               ├── JSONFFIEngine.java
│                               ├── MLCEngine.kt
│                               └── OpenAIProtocol.kt
├── ci/
│   ├── bash.sh
│   ├── build-environment.yaml
│   ├── jenkinsfile.groovy
│   └── task/
│       ├── black.sh
│       ├── build_clean.sh
│       ├── build_lib.sh
│       ├── build_win.bat
│       ├── clang-format.sh
│       ├── isort.sh
│       ├── mypy.sh
│       ├── pylint.sh
│       ├── test_model_compile.sh
│       └── test_unittest.sh
├── cmake/
│   └── gen_cmake_config.py
├── cpp/
│   ├── base.h
│   ├── json_ffi/
│   │   ├── conv_template.cc
│   │   ├── conv_template.h
│   │   ├── image_utils.cc
│   │   ├── image_utils.h
│   │   ├── json_ffi_engine.cc
│   │   ├── json_ffi_engine.h
│   │   ├── openai_api_protocol.cc
│   │   └── openai_api_protocol.h
│   ├── metadata/
│   │   ├── model.cc
│   │   └── model.h
│   ├── multi_gpu/
│   │   ├── builtin.cc
│   │   └── multi_gpu_loader.cc
│   ├── serve/
│   │   ├── config.cc
│   │   ├── config.h
│   │   ├── data.cc
│   │   ├── data.h
│   │   ├── draft_token_workspace_manager.cc
│   │   ├── draft_token_workspace_manager.h
│   │   ├── engine.cc
│   │   ├── engine.h
│   │   ├── engine_actions/
│   │   │   ├── action.cc
│   │   │   ├── action.h
│   │   │   ├── action_commons.cc
│   │   │   ├── action_commons.h
│   │   │   ├── auto_spec_decode.cc
│   │   │   ├── batch_decode.cc
│   │   │   ├── batch_draft.cc
│   │   │   ├── batch_jumpforward.cc
│   │   │   ├── batch_prefill_base.cc
│   │   │   ├── batch_prefill_base.h
│   │   │   ├── batch_verify.cc
│   │   │   ├── disagg_prepare_recv.cc
│   │   │   ├── disagg_remote_send.cc
│   │   │   ├── eagle_batch_draft.cc
│   │   │   ├── eagle_batch_verify.cc
│   │   │   ├── eagle_new_request_prefill.cc
│   │   │   └── new_request_prefill.cc
│   │   ├── engine_state.cc
│   │   ├── engine_state.h
│   │   ├── event_trace_recorder.cc
│   │   ├── event_trace_recorder.h
│   │   ├── function_table.cc
│   │   ├── function_table.h
│   │   ├── logit_processor.cc
│   │   ├── logit_processor.h
│   │   ├── metrics.cc
│   │   ├── metrics.h
│   │   ├── model.cc
│   │   ├── model.h
│   │   ├── prefix_cache.cc
│   │   ├── prefix_cache.h
│   │   ├── radix_tree.cc
│   │   ├── radix_tree.h
│   │   ├── request.cc
│   │   ├── request.h
│   │   ├── request_state.cc
│   │   ├── request_state.h
│   │   ├── sampler/
│   │   │   ├── cpu_sampler.cc
│   │   │   ├── gpu_sampler.cc
│   │   │   └── sampler.h
│   │   ├── threaded_engine.cc
│   │   └── threaded_engine.h
│   ├── support/
│   │   ├── debug_utils.h
│   │   ├── dynamic_bitset.h
│   │   ├── encoding.cc
│   │   ├── encoding.h
│   │   ├── json_parser.h
│   │   ├── load_bytes_from_file.h
│   │   ├── progress_bar.h
│   │   ├── random.h
│   │   ├── result.h
│   │   ├── utils.h
│   │   ├── vlm_utils.cc
│   │   └── vlm_utils.h
│   └── tokenizers/
│       ├── streamer.cc
│       ├── streamer.h
│       ├── tokenizers.cc
│       └── tokenizers.h
├── docs/
│   ├── .gitignore
│   ├── Makefile
│   ├── README.md
│   ├── community/
│   │   ├── faq.rst
│   │   └── guideline.rst
│   ├── compilation/
│   │   ├── compile_models.rst
│   │   ├── configure_quantization.rst
│   │   ├── convert_weights.rst
│   │   ├── define_new_models.rst
│   │   └── package_libraries_and_weights.rst
│   ├── conf.py
│   ├── deploy/
│   │   ├── android.rst
│   │   ├── cli.rst
│   │   ├── ide_integration.rst
│   │   ├── ios.rst
│   │   ├── mlc_chat_config.rst
│   │   ├── python_engine.rst
│   │   ├── rest.rst
│   │   └── webllm.rst
│   ├── get_started/
│   │   ├── introduction.rst
│   │   └── quick_start.rst
│   ├── index.rst
│   ├── install/
│   │   ├── conda.rst
│   │   ├── emcc.rst
│   │   ├── gpu.rst
│   │   ├── mlc_llm.rst
│   │   └── tvm.rst
│   ├── make.bat
│   ├── microserving/
│   │   └── tutorial.rst
│   ├── privacy.rst
│   └── requirements.txt
├── examples/
│   ├── python/
│   │   ├── microserving/
│   │   │   └── custom_router.py
│   │   └── sample_mlc_engine.py
│   └── rest/
│       ├── nodejs/
│       │   ├── README.MD
│       │   ├── dotenv.example
│       │   ├── package.json
│       │   ├── sample_client.js
│       │   ├── sample_langchain.ts
│       │   ├── sample_openai.js
│       │   └── tsconfig.json
│       ├── python/
│       │   ├── sample_client.py
│       │   ├── sample_langchain.py
│       │   └── sample_openai.py
│       └── resources/
│           ├── linux.txt
│           └── state_of_the_union.txt
├── ios/
│   ├── .gitignore
│   ├── MLCChat/
│   │   ├── MLCChat/
│   │   │   ├── Assets.xcassets/
│   │   │   │   ├── AccentColor.colorset/
│   │   │   │   │   └── Contents.json
│   │   │   │   ├── AppIcon.appiconset/
│   │   │   │   │   └── Contents.json
│   │   │   │   └── Contents.json
│   │   │   ├── Common/
│   │   │   │   └── Constants.swift
│   │   │   ├── Info.plist
│   │   │   ├── MLCChat.entitlements
│   │   │   ├── MLCChatApp.swift
│   │   │   ├── Models/
│   │   │   │   ├── AppConfig.swift
│   │   │   │   ├── ModelConfig.swift
│   │   │   │   └── ParamsConfig.swift
│   │   │   ├── Preview Content/
│   │   │   │   └── Preview Assets.xcassets/
│   │   │   │       └── Contents.json
│   │   │   ├── States/
│   │   │   │   ├── AppState.swift
│   │   │   │   ├── ChatState.swift
│   │   │   │   └── ModelState.swift
│   │   │   └── Views/
│   │   │       ├── ChatView.swift
│   │   │       ├── ImageProcessing.swift
│   │   │       ├── MessageView.swift
│   │   │       ├── ModelView.swift
│   │   │       └── StartView.swift
│   │   ├── MLCChat.xcodeproj/
│   │   │   ├── project.pbxproj
│   │   │   ├── project.xcworkspace/
│   │   │   │   ├── contents.xcworkspacedata
│   │   │   │   └── xcshareddata/
│   │   │   │       ├── IDEWorkspaceChecks.plist
│   │   │   │       ├── WorkspaceSettings.xcsettings
│   │   │   │       └── swiftpm/
│   │   │   │           └── Package.resolved
│   │   │   └── xcshareddata/
│   │   │       └── xcschemes/
│   │   │           └── MLCChat.xcscheme
│   │   ├── README.md
│   │   └── mlc-package-config.json
│   ├── MLCEngineExample/
│   │   ├── MLCEngineExample/
│   │   │   ├── Assets.xcassets/
│   │   │   │   ├── AccentColor.colorset/
│   │   │   │   │   └── Contents.json
│   │   │   │   ├── AppIcon.appiconset/
│   │   │   │   │   └── Contents.json
│   │   │   │   └── Contents.json
│   │   │   ├── ContentView.swift
│   │   │   ├── MLCEngineExample.entitlements
│   │   │   ├── MLCEngineExampleApp.swift
│   │   │   └── Preview Content/
│   │   │       └── Preview Assets.xcassets/
│   │   │           └── Contents.json
│   │   ├── MLCEngineExample.xcodeproj/
│   │   │   ├── project.pbxproj
│   │   │   └── project.xcworkspace/
│   │   │       ├── contents.xcworkspacedata
│   │   │       └── xcshareddata/
│   │   │           └── IDEWorkspaceChecks.plist
│   │   ├── README.md
│   │   └── mlc-package-config.json
│   ├── MLCSwift/
│   │   ├── Package.swift
│   │   ├── README.md
│   │   └── Sources/
│   │       ├── ObjC/
│   │       │   ├── LLMEngine.mm
│   │       │   └── include/
│   │       │       └── LLMEngine.h
│   │       └── Swift/
│   │           ├── LLMEngine.swift
│   │           └── OpenAIProtocol.swift
│   ├── README.md
│   └── prepare_libs.sh
├── pyproject.toml
├── python/
│   ├── mlc_llm/
│   │   ├── __init__.py
│   │   ├── __main__.py
│   │   ├── base.py
│   │   ├── bench/
│   │   │   ├── __init__.py
│   │   │   ├── __main__.py
│   │   │   ├── api_endpoint.py
│   │   │   ├── dataset.py
│   │   │   ├── evaluation/
│   │   │   │   ├── gsm8k.py
│   │   │   │   └── mmlu.py
│   │   │   ├── request_processor.py
│   │   │   └── request_record.py
│   │   ├── cli/
│   │   │   ├── __init__.py
│   │   │   ├── calibrate.py
│   │   │   ├── chat.py
│   │   │   ├── check_device.py
│   │   │   ├── compile.py
│   │   │   ├── convert_weight.py
│   │   │   ├── delivery.py
│   │   │   ├── disco_remote_socket_session.py
│   │   │   ├── gen_config.py
│   │   │   ├── lib_delivery.py
│   │   │   ├── model_metadata.py
│   │   │   ├── package.py
│   │   │   ├── router.py
│   │   │   ├── serve.py
│   │   │   └── worker.py
│   │   ├── compiler_pass/
│   │   │   ├── __init__.py
│   │   │   ├── attach_cuda_graph_alloc_init_func.py
│   │   │   ├── attach_embedding_allocator.py
│   │   │   ├── attach_logit_processor.py
│   │   │   ├── attach_sampler.py
│   │   │   ├── attach_softmax_with_temperature.py
│   │   │   ├── attach_spec_decode_aux_funcs.py
│   │   │   ├── attach_support_info.py
│   │   │   ├── blas_dispatch.py
│   │   │   ├── clean_up_tir_attrs.py
│   │   │   ├── dispatch_kv_cache_creation.py
│   │   │   ├── dispatch_triton_kernel.py
│   │   │   ├── estimate_memory_usage.py
│   │   │   ├── fuse_add_norm.py
│   │   │   ├── fuse_dequantize_matmul_ewise.py
│   │   │   ├── fuse_dequantize_take.py
│   │   │   ├── fuse_dequantize_transpose.py
│   │   │   ├── fuse_ft_dequantize_matmul_epilogue.py
│   │   │   ├── fuse_transpose_matmul.py
│   │   │   ├── lift_global_buffer_alloc.py
│   │   │   ├── low_batch_specialization.py
│   │   │   ├── pipeline.py
│   │   │   ├── pipeline_parallel_rewrite.py
│   │   │   └── scatter_tuple_get_item.py
│   │   ├── contrib/
│   │   │   ├── __init__.py
│   │   │   └── embeddings/
│   │   │       ├── __init__.py
│   │   │       ├── embeddings.py
│   │   │       └── openai.py
│   │   ├── conversation_template/
│   │   │   ├── __init__.py
│   │   │   ├── cohere.py
│   │   │   ├── deepseek.py
│   │   │   ├── dolly.py
│   │   │   ├── gemma.py
│   │   │   ├── glm.py
│   │   │   ├── gorilla.py
│   │   │   ├── gpt.py
│   │   │   ├── hermes.py
│   │   │   ├── llama.py
│   │   │   ├── llava.py
│   │   │   ├── llm_jp.py
│   │   │   ├── ministral3.py
│   │   │   ├── ministral3_reasoning.py
│   │   │   ├── mistral.py
│   │   │   ├── nemotron.py
│   │   │   ├── oasst.py
│   │   │   ├── olmo.py
│   │   │   ├── orion.py
│   │   │   ├── phi.py
│   │   │   ├── qwen2.py
│   │   │   ├── redpajama.py
│   │   │   ├── registry.py
│   │   │   ├── rwkv.py
│   │   │   ├── stablelm.py
│   │   │   ├── tinyllama.py
│   │   │   └── wizardlm.py
│   │   ├── interface/
│   │   │   ├── __init__.py
│   │   │   ├── calibrate.py
│   │   │   ├── chat.py
│   │   │   ├── compile.py
│   │   │   ├── compiler_flags.py
│   │   │   ├── convert_weight.py
│   │   │   ├── gen_config.py
│   │   │   ├── help.py
│   │   │   ├── jit.py
│   │   │   ├── package.py
│   │   │   ├── router.py
│   │   │   └── serve.py
│   │   ├── json_ffi/
│   │   │   ├── __init__.py
│   │   │   └── engine.py
│   │   ├── libinfo.py
│   │   ├── loader/
│   │   │   ├── __init__.py
│   │   │   ├── huggingface_loader.py
│   │   │   ├── loader.py
│   │   │   ├── mapping.py
│   │   │   ├── standard_loader.py
│   │   │   ├── stats.py
│   │   │   └── utils.py
│   │   ├── model/
│   │   │   ├── __init__.py
│   │   │   ├── baichuan/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── baichuan_loader.py
│   │   │   │   └── baichuan_model.py
│   │   │   ├── bert/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── bert_loader.py
│   │   │   │   └── bert_model.py
│   │   │   ├── chatglm3/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── chatglm3_loader.py
│   │   │   │   └── chatglm3_model.py
│   │   │   ├── cohere/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── cohere_loader.py
│   │   │   │   └── cohere_model.py
│   │   │   ├── deepseek/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── deepseek_loader.py
│   │   │   │   └── deepseek_model.py
│   │   │   ├── deepseek_v2/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── deepseek_v2_loader.py
│   │   │   │   └── deepseek_v2_model.py
│   │   │   ├── eagle/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── eagle_loader.py
│   │   │   │   └── eagle_model.py
│   │   │   ├── gemma/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── gemma_loader.py
│   │   │   │   └── gemma_model.py
│   │   │   ├── gemma2/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── gemma2_loader.py
│   │   │   │   └── gemma2_model.py
│   │   │   ├── gemma3/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── gemma3_loader.py
│   │   │   │   └── gemma3_model.py
│   │   │   ├── gpt2/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── gpt2_loader.py
│   │   │   │   └── gpt2_model.py
│   │   │   ├── gpt_bigcode/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── gpt_bigcode_loader.py
│   │   │   │   └── gpt_bigcode_model.py
│   │   │   ├── gpt_j/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── gpt_j_loader.py
│   │   │   │   └── gpt_j_model.py
│   │   │   ├── gpt_neox/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── gpt_neox_loader.py
│   │   │   │   └── gpt_neox_model.py
│   │   │   ├── internlm/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── internlm_loader.py
│   │   │   │   └── internlm_model.py
│   │   │   ├── internlm2/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── internlm2_loader.py
│   │   │   │   └── internlm2_model.py
│   │   │   ├── llama/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── llama_loader.py
│   │   │   │   └── llama_model.py
│   │   │   ├── llama4/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── llama4_loader.py
│   │   │   │   └── llama4_model.py
│   │   │   ├── llava/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── llava_loader.py
│   │   │   │   └── llava_model.py
│   │   │   ├── medusa/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── medusa_loader.py
│   │   │   │   └── medusa_model.py
│   │   │   ├── minicpm/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── minicpm_loader.py
│   │   │   │   └── minicpm_model.py
│   │   │   ├── ministral3/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── ministral3_loader.py
│   │   │   │   └── ministral3_model.py
│   │   │   ├── mistral/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── mistral_loader.py
│   │   │   │   └── mistral_model.py
│   │   │   ├── mixtral/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── mixtral_loader.py
│   │   │   │   └── mixtral_model.py
│   │   │   ├── model.py
│   │   │   ├── model_preset.py
│   │   │   ├── nemotron/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── nemotron_loader.py
│   │   │   │   └── nemotron_model.py
│   │   │   ├── olmo/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── olmo_loader.py
│   │   │   │   └── olmo_model.py
│   │   │   ├── orion/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── orion_loader.py
│   │   │   │   └── orion_model.py
│   │   │   ├── phi/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── phi_loader.py
│   │   │   │   └── phi_model.py
│   │   │   ├── phi3/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── phi3_loader.py
│   │   │   │   └── phi3_model.py
│   │   │   ├── phi3v/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── phi3v_image.py
│   │   │   │   ├── phi3v_loader.py
│   │   │   │   └── phi3v_model.py
│   │   │   ├── qwen/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── qwen_loader.py
│   │   │   │   └── qwen_model.py
│   │   │   ├── qwen2/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── qwen2_loader.py
│   │   │   │   └── qwen2_model.py
│   │   │   ├── qwen2_5_vl/
│   │   │   │   ├── __init__.py
│   │   │   │   └── qwen2_5_vl_model.py
│   │   │   ├── qwen2_moe/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── qwen2_moe_loader.py
│   │   │   │   └── qwen2_moe_model.py
│   │   │   ├── qwen3/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── qwen3_loader.py
│   │   │   │   └── qwen3_model.py
│   │   │   ├── qwen3_moe/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── qwen3_moe_loader.py
│   │   │   │   └── qwen3_moe_model.py
│   │   │   ├── rwkv5/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── rwkv5_loader.py
│   │   │   │   └── rwkv5_model.py
│   │   │   ├── rwkv6/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── rwkv6_loader.py
│   │   │   │   └── rwkv6_model.py
│   │   │   ├── stable_lm/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── stablelm_loader.py
│   │   │   │   └── stablelm_model.py
│   │   │   ├── starcoder2/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── starcoder2_loader.py
│   │   │   │   └── starcoder2_model.py
│   │   │   └── vision/
│   │   │       ├── __init__.py
│   │   │       ├── clip_vision.py
│   │   │       └── image_processing.py
│   │   ├── nn/
│   │   │   ├── __init__.py
│   │   │   ├── expert.py
│   │   │   ├── kv_cache.py
│   │   │   └── rnn_state.py
│   │   ├── op/
│   │   │   ├── __init__.py
│   │   │   ├── attention.py
│   │   │   ├── batch_matmul.py
│   │   │   ├── batch_spec_verify.py
│   │   │   ├── cutlass.py
│   │   │   ├── extern.py
│   │   │   ├── ft_gemm.py
│   │   │   ├── moe_matmul.py
│   │   │   ├── moe_misc.py
│   │   │   ├── mrope.py
│   │   │   ├── pipeline_parallel.py
│   │   │   ├── top_p_pivot.py
│   │   │   └── triton.py
│   │   ├── protocol/
│   │   │   ├── __init__.py
│   │   │   ├── conversation_protocol.py
│   │   │   ├── debug_protocol.py
│   │   │   ├── error_protocol.py
│   │   │   ├── generation_config.py
│   │   │   ├── microserving_protocol.py
│   │   │   ├── mlc_chat_config.py
│   │   │   └── openai_api_protocol.py
│   │   ├── quantization/
│   │   │   ├── __init__.py
│   │   │   ├── awq_quantization.py
│   │   │   ├── block_scale_quantization.py
│   │   │   ├── fp8_quantization.py
│   │   │   ├── ft_quantization.py
│   │   │   ├── group_quantization.py
│   │   │   ├── model_quantization.py
│   │   │   ├── no_quantization.py
│   │   │   ├── per_tensor_quantization.py
│   │   │   ├── quantization.py
│   │   │   └── utils.py
│   │   ├── router/
│   │   │   ├── __init__.py
│   │   │   └── router.py
│   │   ├── serve/
│   │   │   ├── __init__.py
│   │   │   ├── _ffi_api.py
│   │   │   ├── config.py
│   │   │   ├── data.py
│   │   │   ├── embedding_engine.py
│   │   │   ├── engine.py
│   │   │   ├── engine_base.py
│   │   │   ├── engine_utils.py
│   │   │   ├── entrypoints/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── debug_entrypoints.py
│   │   │   │   ├── metrics_entrypoints.py
│   │   │   │   ├── microserving_entrypoints.py
│   │   │   │   └── openai_entrypoints.py
│   │   │   ├── event_trace_recorder.py
│   │   │   ├── radix_tree.py
│   │   │   ├── request.py
│   │   │   ├── server/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── popen_server.py
│   │   │   │   └── server_context.py
│   │   │   └── sync_engine.py
│   │   ├── support/
│   │   │   ├── __init__.py
│   │   │   ├── argparse.py
│   │   │   ├── auto_config.py
│   │   │   ├── auto_device.py
│   │   │   ├── auto_target.py
│   │   │   ├── auto_weight.py
│   │   │   ├── config.py
│   │   │   ├── constants.py
│   │   │   ├── convert_tiktoken.py
│   │   │   ├── download_cache.py
│   │   │   ├── logging.py
│   │   │   ├── max_thread_check.py
│   │   │   ├── preshard.py
│   │   │   ├── random.py
│   │   │   ├── style.py
│   │   │   ├── tensor_parallel.py
│   │   │   └── tqdm.py
│   │   ├── testing/
│   │   │   ├── __init__.py
│   │   │   ├── debug_chat.py
│   │   │   ├── debug_compare.py
│   │   │   └── pytest_utils.py
│   │   └── tokenizers/
│   │       ├── __init__.py
│   │       ├── _ffi_api.py
│   │       ├── streamer.py
│   │       └── tokenizers.py
│   ├── requirements.txt
│   └── setup.py
├── scripts/
│   ├── build_mlc_for_docs.sh
│   ├── build_site.sh
│   ├── check_url_validity.py
│   ├── gh_deploy_site.sh
│   └── local_deploy_site.sh
├── site/
│   ├── .gitignore
│   ├── CNAME
│   ├── Gemfile
│   ├── _config.yml
│   ├── _includes/
│   │   ├── head.html
│   │   └── hero.html
│   ├── assets/
│   │   └── css/
│   │       └── hero.scss
│   ├── index.md
│   └── privacy.md
├── tests/
│   ├── README.md
│   ├── cpp/
│   │   └── conv_template_unittest.cc
│   └── python/
│       ├── __init__.py
│       ├── compiler_pass/
│       │   └── test_fuse_ft_dequantize_matmul_epilogue.py
│       ├── conftest.py
│       ├── conversation_template/
│       │   ├── test_conversation_protocol.py
│       │   └── test_llama_template.py
│       ├── integration/
│       │   └── test_model_compile.py
│       ├── json_ffi/
│       │   ├── test_json_ffi_engine.py
│       │   ├── test_json_ffi_engine_image.py
│       │   └── test_json_ffi_engine_mock.py
│       ├── loader/
│       │   ├── test_awq.py
│       │   └── test_huggingface.py
│       ├── model/
│       │   ├── test_gemma3.py
│       │   ├── test_gpt2.py
│       │   ├── test_gptNeox.py
│       │   ├── test_kv_cache.py
│       │   ├── test_llama.py
│       │   ├── test_llama_quantization.py
│       │   ├── test_mistral.py
│       │   ├── test_phi.py
│       │   └── test_qwen3_embedding.py
│       ├── op/
│       │   ├── test_batch_spec_verify.py
│       │   ├── test_fp8_block_matmul.py
│       │   ├── test_mrope.py
│       │   ├── test_top_p_pivot.py
│       │   ├── test_tree_attn.py
│       │   └── test_two_stage_softmax.py
│       ├── quantization/
│       │   ├── test_awq_quantization.py
│       │   └── test_group_quantization.py
│       ├── router/
│       │   └── test_router.py
│       ├── serve/
│       │   ├── evaluate_engine.py
│       │   ├── server/
│       │   │   ├── conftest.py
│       │   │   ├── test_embedding_server.py
│       │   │   ├── test_server.py
│       │   │   ├── test_server_function_call.py
│       │   │   └── test_server_image.py
│       │   ├── test_embedding_engine.py
│       │   ├── test_event_trace_recorder.py
│       │   ├── test_radix_tree.py
│       │   ├── test_serve_async_engine.py
│       │   ├── test_serve_async_engine_spec.py
│       │   ├── test_serve_engine.py
│       │   ├── test_serve_engine_grammar.py
│       │   ├── test_serve_engine_image.py
│       │   ├── test_serve_engine_mock.py
│       │   ├── test_serve_engine_prefix_cache.py
│       │   ├── test_serve_engine_rnn.py
│       │   ├── test_serve_engine_spec.py
│       │   └── test_serve_sync_engine.py
│       ├── support/
│       │   ├── test_auto_config.py
│       │   ├── test_auto_weight.py
│       │   ├── test_cli_convert_weight.py
│       │   └── test_convert_weight_lora_merge.py
│       └── tokenizers/
│           └── test_streamer.py
├── version.py
└── web/
    ├── Makefile
    ├── README.md
    ├── emcc/
    │   └── mlc_wasm_runtime.cc
    └── prep_emcc_deps.sh

================================================
FILE CONTENTS
================================================

================================================
FILE: .clang-format
================================================
# Run the following command to reformat a file:
# clang-format -i -style=Google <file>
# Or use clang-format-diff to only reformat the changed lines:
# https://clang.llvm.org/docs/ClangFormat.html
BasedOnStyle: Google
DerivePointerAlignment: false
ColumnLimit:     100
PointerAlignment: Left


================================================
FILE: .github/ISSUE_TEMPLATE/bug-report.md
================================================
---
name: "🐛 Bug Report"
about: Submit a bug report to help us improve MLC-LLM
title: '[Bug] '
labels: ['bug']
assignees: ''

---

## 🐛 Bug

<!-- A clear and concise description of what the bug is. -->

## To Reproduce

Steps to reproduce the behavior:

1.
1.
1.

<!-- If you have a code sample, error messages, stack traces, please provide it here as well -->

## Expected behavior

<!-- A clear and concise description of what you expected to happen. -->

## Environment

 - Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA):
 - Operating system (e.g. Ubuntu/Windows/MacOS/...):
 - Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...)
 - How you installed MLC-LLM (`conda`, source):
 - How you installed TVM (`pip`, source):
 - Python version (e.g. 3.10):
 - GPU driver version (if applicable):
 - CUDA/cuDNN version (if applicable):
 - TVM Hash Tag (`python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"`, applicable if you compile models):
 - Any other relevant information:

## Additional context

<!-- Add any other context about the problem here. -->


================================================
FILE: .github/ISSUE_TEMPLATE/config.yml
================================================
blank_issues_enabled: false

contact_links:
  - name: Check the MLC-LLM Documentation
    url: https://llm.mlc.ai/docs/
    about: Our documentation might provide answers to your questions.
  - name: Chat on Discord
    url: https://discord.gg/9Xpy2HGBuD
    about: Join the Discord Server to live chat with the community.


================================================
FILE: .github/ISSUE_TEMPLATE/documentation.md
================================================
---
name: "\U0001F4DA Documentation"
about: Report an issue related to https://llm.mlc.ai/docs/
title: '[Doc] '
labels: ['documentation']
assignees: ''

---

## 📚 Documentation

### Suggestion
<!-- Please leave your general suggestion to our documentation here. -->

### Bug
- Link to the buggy documentation/tutorial:
- Description of the bug:


================================================
FILE: .github/ISSUE_TEMPLATE/feature-request.md
================================================
---
name: "\U0001F680 Feature Request"
about: Submit a proposal/request for a new MLC-LLM feature, or an enhancement on existing features.
title: '[Feature Request] '
labels: ['feature request']
assignees: ''

---

## 🚀 Feature
<!-- A brief description of the feature proposal -->

## Motivation

<!-- Please outline the motivation for the proposal, and how could this feature benefit the MLC-LLM project/community. -->

## Alternatives

<!-- A clear and concise description of any alternative solutions or features you've considered, if any. -->

## Additional context

<!-- Add any other context or screenshots about the feature request here. -->


================================================
FILE: .github/ISSUE_TEMPLATE/general.md
================================================
---
name: "❓ General Questions"
about: General questions you have about MLC-LLM.
title: '[Question] '
labels: ['question']
assignees: ''

---

## ❓ General Questions

<!-- Describe your questions -->


================================================
FILE: .github/ISSUE_TEMPLATE/model-request.md
================================================
---
name: "️️⚙️  Model Request"
about: Request a new model in MLC-LLM
title: '[Model Request] '
labels: ['new-models']
assignees: ''

---

## ⚙️  Request New Models

- Link to an existing implementation (e.g. Hugging Face/Github): <!-- Link to the model -->
- Is this model architecture supported by MLC-LLM? (the list of [supported models](https://llm.mlc.ai/docs/prebuilt_models.html)) <!-- Yes/No -->

## Additional context

<!-- Add any other context that you think would be helpful for the community to add this model -->


================================================
FILE: .github/ISSUE_TEMPLATE/speed-report.md
================================================
---
name: " 🏎️  Speed Report"
about: Submit a speed report of an model running in MLC-LLM
title: '[Speed] '
labels: ['performance']
assignees: ''

---

# 🏎️  Speed Report

<!-- Please search if there are existing issues discuss the speed of the model you are using, if there are, we encourage you reply in the existed issue instead of creating a new one. -->

- The model code: <!-- e.g. vicuna-7b-1.1 -->


- The model configuration (e.g. quantization mode, running data type, etc.):
- Device (e.g. MacBook Pro M2, PC+RTX 3080):
- OS (if applicable):
- Encode speed (Token/s):
- Decode speed (Token/s):
- Memory usage (if applicable):

<!-- Note that the measured speed might reflect peak performance if the prompt/chat history is short. -->


================================================
FILE: .github/ISSUE_TEMPLATE/tracking.md
================================================
---
name: "Tracking"
about: A tracking issue that tracks ongoing item in the project
title: '[Tracking] '
labels: ['status: tracking']
assignees: ''

---

<!--

A tracking issue contains a list of action items
that can be executed to complete a feature or fix.

We use tracking issues when we have a clear list of action items
related to feature items as they provide fine-grained
view of action items and provide clarity on what it takes to implement a feature.

When to open a tracking issue: Open a new tracking issue when you have
clear, actionable items (as a rule of thumb, make sure action items
items can be carried through if you are assigned to work on it and
you can provide enough guides to others who plan to work on these actions).
-->


## Overview
<!-- A brief overview of the task  -->



## Action Items
<!-- Please list set of action items to complete -->

- [ ]


## Links to Related Issues and PRs

<!-- Cross link feature requests bug report issues related to the tracking item -->
<!-- When there are new PRs, open up new PRs -->


================================================
FILE: .github/workflows/documentation.yaml
================================================
name: Build Docs

on:
  push:
    branches:
      - main

jobs:
  test_linux:
    name: Deploy Docs
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v4
      with:
        submodules: recursive

    - name: Configuring build Environment
      run: |
        sudo apt-get update
        python -m pip install -U pip wheel

    - name: Setup Ruby
      uses: ruby/setup-ruby@v1
      with:
        ruby-version: '3.0'

    - name: Installing dependencies
      run: |
        python -m pip install -r docs/requirements.txt
        gem install jekyll jekyll-remote-theme

    - name: Deploying on GitHub Pages
      if: github.ref == 'refs/heads/main'
      run: |
        git remote set-url origin https://x-access-token:${{ secrets.MLC_GITHUB_TOKEN }}@github.com/$GITHUB_REPOSITORY
        git config --global user.email "mlc-gh-actions-bot@nomail"
        git config --global user.name "mlc-gh-actions-bot"
        ./scripts/gh_deploy_site.sh


================================================
FILE: .github/workflows/update-relax.yaml
================================================
name: 'Relax Submodule Sync'

on:
  workflow_dispatch:

jobs:
  sync:
    name: 'Relax Submodule Sync'
    runs-on: ubuntu-latest

    defaults:
      run:
        shell: bash

    steps:
    - name: Checkout
      uses: actions/checkout@v4
      with:
        submodules: true

    - name: Git Sumbodule Update
      run: |
        git submodule update --remote 3rdparty/tvm

    - name: Commit update
      env:
        GITHUB_TOKEN: ${{ secrets.MLC_GITHUB_TOKEN }}
      run: |
        git config --global user.name 'Git bot'
        git config --global user.email 'bot@noreply.github.com'
        git remote set-url origin https://$GITHUB_TOKEN@github.com/mlc-ai/mlc-llm
        git commit -am "Auto updated submodule references" && git push || echo "No changes to commit"


================================================
FILE: .github/workflows/windows-build.yaml
================================================
# GH actions.
# We use it to cover windows builds
# Jenkins is still the primary CI
name: Windows CI

on:
  push:
    branches:
      - main
  pull_request:
    branches:
      - main

jobs:
  Windows:
    runs-on: windows-latest
    defaults:
      run:
        shell: 'cmd /C call {0}'

    steps:
    - name: Git config
      run: >-
        git config --system core.longpaths true
    - uses: actions/checkout@v3
      with:
        submodules: 'recursive'
    - uses: conda-incubator/setup-miniconda@v3
      with:
        activate-environment: mlc-llm-build
        channel-priority: strict
        environment-file: ci/build-environment.yaml
        auto-activate-base: false
    - name: Conda info
      run: |
        conda info
        conda list
        python --version
    - name: Build MLC-LLM
      run: >-
        ci/task/build_win.bat


================================================
FILE: .gitignore
================================================
tmp/
dist/
params/
debug/
*.bak
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

.DS_Store

*.S
# C extensions
*.so

build/

*.ll
.npm
# Distribution / packaging
.Python
env/
build/
build-*/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

.conda/
# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Generated by python/gen_requirements.py
python/requirements/*.txt

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/
docs/_staging/

# PyBuilder
target/
/target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject
*~
*.pyc
*~
config.mk
config.cmake
Win32
*.dir
perf
*.wasm
.emscripten

## IOS
DerivedData/

## Java
*.class
jvm/*/target/
jvm/*/*/target/
jvm/native/*/generated
jvm/native/src/main/native/org_apache_tvm_native_c_api.h
*.worksheet
*.idea
*.iml
*.classpath
*.project
*.settings
*/node_modules/

## Various settings
*.pbxuser
!default.pbxuser
*.mode1v3
!default.mode1v3
*.mode2v3
!default.mode2v3
*.perspectivev3
!default.perspectivev3
xcuserdata/
.pkl_memoize_*

.emscripten*
.m2

# Compiled Dynamic libraries
*.so
*.dylib
*.dll

# Compiled Object files
*.slo
*.lo
*.o
*.obj

# Precompiled Headers
*.gch
*.pch

# Compiled Static libraries
*.lai
*.la
*.a
*.lib

# Executables
*.exe
*.out
*.app

## Other
*.moved-aside
*.xccheckout
*.xcscmblueprint
.DS_Store
tags
cscope*
*.lock

# vim temporary files
*.swp
*.swo

# TVM generated code
perf
.bash_history
# *.json
*.params
*.ro
*.onnx
*.h5
synset.txt
cat.jpg
cat.png
docs.tgz
cat.png
*.mlmodel
tvm_u.*
tvm_t.*
# Mac OS X
.DS_Store

# Jetbrain
.idea
.ipython
.jupyter
.nv
.pylint.d
.python_history
.pytest_cache
.local
cmake-build-debug

# Visual Studio
.vs

# Visual Studio Code
.vscode

# tmp file
.nfs*

# keys
*.pem
*.p12
*.pfx
*.cer
*.crt
*.der

# patch sentinel
patched.txt

# Python type checking
.mypy_cache/
.pyre/

# pipenv files
Pipfile
Pipfile.lock

# conda package artifacts
conda/Dockerfile.cuda*
conda/pkg
.node_repl_history
# nix files
.envrc
*.nix

# Docker files
.sudo_as_admin_successful

# Downloaded models/datasets
.tvm_test_data
.dgl
.caffe2

# Local docs build
_docs/
jvm/target
.config/configstore/
.ci-py-scripts/

# Generated Hexagon files
src/runtime/hexagon/rpc/hexagon_rpc.h
src/runtime/hexagon/rpc/hexagon_rpc_skel.c
src/runtime/hexagon/rpc/hexagon_rpc_stub.c

# Local tvm-site checkout
tvm-site/

# Generated docs files
gallery/how_to/work_with_microtvm/micro_tvmc.py

# Test sample data files
!tests/python/ci/sample_prs/*.json

# Used in CI to communicate between Python and Jenkins
.docker-image-names/

# Printed TIR code on disk
*.tir

# GDB history file
.gdb_history

dist


================================================
FILE: .gitmodules
================================================
[submodule "3rdparty/argparse"]
	path = 3rdparty/argparse
	url = https://github.com/p-ranav/argparse
[submodule "3rdparty/tokenizers-cpp"]
	path = 3rdparty/tokenizers-cpp
	url = https://github.com/mlc-ai/tokenizers-cpp
[submodule "3rdparty/googletest"]
	path = 3rdparty/googletest
	url = https://github.com/google/googletest.git
[submodule "3rdparty/tvm"]
	path = 3rdparty/tvm
	url = https://github.com/mlc-ai/relax.git
[submodule "3rdparty/stb"]
	path = 3rdparty/stb
	url = https://github.com/nothings/stb.git
[submodule "3rdparty/xgrammar"]
	path = 3rdparty/xgrammar
	url = https://github.com/mlc-ai/xgrammar.git


================================================
FILE: .pre-commit-config.yaml
================================================
# To use:
#
#     pre-commit run -a
#
# Or:
#
#     pre-commit install  # (runs every time you commit in git)
#
# To update this file:
#
#     pre-commit autoupdate
#
# See https://github.com/pre-commit/pre-commit
# Note the pre-commit hooks shoule only be used for formatting, but not for linting.
# For linting consider using CI.
repos:
  # Standard hooks
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v5.0.0
    hooks:
      - id: check-added-large-files
      - id: check-case-conflict
      - id: check-merge-conflict
      - id: check-symlinks
      - id: end-of-file-fixer
      - id: mixed-line-ending
      - id: requirements-txt-fixer
      - id: trailing-whitespace

  # Changes tabs to spaces
  - repo: https://github.com/Lucas-C/pre-commit-hooks
    rev: v1.5.5
    hooks:
      - id: remove-tabs
      - id: remove-crlf

  # Formatters
  - repo: https://github.com/psf/black-pre-commit-mirror
    rev: 24.8.0
    hooks:
      - id: black

  - repo: https://github.com/pycqa/isort
    rev: 5.13.2
    hooks:
      - id: isort

  - repo: https://github.com/pre-commit/mirrors-clang-format
    rev: v19.1.1
    hooks:
      - id: clang-format
        types_or: [c++, c, cuda]
        exclude: |
          (?x)^(.*cubin.cpp$ | .*fmha_cubin.h | 3rdparty/.*)$

  - repo: https://github.com/cheshirekow/cmake-format-precommit
    rev: v0.6.13
    hooks:
      - id: cmake-format
        additional_dependencies: [pyyaml>=5.1]


================================================
FILE: .pylintrc
================================================
[MESSAGES CONTROL]
disable=too-many-positional-arguments,duplicate-code


================================================
FILE: CMakeLists.txt
================================================
cmake_minimum_required(VERSION 3.18)
project(mlc_llm C CXX)

include(CheckCXXCompilerFlag)
if(MSVC)
  set(CMAKE_CXX_FLAGS "/fp:fast ${CMAKE_CXX_FLAGS}")
else()
  set(CMAKE_CXX_FLAGS "-ffast-math ${CMAKE_CXX_FLAGS}")
endif()

if(EXISTS ${CMAKE_BINARY_DIR}/config.cmake)
  include(${CMAKE_BINARY_DIR}/config.cmake)
else()
  if(EXISTS ${CMAKE_SOURCE_DIR}/config.cmake)
    include(${CMAKE_SOURCE_DIR}/config.cmake)
  endif()
endif()

if(NOT CMAKE_BUILD_TYPE)
  set(CMAKE_BUILD_TYPE
      RelWithDebInfo
      CACHE STRING "Build type" FORCE)
  message(STATUS "Setting default build type to " ${CMAKE_BUILD_TYPE})
endif(NOT CMAKE_BUILD_TYPE)

option(MLC_HIDE_PRIVATE_SYMBOLS "Hide private symbols" ON)
option(MLC_LLM_BUILD_PYTHON_MODULE "Build Python module with scikit-build-core"
       OFF)

if(MLC_LLM_INSTALL_STATIC_LIB)
  set(BUILD_STATIC_RUNTIME ON)
endif()

set(MLC_VISIBILITY_FLAG "")
if(MLC_HIDE_PRIVATE_SYMBOLS)
  set(HIDE_PRIVATE_SYMBOLS ON)
  if(NOT MSVC)
    set(MLC_VISIBILITY_FLAG "-fvisibility=hidden")
  endif()
  message(STATUS "Hide private symbols")
endif()

option(BUILD_CPP_TEST "Build cpp unittests" OFF)

set(CMAKE_CUDA_STANDARD 17)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)

# tvm runtime config: minimize runtime components
set(USE_RPC OFF)
set(USE_MICRO OFF)
set(USE_GRAPH_EXECUTOR OFF)
set(USE_GRAPH_EXECUTOR_DEBUG OFF)
set(USE_AOT_EXECUTOR OFF)
set(USE_PROFILER OFF)
set(USE_GTEST OFF)
set(USE_LIBBACKTRACE OFF)
set(BUILD_DUMMY_LIBTVM ON)
if(NOT DEFINED TVM_SOURCE_DIR)
  if(DEFINED ENV{TVM_SOURCE_DIR})
    set(TVM_SOURCE_DIR "$ENV{TVM_SOURCE_DIR}")
  else()
    set(TVM_SOURCE_DIR 3rdparty/tvm)
  endif(DEFINED ENV{TVM_SOURCE_DIR})
endif(NOT DEFINED TVM_SOURCE_DIR)
message(STATUS "TVM_SOURCE_DIR: ${TVM_SOURCE_DIR}")
add_subdirectory(${TVM_SOURCE_DIR} tvm EXCLUDE_FROM_ALL)

set(MLC_LLM_RUNTIME_LINKER_LIB "")
set(TOKENZIER_CPP_PATH 3rdparty/tokenizers-cpp)
add_subdirectory(${TOKENZIER_CPP_PATH} tokenizers EXCLUDE_FROM_ALL)

set(XGRAMMAR_PATH 3rdparty/xgrammar)
tvm_file_glob(GLOB_RECURSE MLC_LLM_SRCS cpp/*.cc)
tvm_file_glob(GLOB_RECURSE XGRAMMAR_SRCS ${XGRAMMAR_PATH}/cpp/*.cc)
list(FILTER XGRAMMAR_SRCS EXCLUDE REGEX "${XGRAMMAR_PATH}/cpp/pybind/.*\\.cc")
list(APPEND MLC_LLM_SRCS ${XGRAMMAR_SRCS})
add_library(mlc_llm_objs OBJECT ${MLC_LLM_SRCS})

set(MLC_LLM_INCLUDES
    ${TVM_SOURCE_DIR}/include ${TVM_SOURCE_DIR}/3rdparty/dlpack/include)
set(MLC_LLM_COMPILE_DEFS ${MLC_LLM_COMPILE_DEFS} __STDC_FORMAT_MACROS=1)
set(MLC_LLM_COMPILE_DEFS ${MLC_LLM_COMPILE_DEFS} XGRAMMAR_ENABLE_LOG_DEBUG=0)

target_compile_definitions(mlc_llm_objs PRIVATE ${MLC_LLM_COMPILE_DEFS})
target_compile_definitions(mlc_llm_objs PRIVATE -DMLC_LLM_EXPORTS)
target_include_directories(mlc_llm_objs PRIVATE ${MLC_LLM_INCLUDES})
target_include_directories(mlc_llm_objs PRIVATE 3rdparty/stb)
target_include_directories(mlc_llm_objs PRIVATE ${TOKENZIER_CPP_PATH}/include)
target_include_directories(mlc_llm_objs PRIVATE ${XGRAMMAR_PATH}/include)
# xgrammar still depends on picojson - use its bundled copy
target_include_directories(mlc_llm_objs
                           PRIVATE ${XGRAMMAR_PATH}/3rdparty/picojson)
target_link_libraries(mlc_llm_objs PRIVATE tvm_ffi_header)

add_library(mlc_llm SHARED $<TARGET_OBJECTS:mlc_llm_objs>)
add_library(mlc_llm_static STATIC $<TARGET_OBJECTS:mlc_llm_objs>)
add_dependencies(mlc_llm_static tokenizers_cpp sentencepiece-static
                 tokenizers_c tvm_runtime)
set_target_properties(mlc_llm_static PROPERTIES OUTPUT_NAME mlc_llm)

target_link_libraries(mlc_llm PUBLIC tvm_runtime)
target_link_libraries(mlc_llm PRIVATE tokenizers_cpp)

find_library(FLASH_ATTN_LIBRARY flash_attn
             HINTS ${TVM_SOURCE_DIR}/*/3rdparty/libflash_attn/src)

if(FLASH_ATTN_LIBRARY STREQUAL "FLASH_ATTN_LIBRARY-NOTFOUND")
  message(
    WARNING
      "Cannot find libflash_attn. The model must not have been built with --use-flash-attn-mqa option."
  )
else()
  target_link_libraries(mlc_llm PUBLIC -Wl,--no-as-needed ${FLASH_ATTN_LIBRARY})
endif()

if(CMAKE_BUILD_TYPE STREQUAL "Debug")
  target_compile_definitions(mlc_llm PRIVATE "TVM_LOG_DEBUG")
  target_compile_definitions(mlc_llm_objs PRIVATE "TVM_LOG_DEBUG")
  target_compile_definitions(mlc_llm_static PRIVATE "TVM_LOG_DEBUG")
endif()

if(BUILD_CPP_TEST)
  message(STATUS "Building cpp unittests")
  add_subdirectory(3rdparty/googletest)
  file(GLOB_RECURSE MLC_LLM_TEST_SRCS
       ${PROJECT_SOURCE_DIR}/tests/cpp/*unittest.cc)
  add_executable(mlc_llm_cpp_tests ${MLC_LLM_TEST_SRCS})
  target_include_directories(mlc_llm_cpp_tests PRIVATE ${MLC_LLM_INCLUDES})
  target_include_directories(mlc_llm_cpp_tests
                             PRIVATE ${PROJECT_SOURCE_DIR}/cpp)
  target_include_directories(
    mlc_llm_cpp_tests PRIVATE ${gtest_SOURCE_DIR}/include ${gtest_SOURCE_DIR})
  target_link_libraries(mlc_llm_cpp_tests PUBLIC mlc_llm gtest gtest_main)
endif(BUILD_CPP_TEST)

if(CMAKE_SYSTEM_NAME STREQUAL "Android")
  target_link_libraries(mlc_llm PRIVATE log)
  target_link_libraries(tokenizers_cpp PRIVATE log)
endif()

add_library(mlc_llm_module SHARED $<TARGET_OBJECTS:mlc_llm_objs>)
target_link_libraries(mlc_llm_module PUBLIC tvm)
target_link_libraries(mlc_llm_module PRIVATE tokenizers_cpp)

set_property(
  TARGET mlc_llm_module
  APPEND
  PROPERTY LINK_OPTIONS "${MLC_VISIBILITY_FLAG}")
set_property(
  TARGET mlc_llm
  APPEND
  PROPERTY LINK_OPTIONS "${MLC_VISIBILITY_FLAG}")

find_program(CARGO_EXECUTABLE cargo)

if(NOT CARGO_EXECUTABLE)
  message(FATAL_ERROR "Cargo is not found! Please install cargo.")
endif()

# when this option is on, we install all static lib deps into lib
if(MLC_LLM_INSTALL_STATIC_LIB)
  install(TARGETS mlc_llm_static tokenizers_cpp sentencepiece-static tvm_runtime
          LIBRARY DESTINATION lib${LIB_SUFFIX})
  # tokenizers need special handling as it builds from rust
  if(MSVC)
    install(FILES ${CMAKE_CURRENT_BINARY_DIR}/tokenizers/libtokenizers_c.lib
            DESTINATION lib${LIB_SUFFIX})
  else()
    install(FILES ${CMAKE_CURRENT_BINARY_DIR}/tokenizers/libtokenizers_c.a
            DESTINATION lib${LIB_SUFFIX})
  endif()
else()
  install(
    TARGETS tvm_runtime
            mlc_llm
            mlc_llm_module
            mlc_llm_static
            tokenizers_cpp
            sentencepiece-static
            RUNTIME_DEPENDENCY_SET
            tokenizers_c
    RUNTIME DESTINATION bin
    LIBRARY DESTINATION lib${LIB_SUFFIX})
endif()

# Python package installation configuration This section ensures that all
# necessary files are installed for the Python wheel
if(MLC_LLM_BUILD_PYTHON_MODULE)
  message(STATUS "Configuring Python package installation")

  # Set RPATH for mlc_llm and mlc_llm_module to find other libraries relatively
  if(APPLE)
    # macOS uses @loader_path
    set_target_properties(mlc_llm PROPERTIES INSTALL_RPATH "@loader_path")
    set_target_properties(mlc_llm_module PROPERTIES INSTALL_RPATH
                                                    "@loader_path")
  elseif(LINUX)
    # Linux uses $ORIGIN
    set_target_properties(mlc_llm PROPERTIES INSTALL_RPATH "\$ORIGIN")
    set_target_properties(mlc_llm_module PROPERTIES INSTALL_RPATH "\$ORIGIN")
  endif()

  # Install compiled shared libraries
  install(TARGETS mlc_llm DESTINATION ".")
  install(TARGETS mlc_llm_module DESTINATION ".")
  install(DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}/cpp/" DESTINATION "cpp/")
  install(DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}/web/" DESTINATION "web/")
  install(FILES "${CMAKE_CURRENT_SOURCE_DIR}/README.md"
                "${CMAKE_CURRENT_SOURCE_DIR}/LICENSE"
                "${CMAKE_CURRENT_SOURCE_DIR}/NOTICE" DESTINATION ".")

  message(STATUS "Python package installation configured")
endif()


================================================
FILE: CONTRIBUTORS.md
================================================
MLC LLM Contributors
====================


## List of Contributors
- [Full List of Contributors](https://github.com/mlc-ai/mlc-llm/graphs/contributors)


================================================
FILE: LICENSE
================================================
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: NOTICE
================================================
MLC LLM

Copyright (c) 2023-2025 by MLC LLM Contributors


================================================
FILE: README.md
================================================
<div align="center">

# MLC LLM

[![Installation](https://img.shields.io/badge/docs-latest-green)](https://llm.mlc.ai/docs/)
[![License](https://img.shields.io/badge/license-apache_2-blue)](https://github.com/mlc-ai/mlc-llm/blob/main/LICENSE)
[![Join Discoard](https://img.shields.io/badge/Join-Discord-7289DA?logo=discord&logoColor=white)](https://discord.gg/9Xpy2HGBuD)
[![Related Repository: WebLLM](https://img.shields.io/badge/Related_Repo-WebLLM-fafbfc?logo=github)](https://github.com/mlc-ai/web-llm/)

**Universal LLM Deployment Engine with ML Compilation**

[Get Started](https://llm.mlc.ai/docs/get_started/quick_start) | [Documentation](https://llm.mlc.ai/docs) | [Blog](https://blog.mlc.ai/)

</div>

## About

MLC LLM is a machine learning compiler and high-performance deployment engine for large language models.  The mission of this project is to enable everyone to develop, optimize, and deploy AI models natively on everyone's platforms. 

<div align="center">
<table style="width:100%">
  <thead>
    <tr>
      <th style="width:15%"> </th>
      <th style="width:20%">AMD GPU</th>
      <th style="width:20%">NVIDIA GPU</th>
      <th style="width:20%">Apple GPU</th>
      <th style="width:24%">Intel GPU</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Linux / Win</td>
      <td>✅ Vulkan, ROCm</td>
      <td>✅ Vulkan, CUDA</td>
      <td>N/A</td>
      <td>✅ Vulkan</td>
    </tr>
    <tr>
      <td>macOS</td>
      <td>✅ Metal (dGPU)</td>
      <td>N/A</td>
      <td>✅ Metal</td>
      <td>✅ Metal (iGPU)</td>
    </tr>
    <tr>
      <td>Web Browser</td>
      <td colspan=4>✅ WebGPU and WASM </td>
    </tr>
    <tr>
      <td>iOS / iPadOS</td>
      <td colspan=4>✅ Metal on Apple A-series GPU</td>
    </tr>
    <tr>
      <td>Android</td>
      <td colspan=2>✅ OpenCL on Adreno GPU</td>
      <td colspan=2>✅ OpenCL on Mali GPU</td>
    </tr>
  </tbody>
</table>
</div>

MLC LLM compiles and runs code on MLCEngine -- a unified high-performance LLM inference engine across the above platforms. MLCEngine provides OpenAI-compatible API available through REST server, python, javascript, iOS, Android, all backed by the same engine and compiler that we keep improving with the community.

## Get Started

Please visit our [documentation](https://llm.mlc.ai/docs/) to get started with MLC LLM.
- [Installation](https://llm.mlc.ai/docs/install/mlc_llm)
- [Quick start](https://llm.mlc.ai/docs/get_started/quick_start)
- [Introduction](https://llm.mlc.ai/docs/get_started/introduction)

## Citation

Please consider citing our project if you find it useful:

```bibtex
@software{mlc-llm,
    author = {{MLC team}},
    title = {{MLC-LLM}},
    url = {https://github.com/mlc-ai/mlc-llm},
    year = {2023-2025}
}
```

The underlying techniques of MLC LLM include:

<details>
  <summary>References (Click to expand)</summary>

  ```bibtex
  @inproceedings{tensorir,
      author = {Feng, Siyuan and Hou, Bohan and Jin, Hongyi and Lin, Wuwei and Shao, Junru and Lai, Ruihang and Ye, Zihao and Zheng, Lianmin and Yu, Cody Hao and Yu, Yong and Chen, Tianqi},
      title = {TensorIR: An Abstraction for Automatic Tensorized Program Optimization},
      year = {2023},
      isbn = {9781450399166},
      publisher = {Association for Computing Machinery},
      address = {New York, NY, USA},
      url = {https://doi.org/10.1145/3575693.3576933},
      doi = {10.1145/3575693.3576933},
      booktitle = {Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2},
      pages = {804–817},
      numpages = {14},
      keywords = {Tensor Computation, Machine Learning Compiler, Deep Neural Network},
      location = {Vancouver, BC, Canada},
      series = {ASPLOS 2023}
  }

  @inproceedings{metaschedule,
      author = {Shao, Junru and Zhou, Xiyou and Feng, Siyuan and Hou, Bohan and Lai, Ruihang and Jin, Hongyi and Lin, Wuwei and Masuda, Masahiro and Yu, Cody Hao and Chen, Tianqi},
      booktitle = {Advances in Neural Information Processing Systems},
      editor = {S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh},
      pages = {35783--35796},
      publisher = {Curran Associates, Inc.},
      title = {Tensor Program Optimization with Probabilistic Programs},
      url = {https://proceedings.neurips.cc/paper_files/paper/2022/file/e894eafae43e68b4c8dfdacf742bcbf3-Paper-Conference.pdf},
      volume = {35},
      year = {2022}
  }

  @inproceedings{tvm,
      author = {Tianqi Chen and Thierry Moreau and Ziheng Jiang and Lianmin Zheng and Eddie Yan and Haichen Shen and Meghan Cowan and Leyuan Wang and Yuwei Hu and Luis Ceze and Carlos Guestrin and Arvind Krishnamurthy},
      title = {{TVM}: An Automated {End-to-End} Optimizing Compiler for Deep Learning},
      booktitle = {13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18)},
      year = {2018},
      isbn = {978-1-939133-08-3},
      address = {Carlsbad, CA},
      pages = {578--594},
      url = {https://www.usenix.org/conference/osdi18/presentation/chen},
      publisher = {USENIX Association},
      month = oct,
  }
  ```
</details>


================================================
FILE: android/.gitignore
================================================
app/src/main/jni/*.h
app/src/main/jni/*.cc
app/src/main/obj

*.iml
.gradle
/local.properties
/.idea/caches
/.idea/libraries
/.idea/modules.xml
/.idea/workspace.xml
/.idea/navEditor.xml
/.idea/assetWizardSettings.xml
.DS_Store
/build
/captures
.externalNativeBuild
.cxx
local.properties


================================================
FILE: android/MLCChat/README.md
================================================
# MLC-LLM Android

Checkout [Documentation page](https://llm.mlc.ai/docs/deploy/android.html) for more information.

- run `mlc_llm package`
- open this `MLCChat/` folder as a project in Android Studio


================================================
FILE: android/MLCChat/app/.gitignore
================================================
/build
/src/main/libs


================================================
FILE: android/MLCChat/app/build.gradle
================================================
plugins {
    id 'com.android.application'
    id 'org.jetbrains.kotlin.android'
}

android {
    namespace 'ai.mlc.mlcchat'
    compileSdk 35

    defaultConfig {
        applicationId "ai.mlc.mlcchat"
        minSdk 26
        targetSdk 33
        versionCode 1
        versionName "1.0"

        testInstrumentationRunner "androidx.test.runner.AndroidJUnitRunner"
        vectorDrawables {
            useSupportLibrary true
        }
    }

    buildTypes {
        release {
            minifyEnabled false
            proguardFiles getDefaultProguardFile('proguard-android-optimize.txt'), 'proguard-rules.pro'
        }
    }
    compileOptions {
        sourceCompatibility JavaVersion.VERSION_1_8
        targetCompatibility JavaVersion.VERSION_1_8
    }
    kotlinOptions {
        jvmTarget = '1.8'
    }
    buildFeatures {
        compose true
    }
    composeOptions {
        kotlinCompilerExtensionVersion '1.4.3'
    }
    packagingOptions {
        resources {
            excludes += '/META-INF/{AL2.0,LGPL2.1}'
        }
    }
}

dependencies {
    implementation project(":mlc4j")
    implementation 'androidx.core:core-ktx:1.10.1'
    implementation 'androidx.lifecycle:lifecycle-runtime-ktx:2.6.1'
    implementation 'com.github.jeziellago:compose-markdown:0.5.2'
    implementation 'androidx.activity:activity-compose:1.7.1'
    implementation platform('androidx.compose:compose-bom:2022.10.00')
    implementation 'androidx.lifecycle:lifecycle-viewmodel-compose:2.6.1'
    implementation 'androidx.compose.ui:ui'
    implementation 'androidx.compose.ui:ui-graphics'
    implementation 'androidx.compose.ui:ui-tooling-preview'
    implementation 'androidx.compose.material3:material3:1.1.0'
    implementation 'androidx.compose.material:material-icons-extended'
    implementation 'androidx.appcompat:appcompat:1.6.1'
    implementation 'androidx.navigation:navigation-compose:2.5.3'
    implementation 'com.google.code.gson:gson:2.10.1'
    implementation fileTree(dir: 'src/main/libs', include: ['*.aar', '*.jar'], exclude: [])
    testImplementation 'junit:junit:4.13.2'
    androidTestImplementation 'androidx.test.ext:junit:1.1.5'
    androidTestImplementation 'androidx.test.espresso:espresso-core:3.5.1'
    androidTestImplementation platform('androidx.compose:compose-bom:2022.10.00')
    androidTestImplementation 'androidx.compose.ui:ui-test-junit4'
    debugImplementation 'androidx.compose.ui:ui-tooling'
    debugImplementation 'androidx.compose.ui:ui-test-manifest'

}


================================================
FILE: android/MLCChat/app/proguard-rules.pro
================================================
# Add project specific ProGuard rules here.
# You can control the set of applied configuration files using the
# proguardFiles setting in build.gradle.
#
# For more details, see
#   http://developer.android.com/guide/developing/tools/proguard.html

# If your project uses WebView with JS, uncomment the following
# and specify the fully qualified class name to the JavaScript interface
# class:
#-keepclassmembers class fqcn.of.javascript.interface.for.webview {
#   public *;
#}

# Uncomment this to preserve the line number information for
# debugging stack traces.
#-keepattributes SourceFile,LineNumberTable

# If you keep the line number information, uncomment this to
# hide the original source file name.
#-renamesourcefileattribute SourceFile


================================================
FILE: android/MLCChat/app/src/main/AndroidManifest.xml
================================================
<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:tools="http://schemas.android.com/tools"
    package="ai.mlc.mlcchat">

    <uses-permission android:name="android.permission.INTERNET" />
    <uses-permission android:name="android.permission.READ_MEDIA_IMAGES" />
    <uses-permission
        android:name="android.permission.WRITE_EXTERNAL_STORAGE"
        android:maxSdkVersion="32"
        tools:ignore="ScopedStorage" />

    <application
        android:allowBackup="true"
        android:dataExtractionRules="@xml/data_extraction_rules"
        android:fullBackupContent="@xml/backup_rules"
        android:icon="@drawable/mlc_logo_108"
        android:label="@string/app_name"
        android:roundIcon="@drawable/mlc_logo_108"
        android:supportsRtl="true"
        android:theme="@style/Theme.MLCChat"
        tools:targetApi="31">
        <uses-native-library
            android:name="libOpenCL.so"
            android:required="false"/>

        <uses-native-library
            android:name="libOpenCL-pixel.so"
            android:required="false" />
        <activity
            android:name=".MainActivity"
            android:exported="true"
            android:label="@string/app_name"
            android:theme="@android:style/Theme.Material.NoActionBar">
            <intent-filter>
                <action android:name="android.intent.action.MAIN" />
                <category android:name="android.intent.category.LAUNCHER" />
            </intent-filter>
        </activity>
    </application>

</manifest>


================================================
FILE: android/MLCChat/app/src/main/java/ai/mlc/mlcchat/AppViewModel.kt
================================================
package ai.mlc.mlcchat

import ai.mlc.mlcllm.MLCEngine
import ai.mlc.mlcllm.OpenAIProtocol
import android.app.Application
import android.content.ClipData
import android.content.ClipboardManager
import android.content.Context
import android.os.Environment
import android.widget.Toast
import androidx.compose.runtime.mutableStateOf
import androidx.compose.runtime.toMutableStateList
import androidx.lifecycle.AndroidViewModel
import androidx.lifecycle.viewModelScope
import com.google.gson.Gson
import com.google.gson.annotations.SerializedName
import kotlinx.coroutines.launch
import java.io.File
import java.io.FileOutputStream
import java.net.URL
import java.nio.channels.Channels
import java.util.UUID
import java.util.concurrent.Executors
import kotlin.concurrent.thread
import ai.mlc.mlcllm.OpenAIProtocol.ChatCompletionMessage
import ai.mlc.mlcllm.OpenAIProtocol.ChatCompletionMessageContent
import android.app.Activity
import kotlinx.coroutines.*
import android.graphics.Bitmap
import android.graphics.BitmapFactory
import android.net.Uri
import java.io.ByteArrayOutputStream
import android.util.Base64
import android.util.Log

class AppViewModel(application: Application) : AndroidViewModel(application) {
    val modelList = emptyList<ModelState>().toMutableStateList()
    val chatState = ChatState()
    val modelSampleList = emptyList<ModelRecord>().toMutableStateList()
    private var showAlert = mutableStateOf(false)
    private var alertMessage = mutableStateOf("")
    private var appConfig = AppConfig(
        emptyList<String>().toMutableList(),
        emptyList<ModelRecord>().toMutableList()
    )
    private val application = getApplication<Application>()
    private val appDirFile = application.getExternalFilesDir("")
    private val gson = Gson()
    private val modelIdSet = emptySet<String>().toMutableSet()

    companion object {
        const val AppConfigFilename = "mlc-app-config.json"
        const val ModelConfigFilename = "mlc-chat-config.json"
        const val ParamsConfigFilename = "tensor-cache.json"
        const val ModelUrlSuffix = "resolve/main/"
    }

    init {
        loadAppConfig()
    }

    fun isShowingAlert(): Boolean {
        return showAlert.value
    }

    fun errorMessage(): String {
        return alertMessage.value
    }

    fun dismissAlert() {
        require(showAlert.value)
        showAlert.value = false
    }

    fun copyError() {
        require(showAlert.value)
        val clipboard =
            application.getSystemService(Context.CLIPBOARD_SERVICE) as ClipboardManager
        clipboard.setPrimaryClip(ClipData.newPlainText("MLCChat", errorMessage()))
    }

    private fun issueAlert(error: String) {
        showAlert.value = true
        alertMessage.value = error
    }

    fun requestDeleteModel(modelId: String) {
        deleteModel(modelId)
        issueAlert("Model: $modelId has been deleted")
    }


    private fun loadAppConfig() {
        val appConfigFile = File(appDirFile, AppConfigFilename)
        val jsonString: String = if (!appConfigFile.exists()) {
            application.assets.open(AppConfigFilename).bufferedReader().use { it.readText() }
        } else {
            appConfigFile.readText()
        }
        appConfig = gson.fromJson(jsonString, AppConfig::class.java)
        appConfig.modelLibs = emptyList<String>().toMutableList()
        modelList.clear()
        modelIdSet.clear()
        modelSampleList.clear()
        for (modelRecord in appConfig.modelList) {
            appConfig.modelLibs.add(modelRecord.modelLib)
            val modelDirFile = File(appDirFile, modelRecord.modelId)
            val modelConfigFile = File(modelDirFile, ModelConfigFilename)
            if (modelConfigFile.exists()) {
                val modelConfigString = modelConfigFile.readText()
                val modelConfig = gson.fromJson(modelConfigString, ModelConfig::class.java)
                modelConfig.modelId = modelRecord.modelId
                modelConfig.modelLib = modelRecord.modelLib
                modelConfig.estimatedVramBytes = modelRecord.estimatedVramBytes
                addModelConfig(modelConfig, modelRecord.modelUrl, true)
            } else {
                downloadModelConfig(
                    if (modelRecord.modelUrl.endsWith("/")) modelRecord.modelUrl else "${modelRecord.modelUrl}/",
                    modelRecord,
                    true
                )
            }
        }
    }

    private fun updateAppConfig(action: () -> Unit) {
        action()
        val jsonString = gson.toJson(appConfig)
        val appConfigFile = File(appDirFile, AppConfigFilename)
        appConfigFile.writeText(jsonString)
    }

    private fun addModelConfig(modelConfig: ModelConfig, modelUrl: String, isBuiltin: Boolean) {
        require(!modelIdSet.contains(modelConfig.modelId))
        modelIdSet.add(modelConfig.modelId)
        modelList.add(
            ModelState(
                modelConfig,
                modelUrl + if (modelUrl.endsWith("/")) "" else "/",
                File(appDirFile, modelConfig.modelId)
            )
        )
        if (!isBuiltin) {
            updateAppConfig {
                appConfig.modelList.add(
                    ModelRecord(
                        modelUrl,
                        modelConfig.modelId,
                        modelConfig.estimatedVramBytes,
                        modelConfig.modelLib
                    )
                )
            }
        }
    }

    private fun deleteModel(modelId: String) {
        val modelDirFile = File(appDirFile, modelId)
        modelDirFile.deleteRecursively()
        require(!modelDirFile.exists())
        modelIdSet.remove(modelId)
        modelList.removeIf { modelState -> modelState.modelConfig.modelId == modelId }
        updateAppConfig {
            appConfig.modelList.removeIf { modelRecord -> modelRecord.modelId == modelId }
        }
    }

    private fun isModelConfigAllowed(modelConfig: ModelConfig): Boolean {
        if (appConfig.modelLibs.contains(modelConfig.modelLib)) return true
        viewModelScope.launch {
            issueAlert("Model lib ${modelConfig.modelLib} is not supported.")
        }
        return false
    }


    private fun downloadModelConfig(
        modelUrl: String,
        modelRecord: ModelRecord,
        isBuiltin: Boolean
    ) {
        thread(start = true) {
            try {
                val url = URL("${modelUrl}${ModelUrlSuffix}${ModelConfigFilename}")
                val tempId = UUID.randomUUID().toString()
                val tempFile = File(
                    application.getExternalFilesDir(Environment.DIRECTORY_DOWNLOADS),
                    tempId
                )
                url.openStream().use {
                    Channels.newChannel(it).use { src ->
                        FileOutputStream(tempFile).use { fileOutputStream ->
                            fileOutputStream.channel.transferFrom(src, 0, Long.MAX_VALUE)
                        }
                    }
                }
                require(tempFile.exists())
                viewModelScope.launch {
                    try {
                        val modelConfigString = tempFile.readText()
                        val modelConfig = gson.fromJson(modelConfigString, ModelConfig::class.java)
                        modelConfig.modelId = modelRecord.modelId
                        modelConfig.modelLib = modelRecord.modelLib
                        modelConfig.estimatedVramBytes = modelRecord.estimatedVramBytes
                        if (modelIdSet.contains(modelConfig.modelId)) {
                            tempFile.delete()
                            issueAlert("${modelConfig.modelId} has been used, please consider another local ID")
                            return@launch
                        }
                        if (!isModelConfigAllowed(modelConfig)) {
                            tempFile.delete()
                            return@launch
                        }
                        val modelDirFile = File(appDirFile, modelConfig.modelId)
                        val modelConfigFile = File(modelDirFile, ModelConfigFilename)
                        tempFile.copyTo(modelConfigFile, overwrite = true)
                        tempFile.delete()
                        require(modelConfigFile.exists())
                        addModelConfig(modelConfig, modelUrl, isBuiltin)
                    } catch (e: Exception) {
                        viewModelScope.launch {
                            issueAlert("Add model failed: ${e.localizedMessage}")
                        }
                    }
                }
            } catch (e: Exception) {
                viewModelScope.launch {
                    issueAlert("Download model config failed: ${e.localizedMessage}")
                }
            }

        }
    }

    inner class ModelState(
        val modelConfig: ModelConfig,
        private val modelUrl: String,
        private val modelDirFile: File
    ) {
        var modelInitState = mutableStateOf(ModelInitState.Initializing)
        private var paramsConfig = ParamsConfig(emptyList())
        val progress = mutableStateOf(0)
        val total = mutableStateOf(1)
        val id: UUID = UUID.randomUUID()
        private val remainingTasks = emptySet<DownloadTask>().toMutableSet()
        private val downloadingTasks = emptySet<DownloadTask>().toMutableSet()
        private val maxDownloadTasks = 3
        private val gson = Gson()


        init {
            switchToInitializing()
        }

        private fun switchToInitializing() {
            val paramsConfigFile = File(modelDirFile, ParamsConfigFilename)
            if (paramsConfigFile.exists()) {
                loadParamsConfig()
                switchToIndexing()
            } else {
                downloadParamsConfig()
            }
        }

        private fun loadParamsConfig() {
            val paramsConfigFile = File(modelDirFile, ParamsConfigFilename)
            require(paramsConfigFile.exists())
            val jsonString = paramsConfigFile.readText()
            paramsConfig = gson.fromJson(jsonString, ParamsConfig::class.java)
        }

        private fun downloadParamsConfig() {
            thread(start = true) {
                val url = URL("${modelUrl}${ModelUrlSuffix}${ParamsConfigFilename}")
                val tempId = UUID.randomUUID().toString()
                val tempFile = File(modelDirFile, tempId)
                url.openStream().use {
                    Channels.newChannel(it).use { src ->
                        FileOutputStream(tempFile).use { fileOutputStream ->
                            fileOutputStream.channel.transferFrom(src, 0, Long.MAX_VALUE)
                        }
                    }
                }
                require(tempFile.exists())
                val paramsConfigFile = File(modelDirFile, ParamsConfigFilename)
                tempFile.renameTo(paramsConfigFile)
                require(paramsConfigFile.exists())
                viewModelScope.launch {
                    loadParamsConfig()
                    switchToIndexing()
                }
            }
        }

        fun handleStart() {
            switchToDownloading()
        }

        fun handlePause() {
            switchToPausing()
        }

        fun handleClear() {
            require(
                modelInitState.value == ModelInitState.Downloading ||
                        modelInitState.value == ModelInitState.Paused ||
                        modelInitState.value == ModelInitState.Finished
            )
            switchToClearing()
        }

        private fun switchToClearing() {
            if (modelInitState.value == ModelInitState.Paused) {
                modelInitState.value = ModelInitState.Clearing
                clear()
            } else if (modelInitState.value == ModelInitState.Finished) {
                modelInitState.value = ModelInitState.Clearing
                if (chatState.modelName.value == modelConfig.modelId) {
                    chatState.requestTerminateChat { clear() }
                } else {
                    clear()
                }
            } else {
                modelInitState.value = ModelInitState.Clearing
            }
        }

        fun handleDelete() {
            require(
                modelInitState.value == ModelInitState.Downloading ||
                        modelInitState.value == ModelInitState.Paused ||
                        modelInitState.value == ModelInitState.Finished
            )
            switchToDeleting()
        }

        private fun switchToDeleting() {
            if (modelInitState.value == ModelInitState.Paused) {
                modelInitState.value = ModelInitState.Deleting
                delete()
            } else if (modelInitState.value == ModelInitState.Finished) {
                modelInitState.value = ModelInitState.Deleting
                if (chatState.modelName.value == modelConfig.modelId) {
                    chatState.requestTerminateChat { delete() }
                } else {
                    delete()
                }
            } else {
                modelInitState.value = ModelInitState.Deleting
            }
        }

        private fun switchToIndexing() {
            modelInitState.value = ModelInitState.Indexing
            progress.value = 0
            total.value = modelConfig.tokenizerFiles.size + paramsConfig.paramsRecords.size
            for (tokenizerFilename in modelConfig.tokenizerFiles) {
                val file = File(modelDirFile, tokenizerFilename)
                if (file.exists()) {
                    ++progress.value
                } else {
                    remainingTasks.add(
                        DownloadTask(
                            URL("${modelUrl}${ModelUrlSuffix}${tokenizerFilename}"),
                            file
                        )
                    )
                }
            }
            for (paramsRecord in paramsConfig.paramsRecords) {
                val file = File(modelDirFile, paramsRecord.dataPath)
                if (file.exists()) {
                    ++progress.value
                } else {
                    remainingTasks.add(
                        DownloadTask(
                            URL("${modelUrl}${ModelUrlSuffix}${paramsRecord.dataPath}"),
                            file
                        )
                    )
                }
            }
            if (progress.value < total.value) {
                switchToPaused()
            } else {
                switchToFinished()
            }
        }

        private fun switchToDownloading() {
            modelInitState.value = ModelInitState.Downloading
            for (downloadTask in remainingTasks) {
                if (downloadingTasks.size < maxDownloadTasks) {
                    handleNewDownload(downloadTask)
                } else {
                    return
                }
            }
        }

        private fun handleNewDownload(downloadTask: DownloadTask) {
            require(modelInitState.value == ModelInitState.Downloading)
            require(!downloadingTasks.contains(downloadTask))
            downloadingTasks.add(downloadTask)
            thread(start = true) {
                val tempId = UUID.randomUUID().toString()
                val tempFile = File(modelDirFile, tempId)
                downloadTask.url.openStream().use {
                    Channels.newChannel(it).use { src ->
                        FileOutputStream(tempFile).use { fileOutputStream ->
                            fileOutputStream.channel.transferFrom(src, 0, Long.MAX_VALUE)
                        }
                    }
                }
                require(tempFile.exists())
                tempFile.renameTo(downloadTask.file)
                require(downloadTask.file.exists())
                viewModelScope.launch {
                    handleFinishDownload(downloadTask)
                }
            }
        }

        private fun handleNextDownload() {
            require(modelInitState.value == ModelInitState.Downloading)
            for (downloadTask in remainingTasks) {
                if (!downloadingTasks.contains(downloadTask)) {
                    handleNewDownload(downloadTask)
                    break
                }
            }
        }

        private fun handleFinishDownload(downloadTask: DownloadTask) {
            remainingTasks.remove(downloadTask)
            downloadingTasks.remove(downloadTask)
            ++progress.value
            require(
                modelInitState.value == ModelInitState.Downloading ||
                        modelInitState.value == ModelInitState.Pausing ||
                        modelInitState.value == ModelInitState.Clearing ||
                        modelInitState.value == ModelInitState.Deleting
            )
            if (modelInitState.value == ModelInitState.Downloading) {
                if (remainingTasks.isEmpty()) {
                    if (downloadingTasks.isEmpty()) {
                        switchToFinished()
                    }
                } else {
                    handleNextDownload()
                }
            } else if (modelInitState.value == ModelInitState.Pausing) {
                if (downloadingTasks.isEmpty()) {
                    switchToPaused()
                }
            } else if (modelInitState.value == ModelInitState.Clearing) {
                if (downloadingTasks.isEmpty()) {
                    clear()
                }
            } else if (modelInitState.value == ModelInitState.Deleting) {
                if (downloadingTasks.isEmpty()) {
                    delete()
                }
            }
        }

        private fun clear() {
            val files = modelDirFile.listFiles { dir, name ->
                !(dir == modelDirFile && name == ModelConfigFilename)
            }
            require(files != null)
            for (file in files) {
                file.deleteRecursively()
                require(!file.exists())
            }
            val modelConfigFile = File(modelDirFile, ModelConfigFilename)
            require(modelConfigFile.exists())
            switchToIndexing()
        }

        private fun delete() {
            modelDirFile.deleteRecursively()
            require(!modelDirFile.exists())
            requestDeleteModel(modelConfig.modelId)
        }

        private fun switchToPausing() {
            modelInitState.value = ModelInitState.Pausing
        }

        private fun switchToPaused() {
            modelInitState.value = ModelInitState.Paused
        }


        private fun switchToFinished() {
            modelInitState.value = ModelInitState.Finished
        }

        fun startChat() {
            chatState.requestReloadChat(
                modelConfig,
                modelDirFile.absolutePath,
            )
        }

    }

    inner class ChatState {
        val messages = emptyList<MessageData>().toMutableStateList()
        val report = mutableStateOf("")
        val modelName = mutableStateOf("")
        private var modelChatState = mutableStateOf(ModelChatState.Ready)
            @Synchronized get
            @Synchronized set
        private val engine = MLCEngine()
        private var historyMessages = mutableListOf<ChatCompletionMessage>()
        private var modelLib = ""
        private var modelPath = ""
        private val executorService = Executors.newSingleThreadExecutor()
        private val viewModelScope = CoroutineScope(Dispatchers.Main + Job())
        private var imageUri: Uri? = null
        private fun mainResetChat() {
            imageUri = null
            executorService.submit {
                callBackend { engine.reset() }
                historyMessages = mutableListOf<ChatCompletionMessage>()
                viewModelScope.launch {
                    clearHistory()
                    switchToReady()
                }
            }
        }

        private fun clearHistory() {
            messages.clear()
            report.value = ""
            historyMessages.clear()
        }


        private fun switchToResetting() {
            modelChatState.value = ModelChatState.Resetting
        }

        private fun switchToGenerating() {
            modelChatState.value = ModelChatState.Generating
        }

        private fun switchToReloading() {
            modelChatState.value = ModelChatState.Reloading
        }

        private fun switchToReady() {
            modelChatState.value = ModelChatState.Ready
        }

        private fun switchToFailed() {
            modelChatState.value = ModelChatState.Falied
        }

        private fun callBackend(callback: () -> Unit): Boolean {
            try {
                callback()
            } catch (e: Exception) {
                viewModelScope.launch {
                    val stackTrace = e.stackTraceToString()
                    val errorMessage = e.localizedMessage
                    appendMessage(
                        MessageRole.Assistant,
                        "MLCChat failed\n\nStack trace:\n$stackTrace\n\nError message:\n$errorMessage"
                    )
                    switchToFailed()
                }
                return false
            }
            return true
        }

        fun requestResetChat() {
            require(interruptable())
            interruptChat(
                prologue = {
                    switchToResetting()
                },
                epilogue = {
                    mainResetChat()
                }
            )
        }

        private fun interruptChat(prologue: () -> Unit, epilogue: () -> Unit) {
            // prologue runs before interruption
            // epilogue runs after interruption
            require(interruptable())
            if (modelChatState.value == ModelChatState.Ready) {
                prologue()
                epilogue()
            } else if (modelChatState.value == ModelChatState.Generating) {
                prologue()
                executorService.submit {
                    viewModelScope.launch { epilogue() }
                }
            } else {
                require(false)
            }
        }

        fun requestTerminateChat(callback: () -> Unit) {
            require(interruptable())
            interruptChat(
                prologue = {
                    switchToTerminating()
                },
                epilogue = {
                    mainTerminateChat(callback)
                }
            )
        }

        private fun mainTerminateChat(callback: () -> Unit) {
            executorService.submit {
                callBackend { engine.unload() }
                viewModelScope.launch {
                    clearHistory()
                    switchToReady()
                    callback()
                }
            }
        }

        private fun switchToTerminating() {
            modelChatState.value = ModelChatState.Terminating
        }


        fun requestReloadChat(modelConfig: ModelConfig, modelPath: String) {

            if (this.modelName.value == modelConfig.modelId && this.modelLib == modelConfig.modelLib && this.modelPath == modelPath) {
                return
            }
            require(interruptable())
            interruptChat(
                prologue = {
                    switchToReloading()
                },
                epilogue = {
                    mainReloadChat(modelConfig, modelPath)
                }
            )
        }

        private fun mainReloadChat(modelConfig: ModelConfig, modelPath: String) {
            clearHistory()
            this.modelName.value = modelConfig.modelId
            this.modelLib = modelConfig.modelLib
            this.modelPath = modelPath
            executorService.submit {
                viewModelScope.launch {
                    Toast.makeText(application, "Initialize...", Toast.LENGTH_SHORT).show()
                }
                if (!callBackend {
                        engine.unload()
                        engine.reload(modelPath, modelConfig.modelLib)
                    }) return@submit
                viewModelScope.launch {
                    Toast.makeText(application, "Ready to chat", Toast.LENGTH_SHORT).show()
                    switchToReady()
                }
            }
        }

        fun requestImageBitmap(uri: Uri?) {
            require(chatable())
            switchToGenerating()
            executorService.submit {
                imageUri = uri
                viewModelScope.launch {
                    report.value = "Image process is done, ask any question."
                    if (modelChatState.value == ModelChatState.Generating) switchToReady()
                }
            }
        }

        fun bitmapToURL(bm: Bitmap): String {
            val targetSize = 336
            val scaledBitmap = Bitmap.createScaledBitmap(bm, targetSize, targetSize, true)

            val outputStream = ByteArrayOutputStream()
            scaledBitmap.compress(Bitmap.CompressFormat.JPEG, 100, outputStream)
            scaledBitmap.recycle()

            val imageBytes = outputStream.toByteArray()
            val imageBase64 = Base64.encodeToString(imageBytes, Base64.NO_WRAP)
            return "data:image/jpg;base64,$imageBase64"
        }

        fun requestGenerate(prompt: String, activity: Activity) {
            require(chatable())
            switchToGenerating()
            appendMessage(MessageRole.User, prompt)
            appendMessage(MessageRole.Assistant, "")
            var content = ChatCompletionMessageContent(text=prompt)
            if (imageUri != null) {
                val uri = imageUri
                val bitmap = uri?.let {
                    activity.contentResolver.openInputStream(it)?.use { input ->
                        BitmapFactory.decodeStream(input)
                    }
                }
                val imageBase64URL = bitmapToURL(bitmap!!)
                Log.v("requestGenerate", "image base64 url: $imageBase64URL")
                val parts = listOf(
                    mapOf("type" to "text", "text" to prompt),
                    mapOf("type" to "image_url", "image_url" to imageBase64URL)
                )
                content = ChatCompletionMessageContent(parts=parts)
                imageUri = null
            }

            executorService.submit {
                historyMessages.add(ChatCompletionMessage(
                    role = OpenAIProtocol.ChatCompletionRole.user,
                    content = content
                ))

                viewModelScope.launch {
                    val responses = engine.chat.completions.create(
                        messages = historyMessages,
                        stream_options = OpenAIProtocol.StreamOptions(include_usage = true)
                    )

                    var finishReasonLength = false
                    var streamingText = ""

                    for (res in responses) {
                        if (!callBackend {
                            for (choice in res.choices) {
                                choice.delta.content?.let { content ->
                                    streamingText += content.asText()
                                }
                                choice.finish_reason?.let { finishReason ->
                                    if (finishReason == "length") {
                                        finishReasonLength = true
                                    }
                                }
                            }
                            updateMessage(MessageRole.Assistant, streamingText)
                            res.usage?.let { finalUsage ->
                                report.value = finalUsage.extra?.asTextLabel() ?: ""
                            }
                            if (finishReasonLength) {
                                streamingText += " [output truncated due to context length limit...]"
                                updateMessage(MessageRole.Assistant, streamingText)
                            }
                        });
                    }
                    if (streamingText.isNotEmpty()) {
                        historyMessages.add(ChatCompletionMessage(
                            role = OpenAIProtocol.ChatCompletionRole.assistant,
                            content = streamingText
                        ))
                        streamingText = ""
                    } else {
                        if (historyMessages.isNotEmpty()) {
                            historyMessages.removeAt(historyMessages.size - 1)
                        }
                    }

                    if (modelChatState.value == ModelChatState.Generating) switchToReady()
                }
            }
        }

        private fun appendMessage(role: MessageRole, text: String) {
            messages.add(MessageData(role, text))
        }


        private fun updateMessage(role: MessageRole, text: String) {
            messages[messages.size - 1] = MessageData(role, text)
        }

        fun chatable(): Boolean {
            return modelChatState.value == ModelChatState.Ready
        }

        fun interruptable(): Boolean {
            return modelChatState.value == ModelChatState.Ready
                    || modelChatState.value == ModelChatState.Generating
                    || modelChatState.value == ModelChatState.Falied
        }
    }
}

enum class ModelInitState {
    Initializing,
    Indexing,
    Paused,
    Downloading,
    Pausing,
    Clearing,
    Deleting,
    Finished
}

enum class ModelChatState {
    Generating,
    Resetting,
    Reloading,
    Terminating,
    Ready,
    Falied
}

enum class MessageRole {
    Assistant,
    User
}

data class DownloadTask(val url: URL, val file: File)

data class MessageData(val role: MessageRole, val text: String, val id: UUID = UUID.randomUUID(), var imageUri: Uri? = null)

data class AppConfig(
    @SerializedName("model_libs") var modelLibs: MutableList<String>,
    @SerializedName("model_list") val modelList: MutableList<ModelRecord>,
)

data class ModelRecord(
    @SerializedName("model_url") val modelUrl: String,
    @SerializedName("model_id") val modelId: String,
    @SerializedName("estimated_vram_bytes") val estimatedVramBytes: Long?,
    @SerializedName("model_lib") val modelLib: String
)

data class ModelConfig(
    @SerializedName("model_lib") var modelLib: String,
    @SerializedName("model_id") var modelId: String,
    @SerializedName("estimated_vram_bytes") var estimatedVramBytes: Long?,
    @SerializedName("tokenizer_files") val tokenizerFiles: List<String>,
    @SerializedName("context_window_size") val contextWindowSize: Int,
    @SerializedName("prefill_chunk_size") val prefillChunkSize: Int,
)

data class ParamsRecord(
    @SerializedName("dataPath") val dataPath: String
)

data class ParamsConfig(
    @SerializedName("records") val paramsRecords: List<ParamsRecord>
)


================================================
FILE: android/MLCChat/app/src/main/java/ai/mlc/mlcchat/ChatView.kt
================================================
package ai.mlc.mlcchat

import android.app.Activity
import android.graphics.Bitmap
import android.graphics.BitmapFactory
import androidx.compose.foundation.Image
import androidx.compose.foundation.background
import androidx.compose.foundation.gestures.detectTapGestures
import androidx.compose.foundation.layout.Arrangement
import androidx.compose.foundation.layout.Column
import androidx.compose.foundation.layout.IntrinsicSize
import androidx.compose.foundation.layout.Row
import androidx.compose.foundation.layout.aspectRatio
import androidx.compose.foundation.layout.fillMaxSize
import androidx.compose.foundation.layout.fillMaxWidth
import androidx.compose.foundation.layout.height
import androidx.compose.foundation.layout.padding
import androidx.compose.foundation.layout.widthIn
import androidx.compose.foundation.layout.wrapContentHeight
import androidx.compose.foundation.layout.wrapContentWidth
import androidx.compose.foundation.lazy.LazyColumn
import androidx.compose.foundation.lazy.items
import androidx.compose.foundation.lazy.rememberLazyListState
import androidx.compose.foundation.shape.RoundedCornerShape
import androidx.compose.foundation.text.selection.SelectionContainer
import androidx.compose.material.icons.Icons
import androidx.compose.material.icons.filled.AddAPhoto
import androidx.compose.material.icons.filled.ArrowBack
import androidx.compose.material.icons.filled.Photo
import androidx.compose.material.icons.filled.Replay
import androidx.compose.material.icons.filled.Send
import androidx.compose.material3.Divider
import androidx.compose.material3.ExperimentalMaterial3Api
import androidx.compose.material3.Icon
import androidx.compose.material3.IconButton
import androidx.compose.material3.MaterialTheme
import androidx.compose.material3.OutlinedTextField
import androidx.compose.material3.Scaffold
import androidx.compose.material3.Switch
import androidx.compose.material3.Text
import androidx.compose.material3.TopAppBar
import androidx.compose.material3.TopAppBarDefaults
import androidx.compose.runtime.Composable
import androidx.compose.runtime.getValue
import androidx.compose.runtime.mutableStateOf
import androidx.compose.runtime.remember
import androidx.compose.runtime.rememberCoroutineScope
import androidx.compose.runtime.saveable.rememberSaveable
import androidx.compose.runtime.setValue
import androidx.compose.ui.Alignment
import androidx.compose.ui.Modifier
import androidx.compose.ui.graphics.asImageBitmap
import androidx.compose.ui.input.pointer.pointerInput
import androidx.compose.ui.platform.LocalFocusManager
import androidx.compose.ui.text.style.TextAlign
import androidx.compose.ui.tooling.preview.Preview
import androidx.compose.ui.unit.dp
import androidx.navigation.NavController
import dev.jeziellago.compose.markdowntext.MarkdownText
import kotlinx.coroutines.launch

@ExperimentalMaterial3Api
@Composable
fun ChatView(
    navController: NavController, chatState: AppViewModel.ChatState, activity: Activity
) {
    val localFocusManager = LocalFocusManager.current
    (activity as MainActivity).chatState = chatState
    Scaffold(topBar = {
        TopAppBar(
            title = {
                Text(
                    text = "MLCChat: " + chatState.modelName.value.split("-")[0],
                    color = MaterialTheme.colorScheme.onPrimary
                )
            },
            colors = TopAppBarDefaults.topAppBarColors(containerColor = MaterialTheme.colorScheme.primary),
            navigationIcon = {
                IconButton(
                    onClick = { navController.popBackStack() },
                    enabled = chatState.interruptable()
                ) {
                    Icon(
                        imageVector = Icons.Filled.ArrowBack,
                        contentDescription = "back home page",
                        tint = MaterialTheme.colorScheme.onPrimary
                    )
                }
            },
            actions = {
                IconButton(
                    onClick = {
                        chatState.requestResetChat()
                        activity.hasImage = false },
                    enabled = chatState.interruptable()
                ) {
                    Icon(
                        imageVector = Icons.Filled.Replay,
                        contentDescription = "reset the chat",
                        tint = MaterialTheme.colorScheme.onPrimary
                    )
                }
            })
    }, modifier = Modifier.pointerInput(Unit) {
        detectTapGestures(onTap = {
            localFocusManager.clearFocus()
        })
    }) { paddingValues ->
        Column(
            modifier = Modifier
                .fillMaxSize()
                .padding(paddingValues)
                .padding(horizontal = 10.dp)
        ) {
            val lazyColumnListState = rememberLazyListState()
            val coroutineScope = rememberCoroutineScope()
            Text(
                text = chatState.report.value,
                textAlign = TextAlign.Center,
                modifier = Modifier
                    .fillMaxWidth()
                    .wrapContentHeight()
                    .padding(top = 5.dp)
            )
            Divider(thickness = 1.dp, modifier = Modifier.padding(vertical = 5.dp))
            LazyColumn(
                modifier = Modifier.weight(9f),
                verticalArrangement = Arrangement.spacedBy(5.dp, alignment = Alignment.Bottom),
                state = lazyColumnListState
            ) {
                coroutineScope.launch {
                    lazyColumnListState.animateScrollToItem(chatState.messages.size)
                }
                items(
                    items = chatState.messages,
                    key = { message -> message.id },
                ) { message ->
                    MessageView(messageData = message, activity)
                }
                item {
                    // place holder item for scrolling to the bottom
                }
            }
            Divider(thickness = 1.dp, modifier = Modifier.padding(top = 5.dp))
            SendMessageView(chatState = chatState, activity)
        }
    }
}

@Composable
fun MessageView(messageData: MessageData, activity: Activity?) {
    // default render the Assistant text as MarkdownText
    var useMarkdown by remember { mutableStateOf(true) }
    var localActivity : MainActivity = activity as MainActivity
    SelectionContainer {
        if (messageData.role == MessageRole.Assistant) {
            Column {
                if (messageData.text.isNotEmpty()) {
                    Row(
                        verticalAlignment = Alignment.CenterVertically,
                    ) {
                        Text(
                            text = "Show as Markdown",
                            color = MaterialTheme.colorScheme.onSecondaryContainer,
                            modifier = Modifier
                                .wrapContentWidth()
                                .padding(end = 8.dp)
                                .widthIn(max = 300.dp)
                        )
                        Switch(
                            checked = useMarkdown,
                            onCheckedChange = { useMarkdown = it }
                        )
                    }
                }
                Row(
                    horizontalArrangement = Arrangement.Start,
                    modifier = Modifier.fillMaxWidth()
                ) {
                    if (useMarkdown) {
                        MarkdownText(
                            isTextSelectable = true,
                            modifier = Modifier
                                .wrapContentWidth()
                                .background(
                                    color = MaterialTheme.colorScheme.secondaryContainer,
                                    shape = RoundedCornerShape(5.dp)
                                )
                                .padding(5.dp)
                                .widthIn(max = 300.dp),
                            markdown = messageData.text,
                        )
                    } else {
                        Text(
                            text = messageData.text,
                            textAlign = TextAlign.Left,
                            color = MaterialTheme.colorScheme.onSecondaryContainer,
                            modifier = Modifier
                                .wrapContentWidth()
                                .background(
                                    color = MaterialTheme.colorScheme.secondaryContainer,
                                    shape = RoundedCornerShape(5.dp)
                                )
                                .padding(5.dp)
                                .widthIn(max = 300.dp)
                        )
                    }
                }
            }
        } else {
            Row(
                horizontalArrangement = Arrangement.End,
                modifier = Modifier.fillMaxWidth()
            ) {
                if (messageData.imageUri != null) {
                    val uri = messageData.imageUri
                    val bitmap = uri?.let {
                        activity.contentResolver.openInputStream(it)?.use { input ->
                            BitmapFactory.decodeStream(input)
                        }
                    }
                    val displayBitmap = bitmap?.let { Bitmap.createScaledBitmap(it, 224, 224, true) }
                    if (displayBitmap != null) {
                        Image(
                            displayBitmap.asImageBitmap(),
                            "",
                            modifier = Modifier
                                .wrapContentWidth()
                                .background(
                                    color = MaterialTheme.colorScheme.secondaryContainer,
                                    shape = RoundedCornerShape(5.dp)
                                )
                                .padding(5.dp)
                                .widthIn(max = 300.dp)
                        )
                    }
                    if (!localActivity.hasImage) {
                        localActivity.chatState.requestImageBitmap(messageData.imageUri)
                    }
                    localActivity.hasImage = true
                } else {
                    Text(
                        text = messageData.text,
                        textAlign = TextAlign.Right,
                        color = MaterialTheme.colorScheme.onPrimaryContainer,
                        modifier = Modifier
                            .wrapContentWidth()
                            .background(
                                color = MaterialTheme.colorScheme.primaryContainer,
                                shape = RoundedCornerShape(5.dp)
                            )
                            .padding(5.dp)
                            .widthIn(max = 300.dp)
                    )
                }

            }
        }
    }
}

@ExperimentalMaterial3Api
@Composable
fun SendMessageView(chatState: AppViewModel.ChatState, activity: Activity) {
    val localFocusManager = LocalFocusManager.current
    val localActivity : MainActivity = activity as MainActivity
    Row(
        horizontalArrangement = Arrangement.spacedBy(5.dp),
        verticalAlignment = Alignment.CenterVertically,
        modifier = Modifier
            .height(IntrinsicSize.Max)
            .fillMaxWidth()
            .padding(bottom = 5.dp)
    ) {
        var text by rememberSaveable { mutableStateOf("") }
        OutlinedTextField(
            value = text,
            onValueChange = { text = it },
            label = { Text(text = "Input") },
            modifier = Modifier
                .weight(9f),
        )
        IconButton(
            onClick = {
                activity.takePhoto()
            },
            modifier = Modifier
                .aspectRatio(1f)
                .weight(1f),
            enabled = (chatState.chatable() && !localActivity.hasImage)
        ) {
            Icon(
                imageVector = Icons.Filled.AddAPhoto,
                contentDescription = "use camera",
            )
        }
        IconButton(
            onClick = {
                activity.pickImageFromGallery()
            },
            modifier = Modifier
                .aspectRatio(1f)
                .weight(1f),
            enabled = (chatState.chatable() && !localActivity.hasImage)
        ) {
            Icon(
                imageVector = Icons.Filled.Photo,
                contentDescription = "select image",
            )
        }
        IconButton(
            onClick = {
                localFocusManager.clearFocus()
                chatState.requestGenerate(text, activity)
                text = ""
            },
            modifier = Modifier
                .aspectRatio(1f)
                .weight(1f),
            enabled = (text != "" && chatState.chatable())
        ) {
            Icon(
                imageVector = Icons.Filled.Send,
                contentDescription = "send message",
            )
        }
    }
}

@Preview
@Composable
fun MessageViewPreviewWithMarkdown() {
    MessageView(
        messageData = MessageData(
            role = MessageRole.Assistant, text = """
# Sample  Header
* Markdown
* [Link](https://example.com)
<a href="https://www.google.com/">Google</a>
"""
        ), null
    )
}


================================================
FILE: android/MLCChat/app/src/main/java/ai/mlc/mlcchat/MainActivity.kt
================================================
package ai.mlc.mlcchat

import android.Manifest
import android.content.ContentValues
import android.content.pm.PackageManager
import android.net.Uri
import android.os.Build
import android.os.Bundle
import android.provider.MediaStore
import android.util.Log
import androidx.activity.ComponentActivity
import androidx.activity.compose.setContent
import androidx.activity.result.contract.ActivityResultContracts
import androidx.annotation.RequiresApi
import androidx.compose.foundation.layout.fillMaxSize
import androidx.compose.material3.ExperimentalMaterial3Api
import androidx.compose.material3.Surface
import androidx.compose.ui.Modifier
import androidx.core.content.ContextCompat
import ai.mlc.mlcchat.ui.theme.MLCChatTheme
import java.text.SimpleDateFormat
import java.util.Date
import java.util.Locale
import java.util.UUID

class MainActivity : ComponentActivity() {
    var hasImage = false

    private val pickImageLauncher = registerForActivityResult(
        ActivityResultContracts.GetContent()
    ) { uri: Uri? ->
        uri?.let {
            Log.v("pickImageLauncher", "Selected image uri: $it")
            chatState.messages.add(
                MessageData(
                    role = MessageRole.User,
                    text = "",
                    id = UUID.randomUUID(),
                    imageUri = it
                )
            )
        }
    }

    private var cameraImageUri: Uri? = null
    private val takePictureLauncher = registerForActivityResult(
        ActivityResultContracts.TakePicture()
    ) { success: Boolean ->
        if (success && cameraImageUri != null) {
            Log.v("takePictureLauncher", "Camera image uri: $cameraImageUri")
            chatState.messages.add(
                MessageData(
                    role = MessageRole.User,
                    text = "",
                    id = UUID.randomUUID(),
                    imageUri = cameraImageUri
                )
            )
        }
    }

    private val requestPermissionLauncher =
        registerForActivityResult(ActivityResultContracts.RequestMultiplePermissions()) { permissions ->
            permissions.entries.forEach {
                Log.d("Permissions", "${it.key} = ${it.value}")
            }
        }

    lateinit var chatState: AppViewModel.ChatState

    @RequiresApi(Build.VERSION_CODES.TIRAMISU)
    @ExperimentalMaterial3Api
    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)

        chatState = AppViewModel(this.application).ChatState()
        requestNeededPermissions()

        setContent {
            Surface(
                modifier = Modifier.fillMaxSize()
            ) {
                MLCChatTheme {
                    NavView(this)
                }
            }
        }
    }

    private fun requestNeededPermissions() {
        val permissionsToRequest = mutableListOf<String>()

        if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.TIRAMISU) {
            if (ContextCompat.checkSelfPermission(
                    this,
                    Manifest.permission.READ_MEDIA_IMAGES
                ) != PackageManager.PERMISSION_GRANTED
            ) {
                permissionsToRequest.add(Manifest.permission.READ_MEDIA_IMAGES)
            }
            if (ContextCompat.checkSelfPermission(
                    this,
                    Manifest.permission.CAMERA
                ) != PackageManager.PERMISSION_GRANTED
            ) {
                permissionsToRequest.add(Manifest.permission.CAMERA)
            }
        } else {
            if (ContextCompat.checkSelfPermission(
                    this,
                    Manifest.permission.READ_EXTERNAL_STORAGE
                ) != PackageManager.PERMISSION_GRANTED
            ) {
                permissionsToRequest.add(Manifest.permission.READ_EXTERNAL_STORAGE)
            }
            if (ContextCompat.checkSelfPermission(
                    this,
                    Manifest.permission.WRITE_EXTERNAL_STORAGE
                ) != PackageManager.PERMISSION_GRANTED
            ) {
                permissionsToRequest.add(Manifest.permission.WRITE_EXTERNAL_STORAGE)
            }
            if (ContextCompat.checkSelfPermission(
                    this,
                    Manifest.permission.CAMERA
                ) != PackageManager.PERMISSION_GRANTED
            ) {
                permissionsToRequest.add(Manifest.permission.CAMERA)
            }
        }

        if (permissionsToRequest.isNotEmpty()) {
            requestPermissionLauncher.launch(permissionsToRequest.toTypedArray())
        }
    }

    fun pickImageFromGallery() {
        pickImageLauncher.launch("image/*")
    }

    fun takePhoto() {
        val contentValues = ContentValues().apply {
            val timeFormatter = SimpleDateFormat("yyyyMMdd_HHmmss", Locale.getDefault())
            val fileName = "IMG_${timeFormatter.format(Date())}.jpg"
            put(MediaStore.Images.Media.DISPLAY_NAME, fileName)
            put(MediaStore.Images.Media.MIME_TYPE, "image/jpeg")
            put(MediaStore.Images.Media.DATE_ADDED, System.currentTimeMillis() / 1000)
        }

        cameraImageUri = contentResolver.insert(
            MediaStore.Images.Media.EXTERNAL_CONTENT_URI,
            contentValues
        )

        takePictureLauncher.launch(cameraImageUri)
    }
}


================================================
FILE: android/MLCChat/app/src/main/java/ai/mlc/mlcchat/NavView.kt
================================================
package ai.mlc.mlcchat

import android.app.Activity
import androidx.compose.material3.ExperimentalMaterial3Api
import androidx.compose.runtime.Composable
import androidx.lifecycle.viewmodel.compose.viewModel
import androidx.navigation.compose.NavHost
import androidx.navigation.compose.composable
import androidx.navigation.compose.rememberNavController

@ExperimentalMaterial3Api
@Composable
fun NavView(activity: Activity, appViewModel: AppViewModel = viewModel()) {
    val navController = rememberNavController()
    NavHost(navController = navController, startDestination = "home") {
        composable("home") { StartView(navController, appViewModel) }
        composable("chat") { ChatView(navController, appViewModel.chatState, activity) }
    }
}


================================================
FILE: android/MLCChat/app/src/main/java/ai/mlc/mlcchat/StartView.kt
================================================
package ai.mlc.mlcchat

import androidx.compose.foundation.gestures.detectTapGestures
import androidx.compose.foundation.layout.Arrangement
import androidx.compose.foundation.layout.Column
import androidx.compose.foundation.layout.Row
import androidx.compose.foundation.layout.aspectRatio
import androidx.compose.foundation.layout.fillMaxSize
import androidx.compose.foundation.layout.fillMaxWidth
import androidx.compose.foundation.layout.height
import androidx.compose.foundation.layout.padding
import androidx.compose.foundation.layout.width
import androidx.compose.foundation.layout.wrapContentHeight
import androidx.compose.foundation.lazy.LazyColumn
import androidx.compose.foundation.lazy.items
import androidx.compose.foundation.text.selection.SelectionContainer
import androidx.compose.material.icons.Icons
import androidx.compose.material.icons.outlined.Chat
import androidx.compose.material.icons.outlined.Delete
import androidx.compose.material.icons.outlined.Download
import androidx.compose.material.icons.outlined.Pause
import androidx.compose.material.icons.outlined.Schedule
import androidx.compose.material3.AlertDialog
import androidx.compose.material3.Divider
import androidx.compose.material3.ExperimentalMaterial3Api
import androidx.compose.material3.Icon
import androidx.compose.material3.IconButton
import androidx.compose.material3.LinearProgressIndicator
import androidx.compose.material3.MaterialTheme
import androidx.compose.material3.OutlinedTextField
import androidx.compose.material3.Scaffold
import androidx.compose.material3.Text
import androidx.compose.material3.TextButton
import androidx.compose.material3.TopAppBar
import androidx.compose.material3.TopAppBarDefaults
import androidx.compose.runtime.Composable
import androidx.compose.runtime.getValue
import androidx.compose.runtime.mutableStateOf
import androidx.compose.runtime.saveable.rememberSaveable
import androidx.compose.runtime.setValue
import androidx.compose.ui.Alignment
import androidx.compose.ui.Modifier
import androidx.compose.ui.input.pointer.pointerInput
import androidx.compose.ui.platform.LocalFocusManager
import androidx.compose.ui.text.style.TextAlign
import androidx.compose.ui.unit.dp
import androidx.navigation.NavController


@ExperimentalMaterial3Api
@Composable
fun StartView(
    navController: NavController,
    appViewModel: AppViewModel
) {
    val localFocusManager = LocalFocusManager.current
    Scaffold(
        topBar = {
            TopAppBar(
                title = { Text(text = "MLCChat", color = MaterialTheme.colorScheme.onPrimary) },
                colors = TopAppBarDefaults.topAppBarColors(containerColor = MaterialTheme.colorScheme.primary)
            )
        },
        modifier = Modifier.pointerInput(Unit) {
            detectTapGestures(onTap = {
                localFocusManager.clearFocus()
            })
        }
    )
    { paddingValues ->
        Column(
            modifier = Modifier
                .fillMaxSize()
                .padding(paddingValues)
                .padding(horizontal = 10.dp)
        ) {
            Text(text = "Model List", modifier = Modifier.padding(top = 10.dp))
            LazyColumn() {
                items(items = appViewModel.modelList,
                    key = { modelState -> modelState.id }
                ) { modelState ->
                    ModelView(
                        navController = navController,
                        modelState = modelState,
                        appViewModel = appViewModel
                    )
                }
            }
        }
        if (appViewModel.isShowingAlert()) {
            AlertDialog(
                onDismissRequest = { appViewModel.dismissAlert() },
                onConfirmation = { appViewModel.copyError() },
                error = appViewModel.errorMessage()
            )
        }
    }
}

@ExperimentalMaterial3Api
@Composable
fun AlertDialog(
    onDismissRequest: () -> Unit,
    onConfirmation: () -> Unit,
    error: String,
) {
    AlertDialog(
        title = { Text(text = "Error") },
        text = { Text(text = error) },
        onDismissRequest = { onDismissRequest() },
        confirmButton = {
            TextButton(onClick = { onConfirmation() }) { Text("Copy") }
        },
        dismissButton = {
            TextButton(onClick = { onDismissRequest() }) { Text("Dismiss") }
        }
    )
}

@Composable
fun ModelView(
    navController: NavController,
    modelState: AppViewModel.ModelState,
    appViewModel: AppViewModel
) {
    var isDeletingModel by rememberSaveable { mutableStateOf(false) }
    Column(
        verticalArrangement = Arrangement.SpaceBetween,
        modifier = Modifier
            .wrapContentHeight()
    ) {
        Row(
            horizontalArrangement = Arrangement.spacedBy(5.dp),
            verticalAlignment = Alignment.CenterVertically,
            modifier = Modifier
                .fillMaxWidth()
                .wrapContentHeight()
        ) {
            Text(
                text = modelState.modelConfig.modelId,
                textAlign = TextAlign.Left,
                modifier = Modifier
                    .wrapContentHeight()
                    .weight(8f)
            )
            Divider(
                modifier = Modifier
                    .height(20.dp)
                    .width(1.dp)
            )
            if (modelState.modelInitState.value == ModelInitState.Paused) {
                IconButton(
                    onClick = { modelState.handleStart() }, modifier = Modifier
                        .aspectRatio(1f)
                        .weight(1f)
                ) {
                    Icon(
                        imageVector = Icons.Outlined.Download,
                        contentDescription = "start downloading",
                    )
                }

            } else if (modelState.modelInitState.value == ModelInitState.Downloading) {
                IconButton(
                    onClick = { modelState.handlePause() }, modifier = Modifier
                        .aspectRatio(1f)
                        .weight(1f)
                ) {
                    Icon(
                        imageVector = Icons.Outlined.Pause,
                        contentDescription = "pause downloading",
                    )
                }
            } else if (modelState.modelInitState.value == ModelInitState.Finished) {
                IconButton(
                    onClick = {
                        modelState.startChat()
                        navController.navigate("chat")
                    },
                    enabled = appViewModel.chatState.interruptable(),
                    modifier = Modifier
                        .aspectRatio(1f)
                        .weight(1f)
                ) {
                    Icon(
                        imageVector = Icons.Outlined.Chat,
                        contentDescription = "start chatting",
                    )
                }
            } else {
                IconButton(
                    enabled = false, onClick = {}, modifier = Modifier
                        .aspectRatio(1f)
                        .weight(1f)
                ) {
                    Icon(
                        imageVector = Icons.Outlined.Schedule,
                        contentDescription = "pending",
                    )
                }
            }
            if (modelState.modelInitState.value == ModelInitState.Downloading ||
                modelState.modelInitState.value == ModelInitState.Paused ||
                modelState.modelInitState.value == ModelInitState.Finished
            ) {
                IconButton(
                    onClick = { isDeletingModel = true },
                    modifier = Modifier
                        .aspectRatio(1f)
                        .weight(1f)
                ) {
                    Icon(
                        imageVector = Icons.Outlined.Delete,
                        contentDescription = "start downloading",
                        tint = MaterialTheme.colorScheme.error
                    )
                }
            }
        }
        LinearProgressIndicator(
            progress = modelState.progress.value.toFloat() / modelState.total.value,
            modifier = Modifier.fillMaxWidth()
        )
        if (isDeletingModel) {
            Row(
                horizontalArrangement = Arrangement.End,
                verticalAlignment = Alignment.CenterVertically,
                modifier = Modifier
                    .fillMaxWidth()
                    .wrapContentHeight()
            ) {
                TextButton(onClick = { isDeletingModel = false }) {
                    Text(text = "cancel")
                }
                TextButton(onClick = {
                    isDeletingModel = false
                    modelState.handleClear()
                }) {
                    Text(text = "clear data", color = MaterialTheme.colorScheme.error)
                }
                TextButton(onClick = {
                    isDeletingModel = false
                    modelState.handleDelete()
                }) {
                    Text(text = "delete model", color = MaterialTheme.colorScheme.error)
                }
            }
        }
    }
}


================================================
FILE: android/MLCChat/app/src/main/java/ai/mlc/mlcchat/ui/theme/Color.kt
================================================
package ai.mlc.mlcchat.ui.theme

import androidx.compose.ui.graphics.Color

val Blue10 = Color(0xFF000F5E)
val Blue20 = Color(0xFF001E92)
val Blue30 = Color(0xFF002ECC)
val Blue40 = Color(0xFF1546F6)
val Blue80 = Color(0xFFB8C3FF)
val Blue90 = Color(0xFFDDE1FF)

val DarkBlue10 = Color(0xFF00036B)
val DarkBlue20 = Color(0xFF000BA6)
val DarkBlue30 = Color(0xFF1026D3)
val DarkBlue40 = Color(0xFF3648EA)
val DarkBlue80 = Color(0xFFBBC2FF)
val DarkBlue90 = Color(0xFFDEE0FF)

val Yellow10 = Color(0xFF261900)
val Yellow20 = Color(0xFF402D00)
val Yellow30 = Color(0xFF5C4200)
val Yellow40 = Color(0xFF7A5900)
val Yellow80 = Color(0xFFFABD1B)
val Yellow90 = Color(0xFFFFDE9C)

val Red10 = Color(0xFF410001)
val Red20 = Color(0xFF680003)
val Red30 = Color(0xFF930006)
val Red40 = Color(0xFFBA1B1B)
val Red80 = Color(0xFFFFB4A9)
val Red90 = Color(0xFFFFDAD4)

val Grey10 = Color(0xFF191C1D)
val Grey20 = Color(0xFF2D3132)
val Grey80 = Color(0xFFC4C7C7)
val Grey90 = Color(0xFFE0E3E3)
val Grey95 = Color(0xFFEFF1F1)
val Grey99 = Color(0xFFFBFDFD)

val BlueGrey30 = Color(0xFF45464F)
val BlueGrey50 = Color(0xFF767680)
val BlueGrey60 = Color(0xFF90909A)
val BlueGrey80 = Color(0xFFC6C5D0)
val BlueGrey90 = Color(0xFFE2E1EC)


================================================
FILE: android/MLCChat/app/src/main/java/ai/mlc/mlcchat/ui/theme/Theme.kt
================================================
package ai.mlc.mlcchat.ui.theme

import android.app.Activity
import android.os.Build
import androidx.compose.foundation.isSystemInDarkTheme
import androidx.compose.material3.MaterialTheme
import androidx.compose.material3.darkColorScheme
import androidx.compose.material3.dynamicDarkColorScheme
import androidx.compose.material3.dynamicLightColorScheme
import androidx.compose.material3.lightColorScheme
import androidx.compose.runtime.Composable
import androidx.compose.runtime.SideEffect
import androidx.compose.ui.graphics.Color
import androidx.compose.ui.graphics.toArgb
import androidx.compose.ui.platform.LocalContext
import androidx.compose.ui.platform.LocalView
import androidx.core.view.WindowCompat

private val DarkColorScheme = darkColorScheme(
    primary = Blue80,
    onPrimary = Blue20,
    primaryContainer = Blue30,
    onPrimaryContainer = Blue90,
    inversePrimary = Blue40,
    secondary = DarkBlue80,
    onSecondary = DarkBlue20,
    secondaryContainer = DarkBlue30,
    onSecondaryContainer = DarkBlue90,
    tertiary = Yellow80,
    onTertiary = Yellow20,
    tertiaryContainer = Yellow30,
    onTertiaryContainer = Yellow90,
    error = Red80,
    onError = Red20,
    errorContainer = Red30,
    onErrorContainer = Red90,
    background = Grey10,
    onBackground = Grey90,
    surface = Grey10,
    onSurface = Grey80,
    inverseSurface = Grey90,
    inverseOnSurface = Grey20,
    surfaceVariant = BlueGrey30,
    onSurfaceVariant = BlueGrey80,
    outline = BlueGrey60
)

private val LightColorScheme = lightColorScheme(
    primary = Blue40,
    onPrimary = Color.White,
    primaryContainer = Blue90,
    onPrimaryContainer = Blue10,
    inversePrimary = Blue80,
    secondary = DarkBlue40,
    onSecondary = Color.White,
    secondaryContainer = DarkBlue90,
    onSecondaryContainer = DarkBlue10,
    tertiary = Yellow40,
    onTertiary = Color.White,
    tertiaryContainer = Yellow90,
    onTertiaryContainer = Yellow10,
    error = Red40,
    onError = Color.White,
    errorContainer = Red90,
    onErrorContainer = Red10,
    background = Grey99,
    onBackground = Grey10,
    surface = Grey99,
    onSurface = Grey10,
    inverseSurface = Grey20,
    inverseOnSurface = Grey95,
    surfaceVariant = BlueGrey90,
    onSurfaceVariant = BlueGrey30,
    outline = BlueGrey50
)

@Composable
fun MLCChatTheme(
    darkTheme: Boolean = isSystemInDarkTheme(),
    // Dynamic color is available on Android 12+
    dynamicColor: Boolean = true,
    content: @Composable () -> Unit
) {
    val colorScheme = when {
        dynamicColor && Build.VERSION.SDK_INT >= Build.VERSION_CODES.S -> {
            val context = LocalContext.current
            if (darkTheme) dynamicDarkColorScheme(context) else dynamicLightColorScheme(context)
        }

        darkTheme -> DarkColorScheme
        else -> LightColorScheme
    }
    val view = LocalView.current
    if (!view.isInEditMode) {
        SideEffect {
            val window = (view.context as Activity).window
            window.statusBarColor = colorScheme.primary.toArgb()
            WindowCompat.getInsetsController(window, view).isAppearanceLightStatusBars = darkTheme
        }
    }

    MaterialTheme(
        colorScheme = colorScheme,
        typography = Typography,
        content = content
    )
}


================================================
FILE: android/MLCChat/app/src/main/java/ai/mlc/mlcchat/ui/theme/Type.kt
================================================
package ai.mlc.mlcchat.ui.theme

import androidx.compose.material3.Typography
import androidx.compose.ui.text.TextStyle
import androidx.compose.ui.text.font.FontFamily
import androidx.compose.ui.text.font.FontWeight
import androidx.compose.ui.unit.sp

// Set of Material typography styles to start with
val Typography = Typography(
    bodyLarge = TextStyle(
        fontFamily = FontFamily.Default,
        fontWeight = FontWeight.Normal,
        fontSize = 16.sp,
        lineHeight = 24.sp,
        letterSpacing = 0.5.sp
    )
    /* Other default text styles to override
    titleLarge = TextStyle(
        fontFamily = FontFamily.Default,
        fontWeight = FontWeight.Normal,
        fontSize = 22.sp,
        lineHeight = 28.sp,
        letterSpacing = 0.sp
    ),
    labelSmall = TextStyle(
        fontFamily = FontFamily.Default,
        fontWeight = FontWeight.Medium,
        fontSize = 11.sp,
        lineHeight = 16.sp,
        letterSpacing = 0.5.sp
    )
    */
)


================================================
FILE: android/MLCChat/app/src/main/res/drawable/ic_android_black_24dp.xml
================================================
<vector android:height="24dp" android:tint="#000000"
    android:viewportHeight="24" android:viewportWidth="24"
    android:width="24dp" xmlns:android="http://schemas.android.com/apk/res/android">
    <path android:fillColor="#FF000000" android:pathData="M17.6,11.48 L19.44,8.3a0.63,0.63 0,0 0,-1.09 -0.63l-1.88,3.24a11.43,11.43 0,0 0,-8.94 0L5.65,7.67a0.63,0.63 0,0 0,-1.09 0.63L6.4,11.48A10.81,10.81 0,0 0,1 20L23,20A10.81,10.81 0,0 0,17.6 11.48ZM7,17.25A1.25,1.25 0,1 1,8.25 16,1.25 1.25,0 0,1 7,17.25ZM17,17.25A1.25,1.25 0,1 1,18.25 16,1.25 1.25,0 0,1 17,17.25Z"/>
</vector>


================================================
FILE: android/MLCChat/app/src/main/res/drawable/mlc_logo_108.xml
================================================
<vector xmlns:android="http://schemas.android.com/apk/res/android"
    android:width="108dp"
    android:height="108dp"
    android:viewportWidth="108"
    android:viewportHeight="108">
  <path
      android:pathData="M100.93,47.91L58.41,47.91C57,47.91 55.82,49.05 55.82,50.5L55.82,69.17C57.54,68.98 59.14,69.2 60.55,70.04L60.55,52.72L98.79,52.72L98.79,103.09L60.55,103.09L60.55,87.29C59.48,88.09 58.26,88.75 57.08,89.28C56.7,89.47 56.27,89.66 55.82,89.81L55.82,105.23C55.82,106.64 56.96,107.82 58.41,107.82L100.93,107.82C102.34,107.82 103.52,106.68 103.52,105.23L103.52,50.5C103.52,49.09 102.34,47.91 100.93,47.91ZM55.93,86.72C52.88,88.13 47.57,89.39 44.29,90.12C40.63,90.92 30.02,93.36 29.1,87.6C28.34,82.87 40.82,77.6 44.02,76.23C46.96,74.97 49.94,73.79 52.92,72.64C56.12,71.42 59.56,70.88 61.16,74.93C61.85,76.68 62.04,78.14 62.07,80L62.07,80.2L62.04,80.39C61.66,83.55 58.57,85.5 55.93,86.72ZM66.58,35.01C68.03,34.9 69.29,35.96 69.4,37.38L69.82,42.3C69.94,43.75 68.87,45.01 67.46,45.13C66.01,45.24 64.75,44.17 64.63,42.76L64.21,37.84C64.06,36.42 65.13,35.16 66.58,35.01ZM85.55,45.96C85.55,43.03 85.39,40.2 85.13,37.53L85.13,37.57C85.01,36.65 84.9,35.74 84.78,34.82L84.78,34.75L84.75,34.59C84.25,31.23 83.6,27.91 82.8,24.55C82,21.2 79.1,19.1 75.7,19.02C69.52,18.87 63.3,18.79 57.11,19.02C56.05,17.07 52.88,15.86 49.18,16.12L44.9,6.5C45.7,5.78 46.13,4.67 46.05,3.53C45.86,1.5 44.1,0.02 42.08,0.21C40.05,0.4 38.57,2.15 38.76,4.18C38.91,6.09 40.52,7.5 42.38,7.5L46.43,16.58C43.76,17.3 41.77,18.79 41.24,20.47C35.4,21.31 29.64,22.46 23.88,23.6C23.19,23.75 22.54,23.95 21.93,24.25C15.44,25.81 8.76,29.36 8.76,29.36C8.69,30.2 8.61,31.04 8.54,31.92C8.84,31.84 9.18,31.8 9.53,31.77C14.6,31.31 19.22,36.54 19.79,43.41C20.4,50.28 16.78,56.23 11.7,56.65C11.32,56.69 10.94,56.69 10.55,56.65C10.79,57.57 11.02,58.44 11.24,59.32C15.48,61.61 21.2,63.18 24.75,63.94C25.59,64.24 26.47,64.43 27.43,64.43C36.13,64.63 44.82,64.74 53.53,63.98L53.87,63.94L53.87,57.57C47.99,58.06 41.96,58.1 32.92,57.91C31.2,57.87 29.94,56.84 29.52,55.35C27.5,48.02 27,40.43 27.46,32.23C27.54,30.66 28.64,29.44 30.36,29.1C32,28.75 33.57,28.45 35.02,28.18C35.71,28.07 37.5,27.72 38.91,27.46L38.95,27.46C40.02,27.27 41.05,27.04 42.12,26.88C45.44,27.46 47.73,32.3 52.69,31.5C57.69,31.43 59.1,26.23 62.3,25.09C64.63,25.09 66.92,25.13 69.25,25.24C70.36,25.24 71.43,25.28 73.14,25.32C74.86,25.36 76.16,26.39 76.54,27.88C76.92,29.52 77.27,31.12 77.57,32.72C78.18,38.22 78.45,42.64 78.41,46.04L85.55,46.04ZM9.79,38.06C11.78,37.88 13.65,40.58 13.95,44.09C14.26,47.61 12.92,50.58 10.94,50.73C9.98,50.81 9.03,50.24 8.3,49.17C8.72,49.51 9.18,49.66 9.64,49.63C11.2,49.48 12.27,47.07 12.01,44.25C11.74,41.42 10.29,39.25 8.72,39.36C8.27,39.4 7.85,39.63 7.5,40.05C8,38.9 8.8,38.18 9.79,38.06ZM52.65,21.88C54.29,21.88 55.59,23.22 55.59,24.82C55.59,26.46 54.25,27.76 52.65,27.76C51.01,27.76 49.71,26.43 49.71,24.82C49.67,23.22 51.01,21.88 52.65,21.88ZM42.31,37.19C43.76,37.07 45.02,38.14 45.13,39.55L45.55,44.48C45.66,45.93 44.6,47.18 43.18,47.3C41.73,47.41 40.48,46.34 40.36,44.93L39.94,40.01C39.79,38.56 40.86,37.3 42.31,37.19ZM9.75,34.06C13.5,33.71 16.97,37.95 17.43,43.52C17.92,49.09 15.29,53.86 11.51,54.17C7.77,54.51 4.3,50.28 3.84,44.7C3.34,39.17 5.98,34.4 9.75,34.06ZM53.91,100.73C49.98,99.7 46.54,97.1 45.02,92.79C47.77,92.18 51.01,91.45 53.91,90.46ZM42.84,73.79L42.19,66.46L53.91,65.85L53.91,69.47C53.26,69.63 52.61,69.86 51.96,70.08C48.95,71.23 45.93,72.45 42.96,73.71ZM29.64,73.59C33.19,71 37.61,71.04 39.52,73.67C39.83,74.09 40.02,74.51 40.17,74.97C35.82,76.91 29.83,80 27.43,83.86C27.12,83.63 26.85,83.32 26.62,83.02C24.71,80.39 26.05,76.15 29.64,73.59ZM78.68,84.28C79.36,84.13 80.09,84.13 80.77,84.28L81.58,82.91L81.92,83.02C82.61,83.29 83.25,83.63 83.83,84.13L84.09,84.36L83.33,85.77C83.56,86.04 83.79,86.3 83.94,86.61C84.13,86.91 84.25,87.22 84.36,87.56L85.96,87.56L86.04,87.94C86.16,88.67 86.16,89.43 86.04,90.16L85.96,90.5L84.36,90.54C84.13,91.23 83.79,91.84 83.33,92.33L84.13,93.71L83.87,93.93C83.56,94.16 83.25,94.39 82.95,94.58C82.64,94.77 82.3,94.93 81.96,95.04L81.61,95.16L80.77,93.78C80.09,93.93 79.36,93.93 78.68,93.78L77.88,95.16L77.53,95.04C76.84,94.77 76.2,94.43 75.63,93.93L75.36,93.71L76.12,92.29C75.89,92.03 75.66,91.76 75.51,91.45C75.32,91.15 75.2,90.84 75.09,90.5L73.48,90.5L73.41,90.12C73.3,89.39 73.3,88.63 73.41,87.91L73.48,87.56L75.09,87.52C75.32,86.84 75.66,86.23 76.12,85.73L75.32,84.36L75.59,84.13C75.89,83.9 76.2,83.67 76.5,83.48C76.8,83.29 77.15,83.13 77.49,83.02L77.84,82.91ZM64.18,57.76L94.36,57.76L94.36,61L64.18,61ZM64.18,64.97L76.39,64.97L76.39,68.21L64.18,68.21ZM64.18,72.34L74.02,72.34L74.02,75.58L64.25,75.58L64.18,75.31ZM90.09,67.49C91,67.79 91.84,68.29 92.57,68.9L94.48,67.79L94.82,68.18C95.47,68.94 96,69.86 96.34,70.81L96.54,71.27L94.67,72.41C94.78,72.87 94.82,73.36 94.82,73.86C94.82,74.36 94.78,74.82 94.67,75.27L96.57,76.38L96.38,76.84C96.04,77.79 95.51,78.67 94.86,79.47L94.55,79.85L92.64,78.79C91.92,79.43 91.08,79.93 90.16,80.23L90.16,82.41L89.67,82.48C89.17,82.56 88.64,82.64 88.14,82.64C87.64,82.64 87.15,82.6 86.65,82.52L86.16,82.45L86.12,80.23C85.2,79.93 84.36,79.43 83.64,78.82L81.73,79.93L81.39,79.55C80.74,78.79 80.2,77.87 79.86,76.91L79.67,76.46L81.54,75.31C81.43,74.85 81.39,74.36 81.39,73.86C81.39,73.36 81.43,72.91 81.54,72.45L79.63,71.34L79.82,70.85C80.16,69.89 80.7,69.02 81.35,68.21L81.65,67.83L83.56,68.9C84.29,68.25 85.13,67.75 86.04,67.45L86.04,65.31L86.54,65.23C87.04,65.16 87.57,65.08 88.06,65.08C88.56,65.08 89.05,65.12 89.55,65.2L90.05,65.27ZM88.06,70.54C86.2,70.54 84.71,72.03 84.71,73.9C84.71,75.77 86.2,77.26 88.06,77.26C89.93,77.26 91.42,75.77 91.42,73.9C91.42,72.03 89.89,70.54 88.06,70.54ZM78.48,86.95C77.3,87.64 76.92,89.13 77.61,90.31C78.29,91.49 79.78,91.88 80.96,91.19C82.15,90.5 82.53,89.01 81.84,87.83C81.16,86.68 79.67,86.26 78.48,86.95ZM78.48,86.95"
      android:fillColor="#062578"
      android:fillType="evenOdd"
      android:strokeColor="#00000000"/>
</vector>


================================================
FILE: android/MLCChat/app/src/main/res/values/colors.xml
================================================
<?xml version="1.0" encoding="utf-8"?>
<resources>
    <color name="purple_200">#FFBB86FC</color>
    <color name="purple_500">#FF6200EE</color>
    <color name="purple_700">#FF3700B3</color>
    <color name="teal_200">#FF03DAC5</color>
    <color name="teal_700">#FF018786</color>
    <color name="black">#FF000000</color>
    <color name="white">#FFFFFFFF</color>
</resources>


================================================
FILE: android/MLCChat/app/src/main/res/values/strings.xml
================================================
<resources>
    <string name="app_name">MLCChat</string>
</resources>


================================================
FILE: android/MLCChat/app/src/main/res/values/themes.xml
================================================
<?xml version="1.0" encoding="utf-8"?>
<resources>

    <style name="Theme.MLCChat" parent="android:Theme.Material.Light" />

</resources>


================================================
FILE: android/MLCChat/app/src/main/res/xml/backup_rules.xml
================================================
<?xml version="1.0" encoding="utf-8"?><!--
   Sample backup rules file; uncomment and customize as necessary.
   See https://developer.android.com/guide/topics/data/autobackup
   for details.
   Note: This file is ignored for devices older that API 31
   See https://developer.android.com/about/versions/12/backup-restore
-->
<full-backup-content>
    <!--
   <include domain="sharedpref" path="."/>
   <exclude domain="sharedpref" path="device.xml"/>
-->
</full-backup-content>


================================================
FILE: android/MLCChat/app/src/main/res/xml/data_extraction_rules.xml
================================================
<?xml version="1.0" encoding="utf-8"?><!--
   Sample data extraction rules file; uncomment and customize as necessary.
   See https://developer.android.com/about/versions/12/backup-restore#xml-changes
   for details.
-->
<data-extraction-rules>
    <cloud-backup>
        <!-- TODO: Use <include> and <exclude> to control what is backed up.
        <include .../>
        <exclude .../>
        -->
    </cloud-backup>
    <!--
    <device-transfer>
        <include .../>
        <exclude .../>
    </device-transfer>
    -->
</data-extraction-rules>


================================================
FILE: android/MLCChat/build.gradle
================================================
plugins {
    id 'com.android.application' version '8.2.0' apply false
    id 'com.android.library' version '8.2.0' apply false
    id 'org.jetbrains.kotlin.android' version '1.8.10' apply false
}


================================================
FILE: android/MLCChat/bundle_weight.py
================================================
import argparse
import os
import subprocess
from pathlib import Path

from mlc_llm.support import logging

logging.enable_logging()
logger = logging.getLogger(__name__)


def main(apk_path: Path, package_output_path: Path):
    """Push weights to the android device with adb"""
    # - Install the apk on device.
    logger.info('Install apk "%s" to device', str(apk_path.absolute()))
    subprocess.run(["adb", "install", str(apk_path)], check=True, env=os.environ)
    # - Create the weight directory for the app.
    device_weihgt_dir = "/storage/emulated/0/Android/data/ai.mlc.mlcchat/files/"
    logger.info('Creating directory "%s" on device', device_weihgt_dir)
    subprocess.run(
        ["adb", "shell", "mkdir", "-p", device_weihgt_dir],
        check=True,
        env=os.environ,
    )
    for model_weight_dir in (package_output_path / "bundle").iterdir():
        if model_weight_dir.is_dir():
            src_path = str(model_weight_dir.absolute())
            dst_path = "/data/local/tmp/" + model_weight_dir.name
            logger.info('Pushing local weights "%s" to device location "%s"', src_path, dst_path)
            subprocess.run(["adb", "push", src_path, dst_path], check=True, env=os.environ)

            src_path = dst_path
            dst_path = "/storage/emulated/0/Android/data/ai.mlc.mlcchat/files/"
            logger.info('Move weights from "%s" to "%s"', src_path, dst_path)
            subprocess.run(["adb", "shell", "mv", src_path, dst_path], check=True, env=os.environ)
    logger.info("All finished.")


if __name__ == "__main__":
    parser = argparse.ArgumentParser("MLC LLM Android Weight Bundle")

    def _parse_apk_path(path: str) -> Path:
        path = Path(path)
        if not path.exists():
            raise ValueError(
                f"Path {str(path)} is expected to be an apk file, but the file does not exist."
            )
        if not path.is_file():
            raise ValueError(f"Path {str(path)} is expected to be an apk file.")
        return path

    parser.add_argument(
        "--apk-path",
        type=_parse_apk_path,
        default="app/release/app-release.apk",
        help="The path to generated MLCChat apk file.",
    )
    parser.add_argument(
        "--package-output-path",
        type=Path,
        default="dist",
        help='The path to the output directory of "mlc_llm package".',
    )
    args = parser.parse_args()
    main(args.apk_path, args.package_output_path)


================================================
FILE: android/MLCChat/gradle/wrapper/gradle-wrapper.properties
================================================
#Thu Jan 25 10:19:50 EST 2024
distributionBase=GRADLE_USER_HOME
distributionPath=wrapper/dists
distributionUrl=https\://services.gradle.org/distributions/gradle-8.5-bin.zip
zipStoreBase=GRADLE_USER_HOME
zipStorePath=wrapper/dists


================================================
FILE: android/MLCChat/gradle.properties
================================================
# Project-wide Gradle settings.
# IDE (e.g. Android Studio) users:
# Gradle settings configured through the IDE *will override*
# any settings specified in this file.
# For more details on how to configure your build environment visit
# http://www.gradle.org/docs/current/userguide/build_environment.html
# Specifies the JVM arguments used for the daemon process.
# The setting is particularly useful for tweaking memory settings.
org.gradle.jvmargs=-Xmx2048m -Dfile.encoding=UTF-8
# When configured, Gradle will run in incubating parallel mode.
# This option should only be used with decoupled projects. More details, visit
# http://www.gradle.org/docs/current/userguide/multi_project_builds.html#sec:decoupled_projects
# org.gradle.parallel=true
# AndroidX package structure to make it clearer which packages are bundled with the
# Android operating system, and which are packaged with your app's APK
# https://developer.android.com/topic/libraries/support-library/androidx-rn
android.useAndroidX=true
# Kotlin code style for this project: "official" or "obsolete":
kotlin.code.style=official
# Enables namespacing of each library's R class so that its R class includes only the
# resources declared in the library itself and none from the library's dependencies,
# thereby reducing the size of the R class for that library
android.nonTransitiveRClass=true


================================================
FILE: android/MLCChat/gradlew
================================================
#!/usr/bin/env sh

#
# Copyright 2015 the original author or authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

##############################################################################
##
##  Gradle start up script for UN*X
##
##############################################################################

# Attempt to set APP_HOME
# Resolve links: $0 may be a link
PRG="$0"
# Need this for relative symlinks.
while [ -h "$PRG" ] ; do
    ls=`ls -ld "$PRG"`
    link=`expr "$ls" : '.*-> \(.*\)$'`
    if expr "$link" : '/.*' > /dev/null; then
        PRG="$link"
    else
        PRG=`dirname "$PRG"`"/$link"
    fi
done
SAVED="`pwd`"
cd "`dirname \"$PRG\"`/" >/dev/null
APP_HOME="`pwd -P`"
cd "$SAVED" >/dev/null

APP_NAME="Gradle"
APP_BASE_NAME=`basename "$0"`

# Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
DEFAULT_JVM_OPTS='"-Xmx64m" "-Xms64m"'

# Use the maximum available, or set MAX_FD != -1 to use that value.
MAX_FD="maximum"

warn () {
    echo "$*"
}

die () {
    echo
    echo "$*"
    echo
    exit 1
}

# OS specific support (must be 'true' or 'false').
cygwin=false
msys=false
darwin=false
nonstop=false
case "`uname`" in
  CYGWIN* )
    cygwin=true
    ;;
  Darwin* )
    darwin=true
    ;;
  MINGW* )
    msys=true
    ;;
  NONSTOP* )
    nonstop=true
    ;;
esac

CLASSPATH=$APP_HOME/gradle/wrapper/gradle-wrapper.jar


# Determine the Java command to use to start the JVM.
if [ -n "$JAVA_HOME" ] ; then
    if [ -x "$JAVA_HOME/jre/sh/java" ] ; then
        # IBM's JDK on AIX uses strange locations for the executables
        JAVACMD="$JAVA_HOME/jre/sh/java"
    else
        JAVACMD="$JAVA_HOME/bin/java"
    fi
    if [ ! -x "$JAVACMD" ] ; then
        die "ERROR: JAVA_HOME is set to an invalid directory: $JAVA_HOME

Please set the JAVA_HOME variable in your environment to match the
location of your Java installation."
    fi
else
    JAVACMD="java"
    which java >/dev/null 2>&1 || die "ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.

Please set the JAVA_HOME variable in your environment to match the
location of your Java installation."
fi

# Increase the maximum file descriptors if we can.
if [ "$cygwin" = "false" -a "$darwin" = "false" -a "$nonstop" = "false" ] ; then
    MAX_FD_LIMIT=`ulimit -H -n`
    if [ $? -eq 0 ] ; then
        if [ "$MAX_FD" = "maximum" -o "$MAX_FD" = "max" ] ; then
            MAX_FD="$MAX_FD_LIMIT"
        fi
        ulimit -n $MAX_FD
        if [ $? -ne 0 ] ; then
            warn "Could not set maximum file descriptor limit: $MAX_FD"
        fi
    else
        warn "Could not query maximum file descriptor limit: $MAX_FD_LIMIT"
    fi
fi

# For Darwin, add options to specify how the application appears in the dock
if $darwin; then
    GRADLE_OPTS="$GRADLE_OPTS \"-Xdock:name=$APP_NAME\" \"-Xdock:icon=$APP_HOME/media/gradle.icns\""
fi

# For Cygwin or MSYS, switch paths to Windows format before running java
if [ "$cygwin" = "true" -o "$msys" = "true" ] ; then
    APP_HOME=`cygpath --path --mixed "$APP_HOME"`
    CLASSPATH=`cygpath --path --mixed "$CLASSPATH"`

    JAVACMD=`cygpath --unix "$JAVACMD"`

    # We build the pattern for arguments to be converted via cygpath
    ROOTDIRSRAW=`find -L / -maxdepth 1 -mindepth 1 -type d 2>/dev/null`
    SEP=""
    for dir in $ROOTDIRSRAW ; do
        ROOTDIRS="$ROOTDIRS$SEP$dir"
        SEP="|"
    done
    OURCYGPATTERN="(^($ROOTDIRS))"
    # Add a user-defined pattern to the cygpath arguments
    if [ "$GRADLE_CYGPATTERN" != "" ] ; then
        OURCYGPATTERN="$OURCYGPATTERN|($GRADLE_CYGPATTERN)"
    fi
    # Now convert the arguments - kludge to limit ourselves to /bin/sh
    i=0
    for arg in "$@" ; do
        CHECK=`echo "$arg"|egrep -c "$OURCYGPATTERN" -`
        CHECK2=`echo "$arg"|egrep -c "^-"`                                 ### Determine if an option

        if [ $CHECK -ne 0 ] && [ $CHECK2 -eq 0 ] ; then                    ### Added a condition
            eval `echo args$i`=`cygpath --path --ignore --mixed "$arg"`
        else
            eval `echo args$i`="\"$arg\""
        fi
        i=`expr $i + 1`
    done
    case $i in
        0) set -- ;;
        1) set -- "$args0" ;;
        2) set -- "$args0" "$args1" ;;
        3) set -- "$args0" "$args1" "$args2" ;;
        4) set -- "$args0" "$args1" "$args2" "$args3" ;;
        5) set -- "$args0" "$args1" "$args2" "$args3" "$args4" ;;
        6) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" ;;
        7) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" ;;
        8) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" ;;
        9) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" "$args8" ;;
    esac
fi

# Escape application args
save () {
    for i do printf %s\\n "$i" | sed "s/'/'\\\\''/g;1s/^/'/;\$s/\$/' \\\\/" ; done
    echo " "
}
APP_ARGS=`save "$@"`

# Collect all arguments for the java command, following the shell quoting and substitution rules
eval set -- $DEFAULT_JVM_OPTS $JAVA_OPTS $GRADLE_OPTS "\"-Dorg.gradle.appname=$APP_BASE_NAME\"" -classpath "\"$CLASSPATH\"" org.gradle.wrapper.GradleWrapperMain "$APP_ARGS"

exec "$JAVACMD" "$@"


================================================
FILE: android/MLCChat/gradlew.bat
================================================
@rem
@rem Copyright 2015 the original author or authors.
@rem
@rem Licensed under the Apache License, Version 2.0 (the "License");
@rem you may not use this file except in compliance with the License.
@rem You may obtain a copy of the License at
@rem
@rem      https://www.apache.org/licenses/LICENSE-2.0
@rem
@rem Unless required by applicable law or agreed to in writing, software
@rem distributed under the License is distributed on an "AS IS" BASIS,
@rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@rem See the License for the specific language governing permissions and
@rem limitations under the License.
@rem

@if "%DEBUG%" == "" @echo off
@rem ##########################################################################
@rem
@rem  Gradle startup script for Windows
@rem
@rem ##########################################################################

@rem Set local scope for the variables with windows NT shell
if "%OS%"=="Windows_NT" setlocal

set DIRNAME=%~dp0
if "%DIRNAME%" == "" set DIRNAME=.
set APP_BASE_NAME=%~n0
set APP_HOME=%DIRNAME%

@rem Resolve any "." and ".." in APP_HOME to make it shorter.
for %%i in ("%APP_HOME%") do set APP_HOME=%%~fi

@rem Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
set DEFAULT_JVM_OPTS="-Xmx64m" "-Xms64m"

@rem Find java.exe
if defined JAVA_HOME goto findJavaFromJavaHome

set JAVA_EXE=java.exe
%JAVA_EXE% -version >NUL 2>&1
if "%ERRORLEVEL%" == "0" goto execute

echo.
echo ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.
echo.
echo Please set the JAVA_HOME variable in your environment to match the
echo location of your Java installation.

goto fail

:findJavaFromJavaHome
set JAVA_HOME=%JAVA_HOME:"=%
set JAVA_EXE=%JAVA_HOME%/bin/java.exe

if exist "%JAVA_EXE%" goto execute

echo.
echo ERROR: JAVA_HOME is set to an invalid directory: %JAVA_HOME%
echo.
echo Please set the JAVA_HOME variable in your environment to match the
echo location of your Java installation.

goto fail

:execute
@rem Setup the command line

set CLASSPATH=%APP_HOME%\gradle\wrapper\gradle-wrapper.jar


@rem Execute Gradle
"%JAVA_EXE%" %DEFAULT_JVM_OPTS% %JAVA_OPTS% %GRADLE_OPTS% "-Dorg.gradle.appname=%APP_BASE_NAME%" -classpath "%CLASSPATH%" org.gradle.wrapper.GradleWrapperMain %*

:end
@rem End local scope for the variables with windows NT shell
if "%ERRORLEVEL%"=="0" goto mainEnd

:fail
rem Set variable GRADLE_EXIT_CONSOLE if you need the _script_ return code instead of
rem the _cmd.exe /c_ return code!
if  not "" == "%GRADLE_EXIT_CONSOLE%" exit 1
exit /b 1

:mainEnd
if "%OS%"=="Windows_NT" endlocal

:omega


================================================
FILE: android/MLCChat/mlc-package-config.json
================================================
{
    "device": "android",
    "model_list": [
        {
            "model": "HF://mlc-ai/Phi-3.5-mini-instruct-q4f16_0-MLC",
            "estimated_vram_bytes": 4250586449,
            "model_id": "Phi-3.5-mini-instruct-q4f16_0-MLC",
            "overrides": {
                "prefill_chunk_size": 128
            }
        },
        {
            "model": "HF://mlc-ai/Qwen3-0.6B-q0f16-MLC",
            "model_id": "Qwen3-0.6B-q0f16-MLC",
            "estimated_vram_bytes": 3000000000,
            "overrides": {
                "prefill_chunk_size": 128,
                "context_window_size": 2048
            }
        },
        {
            "model": "HF://mlc-ai/Qwen3-1.7B-q4f16_1-MLC",
            "model_id": "Qwen3-1.7B-q4f16_1-MLC",
            "estimated_vram_bytes": 3000000000,
            "overrides": {
                "prefill_chunk_size": 128,
                "context_window_size": 2048
            }
        },
        {
            "model": "HF://mlc-ai/gemma-2-2b-it-q4f16_1-MLC",
            "model_id": "gemma-2-2b-it-q4f16_1-MLC",
            "estimated_vram_bytes": 3000000000
        },
        {
            "model": "HF://mlc-ai/Llama-3.2-3B-Instruct-q4f16_0-MLC",
            "estimated_vram_bytes": 4679979417,
            "model_id": "Llama-3.2-3B-Instruct-q4f16_0-MLC"
        },
        {
            "model": "HF://mlc-ai/Mistral-7B-Instruct-v0.3-q4f16_1-MLC",
            "estimated_vram_bytes": 4115131883,
            "model_id": "Mistral-7B-Instruct-v0.3-q4f16_1-MLC",
            "overrides": {
                "sliding_window_size": 768,
                "prefill_chunk_size": 256
            }
        }
    ]
}


================================================
FILE: android/MLCChat/settings.gradle
================================================
pluginManagement {
    repositories {
        google()
        mavenCentral()
        gradlePluginPortal()
    }
}
dependencyResolutionManagement {
    repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS)
    repositories {
        google()
        mavenCentral()
        maven { url "https://jitpack.io" }
    }
}
rootProject.name = "MLCChat"
include ':app'
include ':mlc4j'
project(':mlc4j').projectDir = file('dist/lib/mlc4j')
include ':mlcengineexample'


================================================
FILE: android/MLCEngineExample/README.md
================================================
# MLC-LLM Android

Checkout [Documentation page](https://llm.mlc.ai/docs/deploy/android.html) for more information.

- run `mlc_llm package`
- open this `MLCEngineExample/` folder as a project in Android Studio


================================================
FILE: android/MLCEngineExample/app/.gitignore
================================================
/build
/src/main/libs


================================================
FILE: android/MLCEngineExample/app/build.gradle
================================================
plugins {
    id 'com.android.application'
    id 'org.jetbrains.kotlin.android'
}

android {
    namespace 'ai.mlc.mlcengineexample'
    compileSdk 34

    defaultConfig {
        applicationId "ai.mlc.mlcengineexample"
        minSdk 26
        targetSdk 33
        versionCode 1
        versionName "1.0"

        testInstrumentationRunner "androidx.test.runner.AndroidJUnitRunner"
        vectorDrawables {
            useSupportLibrary true
        }
    }

    buildTypes {
        release {
            minifyEnabled false
            proguardFiles getDefaultProguardFile('proguard-android-optimize.txt'), 'proguard-rules.pro'
        }
    }
    compileOptions {
        sourceCompatibility JavaVersion.VERSION_1_8
        targetCompatibility JavaVersion.VERSION_1_8
    }
    kotlinOptions {
        jvmTarget = '1.8'
    }
    buildFeatures {
        compose true
    }
    composeOptions {
        kotlinCompilerExtensionVersion '1.4.3'
    }
    packagingOptions {
        resources {
            excludes += '/META-INF/{AL2.0,LGPL2.1}'
        }
    }
}

dependencies {
    implementation project(":mlc4j")
    implementation 'androidx.core:core-ktx:1.10.1'
    implementation 'androidx.lifecycle:lifecycle-runtime-ktx:2.6.1'
    implementation 'androidx.activity:activity-compose:1.7.1'
    implementation platform('androidx.compose:compose-bom:2022.10.00')
    implementation 'androidx.lifecycle:lifecycle-viewmodel-compose:2.6.1'
    implementation 'androidx.compose.ui:ui'
    implementation 'androidx.compose.ui:ui-graphics'
    implementation 'androidx.compose.ui:ui-tooling-preview'
    implementation 'androidx.compose.material3:material3:1.1.0'
    implementation 'androidx.compose.material:material-icons-extended'
    implementation 'androidx.appcompat:appcompat:1.6.1'
    implementation 'androidx.navigation:navigation-compose:2.5.3'
    implementation 'com.google.code.gson:gson:2.10.1'
    implementation fileTree(dir: 'src/main/libs', include: ['*.aar', '*.jar'], exclude: [])
    testImplementation 'junit:junit:4.13.2'
    androidTestImplementation 'androidx.test.ext:junit:1.1.5'
    androidTestImplementation 'androidx.test.espresso:espresso-core:3.5.1'
    androidTestImplementation platform('androidx.compose:compose-bom:2022.10.00')
    androidTestImplementation 'androidx.compose.ui:ui-test-junit4'
    debugImplementation 'androidx.compose.ui:ui-tooling'
    debugImplementation 'androidx.compose.ui:ui-test-manifest'

}


================================================
FILE: android/MLCEngineExample/app/proguard-rules.pro
================================================
# Add project specific ProGuard rules here.
# You can control the set of applied configuration files using the
# proguardFiles setting in build.gradle.
#
# For more details, see
#   http://developer.android.com/guide/developing/tools/proguard.html

# If your project uses WebView with JS, uncomment the following
# and specify the fully qualified class name to the JavaScript interface
# class:
#-keepclassmembers class fqcn.of.javascript.interface.for.webview {
#   public *;
#}

# Uncomment this to preserve the line number information for
# debugging stack traces.
#-keepattributes SourceFile,LineNumberTable

# If you keep the line number information, uncomment this to
# hide the original source file name.
#-renamesourcefileattribute SourceFile


================================================
FILE: android/MLCEngineExample/app/src/main/AndroidManifest.xml
================================================
<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:tools="http://schemas.android.com/tools"
    package="ai.mlc.mlcengineexample">

    <uses-permission android:name="android.permission.INTERNET" />
    <uses-permission
        android:name="android.permission.WRITE_EXTERNAL_STORAGE"
        android:maxSdkVersion="32"
        tools:ignore="ScopedStorage" />

    <application
        android:allowBackup="true"
        android:dataExtractionRules="@xml/data_extraction_rules"
        android:fullBackupContent="@xml/backup_rules"
        android:icon="@drawable/mlc_logo_108"
        android:label="@string/app_name"
        android:roundIcon="@drawable/mlc_logo_108"
        android:supportsRtl="true"
        android:theme="@style/Theme.MLCEngineExample"
        tools:targetApi="31">
        <uses-native-library
            android:name="libOpenCL.so"
            android:required="false"/>

        <uses-native-library
            android:name="libOpenCL-pixel.so"
            android:required="false" />
        <activity
            android:name=".MainActivity"
            android:exported="true"
            android:label="@string/app_name"
            android:theme="@android:style/Theme.Material.NoActionBar">
            <intent-filter>
                <action android:name="android.intent.action.MAIN" />
                <category android:name="android.intent.category.LAUNCHER" />
            </intent-filter>
        </activity>
    </application>

</manifest>


================================================
FILE: android/MLCEngineExample/app/src/main/java/ai/mlc/mlcengineexample/MainActivity.kt
================================================
package ai.mlc.mlcengineexample

import ai.mlc.mlcengineexample.ui.theme.MLCEngineExampleTheme
import ai.mlc.mlcllm.MLCEngine
import ai.mlc.mlcllm.OpenAIProtocol
import ai.mlc.mlcllm.OpenAIProtocol.*
import android.annotation.SuppressLint
import android.os.Bundle
import android.util.Log
import androidx.activity.ComponentActivity
import androidx.activity.compose.setContent
import androidx.compose.foundation.layout.fillMaxSize
import androidx.compose.material3.ExperimentalMaterial3Api
import androidx.compose.material3.Surface
import androidx.compose.material3.Text
import androidx.compose.runtime.mutableStateOf
import androidx.compose.runtime.remember
import androidx.compose.runtime.rememberCoroutineScope
import androidx.compose.ui.Modifier
import kotlinx.coroutines.GlobalScope
import kotlinx.coroutines.channels.ReceiveChannel
import kotlinx.coroutines.launch
import java.io.File


class MainActivity : ComponentActivity() {
    @SuppressLint("CoroutineCreationDuringComposition")
    @ExperimentalMaterial3Api
    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)

        val modelName = "phi-2-q4f16_1-MLC"
        var modelPath = File(application.getExternalFilesDir(""), modelName).toString()
        Log.i("MLC", "model path: $modelPath")
        // need to be changed to the custom system lib prefix used while compiling the model
        val modelLib = "phi_msft_q4f16_1_686d8979c6ebf05d142d9081f1b87162"
        Log.i("MLC", "engine loaded")

        setContent {
            val responseText = remember { mutableStateOf("") }
            val coroutineScope = rememberCoroutineScope()
            val engine = MLCEngine()
            engine.unload()
            engine.reload(modelPath, modelLib)
            coroutineScope.launch {
                var channel = engine.chat.completions.create(
                    messages = listOf(
                        ChatCompletionMessage(
                            role = OpenAIProtocol.ChatCompletionRole.user,
                            content = "What is the meaning of life?"
                        )
                    ),
                    stream_options = OpenAIProtocol.StreamOptions(include_usage = true)
                )


                for (response in channel) {
                    val finalusage = response.usage
                    if (finalusage != null) {
                        responseText.value += "\n" + (finalusage.extra?.asTextLabel() ?: "")
                    } else {
                        if (response.choices.size > 0) {
                            responseText.value += response.choices[0].delta.content?.asText()
                                .orEmpty()
                        }
                    }

                }
            }

            Surface(
                modifier = Modifier
                    .fillMaxSize()
            ) {
                MLCEngineExampleTheme {
                    Text(text = responseText.value)
                }
            }
        }
    }
}


================================================
FILE: android/MLCEngineExample/app/src/main/java/ai/mlc/mlcengineexample/ui/theme/Color.kt
================================================
package ai.mlc.mlcengineexample.ui.theme

import androidx.compose.ui.graphics.Color

val Blue10 = Color(0xFF000F5E)
val Blue20 = Color(0xFF001E92)
val Blue30 = Color(0xFF002ECC)
val Blue40 = Color(0xFF1546F6)
val Blue80 = Color(0xFFB8C3FF)
val Blue90 = Color(0xFFDDE1FF)

val DarkBlue10 = Color(0xFF00036B)
val DarkBlue20 = Color(0xFF000BA6)
val DarkBlue30 = Color(0xFF1026D3)
val DarkBlue40 = Color(0xFF3648EA)
val DarkBlue80 = Color(0xFFBBC2FF)
val DarkBlue90 = Color(0xFFDEE0FF)

val Yellow10 = Color(0xFF261900)
val Yellow20 = Color(0xFF402D00)
val Yellow30 = Color(0xFF5C4200)
val Yellow40 = Color(0xFF7A5900)
val Yellow80 = Color(0xFFFABD1B)
val Yellow90 = Color(0xFFFFDE9C)

val Red10 = Color(0xFF410001)
val Red20 = Color(0xFF680003)
val Red30 = Color(0xFF930006)
val Red40 = Color(0xFFBA1B1B)
val Red80 = Color(0xFFFFB4A9)
val Red90 = Color(0xFFFFDAD4)

val Grey10 = Color(0xFF191C1D)
val Grey20 = Color(0xFF2D3132)
val Grey80 = Color(0xFFC4C7C7)
val Grey90 = Color(0xFFE0E3E3)
val Grey95 = Color(0xFFEFF1F1)
val Grey99 = Color(0xFFFBFDFD)

val BlueGrey30 = Color(0xFF45464F)
val BlueGrey50 = Color(0xFF767680)
val BlueGrey60 = Color(0xFF90909A)
val BlueGrey80 = Color(0xFFC6C5D0)
val BlueGrey90 = Color(0xFFE2E1EC)


================================================
FILE: android/MLCEngineExample/app/src/main/java/ai/mlc/mlcengineexample/ui/theme/Theme.kt
================================================
package ai.mlc.mlcengineexample.ui.theme

import android.app.Activity
import android.os.Build
import androidx.compose.foundation.isSystemInDarkTheme
import androidx.compose.material3.MaterialTheme
import androidx.compose.material3.darkColorScheme
import androidx.compose.material3.dynamicDarkColorScheme
import androidx.compose.material3.dynamicLightColorScheme
import androidx.compose.material3.lightColorScheme
import androidx.compose.runtime.Composable
import androidx.compose.runtime.SideEffect
import androidx.compose.ui.graphics.Color
import androidx.compose.ui.graphics.toArgb
import androidx.compose.ui.platform.LocalContext
import androidx.compose.ui.platform.LocalView
import androidx.core.view.WindowCompat

private val DarkColorScheme = darkColorScheme(
    primary = Blue80,
    onPrimary = Blue20,
    primaryContainer = Blue30,
    onPrimaryContainer = Blue90,
    inversePrimary = Blue40,
    secondary = DarkBlue80,
    onSecondary = DarkBlue20,
    secondaryContainer = DarkBlue30,
    onSecondaryContainer = DarkBlue90,
    tertiary = Yellow80,
    onTertiary = Yellow20,
    tertiaryContainer = Yellow30,
    onTertiaryContainer = Yellow90,
    error = Red80,
    onError = Red20,
    errorContainer = Red30,
    onErrorContainer = Red90,
    background = Grey10,
    onBackground = Grey90,
    surface = Grey10,
    onSurface = Grey80,
    inverseSurface = Grey90,
    inverseOnSurface = Grey20,
    surfaceVariant = BlueGrey30,
    onSurfaceVariant = BlueGrey80,
    outline = BlueGrey60
)

private val LightColorScheme = lightColorScheme(
    primary = Blue40,
    onPrimary = Color.White,
    primaryContainer = Blue90,
    onPrimaryContainer = Blue10,
    inversePrimary = Blue80,
    secondary = DarkBlue40,
    onSecondary = Color.White,
    secondaryContainer = DarkBlue90,
    onSecondaryContainer = DarkBlue10,
    tertiary = Yellow40,
    onTertiary = Color.White,
    tertiaryContainer = Yellow90,
    onTertiaryContainer = Yellow10,
    error = Red40,
    onError = Color.White,
    errorContainer = Red90,
    onErrorContainer = Red10,
    background = Grey99,
    onBackground = Grey10,
    surface = Grey99,
    onSurface = Grey10,
    inverseSurface = Grey20,
    inverseOnSurface = Grey95,
    surfaceVariant = BlueGrey90,
    onSurfaceVariant = BlueGrey30,
    outline = BlueGrey50
)

@Composable
fun MLCEngineExampleTheme(
    darkTheme: Boolean = isSystemInDarkTheme(),
    // Dynamic color is available on Android 12+
    dynamicColor: Boolean = true,
    content: @Composable () -> Unit
) {
    val colorScheme = when {
        dynamicColor && Build.VERSION.SDK_INT >= Build.VERSION_CODES.S -> {
            val context = LocalContext.current
            if (darkTheme) dynamicDarkColorScheme(context) else dynamicLightColorScheme(context)
        }

        darkTheme -> DarkColorScheme
        else -> LightColorScheme
    }
    val view = LocalView.current
    if (!view.isInEditMode) {
        SideEffect {
            val window = (view.context as Activity).window
            window.statusBarColor = colorScheme.primary.toArgb()
            WindowCompat.getInsetsController(window, view).isAppearanceLightStatusBars = darkTheme
        }
    }

    MaterialTheme(
        colorScheme = colorScheme,
        typography = Typography,
        content = content
    )
}


================================================
FILE: android/MLCEngineExample/app/src/main/java/ai/mlc/mlcengineexample/ui/theme/Type.kt
================================================
package ai.mlc.mlcengineexample.ui.theme

import androidx.compose.material3.Typography
import androidx.compose.ui.text.TextStyle
import androidx.compose.ui.text.font.FontFamily
import androidx.compose.ui.text.font.FontWeight
import androidx.compose.ui.unit.sp

// Set of Material typography styles to start with
val Typography = Typography(
    bodyLarge = TextStyle(
        fontFamily = FontFamily.Default,
        fontWeight = FontWeight.Normal,
        fontSize = 16.sp,
        lineHeight = 24.sp,
        letterSpacing = 0.5.sp
    )
    /* Other default text styles to override
    titleLarge = TextStyle(
        fontFamily = FontFamily.Default,
        fontWeight = FontWeight.Normal,
        fontSize = 22.sp,
        lineHeight = 28.sp,
        letterSpacing = 0.sp
    ),
    labelSmall = TextStyle(
        fontFamily = FontFamily.Default,
        fontWeight = FontWeight.Medium,
        fontSize = 11.sp,
        lineHeight = 16.sp,
        letterSpacing = 0.5.sp
    )
    */
)


================================================
FILE: android/MLCEngineExample/app/src/main/res/drawable/ic_android_black_24dp.xml
================================================
<vector android:height="24dp" android:tint="#000000"
    android:viewportHeight="24" android:viewportWidth="24"
    android:width="24dp" xmlns:android="http://schemas.android.com/apk/res/android">
    <path android:fillColor="#FF000000" android:pathData="M17.6,11.48 L19.44,8.3a0.63,0.63 0,0 0,-1.09 -0.63l-1.88,3.24a11.43,11.43 0,0 0,-8.94 0L5.65,7.67a0.63,0.63 0,0 0,-1.09 0.63L6.4,11.48A10.81,10.81 0,0 0,1 20L23,20A10.81,10.81 0,0 0,17.6 11.48ZM7,17.25A1.25,1.25 0,1 1,8.25 16,1.25 1.25,0 0,1 7,17.25ZM17,17.25A1.25,1.25 0,1 1,18.25 16,1.25 1.25,0 0,1 17,17.25Z"/>
</vector>


================================================
FILE: android/MLCEngineExample/app/src/main/res/drawable/mlc_logo_108.xml
================================================
<vector xmlns:android="http://schemas.android.com/apk/res/android"
    android:width="108dp"
    android:height="108dp"
    android:viewportWidth="108"
    android:viewportHeight="108">
  <path
      android:pathData="M100.93,47.91L58.41,47.91C57,47.91 55.82,49.05 55.82,50.5L55.82,69.17C57.54,68.98 59.14,69.2 60.55,70.04L60.55,52.72L98.79,52.72L98.79,103.09L60.55,103.09L60.55,87.29C59.48,88.09 58.26,88.75 57.08,89.28C56.7,89.47 56.27,89.66 55.82,89.81L55.82,105.23C55.82,106.64 56.96,107.82 58.41,107.82L100.93,107.82C102.34,107.82 103.52,106.68 103.52,105.23L103.52,50.5C103.52,49.09 102.34,47.91 100.93,47.91ZM55.93,86.72C52.88,88.13 47.57,89.39 44.29,90.12C40.63,90.92 30.02,93.36 29.1,87.6C28.34,82.87 40.82,77.6 44.02,76.23C46.96,74.97 49.94,73.79 52.92,72.64C56.12,71.42 59.56,70.88 61.16,74.93C61.85,76.68 62.04,78.14 62.07,80L62.07,80.2L62.04,80.39C61.66,83.55 58.57,85.5 55.93,86.72ZM66.58,35.01C68.03,34.9 69.29,35.96 69.4,37.38L69.82,42.3C69.94,43.75 68.87,45.01 67.46,45.13C66.01,45.24 64.75,44.17 64.63,42.76L64.21,37.84C64.06,36.42 65.13,35.16 66.58,35.01ZM85.55,45.96C85.55,43.03 85.39,40.2 85.13,37.53L85.13,37.57C85.01,36.65 84.9,35.74 84.78,34.82L84.78,34.75L84.75,34.59C84.25,31.23 83.6,27.91 82.8,24.55C82,21.2 79.1,19.1 75.7,19.02C69.52,18.87 63.3,18.79 57.11,19.02C56.05,17.07 52.88,15.86 49.18,16.12L44.9,6.5C45.7,5.78 46.13,4.67 46.05,3.53C45.86,1.5 44.1,0.02 42.08,0.21C40.05,0.4 38.57,2.15 38.76,4.18C38.91,6.09 40.52,7.5 42.38,7.5L46.43,16.58C43.76,17.3 41.77,18.79 41.24,20.47C35.4,21.31 29.64,22.46 23.88,23.6C23.19,23.75 22.54,23.95 21.93,24.25C15.44,25.81 8.76,29.36 8.76,29.36C8.69,30.2 8.61,31.04 8.54,31.92C8.84,31.84 9.18,31.8 9.53,31.77C14.6,31.31 19.22,36.54 19.79,43.41C20.4,50.28 16.78,56.23 11.7,56.65C11.32,56.69 10.94,56.69 10.55,56.65C10.79,57.57 11.02,58.44 11.24,59.32C15.48,61.61 21.2,63.18 24.75,63.94C25.59,64.24 26.47,64.43 27.43,64.43C36.13,64.63 44.82,64.74 53.53,63.98L53.87,63.94L53.87,57.57C47.99,58.06 41.96,58.1 32.92,57.91C31.2,57.87 29.94,56.84 29.52,55.35C27.5,48.02 27,40.43 27.46,32.23C27.54,30.66 28.64,29.44 30.36,29.1C32,28.75 33.57,28.45 35.02,28.18C35.71,28.07 37.5,27.72 38.91,27.46L38.95,27.46C40.02,27.27 41.05,27.04 42.12,26.88C45.44,27.46 47.73,32.3 52.69,31.5C57.69,31.43 59.1,26.23 62.3,25.09C64.63,25.09 66.92,25.13 69.25,25.24C70.36,25.24 71.43,25.28 73.14,25.32C74.86,25.36 76.16,26.39 76.54,27.88C76.92,29.52 77.27,31.12 77.57,32.72C78.18,38.22 78.45,42.64 78.41,46.04L85.55,46.04ZM9.79,38.06C11.78,37.88 13.65,40.58 13.95,44.09C14.26,47.61 12.92,50.58 10.94,50.73C9.98,50.81 9.03,50.24 8.3,49.17C8.72,49.51 9.18,49.66 9.64,49.63C11.2,49.48 12.27,47.07 12.01,44.25C11.74,41.42 10.29,39.25 8.72,39.36C8.27,39.4 7.85,39.63 7.5,40.05C8,38.9 8.8,38.18 9.79,38.06ZM52.65,21.88C54.29,21.88 55.59,23.22 55.59,24.82C55.59,26.46 54.25,27.76 52.65,27.76C51.01,27.76 49.71,26.43 49.71,24.82C49.67,23.22 51.01,21.88 52.65,21.88ZM42.31,37.19C43.76,37.07 45.02,38.14 45.13,39.55L45.55,44.48C45.66,45.93 44.6,47.18 43.18,47.3C41.73,47.41 40.48,46.34 40.36,44.93L39.94,40.01C39.79,38.56 40.86,37.3 42.31,37.19ZM9.75,34.06C13.5,33.71 16.97,37.95 17.43,43.52C17.92,49.09 15.29,53.86 11.51,54.17C7.77,54.51 4.3,50.28 3.84,44.7C3.34,39.17 5.98,34.4 9.75,34.06ZM53.91,100.73C49.98,99.7 46.54,97.1 45.02,92.79C47.77,92.18 51.01,91.45 53.91,90.46ZM42.84,73.79L42.19,66.46L53.91,65.85L53.91,69.47C53.26,69.63 52.61,69.86 51.96,70.08C48.95,71.23 45.93,72.45 42.96,73.71ZM29.64,73.59C33.19,71 37.61,71.04 39.52,73.67C39.83,74.09 40.02,74.51 40.17,74.97C35.82,76.91 29.83,80 27.43,83.86C27.12,83.63 26.85,83.32 26.62,83.02C24.71,80.39 26.05,76.15 29.64,73.59ZM78.68,84.28C79.36,84.13 80.09,84.13 80.77,84.28L81.58,82.91L81.92,83.02C82.61,83.29 83.25,83.63 83.83,84.13L84.09,84.36L83.33,85.77C83.56,86.04 83.79,86.3 83.94,86.61C84.13,86.91 84.25,87.22 84.36,87.56L85.96,87.56L86.04,87.94C86.16,88.67 86.16,89.43 86.04,90.16L85.96,90.5L84.36,90.54C84.13,91.23 83.79,91.84 83.33,92.33L84.13,93.71L83.87,93.93C83.56,94.16 83.25,94.39 82.95,94.58C82.64,94.77 82.3,94.93 81.96,95.04L81.61,95.16L80.77,93.78C80.09,93.93 79.36,93.93 78.68,93.78L77.88,95.16L77.53,95.04C76.84,94.77 76.2,94.43 75.63,93.93L75.36,93.71L76.12,92.29C75.89,92.03 75.66,91.76 75.51,91.45C75.32,91.15 75.2,90.84 75.09,90.5L73.48,90.5L73.41,90.12C73.3,89.39 73.3,88.63 73.41,87.91L73.48,87.56L75.09,87.52C75.32,86.84 75.66,86.23 76.12,85.73L75.32,84.36L75.59,84.13C75.89,83.9 76.2,83.67 76.5,83.48C76.8,83.29 77.15,83.13 77.49,83.02L77.84,82.91ZM64.18,57.76L94.36,57.76L94.36,61L64.18,61ZM64.18,64.97L76.39,64.97L76.39,68.21L64.18,68.21ZM64.18,72.34L74.02,72.34L74.02,75.58L64.25,75.58L64.18,75.31ZM90.09,67.49C91,67.79 91.84,68.29 92.57,68.9L94.48,67.79L94.82,68.18C95.47,68.94 96,69.86 96.34,70.81L96.54,71.27L94.67,72.41C94.78,72.87 94.82,73.36 94.82,73.86C94.82,74.36 94.78,74.82 94.67,75.27L96.57,76.38L96.38,76.84C96.04,77.79 95.51,78.67 94.86,79.47L94.55,79.85L92.64,78.79C91.92,79.43 91.08,79.93 90.16,80.23L90.16,82.41L89.67,82.48C89.17,82.56 88.64,82.64 88.14,82.64C87.64,82.64 87.15,82.6 86.65,82.52L86.16,82.45L86.12,80.23C85.2,79.93 84.36,79.43 83.64,78.82L81.73,79.93L81.39,79.55C80.74,78.79 80.2,77.87 79.86,76.91L79.67,76.46L81.54,75.31C81.43,74.85 81.39,74.36 81.39,73.86C81.39,73.36 81.43,72.91 81.54,72.45L79.63,71.34L79.82,70.85C80.16,69.89 80.7,69.02 81.35,68.21L81.65,67.83L83.56,68.9C84.29,68.25 85.13,67.75 86.04,67.45L86.04,65.31L86.54,65.23C87.04,65.16 87.57,65.08 88.06,65.08C88.56,65.08 89.05,65.12 89.55,65.2L90.05,65.27ZM88.06,70.54C86.2,70.54 84.71,72.03 84.71,73.9C84.71,75.77 86.2,77.26 88.06,77.26C89.93,77.26 91.42,75.77 91.42,73.9C91.42,72.03 89.89,70.54 88.06,70.54ZM78.48,86.95C77.3,87.64 76.92,89.13 77.61,90.31C78.29,91.49 79.78,91.88 80.96,91.19C82.15,90.5 82.53,89.01 81.84,87.83C81.16,86.68 79.67,86.26 78.48,86.95ZM78.48,86.95"
      android:fillColor="#062578"
      android:fillType="evenOdd"
      android:strokeColor="#00000000"/>
</vector>


================================================
FILE: android/MLCEngineExample/app/src/main/res/values/colors.xml
================================================
<?xml version="1.0" encoding="utf-8"?>
<resources>
    <color name="purple_200">#FFBB86FC</color>
    <color name="purple_500">#FF6200EE</color>
    <color name="purple_700">#FF3700B3</color>
    <color name="teal_200">#FF03DAC5</color>
    <color name="teal_700">#FF018786</color>
    <color name="black">#FF000000</color>
    <color name="white">#FFFFFFFF</color>
</resources>


================================================
FILE: android/MLCEngineExample/app/src/main/res/values/strings.xml
================================================
<resources>
    <string name="app_name">MLCEngineExample</string>
</resources>


================================================
FILE: android/MLCEngineExample/app/src/main/res/values/themes.xml
================================================
<?xml version="1.0" encoding="utf-8"?>
<resources>

    <style name="Theme.MLCEngineExample" parent="android:Theme.Material.Light" />

</resources>


================================================
FILE: android/MLCEngineExample/app/src/main/res/xml/backup_rules.xml
================================================
<?xml version="1.0" encoding="utf-8"?><!--
   Sample backup rules file; uncomment and customize as necessary.
   See https://developer.android.com/guide/topics/data/autobackup
   for details.
   Note: This file is ignored for devices older that API 31
   See https://developer.android.com/about/versions/12/backup-restore
-->
<full-backup-content>
    <!--
   <include domain="sharedpref" path="."/>
   <exclude domain="sharedpref" path="device.xml"/>
-->
</full-backup-content>


================================================
FILE: android/MLCEngineExample/app/src/main/res/xml/data_extraction_rules.xml
================================================
<?xml version="1.0" encoding="utf-8"?><!--
   Sample data extraction rules file; uncomment and customize as necessary.
   See https://developer.android.com/about/versions/12/backup-restore#xml-changes
   for details.
-->
<data-extraction-rules>
    <cloud-backup>
        <!-- TODO: Use <include> and <exclude> to control what is backed up.
        <include .../>
        <exclude .../>
        -->
    </cloud-backup>
    <!--
    <device-transfer>
        <include .../>
        <exclude .../>
    </device-transfer>
    -->
</data-extraction-rules>


================================================
FILE: android/MLCEngineExample/build.gradle
================================================
plugins {
    id 'com.android.application' version '8.2.0' apply false
    id 'com.android.library' version '8.2.0' apply false
    id 'org.jetbrains.kotlin.android' version '1.8.10' apply false
}


================================================
FILE: android/MLCEngineExample/bundle_weight.py
================================================
import argparse
import os
import subprocess
from pathlib import Path

from mlc_llm.support import logging

logging.enable_logging()
logger = logging.getLogger(__name__)


def main(apk_path: Path, package_output_path: Path):
    """Push weights to the android device with adb"""
    # - Install the apk on device.
    logger.info('Install apk "%s" to device', str(apk_path.absolute()))
    subprocess.run(["adb", "install", str(apk_path)], check=True, env=os.environ)
    # - Create the weight directory for the app.
    device_weihgt_dir = "/storage/emulated/0/Android/data/ai.mlc.mlcengineexample/files/"
    logger.info('Creating directory "%s" on device', device_weihgt_dir)
    subprocess.run(
        ["adb", "shell", "mkdir", "-p", device_weihgt_dir],
        check=True,
        env=os.environ,
    )
    for model_weight_dir in (package_output_path / "bundle").iterdir():
        if model_weight_dir.is_dir():
            src_path = str(model_weight_dir.absolute())
            dst_path = "/data/local/tmp/" + model_weight_dir.name
            logger.info('Pushing local weights "%s" to device location "%s"', src_path, dst_path)
            subprocess.run(["adb", "push", src_path, dst_path], check=True, env=os.environ)

            src_path = dst_path
            dst_path = "/storage/emulated/0/Android/data/ai.mlc.mlcengineexample/files/"
            logger.info('Move weights from "%s" to "%s"', src_path, dst_path)
            subprocess.run(["adb", "shell", "mv", src_path, dst_path], check=True, env=os.environ)
    logger.info("All finished.")


if __name__ == "__main__":
    parser = argparse.ArgumentParser("MLC LLM Android Weight Bundle")

    def _parse_apk_path(path: str) -> Path:
        path = Path(path)
        if not path.exists():
            raise ValueError(
                f"Path {str(path)} is expected to be an apk file, but the file does not exist."
            )
        if not path.is_file():
            raise ValueError(f"Path {str(path)} is expected to be an apk file.")
        return path

    parser.add_argument(
        "--apk-path",
        type=_parse_apk_path,
        default="app/release/app-release.apk",
        help="The path to generated MLCEngineExample apk file.",
    )
    parser.add_argument(
        "--package-output-path",
        type=Path,
        default="dist",
        help='The path to the output directory of "mlc_llm package".',
    )
    args = parser.parse_args()
    main(args.apk_path, args.package_output_path)


================================================
FILE: android/MLCEngineExample/gradle/wrapper/gradle-wrapper.properties
================================================
#Thu Jan 25 10:19:50 EST 2024
distributionBase=GRADLE_USER_HOME
distributionPath=wrapper/dists
distributionUrl=https\://services.gradle.org/distributions/gradle-8.5-bin.zip
zipStoreBase=GRADLE_USER_HOME
zipStorePath=wrapper/dists


================================================
FILE: android/MLCEngineExample/gradle.properties
================================================
# Project-wide Gradle settings.
# IDE (e.g. Android Studio) users:
# Gradle settings configured through the IDE *will override*
# any settings specified in this file.
# For more details on how to configure your build environment visit
# http://www.gradle.org/docs/current/userguide/build_environment.html
# Specifies the JVM arguments used for the daemon process.
# The setting is particularly useful for tweaking memory settings.
org.gradle.jvmargs=-Xmx2048m -Dfile.encoding=UTF-8
# When configured, Gradle will run in incubating parallel mode.
# This option should only be used with decoupled projects. More details, visit
# http://www.gradle.org/docs/current/userguide/multi_project_builds.html#sec:decoupled_projects
# org.gradle.parallel=true
# AndroidX package structure to make it clearer which packages are bundled with the
# Android operating system, and which are packaged with your app's APK
# https://developer.android.com/topic/libraries/support-library/androidx-rn
android.useAndroidX=true
# Kotlin code style for this project: "official" or "obsolete":
kotlin.code.style=official
# Enables namespacing of each library's R class so that its R class includes only the
# resources declared in the library itself and none from the library's dependencies,
# thereby reducing the size of the R class for that library
android.nonTransitiveRClass=true


================================================
FILE: android/MLCEngineExample/gradlew
================================================
#!/usr/bin/env sh

#
# Copyright 2015 the original author or authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

##############################################################################
##
##  Gradle start up script for UN*X
##
##############################################################################

# Attempt to set APP_HOME
# Resolve links: $0 may be a link
PRG="$0"
# Need this for relative symlinks.
while [ -h "$PRG" ] ; do
    ls=`ls -ld "$PRG"`
    link=`expr "$ls" : '.*-> \(.*\)$'`
    if expr "$link" : '/.*' > /dev/null; then
        PRG="$link"
    else
        PRG=`dirname "$PRG"`"/$link"
    fi
done
SAVED="`pwd`"
cd "`dirname \"$PRG\"`/" >/dev/null
APP_HOME="`pwd -P`"
cd "$SAVED" >/dev/null

APP_NAME="Gradle"
APP_BASE_NAME=`basename "$0"`

# Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
DEFAULT_JVM_OPTS='"-Xmx64m" "-Xms64m"'

# Use the maximum available, or set MAX_FD != -1 to use that value.
MAX_FD="maximum"

warn () {
    echo "$*"
}

die () {
    echo
    echo "$*"
    echo
    exit 1
}

# OS specific support (must be 'true' or 'false').
cygwin=false
msys=false
darwin=false
nonstop=false
case "`uname`" in
  CYGWIN* )
    cygwin=true
    ;;
  Darwin* )
    darwin=true
    ;;
  MINGW* )
    msys=true
    ;;
  NONSTOP* )
    nonstop=true
    ;;
esac

CLASSPATH=$APP_HOME/gradle/wrapper/gradle-wrapper.jar


# Determine the Java command to use to start the JVM.
if [ -n "$JAVA_HOME" ] ; then
    if [ -x "$JAVA_HOME/jre/sh/java" ] ; then
        # IBM's JDK on AIX uses strange locations for the executables
        JAVACMD="$JAVA_HOME/jre/sh/java"
    else
        JAVACMD="$JAVA_HOME/bin/java"
    fi
    if [ ! -x "$JAVACMD" ] ; then
        die "ERROR: JAVA_HOME is set to an invalid directory: $JAVA_HOME

Please set the JAVA_HOME variable in your environment to match the
location of your Java installation."
    fi
else
    JAVACMD="java"
    which java >/dev/null 2>&1 || die "ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.

Please set the JAVA_HOME variable in your environment to match the
location of your Java installation."
fi

# Increase the maximum file descriptors if we can.
if [ "$cygwin" = "false" -a "$darwin" = "false" -a "$nonstop" = "false" ] ; then
    MAX_FD_LIMIT=`ulimit -H -n`
    if [ $? -eq 0 ] ; then
        if [ "$MAX_FD" = "maximum" -o "$MAX_FD" = "max" ] ; then
            MAX_FD="$MAX_FD_LIMIT"
        fi
        ulimit -n $MAX_FD
        if [ $? -ne 0 ] ; then
            warn "Could not set maximum file descriptor limit: $MAX_FD"
        fi
    else
        warn "Could not query maximum file descriptor limit: $MAX_FD_LIMIT"
    fi
fi

# For Darwin, add options to specify how the application appears in the dock
if $darwin; then
    GRADLE_OPTS="$GRADLE_OPTS \"-Xdock:name=$APP_NAME\" \"-Xdock:icon=$APP_HOME/media/gradle.icns\""
fi

# For Cygwin or MSYS, switch paths to Windows format before running java
if [ "$cygwin" = "true" -o "$msys" = "true" ] ; then
    APP_HOME=`cygpath --path --mixed "$APP_HOME"`
    CLASSPATH=`cygpath --path --mixed "$CLASSPATH"`

    JAVACMD=`cygpath --unix "$JAVACMD"`

    # We build the pattern for arguments to be converted via cygpath
    ROOTDIRSRAW=`find -L / -maxdepth 1 -mindepth 1 -type d 2>/dev/null`
    SEP=""
    for dir in $ROOTDIRSRAW ; do
        ROOTDIRS="$ROOTDIRS$SEP$dir"
        SEP="|"
    done
    OURCYGPATTERN="(^($ROOTDIRS))"
    # Add a user-defined pattern to the cygpath arguments
    if [ "$GRADLE_CYGPATTERN" != "" ] ; then
        OURCYGPATTERN="$OURCYGPATTERN|($GRADLE_CYGPATTERN)"
    fi
    # Now convert the arguments - kludge to limit ourselves to /bin/sh
    i=0
    for arg in "$@" ; do
        CHECK=`echo "$arg"|egrep -c "$OURCYGPATTERN" -`
        CHECK2=`echo "$arg"|egrep -c "^-"`                                 ### Determine if an option

        if [ $CHECK -ne 0 ] && [ $CHECK2 -eq 0 ] ; then                    ### Added a condition
            eval `echo args$i`=`cygpath --path --ignore --mixed "$arg"`
        else
            eval `echo args$i`="\"$arg\""
        fi
        i=`expr $i + 1`
    done
    case $i in
        0) set -- ;;
        1) set -- "$args0" ;;
        2) set -- "$args0" "$args1" ;;
        3) set -- "$args0" "$args1" "$args2" ;;
        4) set -- "$args0" "$args1" "$args2" "$args3" ;;
        5) set -- "$args0" "$args1" "$args2" "$args3" "$args4" ;;
        6) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" ;;
        7) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" ;;
        8) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" ;;
        9) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" "$args8" ;;
    esac
fi

# Escape application args
save () {
    for i do printf %s\\n "$i" | sed "s/'/'\\\\''/g;1s/^/'/;\$s/\$/' \\\\/" ; done
    echo " "
}
APP_ARGS=`save "$@"`

# Collect all arguments for the java command, following the shell quoting and substitution rules
eval set -- $DEFAULT_JVM_OPTS $JAVA_OPTS $GRADLE_OPTS "\"-Dorg.gradle.appname=$APP_BASE_NAME\"" -classpath "\"$CLASSPATH\"" org.gradle.wrapper.GradleWrapperMain "$APP_ARGS"

exec "$JAVACMD" "$@"


================================================
FILE: android/MLCEngineExample/gradlew.bat
================================================
@rem
@rem Copyright 2015 the original author or authors.
@rem
@rem Licensed under the Apache License, Version 2.0 (the "License");
@rem you may not use this file except in compliance with the License.
@rem You may obtain a copy of the License at
@rem
@rem      https://www.apache.org/licenses/LICENSE-2.0
@rem
@rem Unless required by applicable law or agreed to in writing, software
@rem distributed under the License is distributed on an "AS IS" BASIS,
@rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@rem See the License for the specific language governing permissions and
@rem limitations under the License.
@rem

@if "%DEBUG%" == "" @echo off
@rem ##########################################################################
@rem
@rem  Gradle startup script for Windows
@rem
@rem ##########################################################################

@rem Set local scope for the variables with windows NT shell
if "%OS%"=="Windows_NT" setlocal

set DIRNAME=%~dp0
if "%DIRNAME%" == "" set DIRNAME=.
set APP_BASE_NAME=%~n0
set APP_HOME=%DIRNAME%

@rem Resolve any "." and ".." in APP_HOME to make it shorter.
for %%i in ("%APP_HOME%") do set APP_HOME=%%~fi

@rem Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
set DEFAULT_JVM_OPTS="-Xmx64m" "-Xms64m"

@rem Find java.exe
if defined JAVA_HOME goto findJavaFromJavaHome

set JAVA_EXE=java.exe
%JAVA_EXE% -version >NUL 2>&1
if "%ERRORLEVEL%" == "0" goto execute

echo.
echo ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.
echo.
echo Please set the JAVA_HOME variable in your environment to match the
echo location of your Java installation.

goto fail

:findJavaFromJavaHome
set JAVA_HOME=%JAVA_HOME:"=%
set JAVA_EXE=%JAVA_HOME%/bin/java.exe

if exist "%JAVA_EXE%" goto execute

echo.
echo ERROR: JAVA_HOME is set to an invalid directory: %JAVA_HOME%
echo.
echo Please set the JAVA_HOME variable in your environment to match the
echo location of your Java installation.

goto fail

:execute
@rem Setup the command line

set CLASSPATH=%APP_HOME%\gradle\wrapper\gradle-wrapper.jar


@rem Execute Gradle
"%JAVA_EXE%" %DEFAULT_JVM_OPTS% %JAVA_OPTS% %GRADLE_OPTS% "-Dorg.gradle.appname=%APP_BASE_NAME%" -classpath "%CLASSPATH%" org.gradle.wrapper.GradleWrapperMain %*

:end
@rem End local scope for the variables with windows NT shell
if "%ERRORLEVEL%"=="0" goto mainEnd

:fail
rem Set variable GRADLE_EXIT_CONSOLE if you need the _script_ return code instead of
rem the _cmd.exe /c_ return code!
if  not "" == "%GRADLE_EXIT_CONSOLE%" exit 1
exit /b 1

:mainEnd
if "%OS%"=="Windows_NT" endlocal

:omega


================================================
FILE: android/MLCEngineExample/mlc-package-config.json
================================================
{
    "device": "android",
    "model_list": [
        {
            "model": "HF://mlc-ai/phi-2-q4f16_1-MLC",
            "estimated_vram_bytes": 2036816936,
            "model_id": "phi-2-q4f16_1-MLC",
            "overrides": {
                "prefill_chunk_size": 1024
            }
        }
    ]
}


================================================
FILE: android/MLCEngineExample/settings.gradle
================================================
pluginManagement {
    repositories {
        google()
        mavenCentral()
        gradlePluginPortal()
    }
}
dependencyResolutionManagement {
    repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS)
    repositories {
        google()
        mavenCentral()
    }
}
rootProject.name = "MLCEngineExample"
include ':app'
include ':mlc4j'
project(':mlc4j').projectDir = file('dist/lib/mlc4j')


================================================
FILE: android/README.md
================================================
# MLC-LLM Android

[Documentation page](https://llm.mlc.ai/docs/deploy/android.html)


================================================
FILE: android/mlc4j/.gitignore
================================================
/build


================================================
FILE: android/mlc4j/CMakeLists.txt
================================================
cmake_minimum_required(VERSION 3.18)

project(mlc-chat C CXX)

set(ANDROID_DIR ${CMAKE_CURRENT_LIST_DIR})
set(ANDROID_BIN_DIR ${CMAKE_CURRENT_BINARY_DIR})

set(MLC_LLM_DIR ${ANDROID_DIR}/../..)
set(MLC_LLM_BINARY_DIR mlc_llm)
set(MLC_LLM_COMPILE_DEFS TVM_LOG_CUSTOMIZE=1)
add_subdirectory(${MLC_LLM_DIR} ${MLC_LLM_BINARY_DIR} EXCLUDE_FROM_ALL)

if(NOT DEFINED TVM_SOURCE_DIR)
  set(TVM_SOURCE_DIR ${MLC_LLM_DIR}/3rdparty/tvm)
endif(NOT DEFINED TVM_SOURCE_DIR)
message(STATUS "TVM_SOURCE_DIR: ${TVM_SOURCE_DIR}")

find_package(Java REQUIRED)
include(UseJava)

find_package(JNI)
if(JNI_FOUND)
  message(STATUS "JNI_INCLUDE_DIRS=${JNI_INCLUDE_DIRS}")
else()
  message(STATUS "Try to find jni directly from android env")
  # try to find JNI_LIBRARY
  find_path(JNI_INCLUDE_DIRS NAMES "jni.h")
  message(STATUS "JNI_INCLUDE_DIRS=${JNI_INCLUDE_DIRS}")
endif()

file(GLOB_RECURSE javasources
     ${TVM_SOURCE_DIR}/jvm/core/src/main/java/org/apache/tvm/*.java
     ${ANDROID_DIR}/src/java/*.java)
set(JNI_HEADER ${CMAKE_BINARY_DIR}/jni_header)
add_jar(tvm4j_core ${javasources} GENERATE_NATIVE_HEADERS tvm4jheaders
        DESTINATION ${JNI_HEADER})

add_custom_command(
  TARGET tvm4j_core
  POST_BUILD
  COMMAND ${CMAKE_COMMAND} -E copy ${JNI_HEADER}/org_apache_tvm_LibInfo.h
          ${JNI_HEADER}/org_apache_tvm_native_c_api.h)

add_library(model_android STATIC IMPORTED)
set_target_properties(
  model_android PROPERTIES IMPORTED_LOCATION
                           ${ANDROID_BIN_DIR}/lib/libmodel_android.a)

add_library(
  tvm4j_runtime_packed SHARED
  ${TVM_SOURCE_DIR}/jvm/native/src/main/native/org_apache_tvm_native_c_api.cc)
set(MLC_LLM_COMPILE_DEFS ${MLC_LLM_COMPILE_DEFS}
                         TVM_SOURCE_DIR=${TVM_SOURCE_DIR})

target_include_directories(
  tvm4j_runtime_packed
  PUBLIC ${JNI_INCLUDE_DIRS}
         ${JNI_HEADER}
         ${ANDROID_DIR}/src/cpp
         ${TVM_SOURCE_DIR}/3rdparty/tvm-ffi/3rdparty/dlpack/include
         ${TVM_SOURCE_DIR}/3rdparty/OpenCL-Headers
         ${TVM_SOURCE_DIR}/include
         ${TVM_SOURCE_DIR}/src
         ${TVM_SOURCE_DIR}/3rdparty/tvm-ffi/include
         ${TVM_SOURCE_DIR}/3rdparty/tvm-ffi/src)
target_compile_definitions(tvm4j_runtime_packed PUBLIC ${MLC_LLM_COMPILE_DEFS})
target_compile_definitions(
  tvm4j_runtime_packed
  PUBLIC TVM_VM_ENABLE_PROFILER=0
  PUBLIC TVM_FFI_USE_LIBBACKTRACE=0
  PUBLIC TVM_FFI_BACKTRACE_ON_SEGFAULT=0)

set(MLC_ENABLE_SENTENCEPIECE_TOKENIZER OFF)
target_link_libraries(
  tvm4j_runtime_packed
  tokenizers_c
  tokenizers_cpp
  log
  -Wl,--whole-archive
  mlc_llm_static
  model_android
  -Wl,--no-whole-archive)

target_compile_definitions(tvm4j_runtime_packed PUBLIC TVM4J_ANDROID)
add_dependencies(tvm4j_runtime_packed tvm4j_core)

target_compile_definitions(mlc_llm_objs PUBLIC MLC_SINGLE_GPU_ONLY)

install_jar(tvm4j_core output)
install(TARGETS tvm4j_runtime_packed LIBRARY DESTINATION output/${ANDROID_ABI})


================================================
FILE: android/mlc4j/build.gradle
================================================
plugins {
    id 'com.android.library'
    id 'org.jetbrains.kotlin.android'
    id 'org.jetbrains.kotlin.plugin.serialization' version '1.8.0'
}

android {
    namespace 'ai.mlc.mlcllm'
    compileSdk 34

    defaultConfig {
        minSdk 22
    }
    compileOptions {
        sourceCompatibility JavaVersion.VERSION_1_8
        targetCompatibility JavaVersion.VERSION_1_8
    }
    kotlinOptions {
        jvmTarget = '1.8'
    }
    sourceSets {
        main {
            jniLibs.srcDirs = ['output']
        }
    }
}

dependencies {
    implementation fileTree(dir: 'output', include: ['*.jar'])
    implementation 'androidx.core:core-ktx:1.9.0'
    implementation 'androidx.appcompat:appcompat:1.6.1'
    implementation 'com.google.android.material:material:1.10.0'
    implementation 'org.jetbrains.kotlinx:kotlinx-serialization-json:1.6.3'
}


================================================
FILE: android/mlc4j/prepare_libs.py
================================================
"""The build script for mlc4j (MLC LLM and tvm4j)"""

import argparse
import json
import os
import subprocess
import sys
from pathlib import Path

from mlc_llm.support import logging

logging.enable_logging()
logger = logging.getLogger(__name__)


def run_cmake(mlc4j_path: Path):
    if "ANDROID_NDK" not in os.environ:
        raise ValueError(
            f'Environment variable "ANDROID_NDK" is required but not found.'
            "Please follow https://llm.mlc.ai/docs/deploy/android.html to properly "
            'specify "ANDROID_NDK".'
        )
    logger.info("Running cmake")
    # use pathlib so it is cross platform
    android_ndk_path = (
        Path(os.environ["ANDROID_NDK"]) / "build" / "cmake" / "android.toolchain.cmake"
    )
    cmd = [
        "cmake",
        str(mlc4j_path),
        "-DCMAKE_BUILD_TYPE=Release",
        f"-DCMAKE_TOOLCHAIN_FILE={str(android_ndk_path)}",
        "-DCMAKE_INSTALL_PREFIX=.",
        '-DCMAKE_CXX_FLAGS="-O3"',
        "-DANDROID_ABI=arm64-v8a",
        "-DANDROID_NATIVE_API_LEVEL=android-24",
        "-DANDROID_PLATFORM=android-24",
        "-DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON",
        "-DANDROID_STL=c++_static",
        "-DUSE_HEXAGON_SDK=OFF",
        "-DMLC_LLM_INSTALL_STATIC_LIB=ON",
        "-DCMAKE_SKIP_INSTALL_ALL_DEPENDENCY=ON",
        "-DUSE_OPENCL=ON",
        "-DUSE_OPENCL_ENABLE_HOST_PTR=ON",
        "-DUSE_CUSTOM_LOGGING=ON",
        "-DTVM_FFI_USE_LIBBACKTRACE=OFF",
        "-DTVM_FFI_BACKTRACE_ON_SEGFAULT=OFF",
    ]

    if sys.platform == "win32":
        logger.info("Using ninja in windows, make sure you installed ninja in conda")
        cmd += ["-G", "Ninja"]
    subprocess.run(cmd, check=True, env=os.environ)


def run_cmake_build():
    logger.info("Running cmake build")
    cmd = [
        "cmake",
        "--build",
        ".",
        "--target",
        "tvm4j_runtime_packed",
        "--config",
        "release",
        f"-j{os.cpu_count()}",
    ]
    subprocess.run(cmd, check=True, env=os.environ)


def run_cmake_install():
    logger.info("Running cmake install")
    cmd = [
        "cmake",
        "--build",
        ".",
        "--target",
        "install",
        "--config",
        "release",
        f"-j{os.cpu_count()}",
    ]
    subprocess.run(cmd, check=True, env=os.environ)


def main(mlc_llm_source_dir: Path):
    # - Setup rust.
    subprocess.run(["rustup", "target", "add", "aarch64-linux-android"], check=True, env=os.environ)

    # - Build MLC LLM and tvm4j.
    build_path = Path("build")
    os.makedirs(build_path / "lib", exist_ok=True)
    logger.info('Entering "%s" for MLC LLM and tvm4j build.', os.path.abspath(build_path))
    os.chdir(build_path)
    # Generate config.cmake if TVM Home is set.
    if "TVM_SOURCE_DIR" in os.environ:
        logger.info('Set TVM_SOURCE_DIR to "%s"', os.environ["TVM_SOURCE_DIR"])
        with open("config.cmake", "w", encoding="utf-8") as file:
            # We use "json.dumps" to escape backslashes and quotation marks
            tvm_source_dir_str_with_escape = json.dumps(os.environ["TVM_SOURCE_DIR"])
            print("set(TVM_SOURCE_DIR %s)" % tvm_source_dir_str_with_escape, file=file)

    # - Run cmake, build and install
    run_cmake(mlc_llm_source_dir / "android" / "mlc4j")
    run_cmake_build()
    run_cmake_install()


if __name__ == "__main__":
    parser = argparse.ArgumentParser("MLC LLM Android Lib Preparation")

    parser.add_argument(
        "--mlc-llm-source-dir",
        type=Path,
        default=os.environ.get("MLC_LLM_SOURCE_DIR", None),
        help="The path to MLC LLM source",
    )
    parsed = parser.parse_args()
    if parsed.mlc_llm_source_dir is None:
        parsed.mlc_llm_source_dir = Path(os.path.abspath(os.path.curdir)).parent.parent
    os.environ["MLC_LLM_SOURCE_DIR"] = str(parsed.mlc_llm_source_dir)
    main(parsed.mlc_llm_source_dir)


================================================
FILE: android/mlc4j/src/cpp/tvm_runtime.h
================================================
#define TVM_USE_LIBBACKTRACE 0

#include <android/log.h>
#include <dlfcn.h>
#include <tvm/runtime/logging.h>

#include <ffi/backtrace.cc>
#include <ffi/container.cc>
#include <ffi/dtype.cc>
#include <ffi/error.cc>
#include <ffi/extra/env_c_api.cc>
#include <ffi/extra/env_context.cc>
#include <ffi/extra/json_parser.cc>
#include <ffi/extra/json_writer.cc>
#include <ffi/extra/library_module.cc>
#include <ffi/extra/library_module_dynamic_lib.cc>
#include <ffi/extra/library_module_system_lib.cc>
#include <ffi/extra/module.cc>
#include <ffi/function.cc>
#include <ffi/object.cc>
#include <runtime/cpu_device_api.cc>
#include <runtime/device_api.cc>
#include <runtime/file_utils.cc>
#include <runtime/logging.cc>
#include <runtime/memory/memory_manager.cc>
#include <runtime/module.cc>
#include <runtime/nvtx.cc>
#include <runtime/opencl/opencl_device_api.cc>
#include <runtime/opencl/opencl_module.cc>
#include <runtime/opencl/opencl_wrapper/opencl_wrapper.cc>
#include <runtime/profiling.cc>
#include <runtime/source_utils.cc>
#include <runtime/tensor.cc>
#include <runtime/thread_pool.cc>
#include <runtime/threading_backend.cc>
#include <runtime/vm/attn_backend.cc>
#include <runtime/vm/builtin.cc>
#include <runtime/vm/bytecode.cc>
#include <runtime/vm/executable.cc>
#include <runtime/vm/kv_state.cc>
#include <runtime/vm/paged_kv_cache.cc>
#include <runtime/vm/rnn_state.cc>
#include <runtime/vm/tensor_cache_support.cc>
#include <runtime/vm/vm.cc>
#include <runtime/workspace_pool.cc>

static_assert(TVM_LOG_CUSTOMIZE == 1, "TVM_LOG_CUSTOMIZE must be 1");

namespace tvm {
namespace runtime {
namespace detail {
// Override logging mechanism
[[noreturn]] void LogFatalImpl(const std::string& file, int lineno, const std::string& message) {
  std::string m = file + ":" + std::to_string(lineno) + ": " + message;
  __android_log_write(ANDROID_LOG_FATAL, "TVM_RUNTIME", m.c_str());
  throw InternalError(file, lineno, message);
}
void LogMessageImpl(const std::string& file, int lineno, int level, const std::string& message) {
  std::string m = file + ":" + std::to_string(lineno) + ": " + message;
  __android_log_write(ANDROID_LOG_DEBUG + level, "TVM_RUNTIME", m.c_str());
}

}  // namespace detail
}  // namespace runtime
}  // namespace tvm


================================================
FILE: android/mlc4j/src/main/AndroidManifest.xml
================================================
<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android">

</manifest>


================================================
FILE: android/mlc4j/src/main/java/ai/mlc/mlcllm/JSONFFIEngine.java
================================================
package ai.mlc.mlcllm;

import org.apache.tvm.Device;
import org.apache.tvm.Function;
import org.apache.tvm.Module;
import org.apache.tvm.TVMValue;
import android.util.Log;

public class JSONFFIEngine {
    private Module jsonFFIEngine;
    private Function initBackgroundEngineFunc;
    private Function reloadFunc;
    private Function unloadFunc;
    private Function resetFunc;
    private Function chatCompletionFunc;
    private Function abortFunc;
    private Function getLastErrorFunc;
    private Function runBackgroundLoopFunc;
    private Function runBackgroundStreamBackLoopFunc;
    private Function exitBackgroundLoopFunc;
    private Function requestStreamCallback;

    public JSONFFIEngine() {
        Function createFunc = Function.getFunction("mlc.json_ffi.CreateJSONFFIEngine");
        assert createFunc != null;
        jsonFFIEngine = createFunc.invoke().asModule();
        initBackgroundEngineFunc = jsonFFIEngine.getFunction("init_background_engine");
        reloadFunc = jsonFFIEngine.getFunction("reload");
        unloadFunc = jsonFFIEngine.getFunction("unload");
        resetFunc = jsonFFIEngine.getFunction("reset");
        chatCompletionFunc = jsonFFIEngine.getFunction("chat_completion");
        abortFunc = jsonFFIEngine.getFunction("abort");
        getLastErrorFunc = jsonFFIEngine.getFunction("get_last_error");
        runBackgroundLoopFunc = jsonFFIEngine.getFunction("run_background_loop");
        runBackgroundStreamBackLoopFunc = jsonFFIEngine.getFunction("run_background_stream_back_loop");
        exitBackgroundLoopFunc = jsonFFIEngine.getFunction("exit_background_loop");
    }

    public void initBackgroundEngine(KotlinFunction callback) {
        Device device = Device.opencl();

        requestStreamCallback = Function.convertFunc(new Function.Callback() {
            @Override
            public Object invoke(TVMValue... args) {
                final String chatCompletionStreamResponsesJSONStr = args[0].asString();
                callback.invoke(chatCompletionStreamResponsesJSONStr);
                return 1;
            }
        });

        initBackgroundEngineFunc.pushArg(device.deviceType).pushArg(device.deviceId).pushArg(requestStreamCallback)
                .invoke();
    }

    public void reload(String engineConfigJSONStr) {
        reloadFunc.pushArg(engineConfigJSONStr).invoke();
    }

    public void chatCompletion(String requestJSONStr, String requestId) {
        chatCompletionFunc.pushArg(requestJSONStr).pushArg(requestId).invoke();
    }

    public void runBackgroundLoop() {
        runBackgroundLoopFunc.invoke();
    }

    public void runBackgroundStreamBackLoop() {
        runBackgroundStreamBackLoopFunc.invoke();
    }

    public void exitBackgroundLoop() {
        exitBackgroundLoopFunc.invoke();
    }

    public void unload() {
        unloadFunc.invoke();
    }

    public interface KotlinFunction {
        void invoke(String arg);
    }

    public void reset() {
        resetFunc.invoke();
    }

}


================================================
FILE: android/mlc4j/src/main/java/ai/mlc/mlcllm/MLCEngine.kt
================================================
package ai.mlc.mlcllm

import ai.mlc.mlcllm.OpenAIProtocol.*
import kotlinx.coroutines.GlobalScope
import kotlinx.coroutines.channels.Channel
import kotlinx.coroutines.channels.ReceiveChannel
import kotlinx.coroutines.launch
import kotlinx.serialization.json.Json
import kotlinx.serialization.encodeToString
import kotlinx.serialization.decodeFromString
import kotlin.concurrent.thread
import java.util.UUID
import java.util.logging.Logger

class BackgroundWorker(private val task: () -> Unit) {

    fun start() {
        thread(start = true) {
            task()
        }
    }
}

class MLCEngine {

    private val state: EngineState
    private val jsonFFIEngine: JSONFFIEngine
    val chat: Chat
    private val threads = mutableListOf<BackgroundWorker>()

    init {
        state = EngineState()
        jsonFFIEngine = JSONFFIEngine()
        chat = Chat(jsonFFIEngine, state)

        jsonFFIEngine.initBackgroundEngine { result ->
            state.streamCallback(result)
        }

        val backgroundWorker = BackgroundWorker {
            Thread.currentThread().priority = Thread.MAX_PRIORITY
            jsonFFIEngine.runBackgroundLoop()
        }

        val backgroundStreamBackWorker = BackgroundWorker {
            jsonFFIEngine.runBackgroundStreamBackLoop()
        }

        threads.add(backgroundWorker)
        threads.add(backgroundStreamBackWorker)

        backgroundWorker.start()
        backgroundStreamBackWorker.start()
    }

    fun reload(modelPath: String, modelLib: String) {
        val engineConfig = """
            {
                "model": "$modelPath",
                "model_lib": "system://$modelLib",
                "mode": "interactive"
            }
        """
        jsonFFIEngine.reload(engineConfig)
    }

    fun reset() {
        jsonFFIEngine.reset()
    }

    fun unload() {
        jsonFFIEngine.unload()
    }
}

data class RequestState(
    val request: ChatCompletionRequest,
    val continuation: Channel<ChatCompletionStreamResponse>
)

class EngineState {

    private val logger = Logger.getLogger(EngineState::class.java.name)
    private val requestStateMap = mutableMapOf<String, RequestState>()

    suspend fun chatCompletion(
        jsonFFIEngine: JSONFFIEngine,
        request: ChatCompletionRequest
    ): ReceiveChannel<ChatCompletionStreamResponse> {
        val json = Json { encodeDefaults = true }
        val jsonRequest = json.encodeToString(request)
        val requestID = UUID.randomUUID().toString()
        val channel = Channel<ChatCompletionStreamResponse>(Channel.UNLIMITED)

        requestStateMap[requestID] = RequestState(request, channel)

        jsonFFIEngine.chatCompletion(jsonRequest, requestID)

        return channel
    }

    fun streamCallback(result: String?) {
        val json = Json { ignoreUnknownKeys = true }
        try {
            val responses: List<ChatCompletionStreamResponse> = json.decodeFromString(result ?: return)

            responses.forEach { res ->
                val requestState = requestStateMap[res.id] ?: return@forEach
                GlobalScope.launch {

                    res.usage?.let { finalUsage ->
                        requestState.request.stream_options?.include_usage?.let { includeUsage ->
                            if (includeUsage) {
                                requestState.continuation.send(res)
                            }
                        }
                        requestState.continuation.close()
                        requestStateMap.remove(res.id)
                    } ?: run {
                        val sendResult = requestState.continuation.trySend(res)
                        if (sendResult.isFailure) {
                            // Handle the failure case if needed
                            logger.severe("Failed to send the response: ${sendResult.exceptionOrNull()}")
                        }
                    }
                }
            }
        } catch (e: Exception) {
            logger.severe("Kotlin JSON parsing error: $e, jsonsrc=$result")
        }
    }
}

class Chat(
    private val jsonFFIEngine: JSONFFIEngine,
    private val state: EngineState
) {
    val completions = Completions(jsonFFIEngine, state)
}

class Completions(
    private val jsonFFIEngine: JSONFFIEngine,
    private val state: EngineState
) {

    suspend fun create(request: ChatCompletionRequest): ReceiveChannel<ChatCompletionStreamResponse> {
        return state.chatCompletion(jsonFFIEngine, request)
    }

    suspend fun create(
        messages: List<ChatCompletionMessage>,
        model: String? = null,
        frequency_penalty: Float? = null,
        presence_penalty: Float? = null,
        logprobs: Boolean = false,
        top_logprobs: Int = 0,
        logit_bias: Map<Int, Float>? = null,
        max_tokens: Int? = null,
        n: Int = 1,
        seed: Int? = null,
        stop: List<String>? = null,
        stream: Boolean = true,
        stream_options: StreamOptions? = null,
        temperature: Float? = null,
        top_p: Float? = null,
        tools: List<ChatTool>? = null,
        user: String? = null,
        response_format: ResponseFormat? = null
    ): ReceiveChannel<ChatCompletionStreamResponse> {
        if (!stream) {
            throw IllegalArgumentException("Only stream=true is supported in MLCKotlin")
        }

        val request = ChatCompletionRequest(
            messages = messages,
            model = model,
            frequency_penalty = frequency_penalty,
            presence_penalty = presence_penalty,
            logprobs = logprobs,
            top_logprobs = top_logprobs,
            logit_bias = logit_bias,
            max_tokens = max_tokens,
            n = n,
            seed = seed,
            stop = stop,
            stream = stream,
            stream_options = stream_options,
            temperature = temperature,
            top_p = top_p,
            tools = tools,
            user = user,
            response_format = response_format
        )
        return create(request)
    }
}


================================================
FILE: android/mlc4j/src/main/java/ai/mlc/mlcllm/OpenAIProtocol.kt
================================================
package ai.mlc.mlcllm

import kotlinx.serialization.KSerializer
import kotlinx.serialization.Serializable
import kotlinx.serialization.builtins.ListSerializer
import kotlinx.serialization.builtins.MapSerializer
import kotlinx.serialization.builtins.serializer
import kotlinx.serialization.descriptors.SerialDescriptor
import kotlinx.serialization.descriptors.buildClassSerialDescriptor
import kotlinx.serialization.encoding.Decoder
import kotlinx.serialization.encoding.Encoder
import kotlinx.serialization.json.JsonArray
import kotlinx.serialization.json.JsonElement
import kotlinx.serialization.json.JsonObject
import kotlinx.serialization.json.JsonPrimitive
import kotlinx.serialization.json.jsonPrimitive
import java.util.*

// Data classes for v1/chat/completions
// API reference: https://platform.openai.com/docs/api-reference/chat/create

class OpenAIProtoc

Download .txt

gitextract_s4bq7ahm/

├── .clang-format
├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug-report.md
│   │   ├── config.yml
│   │   ├── documentation.md
│   │   ├── feature-request.md
│   │   ├── general.md
│   │   ├── model-request.md
│   │   ├── speed-report.md
│   │   └── tracking.md
│   └── workflows/
│       ├── documentation.yaml
│       ├── update-relax.yaml
│       └── windows-build.yaml
├── .gitignore
├── .gitmodules
├── .pre-commit-config.yaml
├── .pylintrc
├── CMakeLists.txt
├── CONTRIBUTORS.md
├── LICENSE
├── NOTICE
├── README.md
├── android/
│   ├── .gitignore
│   ├── MLCChat/
│   │   ├── README.md
│   │   ├── app/
│   │   │   ├── .gitignore
│   │   │   ├── build.gradle
│   │   │   ├── proguard-rules.pro
│   │   │   └── src/
│   │   │       └── main/
│   │   │           ├── AndroidManifest.xml
│   │   │           ├── java/
│   │   │           │   └── ai/
│   │   │           │       └── mlc/
│   │   │           │           └── mlcchat/
│   │   │           │               ├── AppViewModel.kt
│   │   │           │               ├── ChatView.kt
│   │   │           │               ├── MainActivity.kt
│   │   │           │               ├── NavView.kt
│   │   │           │               ├── StartView.kt
│   │   │           │               └── ui/
│   │   │           │                   └── theme/
│   │   │           │                       ├── Color.kt
│   │   │           │                       ├── Theme.kt
│   │   │           │                       └── Type.kt
│   │   │           └── res/
│   │   │               ├── drawable/
│   │   │               │   ├── ic_android_black_24dp.xml
│   │   │               │   └── mlc_logo_108.xml
│   │   │               ├── values/
│   │   │               │   ├── colors.xml
│   │   │               │   ├── strings.xml
│   │   │               │   └── themes.xml
│   │   │               └── xml/
│   │   │                   ├── backup_rules.xml
│   │   │                   └── data_extraction_rules.xml
│   │   ├── build.gradle
│   │   ├── bundle_weight.py
│   │   ├── gradle/
│   │   │   └── wrapper/
│   │   │       ├── gradle-wrapper.jar
│   │   │       └── gradle-wrapper.properties
│   │   ├── gradle.properties
│   │   ├── gradlew
│   │   ├── gradlew.bat
│   │   ├── mlc-package-config.json
│   │   └── settings.gradle
│   ├── MLCEngineExample/
│   │   ├── README.md
│   │   ├── app/
│   │   │   ├── .gitignore
│   │   │   ├── build.gradle
│   │   │   ├── proguard-rules.pro
│   │   │   └── src/
│   │   │       └── main/
│   │   │           ├── AndroidManifest.xml
│   │   │           ├── java/
│   │   │           │   └── ai/
│   │   │           │       └── mlc/
│   │   │           │           └── mlcengineexample/
│   │   │           │               ├── MainActivity.kt
│   │   │           │               └── ui/
│   │   │           │                   └── theme/
│   │   │           │                       ├── Color.kt
│   │   │           │                       ├── Theme.kt
│   │   │           │                       └── Type.kt
│   │   │           └── res/
│   │   │               ├── drawable/
│   │   │               │   ├── ic_android_black_24dp.xml
│   │   │               │   └── mlc_logo_108.xml
│   │   │               ├── values/
│   │   │               │   ├── colors.xml
│   │   │               │   ├── strings.xml
│   │   │               │   └── themes.xml
│   │   │               └── xml/
│   │   │                   ├── backup_rules.xml
│   │   │                   └── data_extraction_rules.xml
│   │   ├── build.gradle
│   │   ├── bundle_weight.py
│   │   ├── gradle/
│   │   │   └── wrapper/
│   │   │       ├── gradle-wrapper.jar
│   │   │       └── gradle-wrapper.properties
│   │   ├── gradle.properties
│   │   ├── gradlew
│   │   ├── gradlew.bat
│   │   ├── mlc-package-config.json
│   │   └── settings.gradle
│   ├── README.md
│   └── mlc4j/
│       ├── .gitignore
│       ├── CMakeLists.txt
│       ├── build.gradle
│       ├── prepare_libs.py
│       └── src/
│           ├── cpp/
│           │   └── tvm_runtime.h
│           └── main/
│               ├── AndroidManifest.xml
│               └── java/
│                   └── ai/
│                       └── mlc/
│                           └── mlcllm/
│                               ├── JSONFFIEngine.java
│                               ├── MLCEngine.kt
│                               └── OpenAIProtocol.kt
├── ci/
│   ├── bash.sh
│   ├── build-environment.yaml
│   ├── jenkinsfile.groovy
│   └── task/
│       ├── black.sh
│       ├── build_clean.sh
│       ├── build_lib.sh
│       ├── build_win.bat
│       ├── clang-format.sh
│       ├── isort.sh
│       ├── mypy.sh
│       ├── pylint.sh
│       ├── test_model_compile.sh
│       └── test_unittest.sh
├── cmake/
│   └── gen_cmake_config.py
├── cpp/
│   ├── base.h
│   ├── json_ffi/
│   │   ├── conv_template.cc
│   │   ├── conv_template.h
│   │   ├── image_utils.cc
│   │   ├── image_utils.h
│   │   ├── json_ffi_engine.cc
│   │   ├── json_ffi_engine.h
│   │   ├── openai_api_protocol.cc
│   │   └── openai_api_protocol.h
│   ├── metadata/
│   │   ├── model.cc
│   │   └── model.h
│   ├── multi_gpu/
│   │   ├── builtin.cc
│   │   └── multi_gpu_loader.cc
│   ├── serve/
│   │   ├── config.cc
│   │   ├── config.h
│   │   ├── data.cc
│   │   ├── data.h
│   │   ├── draft_token_workspace_manager.cc
│   │   ├── draft_token_workspace_manager.h
│   │   ├── engine.cc
│   │   ├── engine.h
│   │   ├── engine_actions/
│   │   │   ├── action.cc
│   │   │   ├── action.h
│   │   │   ├── action_commons.cc
│   │   │   ├── action_commons.h
│   │   │   ├── auto_spec_decode.cc
│   │   │   ├── batch_decode.cc
│   │   │   ├── batch_draft.cc
│   │   │   ├── batch_jumpforward.cc
│   │   │   ├── batch_prefill_base.cc
│   │   │   ├── batch_prefill_base.h
│   │   │   ├── batch_verify.cc
│   │   │   ├── disagg_prepare_recv.cc
│   │   │   ├── disagg_remote_send.cc
│   │   │   ├── eagle_batch_draft.cc
│   │   │   ├── eagle_batch_verify.cc
│   │   │   ├── eagle_new_request_prefill.cc
│   │   │   └── new_request_prefill.cc
│   │   ├── engine_state.cc
│   │   ├── engine_state.h
│   │   ├── event_trace_recorder.cc
│   │   ├── event_trace_recorder.h
│   │   ├── function_table.cc
│   │   ├── function_table.h
│   │   ├── logit_processor.cc
│   │   ├── logit_processor.h
│   │   ├── metrics.cc
│   │   ├── metrics.h
│   │   ├── model.cc
│   │   ├── model.h
│   │   ├── prefix_cache.cc
│   │   ├── prefix_cache.h
│   │   ├── radix_tree.cc
│   │   ├── radix_tree.h
│   │   ├── request.cc
│   │   ├── request.h
│   │   ├── request_state.cc
│   │   ├── request_state.h
│   │   ├── sampler/
│   │   │   ├── cpu_sampler.cc
│   │   │   ├── gpu_sampler.cc
│   │   │   └── sampler.h
│   │   ├── threaded_engine.cc
│   │   └── threaded_engine.h
│   ├── support/
│   │   ├── debug_utils.h
│   │   ├── dynamic_bitset.h
│   │   ├── encoding.cc
│   │   ├── encoding.h
│   │   ├── json_parser.h
│   │   ├── load_bytes_from_file.h
│   │   ├── progress_bar.h
│   │   ├── random.h
│   │   ├── result.h
│   │   ├── utils.h
│   │   ├── vlm_utils.cc
│   │   └── vlm_utils.h
│   └── tokenizers/
│       ├── streamer.cc
│       ├── streamer.h
│       ├── tokenizers.cc
│       └── tokenizers.h
├── docs/
│   ├── .gitignore
│   ├── Makefile
│   ├── README.md
│   ├── community/
│   │   ├── faq.rst
│   │   └── guideline.rst
│   ├── compilation/
│   │   ├── compile_models.rst
│   │   ├── configure_quantization.rst
│   │   ├── convert_weights.rst
│   │   ├── define_new_models.rst
│   │   └── package_libraries_and_weights.rst
│   ├── conf.py
│   ├── deploy/
│   │   ├── android.rst
│   │   ├── cli.rst
│   │   ├── ide_integration.rst
│   │   ├── ios.rst
│   │   ├── mlc_chat_config.rst
│   │   ├── python_engine.rst
│   │   ├── rest.rst
│   │   └── webllm.rst
│   ├── get_started/
│   │   ├── introduction.rst
│   │   └── quick_start.rst
│   ├── index.rst
│   ├── install/
│   │   ├── conda.rst
│   │   ├── emcc.rst
│   │   ├── gpu.rst
│   │   ├── mlc_llm.rst
│   │   └── tvm.rst
│   ├── make.bat
│   ├── microserving/
│   │   └── tutorial.rst
│   ├── privacy.rst
│   └── requirements.txt
├── examples/
│   ├── python/
│   │   ├── microserving/
│   │   │   └── custom_router.py
│   │   └── sample_mlc_engine.py
│   └── rest/
│       ├── nodejs/
│       │   ├── README.MD
│       │   ├── dotenv.example
│       │   ├── package.json
│       │   ├── sample_client.js
│       │   ├── sample_langchain.ts
│       │   ├── sample_openai.js
│       │   └── tsconfig.json
│       ├── python/
│       │   ├── sample_client.py
│       │   ├── sample_langchain.py
│       │   └── sample_openai.py
│       └── resources/
│           ├── linux.txt
│           └── state_of_the_union.txt
├── ios/
│   ├── .gitignore
│   ├── MLCChat/
│   │   ├── MLCChat/
│   │   │   ├── Assets.xcassets/
│   │   │   │   ├── AccentColor.colorset/
│   │   │   │   │   └── Contents.json
│   │   │   │   ├── AppIcon.appiconset/
│   │   │   │   │   └── Contents.json
│   │   │   │   └── Contents.json
│   │   │   ├── Common/
│   │   │   │   └── Constants.swift
│   │   │   ├── Info.plist
│   │   │   ├── MLCChat.entitlements
│   │   │   ├── MLCChatApp.swift
│   │   │   ├── Models/
│   │   │   │   ├── AppConfig.swift
│   │   │   │   ├── ModelConfig.swift
│   │   │   │   └── ParamsConfig.swift
│   │   │   ├── Preview Content/
│   │   │   │   └── Preview Assets.xcassets/
│   │   │   │       └── Contents.json
│   │   │   ├── States/
│   │   │   │   ├── AppState.swift
│   │   │   │   ├── ChatState.swift
│   │   │   │   └── ModelState.swift
│   │   │   └── Views/
│   │   │       ├── ChatView.swift
│   │   │       ├── ImageProcessing.swift
│   │   │       ├── MessageView.swift
│   │   │       ├── ModelView.swift
│   │   │       └── StartView.swift
│   │   ├── MLCChat.xcodeproj/
│   │   │   ├── project.pbxproj
│   │   │   ├── project.xcworkspace/
│   │   │   │   ├── contents.xcworkspacedata
│   │   │   │   └── xcshareddata/
│   │   │   │       ├── IDEWorkspaceChecks.plist
│   │   │   │       ├── WorkspaceSettings.xcsettings
│   │   │   │       └── swiftpm/
│   │   │   │           └── Package.resolved
│   │   │   └── xcshareddata/
│   │   │       └── xcschemes/
│   │   │           └── MLCChat.xcscheme
│   │   ├── README.md
│   │   └── mlc-package-config.json
│   ├── MLCEngineExample/
│   │   ├── MLCEngineExample/
│   │   │   ├── Assets.xcassets/
│   │   │   │   ├── AccentColor.colorset/
│   │   │   │   │   └── Contents.json
│   │   │   │   ├── AppIcon.appiconset/
│   │   │   │   │   └── Contents.json
│   │   │   │   └── Contents.json
│   │   │   ├── ContentView.swift
│   │   │   ├── MLCEngineExample.entitlements
│   │   │   ├── MLCEngineExampleApp.swift
│   │   │   └── Preview Content/
│   │   │       └── Preview Assets.xcassets/
│   │   │           └── Contents.json
│   │   ├── MLCEngineExample.xcodeproj/
│   │   │   ├── project.pbxproj
│   │   │   └── project.xcworkspace/
│   │   │       ├── contents.xcworkspacedata
│   │   │       └── xcshareddata/
│   │   │           └── IDEWorkspaceChecks.plist
│   │   ├── README.md
│   │   └── mlc-package-config.json
│   ├── MLCSwift/
│   │   ├── Package.swift
│   │   ├── README.md
│   │   └── Sources/
│   │       ├── ObjC/
│   │       │   ├── LLMEngine.mm
│   │       │   └── include/
│   │       │       └── LLMEngine.h
│   │       └── Swift/
│   │           ├── LLMEngine.swift
│   │           └── OpenAIProtocol.swift
│   ├── README.md
│   └── prepare_libs.sh
├── pyproject.toml
├── python/
│   ├── mlc_llm/
│   │   ├── __init__.py
│   │   ├── __main__.py
│   │   ├── base.py
│   │   ├── bench/
│   │   │   ├── __init__.py
│   │   │   ├── __main__.py
│   │   │   ├── api_endpoint.py
│   │   │   ├── dataset.py
│   │   │   ├── evaluation/
│   │   │   │   ├── gsm8k.py
│   │   │   │   └── mmlu.py
│   │   │   ├── request_processor.py
│   │   │   └── request_record.py
│   │   ├── cli/
│   │   │   ├── __init__.py
│   │   │   ├── calibrate.py
│   │   │   ├── chat.py
│   │   │   ├── check_device.py
│   │   │   ├── compile.py
│   │   │   ├── convert_weight.py
│   │   │   ├── delivery.py
│   │   │   ├── disco_remote_socket_session.py
│   │   │   ├── gen_config.py
│   │   │   ├── lib_delivery.py
│   │   │   ├── model_metadata.py
│   │   │   ├── package.py
│   │   │   ├── router.py
│   │   │   ├── serve.py
│   │   │   └── worker.py
│   │   ├── compiler_pass/
│   │   │   ├── __init__.py
│   │   │   ├── attach_cuda_graph_alloc_init_func.py
│   │   │   ├── attach_embedding_allocator.py
│   │   │   ├── attach_logit_processor.py
│   │   │   ├── attach_sampler.py
│   │   │   ├── attach_softmax_with_temperature.py
│   │   │   ├── attach_spec_decode_aux_funcs.py
│   │   │   ├── attach_support_info.py
│   │   │   ├── blas_dispatch.py
│   │   │   ├── clean_up_tir_attrs.py
│   │   │   ├── dispatch_kv_cache_creation.py
│   │   │   ├── dispatch_triton_kernel.py
│   │   │   ├── estimate_memory_usage.py
│   │   │   ├── fuse_add_norm.py
│   │   │   ├── fuse_dequantize_matmul_ewise.py
│   │   │   ├── fuse_dequantize_take.py
│   │   │   ├── fuse_dequantize_transpose.py
│   │   │   ├── fuse_ft_dequantize_matmul_epilogue.py
│   │   │   ├── fuse_transpose_matmul.py
│   │   │   ├── lift_global_buffer_alloc.py
│   │   │   ├── low_batch_specialization.py
│   │   │   ├── pipeline.py
│   │   │   ├── pipeline_parallel_rewrite.py
│   │   │   └── scatter_tuple_get_item.py
│   │   ├── contrib/
│   │   │   ├── __init__.py
│   │   │   └── embeddings/
│   │   │       ├── __init__.py
│   │   │       ├── embeddings.py
│   │   │       └── openai.py
│   │   ├── conversation_template/
│   │   │   ├── __init__.py
│   │   │   ├── cohere.py
│   │   │   ├── deepseek.py
│   │   │   ├── dolly.py
│   │   │   ├── gemma.py
│   │   │   ├── glm.py
│   │   │   ├── gorilla.py
│   │   │   ├── gpt.py
│   │   │   ├── hermes.py
│   │   │   ├── llama.py
│   │   │   ├── llava.py
│   │   │   ├── llm_jp.py
│   │   │   ├── ministral3.py
│   │   │   ├── ministral3_reasoning.py
│   │   │   ├── mistral.py
│   │   │   ├── nemotron.py
│   │   │   ├── oasst.py
│   │   │   ├── olmo.py
│   │   │   ├── orion.py
│   │   │   ├── phi.py
│   │   │   ├── qwen2.py
│   │   │   ├── redpajama.py
│   │   │   ├── registry.py
│   │   │   ├── rwkv.py
│   │   │   ├── stablelm.py
│   │   │   ├── tinyllama.py
│   │   │   └── wizardlm.py
│   │   ├── interface/
│   │   │   ├── __init__.py
│   │   │   ├── calibrate.py
│   │   │   ├── chat.py
│   │   │   ├── compile.py
│   │   │   ├── compiler_flags.py
│   │   │   ├── convert_weight.py
│   │   │   ├── gen_config.py
│   │   │   ├── help.py
│   │   │   ├── jit.py
│   │   │   ├── package.py
│   │   │   ├── router.py
│   │   │   └── serve.py
│   │   ├── json_ffi/
│   │   │   ├── __init__.py
│   │   │   └── engine.py
│   │   ├── libinfo.py
│   │   ├── loader/
│   │   │   ├── __init__.py
│   │   │   ├── huggingface_loader.py
│   │   │   ├── loader.py
│   │   │   ├── mapping.py
│   │   │   ├── standard_loader.py
│   │   │   ├── stats.py
│   │   │   └── utils.py
│   │   ├── model/
│   │   │   ├── __init__.py
│   │   │   ├── baichuan/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── baichuan_loader.py
│   │   │   │   └── baichuan_model.py
│   │   │   ├── bert/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── bert_loader.py
│   │   │   │   └── bert_model.py
│   │   │   ├── chatglm3/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── chatglm3_loader.py
│   │   │   │   └── chatglm3_model.py
│   │   │   ├── cohere/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── cohere_loader.py
│   │   │   │   └── cohere_model.py
│   │   │   ├── deepseek/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── deepseek_loader.py
│   │   │   │   └── deepseek_model.py
│   │   │   ├── deepseek_v2/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── deepseek_v2_loader.py
│   │   │   │   └── deepseek_v2_model.py
│   │   │   ├── eagle/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── eagle_loader.py
│   │   │   │   └── eagle_model.py
│   │   │   ├── gemma/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── gemma_loader.py
│   │   │   │   └── gemma_model.py
│   │   │   ├── gemma2/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── gemma2_loader.py
│   │   │   │   └── gemma2_model.py
│   │   │   ├── gemma3/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── gemma3_loader.py
│   │   │   │   └── gemma3_model.py
│   │   │   ├── gpt2/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── gpt2_loader.py
│   │   │   │   └── gpt2_model.py
│   │   │   ├── gpt_bigcode/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── gpt_bigcode_loader.py
│   │   │   │   └── gpt_bigcode_model.py
│   │   │   ├── gpt_j/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── gpt_j_loader.py
│   │   │   │   └── gpt_j_model.py
│   │   │   ├── gpt_neox/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── gpt_neox_loader.py
│   │   │   │   └── gpt_neox_model.py
│   │   │   ├── internlm/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── internlm_loader.py
│   │   │   │   └── internlm_model.py
│   │   │   ├── internlm2/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── internlm2_loader.py
│   │   │   │   └── internlm2_model.py
│   │   │   ├── llama/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── llama_loader.py
│   │   │   │   └── llama_model.py
│   │   │   ├── llama4/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── llama4_loader.py
│   │   │   │   └── llama4_model.py
│   │   │   ├── llava/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── llava_loader.py
│   │   │   │   └── llava_model.py
│   │   │   ├── medusa/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── medusa_loader.py
│   │   │   │   └── medusa_model.py
│   │   │   ├── minicpm/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── minicpm_loader.py
│   │   │   │   └── minicpm_model.py
│   │   │   ├── ministral3/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── ministral3_loader.py
│   │   │   │   └── ministral3_model.py
│   │   │   ├── mistral/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── mistral_loader.py
│   │   │   │   └── mistral_model.py
│   │   │   ├── mixtral/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── mixtral_loader.py
│   │   │   │   └── mixtral_model.py
│   │   │   ├── model.py
│   │   │   ├── model_preset.py
│   │   │   ├── nemotron/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── nemotron_loader.py
│   │   │   │   └── nemotron_model.py
│   │   │   ├── olmo/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── olmo_loader.py
│   │   │   │   └── olmo_model.py
│   │   │   ├── orion/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── orion_loader.py
│   │   │   │   └── orion_model.py
│   │   │   ├── phi/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── phi_loader.py
│   │   │   │   └── phi_model.py
│   │   │   ├── phi3/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── phi3_loader.py
│   │   │   │   └── phi3_model.py
│   │   │   ├── phi3v/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── phi3v_image.py
│   │   │   │   ├── phi3v_loader.py
│   │   │   │   └── phi3v_model.py
│   │   │   ├── qwen/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── qwen_loader.py
│   │   │   │   └── qwen_model.py
│   │   │   ├── qwen2/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── qwen2_loader.py
│   │   │   │   └── qwen2_model.py
│   │   │   ├── qwen2_5_vl/
│   │   │   │   ├── __init__.py
│   │   │   │   └── qwen2_5_vl_model.py
│   │   │   ├── qwen2_moe/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── qwen2_moe_loader.py
│   │   │   │   └── qwen2_moe_model.py
│   │   │   ├── qwen3/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── qwen3_loader.py
│   │   │   │   └── qwen3_model.py
│   │   │   ├── qwen3_moe/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── qwen3_moe_loader.py
│   │   │   │   └── qwen3_moe_model.py
│   │   │   ├── rwkv5/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── rwkv5_loader.py
│   │   │   │   └── rwkv5_model.py
│   │   │   ├── rwkv6/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── rwkv6_loader.py
│   │   │   │   └── rwkv6_model.py
│   │   │   ├── stable_lm/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── stablelm_loader.py
│   │   │   │   └── stablelm_model.py
│   │   │   ├── starcoder2/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── starcoder2_loader.py
│   │   │   │   └── starcoder2_model.py
│   │   │   └── vision/
│   │   │       ├── __init__.py
│   │   │       ├── clip_vision.py
│   │   │       └── image_processing.py
│   │   ├── nn/
│   │   │   ├── __init__.py
│   │   │   ├── expert.py
│   │   │   ├── kv_cache.py
│   │   │   └── rnn_state.py
│   │   ├── op/
│   │   │   ├── __init__.py
│   │   │   ├── attention.py
│   │   │   ├── batch_matmul.py
│   │   │   ├── batch_spec_verify.py
│   │   │   ├── cutlass.py
│   │   │   ├── extern.py
│   │   │   ├── ft_gemm.py
│   │   │   ├── moe_matmul.py
│   │   │   ├── moe_misc.py
│   │   │   ├── mrope.py
│   │   │   ├── pipeline_parallel.py
│   │   │   ├── top_p_pivot.py
│   │   │   └── triton.py
│   │   ├── protocol/
│   │   │   ├── __init__.py
│   │   │   ├── conversation_protocol.py
│   │   │   ├── debug_protocol.py
│   │   │   ├── error_protocol.py
│   │   │   ├── generation_config.py
│   │   │   ├── microserving_protocol.py
│   │   │   ├── mlc_chat_config.py
│   │   │   └── openai_api_protocol.py
│   │   ├── quantization/
│   │   │   ├── __init__.py
│   │   │   ├── awq_quantization.py
│   │   │   ├── block_scale_quantization.py
│   │   │   ├── fp8_quantization.py
│   │   │   ├── ft_quantization.py
│   │   │   ├── group_quantization.py
│   │   │   ├── model_quantization.py
│   │   │   ├── no_quantization.py
│   │   │   ├── per_tensor_quantization.py
│   │   │   ├── quantization.py
│   │   │   └── utils.py
│   │   ├── router/
│   │   │   ├── __init__.py
│   │   │   └── router.py
│   │   ├── serve/
│   │   │   ├── __init__.py
│   │   │   ├── _ffi_api.py
│   │   │   ├── config.py
│   │   │   ├── data.py
│   │   │   ├── embedding_engine.py
│   │   │   ├── engine.py
│   │   │   ├── engine_base.py
│   │   │   ├── engine_utils.py
│   │   │   ├── entrypoints/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── debug_entrypoints.py
│   │   │   │   ├── metrics_entrypoints.py
│   │   │   │   ├── microserving_entrypoints.py
│   │   │   │   └── openai_entrypoints.py
│   │   │   ├── event_trace_recorder.py
│   │   │   ├── radix_tree.py
│   │   │   ├── request.py
│   │   │   ├── server/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── popen_server.py
│   │   │   │   └── server_context.py
│   │   │   └── sync_engine.py
│   │   ├── support/
│   │   │   ├── __init__.py
│   │   │   ├── argparse.py
│   │   │   ├── auto_config.py
│   │   │   ├── auto_device.py
│   │   │   ├── auto_target.py
│   │   │   ├── auto_weight.py
│   │   │   ├── config.py
│   │   │   ├── constants.py
│   │   │   ├── convert_tiktoken.py
│   │   │   ├── download_cache.py
│   │   │   ├── logging.py
│   │   │   ├── max_thread_check.py
│   │   │   ├── preshard.py
│   │   │   ├── random.py
│   │   │   ├── style.py
│   │   │   ├── tensor_parallel.py
│   │   │   └── tqdm.py
│   │   ├── testing/
│   │   │   ├── __init__.py
│   │   │   ├── debug_chat.py
│   │   │   ├── debug_compare.py
│   │   │   └── pytest_utils.py
│   │   └── tokenizers/
│   │       ├── __init__.py
│   │       ├── _ffi_api.py
│   │       ├── streamer.py
│   │       └── tokenizers.py
│   ├── requirements.txt
│   └── setup.py
├── scripts/
│   ├── build_mlc_for_docs.sh
│   ├── build_site.sh
│   ├── check_url_validity.py
│   ├── gh_deploy_site.sh
│   └── local_deploy_site.sh
├── site/
│   ├── .gitignore
│   ├── CNAME
│   ├── Gemfile
│   ├── _config.yml
│   ├── _includes/
│   │   ├── head.html
│   │   └── hero.html
│   ├── assets/
│   │   └── css/
│   │       └── hero.scss
│   ├── index.md
│   └── privacy.md
├── tests/
│   ├── README.md
│   ├── cpp/
│   │   └── conv_template_unittest.cc
│   └── python/
│       ├── __init__.py
│       ├── compiler_pass/
│       │   └── test_fuse_ft_dequantize_matmul_epilogue.py
│       ├── conftest.py
│       ├── conversation_template/
│       │   ├── test_conversation_protocol.py
│       │   └── test_llama_template.py
│       ├── integration/
│       │   └── test_model_compile.py
│       ├── json_ffi/
│       │   ├── test_json_ffi_engine.py
│       │   ├── test_json_ffi_engine_image.py
│       │   └── test_json_ffi_engine_mock.py
│       ├── loader/
│       │   ├── test_awq.py
│       │   └── test_huggingface.py
│       ├── model/
│       │   ├── test_gemma3.py
│       │   ├── test_gpt2.py
│       │   ├── test_gptNeox.py
│       │   ├── test_kv_cache.py
│       │   ├── test_llama.py
│       │   ├── test_llama_quantization.py
│       │   ├── test_mistral.py
│       │   ├── test_phi.py
│       │   └── test_qwen3_embedding.py
│       ├── op/
│       │   ├── test_batch_spec_verify.py
│       │   ├── test_fp8_block_matmul.py
│       │   ├── test_mrope.py
│       │   ├── test_top_p_pivot.py
│       │   ├── test_tree_attn.py
│       │   └── test_two_stage_softmax.py
│       ├── quantization/
│       │   ├── test_awq_quantization.py
│       │   └── test_group_quantization.py
│       ├── router/
│       │   └── test_router.py
│       ├── serve/
│       │   ├── evaluate_engine.py
│       │   ├── server/
│       │   │   ├── conftest.py
│       │   │   ├── test_embedding_server.py
│       │   │   ├── test_server.py
│       │   │   ├── test_server_function_call.py
│       │   │   └── test_server_image.py
│       │   ├── test_embedding_engine.py
│       │   ├── test_event_trace_recorder.py
│       │   ├── test_radix_tree.py
│       │   ├── test_serve_async_engine.py
│       │   ├── test_serve_async_engine_spec.py
│       │   ├── test_serve_engine.py
│       │   ├── test_serve_engine_grammar.py
│       │   ├── test_serve_engine_image.py
│       │   ├── test_serve_engine_mock.py
│       │   ├── test_serve_engine_prefix_cache.py
│       │   ├── test_serve_engine_rnn.py
│       │   ├── test_serve_engine_spec.py
│       │   └── test_serve_sync_engine.py
│       ├── support/
│       │   ├── test_auto_config.py
│       │   ├── test_auto_weight.py
│       │   ├── test_cli_convert_weight.py
│       │   └── test_convert_weight_lora_merge.py
│       └── tokenizers/
│           └── test_streamer.py
├── version.py
└── web/
    ├── Makefile
    ├── README.md
    ├── emcc/
    │   └── mlc_wasm_runtime.cc
    └── prep_emcc_deps.sh

Download .txt

Showing preview only (295K chars total). Download the full file or copy to clipboard to get everything.

SYMBOL INDEX (3167 symbols across 352 files)

FILE: android/MLCChat/bundle_weight.py
  function main (line 12) | def main(apk_path: Path, package_output_path: Path):
  function _parse_apk_path (line 42) | def _parse_apk_path(path: str) -> Path:

FILE: android/MLCEngineExample/bundle_weight.py
  function main (line 12) | def main(apk_path: Path, package_output_path: Path):
  function _parse_apk_path (line 42) | def _parse_apk_path(path: str) -> Path:

FILE: android/mlc4j/prepare_libs.py
  function run_cmake (line 16) | def run_cmake(mlc4j_path: Path):
  function run_cmake_build (line 56) | def run_cmake_build():
  function run_cmake_install (line 71) | def run_cmake_install():
  function main (line 86) | def main(mlc_llm_source_dir: Path):

FILE: android/mlc4j/src/cpp/tvm_runtime.h
  function namespace (line 49) | namespace tvm {

FILE: android/mlc4j/src/main/java/ai/mlc/mlcllm/JSONFFIEngine.java
  class JSONFFIEngine (line 9) | public class JSONFFIEngine {
    method JSONFFIEngine (line 23) | public JSONFFIEngine() {
    method initBackgroundEngine (line 39) | public void initBackgroundEngine(KotlinFunction callback) {
    method reload (line 55) | public void reload(String engineConfigJSONStr) {
    method chatCompletion (line 59) | public void chatCompletion(String requestJSONStr, String requestId) {
    method runBackgroundLoop (line 63) | public void runBackgroundLoop() {
    method runBackgroundStreamBackLoop (line 67) | public void runBackgroundStreamBackLoop() {
    method exitBackgroundLoop (line 71) | public void exitBackgroundLoop() {
    method unload (line 75) | public void unload() {
    type KotlinFunction (line 79) | public interface KotlinFunction {
      method invoke (line 80) | void invoke(String arg);
    method reset (line 83) | public void reset() {

FILE: cpp/json_ffi/conv_template.cc
  type mlc (line 8) | namespace mlc {
    type llm (line 9) | namespace llm {
      type json_ffi (line 10) | namespace json_ffi {
        function ModelVisionConfig (line 16) | ModelVisionConfig ModelVisionConfig::FromJSON(const tvm::ffi::json...
        function ModelConfig (line 85) | ModelConfig ModelConfig::FromJSON(const tvm::ffi::json::Object& js...
        function MessagePlaceholders (line 147) | MessagePlaceholders MessagePlaceholderFromString(const std::string...
        function TryGetFunctionCallingString (line 193) | Result<std::optional<std::string>> TryGetFunctionCallingString(
        function CreatePrompt (line 224) | Result<std::vector<Data>> CreatePrompt(const Conversation& conv,

FILE: cpp/json_ffi/conv_template.h
  function namespace (line 21) | namespace llm {

FILE: cpp/json_ffi/image_utils.cc
  type mlc (line 9) | namespace mlc {
    type llm (line 10) | namespace llm {
      type json_ffi (line 11) | namespace json_ffi {
        class MemoryBufferStream (line 15) | class MemoryBufferStream : public tvm::support::Stream {
          method MemoryBufferStream (line 20) | MemoryBufferStream(const char* data, size_t size) : data_(data),...
          method Read (line 22) | size_t Read(void* ptr, size_t size) override {
          method Write (line 35) | size_t Write(const void* ptr, size_t size) override {
        function Base64DecodedSize (line 46) | size_t Base64DecodedSize(const std::string& base64_str) {
        function LoadImageFromBase64 (line 58) | Result<Tensor> LoadImageFromBase64(const std::string& base64_str) {
        function Tensor (line 78) | Tensor ClipPreprocessor(Tensor image_data, int target_size, DLDevi...

FILE: cpp/json_ffi/image_utils.h
  function namespace (line 16) | namespace mlc {

FILE: cpp/json_ffi/json_ffi_engine.cc
  type mlc (line 15) | namespace mlc {
    type llm (line 16) | namespace llm {
      type json_ffi (line 17) | namespace json_ffi {
        class JSONFFIEngineImpl (line 154) | class JSONFFIEngineImpl : public JSONFFIEngine, public ffi::Module...
          method InitBackgroundEngine (line 170) | void InitBackgroundEngine(int device_type, int device_id,
          method Reload (line 189) | void Reload(String engine_config_json_str) {
          method Unload (line 209) | void Unload() { this->engine_->Unload(); }
          method Reset (line 211) | void Reset() { this->engine_->Reset(); }
          method RunBackgroundLoop (line 213) | void RunBackgroundLoop() { this->engine_->RunBackgroundLoop(); }
          method RunBackgroundStreamBackLoop (line 215) | void RunBackgroundStreamBackLoop() { this->engine_->RunBackgroun...
          method String (line 217) | String GetResponseFromStreamOutput(Array<RequestStreamOutput> de...
        function TVM_FFI_STATIC_INIT_BLOCK (line 299) | TVM_FFI_STATIC_INIT_BLOCK() {

FILE: cpp/json_ffi/json_ffi_engine.h
  function namespace (line 16) | namespace mlc {

FILE: cpp/json_ffi/openai_api_protocol.cc
  type mlc (line 10) | namespace mlc {
    type llm (line 11) | namespace llm {
      type json_ffi (line 12) | namespace json_ffi {

FILE: cpp/json_ffi/openai_api_protocol.h
  function namespace (line 22) | namespace llm {

FILE: cpp/metadata/model.cc
  type mlc (line 7) | namespace mlc {
    type llm (line 8) | namespace llm {
      function ModelMetadata (line 76) | ModelMetadata ModelMetadata::FromJSON(const tvm::ffi::json::Object& ...
      function ModelMetadata (line 139) | ModelMetadata ModelMetadata::FromModule(Module module, const tvm::ff...

FILE: cpp/metadata/model.h
  function namespace (line 18) | namespace llm {

FILE: cpp/multi_gpu/builtin.cc
  type mlc (line 18) | namespace mlc {
    type llm (line 19) | namespace llm {
      type multi_gpu (line 20) | namespace multi_gpu {
        function ObjectRef (line 28) | ObjectRef DispatchFunctionByGroup(tvm::ffi::AnyView vm_arg,
        function ObjectRef (line 59) | ObjectRef SendFromLastGroupToWorker0(Tensor send, Optional<Tensor>...
        function TVM_FFI_STATIC_INIT_BLOCK (line 90) | TVM_FFI_STATIC_INIT_BLOCK() {

FILE: cpp/multi_gpu/multi_gpu_loader.cc
  type mlc (line 29) | namespace mlc {
    type llm (line 30) | namespace llm {
      type multi_gpu (line 31) | namespace multi_gpu {
        class RangeTimer (line 42) | class RangeTimer {
          method RangeTimer (line 44) | explicit RangeTimer(DurationType* result)
        class PreprocessorPool (line 59) | class PreprocessorPool {
          method PreprocessorPool (line 61) | explicit PreprocessorPool(const ModelMetadata& model_metadata, M...
          method Tensor (line 79) | Tensor Apply(Tensor param, const ModelMetadata::Param& param_inf...
        type ParamInfo (line 96) | struct ParamInfo {
        function Tensor (line 101) | Tensor RecvFromGlobalWorker0(Device device, const ModelMetadata::P...
        function Tensor (line 108) | Tensor BroadcastOrShardAndScatter(Tensor param, const ModelMetadat...
        function Tensor (line 127) | Tensor ReceiveBroadcastedOrSharded(Device device, const ModelMetad...
        function FormatDuration (line 143) | std::string FormatDuration(DurationType duration) {
        function LoadMultiGPU (line 150) | Array<Optional<Tensor>> LoadMultiGPU(const std::string& model_path...
        function LoadMultiGPUPresharded (line 250) | Array<Optional<Tensor>> LoadMultiGPUPresharded(const std::string& ...
        function TVM_FFI_STATIC_INIT_BLOCK (line 314) | TVM_FFI_STATIC_INIT_BLOCK() {

FILE: cpp/serve/config.cc
  type mlc (line 18) | namespace mlc {
    type llm (line 19) | namespace llm {
      type serve (line 20) | namespace serve {
        function TVM_FFI_STATIC_INIT_BLOCK (line 22) | TVM_FFI_STATIC_INIT_BLOCK() {
        function TotalDetectGlobalMemory (line 27) | uint64_t TotalDetectGlobalMemory(DLDevice device) {
        function GenerationConfig (line 365) | GenerationConfig GenerationConfig::GetDefaultFromModelConfig(
        function EngineConfig (line 423) | EngineConfig EngineConfig::FromJSONAndInferredConfig(
        function String (line 511) | String EngineConfigNode::AsJSONString() const {
        type ModelConfigLimits (line 550) | struct ModelConfigLimits {
        function BytesToMegabytesString (line 560) | inline std::string BytesToMegabytesString(double bytes) {
        function GetModelConfigLimits (line 570) | Result<ModelConfigLimits> GetModelConfigLimits(
        type MemUsageEstimationResult (line 646) | struct MemUsageEstimationResult {
        function EstimateMemoryUsageOnMode (line 653) | Result<MemUsageEstimationResult> EstimateMemoryUsageOnMode(
        function ModelsUseKVCache (line 1065) | Result<bool> ModelsUseKVCache(const std::vector<tvm::ffi::json::Ob...

FILE: cpp/serve/config.h
  type ResponseFormat (line 34) | struct ResponseFormat {
  type class (line 50) | enum class
  type class (line 55) | enum class
  type class (line 63) | enum class
  function class (line 72) | class DisaggConfig {
  function class (line 94) | class DebugConfig {
  function class (line 117) | class GenerationConfigNode : public Object {
  function class (line 149) | class GenerationConfig : public ObjectRef {
  type class (line 192) | enum class
  type class (line 199) | enum class
  function SpeculativeMode (line 207) | enum class SpeculativeMode : int {

FILE: cpp/serve/data.cc
  type mlc (line 12) | namespace mlc {
    type llm (line 13) | namespace llm {
      type serve (line 14) | namespace serve {
        function TVM_FFI_STATIC_INIT_BLOCK (line 16) | TVM_FFI_STATIC_INIT_BLOCK() {
        function SplitData (line 26) | std::pair<Array<Data>, Array<Data>> SplitData(const Array<Data>& o...
        function ObjectRef (line 78) | ObjectRef TextDataNode::GetEmbedding(Model model, ObjectRef* dst, ...
        function TVM_FFI_STATIC_INIT_BLOCK (line 83) | TVM_FFI_STATIC_INIT_BLOCK() {
        function ObjectRef (line 106) | ObjectRef TokenDataNode::GetEmbedding(Model model, ObjectRef* dst,...
        function TVM_FFI_STATIC_INIT_BLOCK (line 110) | TVM_FFI_STATIC_INIT_BLOCK() {
        function ObjectRef (line 136) | ObjectRef ImageDataNode::GetEmbedding(Model model, ObjectRef* dst,...
        function TVM_FFI_STATIC_INIT_BLOCK (line 140) | TVM_FFI_STATIC_INIT_BLOCK() {
        function TokenToLogProbJSON (line 151) | inline void TokenToLogProbJSON(const Tokenizer& tokenizer, const T...
        function RequestStreamOutput (line 226) | RequestStreamOutput RequestStreamOutput::Usage(String request_id,
        function TVM_FFI_STATIC_INIT_BLOCK (line 234) | TVM_FFI_STATIC_INIT_BLOCK() {

FILE: cpp/serve/data.h
  function namespace (line 23) | namespace mlc {

FILE: cpp/serve/draft_token_workspace_manager.cc
  type mlc (line 10) | namespace mlc {
    type llm (line 11) | namespace llm {
      type serve (line 12) | namespace serve {
        function TVM_FFI_STATIC_INIT_BLOCK (line 14) | TVM_FFI_STATIC_INIT_BLOCK() { DraftTokenWorkspaceManagerObj::Regis...

FILE: cpp/serve/draft_token_workspace_manager.h
  function namespace (line 17) | namespace mlc {

FILE: cpp/serve/engine.cc
  type mlc (line 39) | namespace mlc {
    type llm (line 40) | namespace llm {
      type serve (line 41) | namespace serve {
        class EngineModule (line 47) | class EngineModule
          method Init (line 1043) | void Init(const std::string& engine_config_json_str, Device device,
          method Create (line 1054) | static ffi::Module Create() { return ffi::Module(tvm::ffi::make_...
          method AddRequest (line 1056) | void AddRequest(Request request) { return GetEngine()->AddReques...
          method Abort (line 1058) | void Abort(const String& request_id) { return GetEngine()->Abort...
          method Request (line 1060) | Request CreateRequest(String id, Array<Data> inputs, String gene...
          method Step (line 1067) | void Step() { return GetEngine()->Step(); }
          method FRequestStreamCallback (line 1069) | FRequestStreamCallback GetRequestStreamCallback() {
          method SetRequestStreamCallback (line 1073) | void SetRequestStreamCallback(FRequestStreamCallback request_str...
          method Reset (line 1077) | void Reset() { return GetEngine()->Reset(); }
          method String (line 1080) | String JSONMetrics() { return GetEngine()->JSONMetrics(); }
          method Engine (line 1083) | Engine* GetEngine() {
        function GetTokenizerInfo (line 50) | inline std::optional<TokenizerInfo> GetTokenizerInfo(const tvm::ff...
        function GetEnvSocketHostPort (line 72) | inline std::pair<std::optional<std::string>, int> GetEnvSocketHost...
        function StreamBackErrorImpl (line 86) | void StreamBackErrorImpl(Request request, FRequestStreamCallback r...
        function AbortRequestImpl (line 104) | void AbortRequestImpl(EngineState estate, const Array<Model>& mode...
        class MockEchoEngineImpl (line 158) | class MockEchoEngineImpl : public Engine {
          method Create (line 160) | static Result<EngineCreationOutput> Create(const std::string& en...
          method Reset (line 188) | void Reset() final {}
          method Empty (line 190) | bool Empty() final { return request_map_.empty(); }
          method SetRequestStreamCallback (line 192) | void SetRequestStreamCallback(FRequestStreamCallback request_str...
          method FRequestStreamCallback (line 196) | FRequestStreamCallback GetRequestStreamCallback() final { return...
          method AddRequest (line 198) | void AddRequest(Request request) final {
          method AbortRequest (line 260) | void AbortRequest(const String& request_id) {
          method AbortAllRequests (line 283) | void AbortAllRequests() final {
          method Step (line 294) | void Step() final {
          method String (line 321) | String JSONMetrics() final { return "{}"; }
          method DebugCallFuncOnAllAllWorker (line 324) | void DebugCallFuncOnAllAllWorker(const String& func_name, Option...
          type MockRequestState (line 327) | struct MockRequestState {
        class EngineImpl (line 344) | class EngineImpl : public Engine {
          method Create (line 350) | static Result<EngineCreationOutput> Create(const std::string& en...
          method Reset (line 505) | void Reset() final {
          method Empty (line 513) | bool Empty() final { return estate_->running_queue.empty() && es...
          method String (line 515) | String JSONMetrics() final { return tvm::ffi::json::Stringify(es...
          method FRequestStreamCallback (line 517) | FRequestStreamCallback GetRequestStreamCallback() final {
          method SetRequestStreamCallback (line 521) | void SetRequestStreamCallback(FRequestStreamCallback request_str...
          method StreamBackError (line 526) | void StreamBackError(Request request, String finish_reason) {
          method HandleSpecialRequests (line 532) | void HandleSpecialRequests(Request request) {
          method HandleDisaggRequest (line 550) | bool HandleDisaggRequest(Request request) {
          method AddRequest (line 665) | void AddRequest(Request request) final {
          method AbortRequest (line 727) | void AbortRequest(const String& request_id) final {
          method AbortAllRequests (line 731) | void AbortAllRequests() final {
          method Step (line 746) | void Step() final {
          method CreateDiscoSession (line 769) | std::tuple<Optional<Session>, int, std::vector<int>> CreateDisco...
          method DebugCallFuncOnAllAllWorker (line 884) | void DebugCallFuncOnAllAllWorker(const String& func_name, Option...
          method AutoDecideEngineConfig (line 890) | Result<EngineConfig> AutoDecideEngineConfig(
          method SetThreadMaxConcurrency (line 964) | void SetThreadMaxConcurrency() {
          method GetGrammarFromResponseFormat (line 978) | std::optional<xgrammar::CompiledGrammar> GetGrammarFromResponseF...
        function ClearGlobalMemoryManager (line 1022) | void ClearGlobalMemoryManager() {
        class EngineModule (line 1028) | class EngineModule : public ffi::ModuleObj {
          method Init (line 1043) | void Init(const std::string& engine_config_json_str, Device device,
          method Create (line 1054) | static ffi::Module Create() { return ffi::Module(tvm::ffi::make_...
          method AddRequest (line 1056) | void AddRequest(Request request) { return GetEngine()->AddReques...
          method Abort (line 1058) | void Abort(const String& request_id) { return GetEngine()->Abort...
          method Request (line 1060) | Request CreateRequest(String id, Array<Data> inputs, String gene...
          method Step (line 1067) | void Step() { return GetEngine()->Step(); }
          method FRequestStreamCallback (line 1069) | FRequestStreamCallback GetRequestStreamCallback() {
          method SetRequestStreamCallback (line 1073) | void SetRequestStreamCallback(FRequestStreamCallback request_str...
          method Reset (line 1077) | void Reset() { return GetEngine()->Reset(); }
          method String (line 1080) | String JSONMetrics() { return GetEngine()->JSONMetrics(); }
          method Engine (line 1083) | Engine* GetEngine() {
        function TVM_FFI_STATIC_INIT_BLOCK (line 1092) | TVM_FFI_STATIC_INIT_BLOCK() {

FILE: cpp/serve/engine.h
  function namespace (line 15) | namespace mlc {

FILE: cpp/serve/engine_actions/action.cc
  type mlc (line 8) | namespace mlc {
    type llm (line 9) | namespace llm {
      type serve (line 10) | namespace serve {
        function TVM_FFI_STATIC_INIT_BLOCK (line 12) | TVM_FFI_STATIC_INIT_BLOCK() { EngineActionObj::RegisterReflection(...

FILE: cpp/serve/engine_actions/action.h
  function namespace (line 18) | namespace mlc {

FILE: cpp/serve/engine_actions/action_commons.cc
  type mlc (line 10) | namespace mlc {
    type llm (line 11) | namespace llm {
      type serve (line 12) | namespace serve {
        function CreateEngineActions (line 14) | Array<EngineAction> CreateEngineActions(Array<Model> models, Engin...
        function RemoveRequestFromModel (line 137) | void RemoveRequestFromModel(EngineState estate, int64_t req_intern...
        function RemoveRequestStateEntry (line 151) | void RemoveRequestStateEntry(EngineState estate, const Array<Model...
        function ProcessFinishedRequestStateEntries (line 175) | void ProcessFinishedRequestStateEntries(
        function ActionStepPostProcess (line 238) | void ActionStepPostProcess(Array<Request> requests, EngineState es...
        function RequestStateEntry (line 331) | RequestStateEntry PreemptLastRunningRequestStateEntry(
        function ApplyLogitProcessorAndSample (line 427) | std::pair<Tensor, std::vector<SampleResult>> ApplyLogitProcessorAn...

FILE: cpp/serve/engine_actions/action_commons.h
  function namespace (line 19) | namespace mlc {

FILE: cpp/serve/engine_actions/auto_spec_decode.cc
  type mlc (line 13) | namespace mlc {
    type llm (line 14) | namespace llm {
      type serve (line 15) | namespace serve {
        class AutoSpecDecodeActionObj (line 21) | class AutoSpecDecodeActionObj : public EngineActionObj {
          method AutoSpecDecodeActionObj (line 23) | explicit AutoSpecDecodeActionObj(Array<EngineAction> spec_decode...
          method Step (line 30) | Array<Request> Step(EngineState estate) final {
          method CalculateDraftLength (line 54) | int CalculateDraftLength(EngineState estate, int num_running_rse...
        function EngineAction (line 80) | EngineAction EngineAction::AutoSpecDecode(std::vector<EngineAction...

FILE: cpp/serve/engine_actions/batch_decode.cc
  type mlc (line 17) | namespace mlc {
    type llm (line 18) | namespace llm {
      type serve (line 19) | namespace serve {
        class BatchDecodeActionObj (line 29) | class BatchDecodeActionObj : public EngineActionObj {
          method BatchDecodeActionObj (line 31) | explicit BatchDecodeActionObj(Array<Model> models, Tokenizer tok...
          method Step (line 42) | Array<Request> Step(EngineState estate) final {
          method CanDecode (line 203) | bool CanDecode(int num_rsentries) {
          method RetokenizeWithNewToken (line 215) | std::pair<int, std::vector<int32_t>> RetokenizeWithNewToken(Requ...
          method CommitTokenMayRetokenize (line 254) | void CommitTokenMayRetokenize(RequestStateEntry rsentry, Request...
        function EngineAction (line 316) | EngineAction EngineAction::BatchDecode(Array<Model> models, Tokeni...

FILE: cpp/serve/engine_actions/batch_draft.cc
  type mlc (line 14) | namespace mlc {
    type llm (line 15) | namespace llm {
      type serve (line 16) | namespace serve {
        class BatchDraftActionObj (line 23) | class BatchDraftActionObj : public EngineActionObj {
          method BatchDraftActionObj (line 25) | explicit BatchDraftActionObj(Array<Model> models, LogitProcessor...
          method Step (line 38) | Array<Request> Step(EngineState estate) final {
          method CanDecode (line 304) | bool CanDecode(int num_rsentries) {
          method PrefillLaggedTokensByChunk (line 316) | void PrefillLaggedTokensByChunk(const Array<RequestModelState>& ...
        function EngineAction (line 395) | EngineAction EngineAction::BatchDraft(Array<Model> models, LogitPr...

FILE: cpp/serve/engine_actions/batch_jumpforward.cc
  type mlc (line 18) | namespace mlc {
    type llm (line 19) | namespace llm {
      type serve (line 20) | namespace serve {
        class BatchJumpForwardActionObj (line 27) | class BatchJumpForwardActionObj : public EngineActionObj {
          method BatchJumpForwardActionObj (line 29) | explicit BatchJumpForwardActionObj(Array<Model> models, Tokenize...
          method Step (line 35) | Array<Request> Step(EngineState estate) final {
          method CheckMemForJumpForward (line 103) | bool CheckMemForJumpForward(int num_rsentries) {
          method CanJumpForward (line 111) | bool CanJumpForward(const RequestStateEntry& rsentry) {
          method RetokenizeWithNewString (line 133) | std::tuple<int, std::vector<int32_t>, std::string> RetokenizeWit...
          method HandleRollback (line 188) | void HandleRollback(const RequestStateEntry& rsentry, RequestMod...
        function EngineAction (line 231) | EngineAction EngineAction::BatchJumpForward(Array<Model> models, T...

FILE: cpp/serve/engine_actions/batch_prefill_base.cc
  type mlc (line 12) | namespace mlc {
    type llm (line 13) | namespace llm {
      type serve (line 14) | namespace serve {
        function HasPrefillSpace (line 16) | bool HasPrefillSpace(int num_required_pages, bool sliding_window_e...

FILE: cpp/serve/engine_actions/batch_prefill_base.h
  function namespace (line 13) | namespace mlc {

FILE: cpp/serve/engine_actions/batch_verify.cc
  type mlc (line 19) | namespace mlc {
    type llm (line 20) | namespace llm {
      type serve (line 21) | namespace serve {
        class BatchVerifyActionObj (line 28) | class BatchVerifyActionObj : public EngineActionObj {
          method BatchVerifyActionObj (line 30) | explicit BatchVerifyActionObj(Array<Model> models, LogitProcesso...
          method Step (line 44) | Array<Request> Step(EngineState estate) final {
          type DraftRequestStateEntries (line 277) | struct DraftRequestStateEntries {
          method DraftRequestStateEntries (line 292) | DraftRequestStateEntries GetDraftsToVerify(EngineState estate) {
          method CanVerify (line 337) | bool CanVerify(int num_required_pages) {
        function EngineAction (line 369) | EngineAction EngineAction::BatchVerify(Array<Model> models, LogitP...

FILE: cpp/serve/engine_actions/disagg_prepare_recv.cc
  type mlc (line 12) | namespace mlc {
    type llm (line 13) | namespace llm {
      type serve (line 14) | namespace serve {
        class DisaggPrepareReceiveActionObj (line 21) | class DisaggPrepareReceiveActionObj : public BatchPrefillBaseActio...
          method DisaggPrepareReceiveActionObj (line 23) | explicit DisaggPrepareReceiveActionObj(Array<Model> models, Engi...
          method Step (line 34) | Array<Request> Step(EngineState estate) final {
          method GetRequestStateEntriesToPrefill (line 186) | std::optional<PrefillInput> GetRequestStateEntriesToPrefill(Engi...
          method CanPrefill (line 324) | bool CanPrefill(EngineState estate, int num_prefill_rsentries, i...
          method MatchPrefixCache (line 354) | int MatchPrefixCache(EngineState estate, PrefillInput* input) fi...
        function EngineAction (line 432) | EngineAction EngineAction::DisaggPrepareReceive(Array<Model> model...

FILE: cpp/serve/engine_actions/disagg_remote_send.cc
  type mlc (line 9) | namespace mlc {
    type llm (line 10) | namespace llm {
      type serve (line 11) | namespace serve {
        class DisaggRemoteSendActionObj (line 19) | class DisaggRemoteSendActionObj : public BatchPrefillBaseActionObj {
          method DisaggRemoteSendActionObj (line 21) | explicit DisaggRemoteSendActionObj(Array<Model> models,
          method Step (line 40) | Array<Request> Step(EngineState estate) final {
          method GetRequestStateEntriesToPrefill (line 174) | std::vector<PrefillInput> GetRequestStateEntriesToPrefill(Engine...
          method MatchPrefixCache (line 385) | int MatchPrefixCache(EngineState estate, PrefillInput* input) fi...
        function EngineAction (line 487) | EngineAction EngineAction::DisaggRemoteSend(

FILE: cpp/serve/engine_actions/eagle_batch_draft.cc
  type mlc (line 14) | namespace mlc {
    type llm (line 15) | namespace llm {
      type serve (line 16) | namespace serve {
        class EagleBatchDraftActionObj (line 23) | class EagleBatchDraftActionObj : public EngineActionObj {
          method EagleBatchDraftActionObj (line 25) | explicit EagleBatchDraftActionObj(Array<Model> models, LogitProc...
          method Step (line 38) | Array<Request> Step(EngineState estate) final {
          method CanDecode (line 190) | bool CanDecode(int num_rsentries) {
        function EngineAction (line 220) | EngineAction EngineAction::EagleBatchDraft(Array<Model> models, Lo...

FILE: cpp/serve/engine_actions/eagle_batch_verify.cc
  type mlc (line 19) | namespace mlc {
    type llm (line 20) | namespace llm {
      type serve (line 21) | namespace serve {
        class EagleBatchVerifyActionObj (line 28) | class EagleBatchVerifyActionObj : public EngineActionObj {
          method EagleBatchVerifyActionObj (line 30) | explicit EagleBatchVerifyActionObj(Array<Model> models, LogitPro...
          method Step (line 44) | Array<Request> Step(EngineState estate) final {
          type DraftRequestStateEntries (line 347) | struct DraftRequestStateEntries {
          method DraftRequestStateEntries (line 362) | DraftRequestStateEntries GetDraftsToVerify(EngineState estate) {
          method CanVerify (line 397) | bool CanVerify(int num_required_pages) {
          method UpdateRequestStatesWithDraftProposals (line 402) | void UpdateRequestStatesWithDraftProposals(const Array<RequestMo...
        function EngineAction (line 447) | EngineAction EngineAction::EagleBatchVerify(

FILE: cpp/serve/engine_actions/eagle_new_request_prefill.cc
  type mlc (line 9) | namespace mlc {
    type llm (line 10) | namespace llm {
      type serve (line 11) | namespace serve {
        class EagleNewRequestPrefillActionObj (line 17) | class EagleNewRequestPrefillActionObj : public BatchPrefillBaseAct...
          method EagleNewRequestPrefillActionObj (line 19) | explicit EagleNewRequestPrefillActionObj(Array<Model> models, Lo...
          method Step (line 33) | Array<Request> Step(EngineState estate) final {
          method UpdateRequestStatesWithDraftProposals (line 344) | void UpdateRequestStatesWithDraftProposals(
          method MatchPrefixCache (line 393) | int MatchPrefixCache(EngineState estate, PrefillInput* input) fi...
        function EngineAction (line 485) | EngineAction EngineAction::EagleNewRequestPrefill(

FILE: cpp/serve/engine_actions/new_request_prefill.cc
  type mlc (line 9) | namespace mlc {
    type llm (line 10) | namespace llm {
      type serve (line 11) | namespace serve {
        class NewRequestPrefillActionObj (line 17) | class NewRequestPrefillActionObj : public BatchPrefillBaseActionObj {
          method NewRequestPrefillActionObj (line 19) | explicit NewRequestPrefillActionObj(Array<Model> models, LogitPr...
          method Step (line 30) | Array<Request> Step(EngineState estate) final {
          method MatchPrefixCache (line 280) | int MatchPrefixCache(EngineState estate, PrefillInput* input) fi...
        function EngineAction (line 352) | EngineAction EngineAction::NewRequestPrefill(Array<Model> models, ...

FILE: cpp/serve/engine_state.cc
  type mlc (line 7) | namespace mlc {
    type llm (line 8) | namespace llm {
      type serve (line 9) | namespace serve {
        function TVM_FFI_STATIC_INIT_BLOCK (line 11) | TVM_FFI_STATIC_INIT_BLOCK() { EngineStateObj::RegisterReflection(); }
        function RequestState (line 28) | RequestState EngineStateObj::GetRequestState(Request request) {

FILE: cpp/serve/engine_state.h
  function namespace (line 16) | namespace mlc {

FILE: cpp/serve/event_trace_recorder.cc
  type mlc (line 19) | namespace mlc {
    type llm (line 20) | namespace llm {
      type serve (line 21) | namespace serve {
        type detail (line 25) | namespace detail {
          type PairHash (line 27) | struct PairHash {
        class EventTraceRecorderImpl (line 39) | class EventTraceRecorderImpl : public EventTraceRecorderObj {
          method AddEvent (line 41) | void AddEvent(const String& request_id, const std::string& event...
          method AddEvent (line 52) | void AddEvent(const Array<String>& request_ids, const std::strin...
          method DumpJSON (line 65) | std::string DumpJSON() final {
          method AddEventInternal (line 124) | void AddEventInternal(const std::string& request_id, const std::...
        function EventTraceRecorder (line 146) | EventTraceRecorder EventTraceRecorder::Create() {
        function TVM_FFI_STATIC_INIT_BLOCK (line 150) | TVM_FFI_STATIC_INIT_BLOCK() {

FILE: cpp/serve/event_trace_recorder.h
  function namespace (line 16) | namespace mlc {

FILE: cpp/serve/function_table.cc
  type mlc (line 24) | namespace mlc {
    type llm (line 25) | namespace llm {
      type serve (line 26) | namespace serve {
        function GetDiscoWorkerCPUBinding (line 28) | Optional<IntTuple> GetDiscoWorkerCPUBinding(int num_workers) {
        function Function (line 53) | Function FunctionTable::SessionFuncAsPackedFunc(Session sess, DRef...
        function ObjectRef (line 155) | ObjectRef FunctionTable::LoadParams(const std::string& model_path,...
        function ObjectRef (line 294) | ObjectRef FunctionTable::Empty(Shape shape, DataType dtype, Device...
        function ObjectRef (line 305) | ObjectRef FunctionTable::CopyToWorker0(const Tensor& host_array, S...

FILE: cpp/serve/function_table.h
  function namespace (line 23) | namespace mlc {

FILE: cpp/serve/logit_processor.cc
  type mlc (line 13) | namespace mlc {
    type llm (line 14) | namespace llm {
      type serve (line 15) | namespace serve {
        function CopyArray (line 17) | inline void CopyArray(Tensor src, Tensor dst, TVMStreamHandle copy...
        function SyncCopyStream (line 22) | inline void SyncCopyStream(Device device, TVMStreamHandle compute_...
        function TVM_FFI_STATIC_INIT_BLOCK (line 34) | TVM_FFI_STATIC_INIT_BLOCK() { LogitProcessorObj::RegisterReflectio...
        class LogitProcessorImpl (line 36) | class LogitProcessorImpl : public LogitProcessorObj {
          method LogitProcessorImpl (line 39) | explicit LogitProcessorImpl(int max_num_token, int vocab_size, F...
          method InplaceUpdateLogits (line 99) | void InplaceUpdateLogits(Tensor logits,                         ...
          method Tensor (line 153) | Tensor ComputeProbsFromLogits(Tensor logits, const Array<Generat...
          method UpdateWithLogitBias (line 212) | void UpdateWithLogitBias(Tensor logits, const Array<GenerationCo...
          method UpdateWithPenalty (line 269) | void UpdateWithPenalty(Tensor logits, const Array<GenerationConf...
          method UpdateWithMask (line 371) | void UpdateWithMask(Tensor logits, const Array<RequestModelState...

FILE: cpp/serve/logit_processor.h
  function namespace (line 19) | namespace mlc {

FILE: cpp/serve/metrics.cc
  type mlc (line 12) | namespace mlc {
    type llm (line 13) | namespace llm {
      type serve (line 14) | namespace serve {

FILE: cpp/serve/metrics.h
  function namespace (line 15) | namespace mlc {
  function Reset (line 87) | void Reset() {
  function GetPrefillTime (line 100) | struct RequestMetrics {
  function Reset (line 148) | void Reset() {
  function UpdateDraftTimeByBatchSize (line 168) | struct EngineMetrics {
  function UpdateVerifyTimeByBatchSize (line 227) | void UpdateVerifyTimeByBatchSize(int effective_batch_size, double time) {
  function RequestFinishUpdate (line 237) | void RequestFinishUpdate(const RequestMetrics& request_metrics) {

FILE: cpp/serve/model.cc
  type mlc (line 21) | namespace mlc {
    type llm (line 22) | namespace llm {
      type serve (line 23) | namespace serve {
        function TVM_FFI_STATIC_INIT_BLOCK (line 27) | TVM_FFI_STATIC_INIT_BLOCK() { ModelObj::RegisterReflection(); }
        class ModelImpl (line 29) | class ModelImpl
          method ModelImpl (line 64) | explicit ModelImpl(String reload_lib_path, String model_path, tv...
          method ObjectRef (line 85) | ObjectRef TokenEmbed(IntTuple token_ids, ObjectRef* dst, int off...
          method ObjectRef (line 126) | ObjectRef ImageEmbed(const Tensor& image, ObjectRef* dst, int of...
          method CanGetLogits (line 154) | bool CanGetLogits() final {
          method Tensor (line 158) | Tensor GetLogits(const ObjectRef& hidden_states) final {
          method GetMultiStepLogits (line 184) | Array<Tensor> GetMultiStepLogits(const ObjectRef& hidden_states)...
          method ObjectRef (line 200) | ObjectRef FuseEmbedHidden(const ObjectRef& embeddings, const Obj...
          method Tensor (line 243) | Tensor BatchPrefill(const ObjectRef& embeddings, const std::vect...
          method ObjectRef (line 352) | ObjectRef BatchPrefillToLastHidden(const ObjectRef& embedding_or...
          method Tensor (line 420) | Tensor BatchDecode(const ObjectRef& embeddings, const std::vecto...
          method Tensor (line 488) | Tensor BatchTreeDecode(const ObjectRef& embeddings, const std::v...
          method ObjectRef (line 561) | ObjectRef BatchDecodeToLastHidden(const ObjectRef& hidden_states...
          method Tensor (line 612) | Tensor BatchVerify(const ObjectRef& embeddings, const std::vecto...
          method ObjectRef (line 684) | ObjectRef BatchVerifyToLastHidden(const ObjectRef& embeddings,
          method CreateKVCache (line 752) | void CreateKVCache(int page_size, int max_num_sequence, int64_t ...
          method AddNewSequence (line 783) | void AddNewSequence(int64_t seq_id) final {
          method ForkSequence (line 790) | void ForkSequence(int64_t parent_seq_id, int64_t child_seq_id, i...
          method RemoveSequence (line 798) | void RemoveSequence(int64_t seq_id) final {
          method PopNFromKVCache (line 806) | void PopNFromKVCache(int64_t seq_id, int num_tokens) final {
          method CommitAcceptedTokenTreeNodesToKVCache (line 813) | void CommitAcceptedTokenTreeNodesToKVCache(
          method EnableSlidingWindowForSeq (line 822) | void EnableSlidingWindowForSeq(int64_t seq_id) final {
          method IntTuple (line 832) | IntTuple DisaggPrepareKVRecv(int64_t seq_id, int length) final {
          method DisaggMarkKVSend (line 851) | void DisaggMarkKVSend(int64_t seq_id, int begin_pos, IntTuple co...
          method ModelMetadata (line 866) | ModelMetadata GetMetadata() const final { return ft_.model_metad...
          method GetNumAvailablePages (line 868) | int GetNumAvailablePages() const final {
          method GetCurrentTotalSequenceLength (line 877) | int GetCurrentTotalSequenceLength() const final {
          method LoadParams (line 888) | void LoadParams() final { this->params_ = ft_.LoadParams(model_,...
          method SetMaxNumSequence (line 890) | void SetMaxNumSequence(int max_num_sequence) final {
          method SetPrefillChunkSize (line 896) | void SetPrefillChunkSize(int prefill_chunk_size) final {
          method LogitProcessor (line 913) | LogitProcessor CreateLogitProcessor(int max_num_token,
          method Sampler (line 919) | Sampler CreateSampler(int max_num_sample, int num_models,
          method EstimateHostCPURequirement (line 929) | int EstimateHostCPURequirement() const final {
          method GetSlidingWindowSize (line 934) | int GetSlidingWindowSize() const final { return sliding_window_s...
          method GetAttentionSinkSize (line 936) | int GetAttentionSinkSize() const final { return attention_sink_s...
          method ObjectRef (line 938) | ObjectRef AllocEmbeddingTensor() final {
          method ObjectRef (line 961) | ObjectRef AllocHiddenStatesTensor() final {
          method Reset (line 985) | void Reset() final {
          method DraftTokenWorkspaceManager (line 994) | DraftTokenWorkspaceManager CreateDraftTokenWorkspaceManager(int ...
          method ObjectRef (line 999) | ObjectRef GatherHiddenStates(const ObjectRef& input, const std::...
          method ScatterHiddenStates (line 1018) | void ScatterHiddenStates(const ObjectRef& input, const std::vect...
          method Tensor (line 1028) | Tensor GatherDraftProbs(const Tensor& input, const std::vector<i...
          method ScatterDraftProbs (line 1041) | void ScatterDraftProbs(const Tensor& input, const std::vector<in...
          method GetMedusaLogits (line 1051) | Array<Tensor> GetMedusaLogits(const ObjectRef& hidden_states) {
          method DebugCallFuncOnAllAllWorker (line 1064) | void DebugCallFuncOnAllAllWorker(const String& func_name, Option...
          method LoadModelConfigJSON (line 1070) | void LoadModelConfigJSON(const tvm::ffi::json::Object& config) {
        function Model (line 31) | Model Model::Create(String reload_lib_path, String model_path,
        class ModelImpl (line 58) | class ModelImpl : public ModelObj {
          method ModelImpl (line 64) | explicit ModelImpl(String reload_lib_path, String model_path, tv...
          method ObjectRef (line 85) | ObjectRef TokenEmbed(IntTuple token_ids, ObjectRef* dst, int off...
          method ObjectRef (line 126) | ObjectRef ImageEmbed(const Tensor& image, ObjectRef* dst, int of...
          method CanGetLogits (line 154) | bool CanGetLogits() final {
          method Tensor (line 158) | Tensor GetLogits(const ObjectRef& hidden_states) final {
          method GetMultiStepLogits (line 184) | Array<Tensor> GetMultiStepLogits(const ObjectRef& hidden_states)...
          method ObjectRef (line 200) | ObjectRef FuseEmbedHidden(const ObjectRef& embeddings, const Obj...
          method Tensor (line 243) | Tensor BatchPrefill(const ObjectRef& embeddings, const std::vect...
          method ObjectRef (line 352) | ObjectRef BatchPrefillToLastHidden(const ObjectRef& embedding_or...
          method Tensor (line 420) | Tensor BatchDecode(const ObjectRef& embeddings, const std::vecto...
          method Tensor (line 488) | Tensor BatchTreeDecode(const ObjectRef& embeddings, const std::v...
          method ObjectRef (line 561) | ObjectRef BatchDecodeToLastHidden(const ObjectRef& hidden_states...
          method Tensor (line 612) | Tensor BatchVerify(const ObjectRef& embeddings, const std::vecto...
          method ObjectRef (line 684) | ObjectRef BatchVerifyToLastHidden(const ObjectRef& embeddings,
          method CreateKVCache (line 752) | void CreateKVCache(int page_size, int max_num_sequence, int64_t ...
          method AddNewSequence (line 783) | void AddNewSequence(int64_t seq_id) final {
          method ForkSequence (line 790) | void ForkSequence(int64_t parent_seq_id, int64_t child_seq_id, i...
          method RemoveSequence (line 798) | void RemoveSequence(int64_t seq_id) final {
          method PopNFromKVCache (line 806) | void PopNFromKVCache(int64_t seq_id, int num_tokens) final {
          method CommitAcceptedTokenTreeNodesToKVCache (line 813) | void CommitAcceptedTokenTreeNodesToKVCache(
          method EnableSlidingWindowForSeq (line 822) | void EnableSlidingWindowForSeq(int64_t seq_id) final {
          method IntTuple (line 832) | IntTuple DisaggPrepareKVRecv(int64_t seq_id, int length) final {
          method DisaggMarkKVSend (line 851) | void DisaggMarkKVSend(int64_t seq_id, int begin_pos, IntTuple co...
          method ModelMetadata (line 866) | ModelMetadata GetMetadata() const final { return ft_.model_metad...
          method GetNumAvailablePages (line 868) | int GetNumAvailablePages() const final {
          method GetCurrentTotalSequenceLength (line 877) | int GetCurrentTotalSequenceLength() const final {
          method LoadParams (line 888) | void LoadParams() final { this->params_ = ft_.LoadParams(model_,...
          method SetMaxNumSequence (line 890) | void SetMaxNumSequence(int max_num_sequence) final {
          method SetPrefillChunkSize (line 896) | void SetPrefillChunkSize(int prefill_chunk_size) final {
          method LogitProcessor (line 913) | LogitProcessor CreateLogitProcessor(int max_num_token,
          method Sampler (line 919) | Sampler CreateSampler(int max_num_sample, int num_models,
          method EstimateHostCPURequirement (line 929) | int EstimateHostCPURequirement() const final {
          method GetSlidingWindowSize (line 934) | int GetSlidingWindowSize() const final { return sliding_window_s...
          method GetAttentionSinkSize (line 936) | int GetAttentionSinkSize() const final { return attention_sink_s...
          method ObjectRef (line 938) | ObjectRef AllocEmbeddingTensor() final {
          method ObjectRef (line 961) | ObjectRef AllocHiddenStatesTensor() final {
          method Reset (line 985) | void Reset() final {
          method DraftTokenWorkspaceManager (line 994) | DraftTokenWorkspaceManager CreateDraftTokenWorkspaceManager(int ...
          method ObjectRef (line 999) | ObjectRef GatherHiddenStates(const ObjectRef& input, const std::...
          method ScatterHiddenStates (line 1018) | void ScatterHiddenStates(const ObjectRef& input, const std::vect...
          method Tensor (line 1028) | Tensor GatherDraftProbs(const Tensor& input, const std::vector<i...
          method ScatterDraftProbs (line 1041) | void ScatterDraftProbs(const Tensor& input, const std::vector<in...
          method GetMedusaLogits (line 1051) | Array<Tensor> GetMedusaLogits(const ObjectRef& hidden_states) {
          method DebugCallFuncOnAllAllWorker (line 1064) | void DebugCallFuncOnAllAllWorker(const String& func_name, Option...
          method LoadModelConfigJSON (line 1070) | void LoadModelConfigJSON(const tvm::ffi::json::Object& config) {
        function TVM_FFI_STATIC_INIT_BLOCK (line 1127) | TVM_FFI_STATIC_INIT_BLOCK() {

FILE: cpp/serve/model.h
  type ModelWorkspace (line 39) | struct ModelWorkspace {
  function ObjectRef (line 49) | ObjectRef hidden_states{nullptr};

FILE: cpp/serve/prefix_cache.cc
  type mlc (line 10) | namespace mlc {
    type llm (line 11) | namespace llm {
      type serve (line 12) | namespace serve {
        function TVM_FFI_STATIC_INIT_BLOCK (line 16) | TVM_FFI_STATIC_INIT_BLOCK() { PrefixCacheObj::RegisterReflection(); }
        class PrefixCacheImpl (line 21) | class PrefixCacheImpl : public PrefixCacheObj {
          method PrefixCacheImpl (line 28) | explicit PrefixCacheImpl(size_t max_num_recycling_seqs, PrefixCa...
          method PrefixCacheMatchedResult (line 48) | PrefixCacheMatchedResult InsertSequence(int64_t seq_id, std::vec...
          method ExtendSequence (line 149) | void ExtendSequence(int64_t seq_id, const std::vector<int32_t>& ...
          method CommitSequenceExtention (line 153) | void CommitSequenceExtention() final {
          method RollBackSequence (line 176) | void RollBackSequence(int64_t seq_id, size_t num_tokens) final {
          method RecycleSequence (line 190) | void RecycleSequence(int64_t seq_id, bool lazy = true) final {
          method TryFreeMemory (line 224) | bool TryFreeMemory() final {
          method HasSequence (line 250) | bool HasSequence(int64_t seq_id) final { return radix_tree_->Has...
          method Reset (line 255) | void Reset() final {
          method PrefixCacheMode (line 265) | PrefixCacheMode Mode() final { return PrefixCacheMode::kRadix; }
          method ReuseRecyclingSequence (line 268) | void ReuseRecyclingSequence(int64_t seq_id) {
          type SequenceState (line 280) | enum class SequenceState : int {
        class NoPrefixCache (line 344) | class NoPrefixCache : public PrefixCacheObj {
          method PrefixCacheMatchedResult (line 355) | PrefixCacheMatchedResult InsertSequence(int64_t seq_id, std::vec...
          method ExtendSequence (line 367) | void ExtendSequence(int64_t seq_id, const std::vector<int32_t>& ...
          method CommitSequenceExtention (line 371) | void CommitSequenceExtention() final {
          method RollBackSequence (line 381) | void RollBackSequence(int64_t seq_id, size_t num_tokens) final {
          method RecycleSequence (line 394) | void RecycleSequence(int64_t seq_id, bool lazy = true) final {
          method TryFreeMemory (line 404) | bool TryFreeMemory() final {
          method HasSequence (line 414) | bool HasSequence(int64_t seq_id) final {
          method Reset (line 422) | void Reset() final {}
          method PrefixCacheMode (line 424) | PrefixCacheMode Mode() final { return PrefixCacheMode::kDisable; }
        function PrefixCache (line 427) | PrefixCache PrefixCache::CreateRadixPrefixCache(size_t max_num_rec...
        function PrefixCache (line 434) | PrefixCache PrefixCache::CreateNoPrefixCache() {

FILE: cpp/serve/prefix_cache.h
  function namespace (line 20) | namespace mlc {

FILE: cpp/serve/radix_tree.cc
  type mlc (line 11) | namespace mlc {
    type llm (line 12) | namespace llm {
      type serve (line 13) | namespace serve {
        function TVM_FFI_STATIC_INIT_BLOCK (line 17) | TVM_FFI_STATIC_INIT_BLOCK() { PagedRadixTreeObj::RegisterReflectio...
        type SequenceIDNode (line 22) | struct SequenceIDNode {
        class SequenceIDNodePool (line 35) | class SequenceIDNodePool {
          method SequenceIDNodePool (line 38) | SequenceIDNodePool() {
          method SequenceIDNode (line 50) | SequenceIDNode* Allocate(int64_t seq_id, SequenceIDNode* next) {
          method Free (line 68) | void Free(SequenceIDNode* node) {
          method Reset (line 77) | void Reset() {
          method NewNodeBlock_ (line 107) | void NewNodeBlock_() {
        type RadixPage (line 137) | struct RadixPage {
          method Extend (line 172) | void Extend(const int32_t* suffix, size_t suffix_length) {
          method AddSequence (line 185) | void AddSequence(SequenceIDNodePool* pool, int64_t id) { seq_ids...
          method PopSequence (line 193) | void PopSequence(SequenceIDNodePool* pool, int64_t id) {
          method GetLocalSequence (line 222) | std::vector<int64_t> GetLocalSequence() {
          method FindAnyChildSequence (line 236) | int32_t FindAnyChildSequence() {
          method FindAllChildSequence (line 246) | std::vector<int64_t> FindAllChildSequence() {
          method Iterate (line 263) | void Iterate(CallbackFunc f) {
          method RadixPage (line 274) | RadixPage* GetLastSibling() {
          method RadixPage (line 287) | RadixPage* FindChild(int64_t first_token) {
          method InsertChild (line 297) | void InsertChild(RadixPage* child) {
          method RemoveChild (line 307) | void RemoveChild(RadixPage* child) {
          method Mergeable (line 325) | bool Mergeable() {
          method MatchPrefix (line 341) | size_t MatchPrefix(const int32_t* prefix, size_t prefix_length) {
        class RadixPagePool (line 356) | class RadixPagePool {
          method RadixPagePool (line 359) | RadixPagePool() {
          method RadixPage (line 369) | RadixPage* Allocate() {
          method Free (line 389) | void Free(RadixPage* page) {
          method FreeCapacity (line 400) | size_t FreeCapacity() { return free_page_indices_.size() * kPage...
          method Reset (line 405) | void Reset() {
          method NewPageBlock_ (line 443) | void NewPageBlock_() {
        class PagedRadixTreeImpl (line 460) | class PagedRadixTreeImpl : public PagedRadixTreeObj {
          method PagedRadixTreeImpl (line 471) | explicit PagedRadixTreeImpl() {
          method HasSequence (line 487) | bool HasSequence(int64_t seq_id) { return seq2page.find(seq_id) ...
          method IntTuple (line 495) | IntTuple GetSequence(int64_t seq_id) {
          method MatchPrefix (line 514) | std::pair<size_t, std::vector<int64_t>> MatchPrefix(const std::v...
          method GetSequenceLength (line 528) | size_t GetSequenceLength(int64_t seq_id) {
          method ForkSequence (line 547) | void ForkSequence(int64_t seq_id, int64_t parent_seq_id, size_t ...
          method AddSequence (line 572) | void AddSequence(int64_t seq_id) {
          method ExtendSequence (line 585) | void ExtendSequence(int64_t seq_id, const std::vector<int32_t>& ...
          method RollBackSequence (line 625) | void RollBackSequence(int64_t seq_id, size_t num_tokens) {
          method RemoveSequence (line 672) | void RemoveSequence(int64_t seq_id) {
          method FreeCapacity (line 692) | size_t FreeCapacity() { return radix_page_pool->FreeCapacity(); }
          method Reset (line 694) | void Reset() {
          method MergePage (line 717) | void MergePage(RadixPage* page) {
          method RadixPage (line 743) | RadixPage* SplitPage(RadixPage* page, size_t offset) {
          method MatchSequence (line 779) | std::tuple<RadixPage*, size_t, size_t> MatchSequence(RadixPage* ...
        function PagedRadixTree (line 801) | PagedRadixTree PagedRadixTree::Create() {
        function TVM_FFI_STATIC_INIT_BLOCK (line 805) | TVM_FFI_STATIC_INIT_BLOCK() {

FILE: cpp/serve/radix_tree.h
  function namespace (line 15) | namespace mlc {

FILE: cpp/serve/request.cc
  type mlc (line 13) | namespace mlc {
    type llm (line 14) | namespace llm {
      type serve (line 15) | namespace serve {
        function TVM_FFI_STATIC_INIT_BLOCK (line 19) | TVM_FFI_STATIC_INIT_BLOCK() { RequestNode::RegisterReflection(); }
        function Request (line 47) | Request Request::FromUntokenized(const Request& request, const Tok...
        function TVM_FFI_STATIC_INIT_BLOCK (line 71) | TVM_FFI_STATIC_INIT_BLOCK() {

FILE: cpp/serve/request.h
  function namespace (line 18) | namespace mlc {

FILE: cpp/serve/request_state.cc
  type mlc (line 10) | namespace mlc {
    type llm (line 11) | namespace llm {
      type serve (line 12) | namespace serve {
        function TVM_FFI_STATIC_INIT_BLOCK (line 14) | TVM_FFI_STATIC_INIT_BLOCK() {
        function RequestStreamOutput (line 117) | RequestStreamOutput RequestActionPostProcWorkspace::GetStreamOutpu...

FILE: cpp/serve/request_state.h
  function namespace (line 23) | namespace llm {

FILE: cpp/serve/sampler/cpu_sampler.cc
  type mlc (line 16) | namespace mlc {
    type llm (line 17) | namespace llm {
      type serve (line 18) | namespace serve {
        function TVM_FFI_STATIC_INIT_BLOCK (line 20) | TVM_FFI_STATIC_INIT_BLOCK() { SamplerObj::RegisterReflection(); }
        function TokenProbPair (line 35) | TokenProbPair SampleTopPFromProb(Tensor prob, int unit_offset, int...
        function RenormalizeProbByTopP (line 172) | void RenormalizeProbByTopP(Tensor prob, int unit_offset, double to...
        type detail (line 262) | namespace detail {
          function ComputeTopProbsImpl (line 266) | std::vector<TokenProbPair> ComputeTopProbsImpl(const float* p_pr...
        function ComputeTopProbs (line 302) | inline std::vector<TokenProbPair> ComputeTopProbs(Tensor prob, int...
        class CPUSampler (line 327) | class CPUSampler : public SamplerObj {
          method CPUSampler (line 329) | explicit CPUSampler(Optional<EventTraceRecorder> trace_recorder)
          method Tensor (line 332) | Tensor BatchRenormalizeProbsByTopP(Tensor probs_on_device,      ...
          method BatchSampleTokensWithProbBeforeTopP (line 375) | std::vector<SampleResult> BatchSampleTokensWithProbBeforeTopP(
          method BatchSampleTokensWithProbAfterTopP (line 392) | std::vector<SampleResult> BatchSampleTokensWithProbAfterTopP(
          method BatchVerifyDraftTokensWithProbAfterTopP (line 402) | std::pair<std::vector<std::vector<SampleResult>>, std::vector<int>>
          method BatchSampleTokensImpl (line 506) | std::vector<SampleResult> BatchSampleTokensImpl(Tensor probs_on_...
          method Tensor (line 546) | Tensor CopyProbsToCPU(Tensor probs_on_device) {
        function Sampler (line 582) | Sampler Sampler::CreateCPUSampler(Optional<EventTraceRecorder> tra...

FILE: cpp/serve/sampler/gpu_sampler.cc
  type mlc (line 14) | namespace mlc {
    type llm (line 15) | namespace llm {
      type serve (line 16) | namespace serve {
        function FlashInferSamplingAvailable (line 18) | inline bool FlashInferSamplingAvailable(Device device) {
        function CopyArray (line 32) | inline void CopyArray(Tensor src, Tensor dst, TVMStreamHandle copy...
        function SyncCopyStream (line 37) | inline void SyncCopyStream(Device device, TVMStreamHandle compute_...
        class GPUSampler (line 49) | class GPUSampler : public SamplerObj {
          method GPUSampler (line 51) | explicit GPUSampler(int max_num_sample, int vocab_size, Function...
          method Tensor (line 122) | Tensor BatchRenormalizeProbsByTopP(Tensor probs_on_device,      ...
          method BatchSampleTokensWithProbBeforeTopP (line 177) | std::vector<SampleResult> BatchSampleTokensWithProbBeforeTopP(
          method BatchSampleTokensWithProbAfterTopP (line 188) | std::vector<SampleResult> BatchSampleTokensWithProbAfterTopP(
          method BatchVerifyDraftTokensWithProbAfterTopP (line 199) | std::pair<std::vector<std::vector<SampleResult>>, std::vector<int>>
          method BatchSampleTokensImpl (line 358) | std::vector<SampleResult> BatchSampleTokensImpl(Tensor probs_on_...
          method CollectSampleResult (line 409) | std::vector<SampleResult> CollectSampleResult(const std::vector<...
          method ChunkSampleTokensImpl (line 438) | std::vector<SampleResult> ChunkSampleTokensImpl(Tensor probs_on_...
          method Tensor (line 478) | Tensor GenerateUniformSamples(const std::vector<RandomGenerator*...
          method Tensor (line 491) | Tensor GenerateUniformSamples(const std::vector<RandomGenerator*...
          method Tensor (line 507) | Tensor CopySampleIndicesToGPU(const std::vector<int>& sample_ind...
          method CheckTopP (line 519) | bool CheckTopP(const Array<GenerationConfig>& generation_cfg,
          method CheckProbValues (line 544) | bool CheckProbValues(const Array<GenerationConfig>& generation_cfg,
          method SampleOnGPU (line 565) | std::vector<Tensor> SampleOnGPU(Tensor probs_on_device, Tensor u...
          method CopyArraysToCPU (line 655) | std::vector<Tensor> CopyArraysToCPU(const std::vector<Tensor>& d...
        function Sampler (line 746) | Sampler Sampler::CreateGPUSampler(int max_num_sample, int vocab_si...

FILE: cpp/serve/sampler/sampler.h
  function namespace (line 20) | namespace mlc {

FILE: cpp/serve/threaded_engine.cc
  type mlc (line 22) | namespace mlc {
    type llm (line 23) | namespace llm {
      type serve (line 24) | namespace serve {
        type InstructionKind (line 30) | enum class InstructionKind : int {
        class ThreadedEngineImpl (line 40) | class ThreadedEngineImpl : public ThreadedEngine {
          method InitThreadedEngine (line 42) | void InitThreadedEngine(Device device, Optional<Function> reques...
          method Reload (line 51) | void Reload(String engine_config_json_str) final {
          method Unload (line 73) | void Unload() final {
          method Reset (line 96) | void Reset() final {
          method AddRequest (line 109) | void AddRequest(Request request) final {
          method AbortRequest (line 122) | void AbortRequest(const String& request_id) final {
          method RunBackgroundLoop (line 135) | void RunBackgroundLoop() final {
          method RunBackgroundStreamBackLoop (line 190) | void RunBackgroundStreamBackLoop() final {
          method ExitBackgroundLoop (line 222) | void ExitBackgroundLoop() final {
          method GenerationConfig (line 233) | GenerationConfig GetDefaultGenerationConfig() const final {
          method Request (line 239) | Request CreateRequest(String id, Array<Data> inputs, String gene...
          method EngineConfig (line 246) | EngineConfig GetCompleteEngineConfig() const final {
          method String (line 251) | String GetCompleteEngineConfigJSONString() const {
          method DebugCallFuncOnAllAllWorker (line 255) | void DebugCallFuncOnAllAllWorker(const String& func_name, Option...
          method EngineReloadImpl (line 270) | void EngineReloadImpl(const std::string& engine_config_json_str) {
          method EngineUnloadImpl (line 300) | void EngineUnloadImpl() {
        class ThreadedEngineModule (line 383) | class ThreadedEngineModule : public ThreadedEngineImpl, public ffi...
        function TVM_FFI_STATIC_INIT_BLOCK (line 403) | TVM_FFI_STATIC_INIT_BLOCK() {

FILE: cpp/serve/threaded_engine.h
  function namespace (line 13) | namespace mlc {

FILE: cpp/support/debug_utils.h
  function namespace (line 11) | namespace mlc {

FILE: cpp/support/dynamic_bitset.h
  function namespace (line 15) | namespace mlc {
  function const (line 89) | bool operator[](int index) const {
  function Set (line 98) | void Set() {
  function Reset (line 114) | void Reset() {
  function Reset (line 120) | void Reset(int index) { Set(index, false); }

FILE: cpp/support/encoding.cc
  type mlc (line 11) | namespace mlc {
    type llm (line 12) | namespace llm {
      function PrintAsUTF8 (line 14) | std::string PrintAsUTF8(TCodepoint codepoint) {
      function PrintAsEscaped (line 39) | std::string PrintAsEscaped(
      function PrintAsEscaped (line 68) | std::string PrintAsEscaped(uint8_t raw_char) { return PrintAsEscaped...
      function PrintAsEscaped (line 70) | std::string PrintAsEscaped(std::string raw_str) {
      function HandleUTF8FirstByte (line 79) | std::tuple<bool, int, TCodepoint> HandleUTF8FirstByte(uint8_t byte) {
      function ParseNextUTF8 (line 108) | std::pair<TCodepoint, const char*> ParseNextUTF8(const char* utf8, U...
      function ParseUTF8 (line 133) | std::vector<TCodepoint> ParseUTF8(const char* utf8, UTF8ErrorPolicy ...
      function HexCharToInt (line 146) | inline int HexCharToInt(char c) {
      function ParseNextUTF8OrEscaped (line 158) | std::pair<TCodepoint, const char*> ParseNextUTF8OrEscaped(

FILE: cpp/support/encoding.h
  function TCodepoint (line 62) | enum CharHandlingError : TCodepoint {

FILE: cpp/support/json_parser.h
  function namespace (line 18) | namespace mlc {
  function namespace (line 205) | namespace details {

FILE: cpp/support/load_bytes_from_file.h
  function namespace (line 14) | namespace mlc {

FILE: cpp/support/progress_bar.h
  function namespace (line 12) | namespace mlc {

FILE: cpp/support/random.h
  function namespace (line 12) | namespace mlc {

FILE: cpp/support/result.h
  function namespace (line 14) | namespace mlc {

FILE: cpp/support/utils.h
  function namespace (line 18) | namespace mlc {

FILE: cpp/support/vlm_utils.cc
  type mlc (line 9) | namespace mlc {
    type llm (line 10) | namespace llm {
      function CalculateResizeShape (line 12) | void CalculateResizeShape(tvm::runtime::Tensor image_data, std::stri...
      function CalculatePadShape (line 31) | void CalculatePadShape(tvm::runtime::Tensor image_data, std::string ...
      function CalculateCropShape (line 47) | void CalculateCropShape(tvm::runtime::Tensor image_data, std::string...

FILE: cpp/support/vlm_utils.h
  function namespace (line 13) | namespace mlc {

FILE: cpp/tokenizers/streamer.cc
  type mlc (line 17) | namespace mlc {
    type llm (line 18) | namespace llm {
      function TVM_FFI_STATIC_INIT_BLOCK (line 20) | TVM_FFI_STATIC_INIT_BLOCK() {
      function TVM_FFI_STATIC_INIT_BLOCK (line 146) | TVM_FFI_STATIC_INIT_BLOCK() {
      function CreatePartialMatchTable (line 162) | inline std::vector<int> CreatePartialMatchTable(const String& str) {
      function TVM_FFI_STATIC_INIT_BLOCK (line 269) | TVM_FFI_STATIC_INIT_BLOCK() {

FILE: cpp/tokenizers/streamer.h
  function namespace (line 17) | namespace mlc {

FILE: cpp/tokenizers/tokenizers.cc
  type mlc (line 24) | namespace mlc {
    type llm (line 25) | namespace llm {
      function TVM_FFI_STATIC_INIT_BLOCK (line 27) | TVM_FFI_STATIC_INIT_BLOCK() {
      function String (line 34) | String TokenizerInfoNode::AsJSONString() const {
      function TokenizerInfo (line 42) | TokenizerInfo TokenizerInfo::FromJSONString(String json_string) {
      function DynamicBitset (line 104) | const DynamicBitset& TokenizerObj::GetPrefixTokenMask() {
      function Tokenizer (line 143) | Tokenizer Tokenizer::FromPath(const String& _path, std::optional<Tok...
      function TokenizerInfo (line 197) | TokenizerInfo Tokenizer::DetectTokenizerInfo(const String& path_str) {
      function ByteFallbackDecoder (line 360) | inline std::string ByteFallbackDecoder(const std::string& token) {
      function SpaceReplacerDecoder (line 375) | inline std::string SpaceReplacerDecoder(const std::string& token) {
      function ByteLevelDecoder (line 395) | inline std::string ByteLevelDecoder(const std::string& token) {
      function PostProcessToken (line 440) | inline std::string PostProcessToken(const std::string& token,
      function TVM_FFI_STATIC_INIT_BLOCK (line 478) | TVM_FFI_STATIC_INIT_BLOCK() {
      function TVM_FFI_STATIC_INIT_BLOCK (line 507) | TVM_FFI_STATIC_INIT_BLOCK() {

FILE: cpp/tokenizers/tokenizers.h
  function namespace (line 23) | namespace llm {

FILE: examples/python/microserving/custom_router.py
  class CustomRouter (line 13) | class CustomRouter(Router):
    method translate_request (line 16) | async def translate_request(

FILE: examples/rest/nodejs/sample_langchain.ts
  function print (line 21) | function print(str: string) {

FILE: examples/rest/python/sample_client.py
  class color (line 6) | class color:

FILE: examples/rest/python/sample_langchain.py
  class color (line 30) | class color:
  function llm_chain_example (line 43) | def llm_chain_example():
  function load_qa_chain_example (line 62) | def load_qa_chain_example():
  function retrieval_qa_sotu_example (line 73) | def retrieval_qa_sotu_example():
  function retrieval_qa_mlc_docs_example (line 117) | def retrieval_qa_mlc_docs_example():

FILE: examples/rest/python/sample_openai.py
  class color (line 9) | class color:

FILE: python/mlc_llm/__init__.py
  function _create_socket_session_local_workers (line 14) | def _create_socket_session_local_workers(num_workers):

FILE: python/mlc_llm/__main__.py
  function main (line 11) | def main():

FILE: python/mlc_llm/base.py
  function _load_mlc_llm_lib (line 15) | def _load_mlc_llm_lib():
  function _debug_cuda_profiler_start (line 28) | def _debug_cuda_profiler_start() -> None:
  function _debug_cuda_profiler_stop (line 37) | def _debug_cuda_profiler_stop() -> None:

FILE: python/mlc_llm/bench/__main__.py
  function _parse_num_concurrent_requests (line 34) | def _parse_num_concurrent_requests(num_str: Optional[str]) -> Optional[L...
  function _parse_request_rate (line 43) | def _parse_request_rate(request_rate_str: Optional[str]) -> Optional[Lis...
  function _parse_mlc_engine_config (line 56) | def _parse_mlc_engine_config(config_str: Optional[str]) -> EngineConfig:
  function _launch_mlc_server (line 76) | def _launch_mlc_server(args: argparse.argparse.Namespace):
  function run_pipeline (line 88) | def run_pipeline(
  function query_mlc_server_metrics (line 119) | def query_mlc_server_metrics(host: str, port: int):
  function main (line 129) | def main(args: argparse.argparse.Namespace):

FILE: python/mlc_llm/bench/api_endpoint.py
  class APIEndPoint (line 18) | class APIEndPoint:
    method __init__ (line 23) | def __init__(self, include_server_metrics: bool = False) -> None:
    method __aenter__ (line 26) | async def __aenter__(self) -> Self:
    method __aexit__ (line 29) | async def __aexit__(self, exc_type, exc_value, tb) -> None:
    method __call__ (line 32) | async def __call__(self, request: RequestRecord) -> RequestRecord:
  class OpenAIChatEndPoint (line 36) | class OpenAIChatEndPoint(APIEndPoint):
    method __init__ (line 39) | def __init__(  # pylint: disable=too-many-arguments
    method __aenter__ (line 57) | async def __aenter__(self) -> Self:
    method __aexit__ (line 63) | async def __aexit__(self, exc_type, exc_value, tb) -> None:
    method __call__ (line 66) | async def __call__(  # pylint: disable=too-many-branches,too-many-stat...
  class OpenAIEndPoint (line 186) | class OpenAIEndPoint(APIEndPoint):
    method __init__ (line 189) | def __init__(  # pylint: disable=too-many-arguments
    method __aenter__ (line 212) | async def __aenter__(self) -> Self:
    method __aexit__ (line 218) | async def __aexit__(self, exc_type, exc_value, tb) -> None:
    method __call__ (line 221) | async def __call__(  # pylint: disable=too-many-branches,too-many-stat...
  class TensorRTLLMEndPoint (line 318) | class TensorRTLLMEndPoint(APIEndPoint):
    method __init__ (line 321) | def __init__(  # pylint: disable=too-many-arguments
    method __aenter__ (line 333) | async def __aenter__(self) -> Self:
    method __aexit__ (line 339) | async def __aexit__(self, exc_type, exc_value, tb) -> None:
    method __call__ (line 342) | async def __call__(  # pylint: disable=too-many-branches,too-many-loca...
  function create_api_endpoint (line 448) | def create_api_endpoint(args: argparse.Namespace) -> APIEndPoint:

FILE: python/mlc_llm/bench/dataset.py
  class Dataset (line 22) | class Dataset:  # pylint: disable=too-few-public-methods
    method generate_request_records (line 35) | def generate_request_records(
  class ShareGPTDataset (line 46) | class ShareGPTDataset(Dataset):  # pylint: disable=too-few-public-methods
    method __init__ (line 52) | def __init__(
    method generate_request_records (line 109) | def generate_request_records(
  class LoogleDataset (line 170) | class LoogleDataset(Dataset):  # pylint: disable=too-few-public-methods
    method __init__ (line 183) | def __init__(self, tokenizer: AutoTokenizer, testset_name: str) -> None:
    method generate_request_records (line 210) | def generate_request_records(  # pylint: disable=too-many-locals
  class LLMPerfDataset (line 264) | class LLMPerfDataset(Dataset):  # pylint: disable=too-few-public-methods
    method __init__ (line 267) | def __init__(self, dataset_path: str, num_requests: int, tokenizer: Au...
    method generate_request_records (line 285) | def generate_request_records(  # pylint: disable=too-many-arguments,to...
  class JSONModeEvalDataset (line 345) | class JSONModeEvalDataset(Dataset):  # pylint: disable=too-few-public-me...
    method __init__ (line 348) | def __init__(self, tokenizer: AutoTokenizer) -> None:
    method generate_request_records (line 365) | def generate_request_records(
  class ReActDataset (line 407) | class ReActDataset(Dataset):  # pylint: disable=too-few-public-methods
    method __init__ (line 484) | def __init__(  # pylint: disable=too-many-locals
    method generate_request_records (line 550) | def generate_request_records(
  class WildChatDataset (line 590) | class WildChatDataset(Dataset):  # pylint: disable=too-few-public-methods
    method __init__ (line 595) | def __init__(self, tokenizer: AutoTokenizer, apply_chat_template: bool...
    method generate_request_records (line 650) | def generate_request_records(  # pylint: disable=too-many-locals
  class AzureLLMInferenceDataset (line 711) | class AzureLLMInferenceDataset(Dataset):  # pylint: disable=too-few-publ...
    method __init__ (line 718) | def __init__(self, dataset_path: str, tokenizer: AutoTokenizer) -> None:
    method generate_request_records (line 741) | def generate_request_records(  # pylint: disable=too-many-locals
  function create_dataset (line 817) | def create_dataset(  # pylint: disable=too-many-return-statements,too-ma...

FILE: python/mlc_llm/bench/evaluation/gsm8k.py
  function extract_answer (line 21) | def extract_answer(text: str, regex: re.Pattern, select_index: int) -> str:
  function extract_ground_truth (line 34) | def extract_ground_truth(text: str) -> str:
  function strict_extract_answer (line 39) | def strict_extract_answer(text: str) -> str:
  function flexible_extract_answer (line 44) | def flexible_extract_answer(text: str) -> str:
  function create_few_shot_prompt (line 49) | def create_few_shot_prompt(n_shot: int, use_cot: bool, random_order=Fals...
  function create_prompt (line 157) | def create_prompt(question: str, n_shot: int, use_cot: bool, random_orde...
  function parse_args (line 167) | def parse_args():
  function send_request (line 184) | async def send_request(
  function evaluate (line 209) | async def evaluate(  # pylint: disable=too-many-arguments, too-many-locals

FILE: python/mlc_llm/bench/evaluation/mmlu.py
  function parse_args (line 81) | def parse_args():
  function send_request (line 97) | async def send_request(
  function evaluate (line 128) | async def evaluate(  # pylint: disable=too-many-arguments, too-many-locals

FILE: python/mlc_llm/bench/request_processor.py
  class RequestProcessor (line 30) | class RequestProcessor:  # pylint: disable=too-few-public-methods
    method __call__ (line 36) | def __call__(self, request_records: List[RequestRecord]) -> List[Reque...
  class LogMessage (line 40) | class LogMessage(RequestProcessor):  # pylint: disable=too-few-public-me...
    method __init__ (line 43) | def __init__(self, message: str) -> None:
    method __call__ (line 46) | def __call__(self, request_records: List[RequestRecord]) -> List[Reque...
  class SampleRequests (line 51) | class SampleRequests(RequestProcessor):  # pylint: disable=too-few-publi...
    method __init__ (line 54) | def __init__(self, num_requests: int, take_first_x_requests: bool = Fa...
    method __call__ (line 60) | def __call__(self, request_records: List[RequestRecord]) -> List[Reque...
    method _sample_from_plain_request_records (line 71) | def _sample_from_plain_request_records(
    method _sample_from_grouped_request_records (line 93) | def _sample_from_grouped_request_records(
  class AttachModelName (line 124) | class AttachModelName(RequestProcessor):  # pylint: disable=too-few-publ...
    method __init__ (line 127) | def __init__(self, model: str) -> None:
    method __call__ (line 130) | def __call__(self, request_records: List[RequestRecord]) -> List[Reque...
  class AttachRequestRateTimestamp (line 136) | class AttachRequestRateTimestamp(RequestProcessor):  # pylint: disable=t...
    method __init__ (line 139) | def __init__(self, request_rate: np.float32) -> None:
    method __call__ (line 142) | def __call__(self, request_records: List[RequestRecord]) -> List[Reque...
  class AttachExecutionFeature (line 151) | class AttachExecutionFeature(RequestProcessor):  # pylint: disable=too-f...
    method __init__ (line 154) | def __init__(self, exec_feature: Dict[str, Any]) -> None:
    method __call__ (line 157) | def __call__(self, request_records: List[RequestRecord]) -> List[Reque...
  class AttachStreamFlag (line 164) | class AttachStreamFlag(RequestProcessor):  # pylint: disable=too-few-pub...
    method __init__ (line 167) | def __init__(self, stream: Optional[bool]) -> None:
    method __call__ (line 170) | def __call__(self, request_records: List[RequestRecord]) -> List[Reque...
  class AttachSamplingOptions (line 178) | class AttachSamplingOptions(RequestProcessor):  # pylint: disable=too-fe...
    method __init__ (line 181) | def __init__(self, temperature: float, top_p: float, ignore_eos: bool)...
    method __call__ (line 186) | def __call__(self, request_records: List[RequestRecord]) -> List[Reque...
  class ScaleTimestamp (line 198) | class ScaleTimestamp(RequestProcessor):  # pylint: disable=too-few-publi...
    method __init__ (line 201) | def __init__(self, timestamp_scale: float):
    method __call__ (line 204) | def __call__(self, request_records: List[RequestRecord]) -> List[Reque...
  class MetricAnalyzer (line 214) | class MetricAnalyzer(RequestProcessor):  # pylint: disable=too-few-publi...
    method __init__ (line 217) | def __init__(self, tokenizer: AutoTokenizer) -> None:
    method __call__ (line 220) | def __call__(self, request_records: List[RequestRecord]) -> List[Reque...
  class WarmupAndRun (line 255) | class WarmupAndRun(RequestProcessor):  # pylint: disable=too-few-public-...
    method __init__ (line 258) | def __init__(  # pylint: disable=too-many-arguments
    method generate_fake_warmup_requests (line 272) | def generate_fake_warmup_requests(  # pylint: disable=missing-function...
    method __call__ (line 291) | def __call__(self, request_records: List[RequestRecord]) -> List[Reque...
    method _process_warmup_requests (line 324) | def _process_warmup_requests(self, warmup_requests: List[RequestRecord...
  class SequentialProcessor (line 341) | class SequentialProcessor(RequestProcessor):  # pylint: disable=too-few-...
    method __init__ (line 346) | def __init__(self, *processors: RequestProcessor) -> None:
    method __call__ (line 349) | def __call__(self, request_records: List[RequestRecord]) -> List[Reque...
  class Executor (line 355) | class Executor(RequestProcessor):  # pylint: disable=too-few-public-methods
    method __init__ (line 358) | def __init__(
    method __call__ (line 368) | def __call__(self, request_records: List[RequestRecord]) -> List[Reque...
  class FixedConcurrentRequestExecutor (line 372) | class FixedConcurrentRequestExecutor(Executor):  # pylint: disable=too-f...
    method __init__ (line 375) | def __init__(  # pylint: disable=too-many-arguments
    method __call__ (line 391) | def __call__(self, request_records: List[RequestRecord]) -> List[Reque...
    method _process_task (line 422) | def _process_task(
  class FixTimestampExecutor (line 484) | class FixTimestampExecutor(Executor):  # pylint: disable=too-few-public-...
    method __init__ (line 487) | def __init__(  # pylint: disable=too-many-arguments
    method __call__ (line 503) | def __call__(self, request_records: List[RequestRecord]) -> List[Reque...
    method _process_task (line 540) | def _process_task(
  function create_pipelines (line 603) | def create_pipelines(  # pylint: disable=too-many-branches

FILE: python/mlc_llm/bench/request_record.py
  class ServerMetrics (line 14) | class ServerMetrics(BaseModel):
  class Metrics (line 27) | class Metrics(BaseModel):
  class RequestRecord (line 45) | class RequestRecord(BaseModel):
  class GroupedRequestRecord (line 57) | class GroupedRequestRecord(RequestRecord):
  function generate_metrics_summary (line 67) | def generate_metrics_summary(
  function _compute_metrics_statistics (line 116) | def _compute_metrics_statistics(
  function convert_reports_to_df (line 161) | def convert_reports_to_df(reports: List[Dict[str, Any]]) -> pd.DataFrame:
  function pretty_print_report (line 177) | def pretty_print_report(report: Dict[str, Any]) -> None:  # pylint: disa...

FILE: python/mlc_llm/cli/calibrate.py
  function main (line 10) | def main(argv):

FILE: python/mlc_llm/cli/chat.py
  function main (line 8) | def main(argv):

FILE: python/mlc_llm/cli/check_device.py
  function _check_device (line 10) | def _check_device(device: Device) -> bool:
  function main (line 17) | def main():

FILE: python/mlc_llm/cli/compile.py
  function main (line 27) | def main(argv):

FILE: python/mlc_llm/cli/convert_weight.py
  function main (line 17) | def main(argv):

FILE: python/mlc_llm/cli/delivery.py
  class OverrideConfigs (line 33) | class OverrideConfigs(BaseModel):
  class ModelDeliveryTask (line 46) | class ModelDeliveryTask(BaseModel):
  class ModelDeliveryList (line 71) | class ModelDeliveryList(BaseModel):
    method from_json (line 83) | def from_json(cls: Type[T], json_dict: Dict[str, Any]) -> T:
    method to_json (line 93) | def to_json(self) -> Dict[str, Any]:
  function _clone_repo (line 100) | def _clone_repo(model: Union[str, Path], hf_local_dir: Optional[str]) ->...
  function _run_quantization (line 120) | def _run_quantization(
  function _get_current_log (line 207) | def _get_current_log(log: str) -> ModelDeliveryList:
  function _generate_model_delivery_diff (line 219) | def _generate_model_delivery_diff(  # pylint: disable=too-many-locals
  function _main (line 281) | def _main(  # pylint: disable=too-many-locals, too-many-arguments
  function main (line 369) | def main():

FILE: python/mlc_llm/cli/gen_config.py
  function main (line 14) | def main(argv):

FILE: python/mlc_llm/cli/lib_delivery.py
  class ModelInfo (line 23) | class ModelInfo:  # pylint: disable=too-many-instance-attributes
  class DeferredScope (line 36) | class DeferredScope:
    method __init__ (line 39) | def __init__(self):
    method add (line 42) | def add(self, func: Callable[[], None]):
    method __enter__ (line 46) | def __enter__(self):
    method __exit__ (line 49) | def __exit__(self, exc_type, exc_value, traceback):
    method create_temp_dir (line 54) | def create_temp_dir(self) -> Path:
  function _run_compilation (line 61) | def _run_compilation(model_info: ModelInfo, repo_dir: Path) -> bool:
  function _main (line 122) | def _main(  # pylint: disable=too-many-locals
  function main (line 175) | def main():

FILE: python/mlc_llm/cli/model_metadata.py
  function _extract_metadata (line 19) | def _extract_metadata(model_lib: Path) -> Dict[str, Any]:
  function _report_all (line 29) | def _report_all(metadata: Dict[str, Any]) -> None:
  function _read_dynamic_shape (line 46) | def _read_dynamic_shape(shape: List[Union[int, str]], config: Union[Dict...
  function _compute_memory_usage (line 74) | def _compute_memory_usage(metadata: Dict[str, Any], config: Union[Dict, ...
  function _report_memory_usage (line 91) | def _report_memory_usage(metadata: Dict[str, Any], config: Union[Dict, C...
  function main (line 145) | def main():

FILE: python/mlc_llm/cli/package.py
  function main (line 12) | def main(argv):

FILE: python/mlc_llm/cli/router.py
  function main (line 8) | def main(argv):

FILE: python/mlc_llm/cli/serve.py
  class EngineConfigOverride (line 15) | class EngineConfigOverride:  # pylint: disable=too-many-instance-attributes
    method __repr__ (line 36) | def __repr__(self) -> str:
    method from_str (line 65) | def from_str(source: str) -> "EngineConfigOverride":
  function main (line 106) | def main(argv):

FILE: python/mlc_llm/cli/worker.py
  function main (line 32) | def main():

FILE: python/mlc_llm/compiler_pass/attach_cuda_graph_alloc_init_func.py
  class AttachCUDAGraphAllocInitFunc (line 8) | class AttachCUDAGraphAllocInitFunc:  # pylint: disable=too-few-public-me...
    method __init__ (line 11) | def __init__(self):
    method transform_module (line 14) | def transform_module(self, mod: IRModule, _ctx: tvm.transform.PassCont...

FILE: python/mlc_llm/compiler_pass/attach_embedding_allocator.py
  class AttachAllocEmbeddingTensorFunc (line 10) | class AttachAllocEmbeddingTensorFunc:  # pylint: disable=too-few-public-...
    method __init__ (line 13) | def __init__(self, metadata: Dict[str, Any]):
    method transform_module (line 16) | def transform_module(self, mod: IRModule, _ctx: tvm.transform.PassCont...

FILE: python/mlc_llm/compiler_pass/attach_logit_processor.py
  class AttachLogitProcessFunc (line 14) | class AttachLogitProcessFunc:  # pylint: disable=too-few-public-methods
    method __init__ (line 17) | def __init__(self, target: tvm.target.Target):
    method transform_module (line 27) | def transform_module(self, mod: IRModule, _ctx: tvm.transform.PassCont...
  function _get_apply_logit_bias_inplace_cpu (line 41) | def _get_apply_logit_bias_inplace_cpu():
  function _get_apply_logit_bias_inplace (line 72) | def _get_apply_logit_bias_inplace(target: tvm.target.Target):
  function _get_apply_penalty_inplace_cpu (line 112) | def _get_apply_penalty_inplace_cpu():
  function _get_apply_penalty_inplace (line 156) | def _get_apply_penalty_inplace(target: tvm.target.Target):
  function _get_apply_bitmask_inplace_cpu (line 210) | def _get_apply_bitmask_inplace_cpu():
  function _get_apply_bitmask_inplace (line 246) | def _get_apply_bitmask_inplace(target: tvm.target.Target):

FILE: python/mlc_llm/compiler_pass/attach_sampler.py
  class AttachGPUSamplingFunc (line 15) | class AttachGPUSamplingFunc:  # pylint: disable=too-few-public-methods
    method __init__ (line 18) | def __init__(self, target: tvm.target.Target, variable_bounds: Dict[st...
    method transform_module (line 29) | def transform_module(self, mod: IRModule, _ctx: tvm.transform.PassCont...
  function _attach_multinomial_sampling_func (line 68) | def _attach_multinomial_sampling_func(bb: relax.BlockBuilder):
  function _attach_argsort_func (line 119) | def _attach_argsort_func(bb: relax.BlockBuilder):
  function full (line 142) | def full(var_result: T.handle, value: T.int32):
  function _attach_sample_with_top_p (line 152) | def _attach_sample_with_top_p(bb: relax.BlockBuilder):  # pylint: disabl...
  function _attach_renormalize_by_top_p (line 236) | def _attach_renormalize_by_top_p(bb: relax.BlockBuilder, target: tvm.tar...
  function _attach_take_probs_func (line 267) | def _attach_take_probs_func(bb: relax.BlockBuilder):
  function _attach_batch_verifier (line 343) | def _attach_batch_verifier(bb: relax.BlockBuilder):

FILE: python/mlc_llm/compiler_pass/attach_softmax_with_temperature.py
  class AttachSoftmaxWithTemperature (line 15) | class AttachSoftmaxWithTemperature:  # pylint: disable=too-few-public-me...
    method __init__ (line 18) | def __init__(
    method transform_module (line 24) | def transform_module(self, mod: IRModule, _ctx: tvm.transform.PassCont...
  class _Rewriter (line 30) | class _Rewriter(PyExprMutator):  # pylint: disable=abstract-method
    method __init__ (line 31) | def __init__(
    method transform (line 44) | def transform(self) -> IRModule:
  function _get_lse_and_softmax_func (line 99) | def _get_lse_and_softmax_func(  # pylint: disable=too-many-locals,too-ma...

FILE: python/mlc_llm/compiler_pass/attach_spec_decode_aux_funcs.py
  class AttachSpecDecodeAuxFuncs (line 10) | class AttachSpecDecodeAuxFuncs:  # pylint: disable=too-few-public-methods
    method __init__ (line 15) | def __init__(self, tensor_parallel_shards: int):
    method transform_module (line 18) | def transform_module(self, mod: IRModule, _ctx: tvm.transform.PassCont...
  function _get_scatter_2d_inplace (line 40) | def _get_scatter_2d_inplace(dtype: str, global_symbol: str):
  function _get_gather_2d_inplace (line 58) | def _get_gather_2d_inplace(dtype: str, global_symbol: str):
  function _add_scatter_hidden_states (line 76) | def _add_scatter_hidden_states(bb: BlockBuilder, tensor_parallel_shards:...
  function _add_gather_hidden_states (line 102) | def _add_gather_hidden_states(bb: BlockBuilder, tensor_parallel_shards: ...

FILE: python/mlc_llm/compiler_pass/attach_support_info.py
  class AttachVariableBounds (line 13) | class AttachVariableBounds:  # pylint: disable=too-few-public-methods
    method __init__ (line 16) | def __init__(self, variable_bounds: Dict[str, int]):
    method transform_module (line 21) | def transform_module(self, mod: IRModule, _ctx: tvm.transform.PassCont...
  class AttachAdditionalPrimFuncs (line 32) | class AttachAdditionalPrimFuncs:  # pylint: disable=too-few-public-methods
    method __init__ (line 35) | def __init__(self, functions: Dict[str, tir.PrimFunc]):
    method transform_module (line 38) | def transform_module(self, mod: IRModule, _ctx: tvm.transform.PassCont...
  class AttachMemoryPlanAttr (line 46) | class AttachMemoryPlanAttr:  # pylint: disable=too-few-public-methods
    method transform_module (line 49) | def transform_module(self, mod: IRModule, _ctx: tvm.transform.PassCont...
  class AttachCUDAGraphSymbolicCaptureHints (line 58) | class AttachCUDAGraphSymbolicCaptureHints:  # pylint: disable=too-few-pu...
    method __init__ (line 61) | def __init__(self, hints: Dict[str, List[str]]):
    method transform_module (line 64) | def transform_module(self, mod: IRModule, _ctx: tvm.transform.PassCont...
  class AttachPipelineParallelStages (line 79) | class AttachPipelineParallelStages:  # pylint: disable=too-few-public-me...
    method __init__ (line 82) | def __init__(self, pipeline_parallel_shards: int):
    method transform_module (line 85) | def transform_module(self, mod: IRModule, _ctx: tvm.transform.PassCont...
  class AttachSequenceLengthPaddingFactor (line 108) | class AttachSequenceLengthPaddingFactor:  # pylint: disable=too-few-publ...
    method __init__ (line 111) | def __init__(self, target: tvm.target.Target, metadata: Dict[str, Any]):
    method transform_module (line 115) | def transform_module(self, mod: IRModule, _ctx: tvm.transform.PassCont...

FILE: python/mlc_llm/compiler_pass/blas_dispatch.py
  class BLASDispatch (line 17) | class BLASDispatch:  # pylint: disable=too-few-public-methods,broad-exce...
    method __init__ (line 20) | def __init__(self, target: tvm.target.Target) -> None:
    method transform_module (line 34) | def transform_module(self, mod: IRModule, _ctx: tvm.transform.PassCont...

FILE: python/mlc_llm/compiler_pass/clean_up_tir_attrs.py
  class CleanUpTIRAttrs (line 10) | class CleanUpTIRAttrs:  # pylint: disable=too-few-public-methods
    method __init__ (line 13) | def __init__(self, attrs: List[str]):
    method transform_module (line 16) | def transform_module(

FILE: python/mlc_llm/compiler_pass/dispatch_kv_cache_creation.py
  function extract_creation_args (line 16) | def extract_creation_args(func: relax.Function) -> Dict[str, Any]:
  class DispatchKVCacheCreation (line 79) | class DispatchKVCacheCreation:  # pylint: disable=too-many-instance-attr...
    method __init__ (line 82) | def __init__(
    method transform_module (line 104) | def transform_module(self, mod: IRModule, _ctx: tvm.transform.PassCont...
    method attach_kv_cache_metadata (line 135) | def attach_kv_cache_metadata(self, kwargs: Dict[str, Any]):
    method create_tir_paged_kv_cache (line 144) | def create_tir_paged_kv_cache(
    method create_flashinfer_paged_kv_cache (line 182) | def create_flashinfer_paged_kv_cache(

FILE: python/mlc_llm/compiler_pass/dispatch_triton_kernel.py
  class _Rewriter (line 21) | class _Rewriter(PyExprMutator):  # pylint: disable=abstract-method
    method __init__ (line 22) | def __init__(self, mod: IRModule, target: tvm.target.Target) -> None:
    method transform (line 28) | def transform(self) -> tvm.IRModule:  # pylint: disable=too-many-locals
    method visit_call_ (line 44) | def visit_call_(self, call: relax.Call) -> relax.Expr:  # pylint: disa...
    method w8a8_block_fp8_matmul (line 62) | def w8a8_block_fp8_matmul(  # pylint: disable=too-many-locals
    method w8a8_block_fp8_group_matmul (line 106) | def w8a8_block_fp8_group_matmul(  # pylint: disable=too-many-locals
  class DispatchTritonKernel (line 158) | class DispatchTritonKernel:  # pylint: disable=too-many-instance-attribu...
    method __init__ (line 161) | def __init__(self, target: tvm.target.Target) -> None:
    method transform_module (line 169) | def transform_module(self, mod: IRModule, _ctx: tvm.transform.PassCont...

FILE: python/mlc_llm/compiler_pass/estimate_memory_usage.py
  class AttachMetadataWithMemoryUsage (line 17) | class AttachMetadataWithMemoryUsage:  # pylint: disable=too-few-public-m...
    method __init__ (line 20) | def __init__(self, metadata: Dict[str, Any]):
    method transform_module (line 23) | def transform_module(self, mod: IRModule, _ctx: tvm.transform.PassCont...
  class _MemoryEstimator (line 40) | class _MemoryEstimator(PyExprVisitor):
    method __init__ (line 43) | def __init__(self) -> None:
    method run (line 49) | def run(self, mod: IRModule) -> Dict[str, int]:
    method visit_call_ (line 65) | def visit_call_(self, call: relax.Call) -> None:  # pylint: disable=ar...
    method _builtin_tensor_alloc (line 72) | def _builtin_tensor_alloc(self, shape: relax.Expr, dtype_str: str) -> ...
    method _storage_alloc (line 83) | def _storage_alloc(self, size: relax.Expr) -> None:

FILE: python/mlc_llm/compiler_pass/fuse_add_norm.py
  function _get_add_rms_norm_decode (line 16) | def _get_add_rms_norm_decode(hidden_size: int, eps: float, TX: int, in_d...
  function _get_add_rms_norm_prefill (line 87) | def _get_add_rms_norm_prefill(hidden_size: int, eps: float, TX: int, in_...
  class FuseAddRMSNorm (line 156) | class FuseAddRMSNorm:  # pylint: disable=too-few-public-methods
    method __init__ (line 159) | def __init__(self, target: tvm.target.Target) -> None:
    method transform_module (line 169) | def transform_module(self, mod: tvm.IRModule, _ctx: tvm.transform.Pass...
  class _FuseAddRMSNormRewriter (line 175) | class _FuseAddRMSNormRewriter(PyExprMutator):  # pylint: disable=abstrac...
    method __init__ (line 176) | def __init__(self, mod: tvm.IRModule, target: tvm.target.Target):
    method transform (line 183) | def transform(self) -> tvm.IRModule:  # pylint: disable=too-many-locals
    method visit_call_ (line 193) | def visit_call_(self, call: relax.Call) -> relax.Expr:  # pylint: disa...

FILE: python/mlc_llm/compiler_pass/fuse_dequantize_matmul_ewise.py
  class FuseDequantizeMatmulEwise (line 9) | class FuseDequantizeMatmulEwise:  # pylint: disable=too-few-public-methods
    method transform_module (line 12) | def transform_module(
  function _pattern (line 37) | def _pattern(match_ewise: int, n_aux_tensor: int):

FILE: python/mlc_llm/compiler_pass/fuse_dequantize_take.py
  class FuseDequantizeTake (line 15) | class FuseDequantizeTake:  # pylint: disable=too-few-public-methods
    method transform_module (line 18) | def transform_module(  # pylint: disable=too-many-locals
  function _pattern (line 52) | def _pattern(n_aux_tensor: int, match_tir_vars: bool):

FILE: python/mlc_llm/compiler_pass/fuse_dequantize_transpose.py
  class FuseDequantizeTranspose (line 11) | class FuseDequantizeTranspose:  # pylint: disable=too-few-public-methods
    method transform_module (line 14) | def transform_module(self, mod: IRModule, _ctx: tvm.transform.PassCont...
  class _DequantizeTransposeFuser (line 20) | class _DequantizeTransposeFuser(PyExprMutator):  # pylint: disable=abstr...
    method __init__ (line 21) | def __init__(
    method transform (line 28) | def transform(self) -> IRModule:
    method visit_call_ (line 37) | def visit_call_(  # pylint: disable=arguments-renamed

FILE: python/mlc_llm/compiler_pass/fuse_ft_dequantize_matmul_epilogue.py
  class FuseFTDequantizeEpilogue (line 13) | class FuseFTDequantizeEpilogue:  # pylint: disable=too-few-public-methods
    method transform_module (line 16) | def transform_module(
  function fuse_bias (line 32) | def fuse_bias(func: relax.Function) -> relax.Function:
  function fuse_activation (line 98) | def fuse_activation(func: relax.Function) -> relax.Function:
  function fuse_residual_binary (line 188) | def fuse_residual_binary(func: relax.Function) -> relax.Function:
  function fuse_residual_unary (line 267) | def fuse_residual_unary(func: relax.Function) -> relax.Function:

FILE: python/mlc_llm/compiler_pass/fuse_transpose_matmul.py
  class FuseTransposeMatmul (line 10) | class FuseTransposeMatmul:  # pylint: disable=too-few-public-methods
    method transform_module (line 13) | def transform_module(self, mod: IRModule, _ctx: tvm.transform.PassCont...
  function _pattern (line 31) | def _pattern():
  class _TransposeMatmulFuser (line 59) | class _TransposeMatmulFuser(PyExprMutator):  # pylint: disable=abstract-...
    method __init__ (line 60) | def __init__(self, mod):
    method visit_call_ (line 63) | def visit_call_(  # pylint: disable=arguments-renamed

FILE: python/mlc_llm/compiler_pass/lift_global_buffer_alloc.py
  class LiftTIRGlobalBufferAlloc (line 13) | class LiftTIRGlobalBufferAlloc:  # pylint: disable=too-few-public-methods
    method transform_module (line 16) | def transform_module(
  class _TIRGlobalAllocRewriter (line 26) | class _TIRGlobalAllocRewriter(PyExprMutator):  # pylint: disable=abstrac...
    method __init__ (line 27) | def __init__(self, mod: IRModule):
    method transform (line 35) | def transform(self) -> IRModule:
    method visit_call_ (line 54) | def visit_call_(self, call: relax.Call):  # pylint: disable=arguments-...
  function remove_global_buf_alloc (line 93) | def remove_global_buf_alloc(
  function _has_symbolic_var (line 148) | def _has_symbolic_var(tensor_sinfo: relax.TensorStructInfo) -> bool:
  function _resolve_tir_var_mapping (line 156) | def _resolve_tir_var_mapping(  # pylint: disable=too-many-locals

FILE: python/mlc_llm/compiler_pass/low_batch_specialization.py
  class LowBatchGemvSpecialize (line 12) | class LowBatchGemvSpecialize:  # pylint: disable=too-few-public-methods
    method transform_module (line 15) | def transform_module(

FILE: python/mlc_llm/compiler_pass/pipeline.py
  class _LogProgress (line 49) | class _LogProgress:  # pylint: disable=too-few-public-methods
    method __init__ (line 52) | def __init__(self, *args):
    method transform_module (line 55) | def transform_module(self, mod: IRModule, _ctx: tvm.transform.PassCont...
  class _DebugDump (line 62) | class _DebugDump:  # pylint: disable=too-few-public-methods
    method __init__ (line 66) | def __init__(self, file_name: str, file_path: Optional[Path], show_met...
    method transform_module (line 71) | def transform_module(self, mod: IRModule, _ctx: tvm.transform.PassCont...
  function _mlc_llm_pipeline (line 82) | def _mlc_llm_pipeline(  # pylint: disable=too-many-arguments

FILE: python/mlc_llm/compiler_pass/pipeline_parallel_rewrite.py
  class PipelineParallelRewrite (line 12) | class PipelineParallelRewrite:  # pylint: disable=too-few-public-methods
    method transform_module (line 15) | def transform_module(
  class _PipelineParallelRewriter (line 25) | class _PipelineParallelRewriter(PyExprMutator):  # pylint: disable=abstr...
    method __init__ (line 26) | def __init__(self, mod: IRModule):
    method transform (line 35) | def transform(self) -> IRModule:  # pylint: disable=too-many-locals
    method _create_stage_func (line 105) | def _create_stage_func(  # pylint: disable=too-many-arguments,too-many...
    method visit_var_binding_ (line 202) | def visit_var_binding_(self, binding: relax.VarBinding) -> None:
    method visit_call_ (line 240) | def visit_call_(self, call: relax.Call) -> relax.Call:  # pylint: disa...
    method _prepare_stage_func_params_and_args (line 249) | def _prepare_stage_func_params_and_args(
    method _update_struct_info (line 261) | def _update_struct_info(
    method _copy_undefined_var (line 291) | def _copy_undefined_var(
    method _update_shape (line 301) | def _update_shape(
  function _extract_pipeline_stages (line 311) | def _extract_pipeline_stages(
  function _analyze_required_func_params (line 363) | def _analyze_required_func_params(
  class _RequiredFuncParamAnalyzer (line 376) | class _RequiredFuncParamAnalyzer(PyExprVisitor):
    method __init__ (line 379) | def __init__(self, func_params: List[relax.Var]) -> None:
    method run (line 383) | def run(self, stage_bindings: List[relax.Binding]) -> List[relax.Var]:
    method visit_var_ (line 390) | def visit_var_(self, var: relax.Var) -> None:  # pylint: disable=argum...

FILE: python/mlc_llm/compiler_pass/scatter_tuple_get_item.py
  class ScatterTupleGetItem (line 14) | class ScatterTupleGetItem:  # pylint: disable=too-few-public-methods
    method transform_module (line 17) | def transform_module(self, mod: IRModule, _ctx: tvm.transform.PassCont...
  class _Scatter (line 23) | class _Scatter(PyExprMutator):  # pylint: disable=abstract-method
    method __init__ (line 24) | def __init__(self, mod: IRModule) -> None:
    method transform (line 29) | def transform(self) -> IRModule:
    method visit_var_binding_ (line 38) | def visit_var_binding_(self, binding: relax.VarBinding):
    method visit_dataflow_var_ (line 43) | def visit_dataflow_var_(  # pylint: disable=arguments-renamed

FILE: python/mlc_llm/contrib/embeddings/embeddings.py
  function _extract_metadata (line 20) | def _extract_metadata(mod: Module):
  function _load_params (line 24) | def _load_params(
  function _get_tvm_module (line 37) | def _get_tvm_module(
  class DefaultDebugInstrument (line 52) | class DefaultDebugInstrument:
    method __init__ (line 61) | def __init__(self, debug_out: Path):
    method reset (line 75) | def reset(self, debug_out: Path):
    method __call__ (line 89) | def __call__(self, func, name, before_run, ret_val, *args):
  class MLCEmbeddings (line 111) | class MLCEmbeddings:  # pylint: disable=too-few-public-methods
    method __init__ (line 137) | def __init__(  # pylint: disable=too-many-arguments
    method embed (line 153) | def embed(self, queries: List[str]) -> tvm.runtime.Tensor:
    method _tokenize_queries (line 173) | def _tokenize_queries(self, queries: List[str]) -> Tuple[np.ndarray, n...

FILE: python/mlc_llm/contrib/embeddings/openai.py
  class MLCEmbeddings (line 18) | class MLCEmbeddings(OpenAIEmbeddings):
    method _chunk_tokens (line 19) | def _chunk_tokens(self, texts: Sequence[str]) -> Tuple[List[List], Lis...
    method _batch_embed (line 59) | def _batch_embed(
    method _abatch_embed (line 82) | async def _abatch_embed(
    method _get_len_safe_embeddings (line 107) | def _get_len_safe_embeddings(  # pylint: disable=too-many-locals,unuse...
    method _aget_len_safe_embeddings (line 142) | async def _aget_len_safe_embeddings(  # pylint: disable=too-many-local...
    method embed_documents (line 178) | def embed_documents(
    method aembed_documents (line 202) | async def aembed_documents(
    method embed_query (line 224) | def embed_query(self, text: str) -> List[float]:
    method aembed_query (line 235) | async def aembed_query(self, text: str) -> List[float]:

FILE: python/mlc_llm/conversation_template/registry.py
  class ConvTemplateRegistry (line 8) | class ConvTemplateRegistry:
    method register_conv_template (line 14) | def register_conv_template(conv_template: Conversation, override: bool...
    method get_conv_template (line 30) | def get_conv_template(name: str) -> Optional[Conversation]:

FILE: python/mlc_llm/interface/calibrate.py
  class CalibrationObserver (line 17) | class CalibrationObserver:
    method get (line 25) | def get():
    method callback (line 33) | def callback(
    method save_params (line 51) | def save_params(self, output: str):
  function sample_requests (line 63) | def sample_requests(
  function send_calibration_requests (line 106) | async def send_calibration_requests(
  function calibrate (line 131) | def calibrate(

FILE: python/mlc_llm/interface/chat.py
  function _print_help_str (line 18) | def _print_help_str():
  function _set_up_key_bindings (line 33) | def _set_up_key_bindings():
  class ChatCompletionOverride (line 48) | class ChatCompletionOverride(ConfigOverrideBase):  # pylint: disable=too...
    method from_str (line 60) | def from_str(source: str) -> "ChatCompletionOverride":
  class ModelConfigOverride (line 83) | class ModelConfigOverride(ConfigOverrideBase):  # pylint: disable=too-ma...
    method from_str (line 95) | def from_str(source: str) -> "ModelConfigOverride":
  class ChatState (line 118) | class ChatState:
    method __init__ (line 156) | def __init__(self, engine: Union[JSONFFIEngine, MLCEngine]):
    method slide_history (line 165) | def slide_history(self):
    method process_system_prompts (line 171) | def process_system_prompts(self):
    method generate (line 183) | def generate(self, prompt: str):
    method stats (line 222) | def stats(self):
    method metrics (line 240) | def metrics(self):
    method reset (line 244) | def reset(self):
    method chat (line 249) | def chat(self):
  function chat (line 285) | def chat(

FILE: python/mlc_llm/interface/compile.py
  class CompileArgs (line 28) | class CompileArgs:  # pylint: disable=too-many-instance-attributes
    method __post_init__ (line 42) | def __post_init__(self) -> None:
    method display (line 45) | def display(self) -> None:
  function _apply_preproc_to_params_and_check_pipeline (line 62) | def _apply_preproc_to_params_and_check_pipeline(
  function _infer_kv_state_kind (line 98) | def _infer_kv_state_kind(model_type) -> str:
  function _compile (line 106) | def _compile(args: CompileArgs, model_config: ConfigBase):
  function compile (line 226) | def compile(  # pylint: disable=too-many-arguments,redefined-builtin

FILE: python/mlc_llm/interface/compiler_flags.py
  class IPCAllReduceStrategyType (line 14) | class IPCAllReduceStrategyType(enum.IntEnum):
  class OptimizationFlags (line 24) | class OptimizationFlags:
    method __repr__ (line 34) | def __repr__(self) -> str:
    method from_str (line 49) | def from_str(source: str) -> "OptimizationFlags":
    method update (line 84) | def update(self, target, quantization) -> None:
  class ModelConfigOverride (line 141) | class ModelConfigOverride(ConfigOverrideBase):  # pylint: disable=too-ma...
    method __repr__ (line 153) | def __repr__(self) -> str:
    method from_str (line 170) | def from_str(source: str) -> "ModelConfigOverride":

FILE: python/mlc_llm/interface/convert_weight.py
  class ConversionArgs (line 30) | class ConversionArgs:  # pylint: disable=too-many-instance-attributes
    method display (line 42) | def display(self) -> None:
  function _resolve_base_model_dir (line 62) | def _resolve_base_model_dir(source: Path) -> Path:
  function _merge_lora_adapter_with_base_model (line 67) | def _merge_lora_adapter_with_base_model(base_source: Path, lora_adapter:...
  function _convert_args (line 102) | def _convert_args(args: ConversionArgs) -> None:  # pylint: disable=too-...
  function convert_weight (line 215) | def convert_weight(  # pylint: disable=too-many-arguments

FILE: python/mlc_llm/interface/gen_config.py
  function apply_system_defaults_for_missing_fields (line 29) | def apply_system_defaults_for_missing_fields(mlc_chat_config: MLCChatCon...
  function check_string (line 36) | def check_string(s: str) -> bool:
  function txt2rwkv_tokenizer (line 48) | def txt2rwkv_tokenizer(vocab: Path, out: Path) -> None:
  function json2rwkv_tokenizer (line 73) | def json2rwkv_tokenizer(vocab: Path, out: Path) -> None:
  function gen_config (line 90) | def gen_config(  # pylint: disable=too-many-locals,too-many-arguments,to...

FILE: python/mlc_llm/interface/jit.py
  class JITResult (line 34) | class JITResult:
  function log_jit_policy (line 41) | def log_jit_policy():
  function jit (line 50) | def jit(  # pylint: disable=too-many-locals,too-many-statements

FILE: python/mlc_llm/interface/package.py
  function build_model_library (line 21) | def build_model_library(  # pylint: disable=too-many-branches,too-many-l...
  function validate_model_lib (line 162) | def validate_model_lib(  # pylint: disable=too-many-locals,too-many-stat...
  function build_android_binding (line 264) | def build_android_binding(mlc_llm_source_dir: Path, output: Path) -> None:
  function build_iphone_binding (line 308) | def build_iphone_binding(mlc_llm_source_dir: Path, output: Path) -> None:
  function build_macabi_binding (line 325) | def build_macabi_binding(mlc_llm_source_dir: Path, output: Path) -> None:
  function package (line 349) | def package(

FILE: python/mlc_llm/interface/router.py
  function serve (line 17) | def serve(

FILE: python/mlc_llm/interface/serve.py
  function serve (line 24) | def serve(

FILE: python/mlc_llm/json_ffi/engine.py
  class EngineState (line 24) | class EngineState:
    method get_request_stream_callback (line 27) | def get_request_stream_callback(self) -> Callable[[str], None]:
    method _sync_request_stream_callback (line 35) | def _sync_request_stream_callback(self, chat_completion_stream_respons...
    method handle_chat_completion (line 39) | def handle_chat_completion(
  class BackgroundLoops (line 76) | class BackgroundLoops:
    method __init__ (line 79) | def __init__(self, ffi: dict):
    method __del__ (line 94) | def __del__(self):
    method terminate (line 97) | def terminate(self):
  class Completions (line 106) | class Completions:
    method __init__ (line 113) | def __init__(self, ffi: dict, state: EngineState, background_loops: Ba...
    method create (line 118) | def create(  # pylint: disable=too-many-arguments,too-many-locals
  class Chat (line 201) | class Chat:
    method __init__ (line 206) | def __init__(self, ffi: dict, state: EngineState, background_loops: Ba...
  class JSONFFIEngine (line 210) | class JSONFFIEngine:
    method __init__ (line 213) | def __init__(  # pylint: disable=too-many-arguments,too-many-locals
    method metrics (line 273) | def metrics(self) -> EngineMetrics:
    method _raw_chat_completion (line 277) | def _raw_chat_completion(
    method terminate (line 285) | def terminate(self):
    method _test_reload (line 289) | def _test_reload(self):
    method _test_reset (line 292) | def _test_reset(self):
    method _test_unload (line 295) | def _test_unload(self):

FILE: python/mlc_llm/libinfo.py
  function get_env_paths (line 11) | def get_env_paths(env_var, splitter):
  function get_dll_directories (line 18) | def get_dll_directories():
  function find_lib_path (line 40) | def find_lib_path(name, optional=False):

FILE: python/mlc_llm/loader/huggingface_loader.py
  class HuggingFaceLoader (line 25) | class HuggingFaceLoader:  # pylint: disable=too-few-public-methods
    method __init__ (line 55) | def __init__(
    method load (line 101) | def load(
    method _load_mlc_param (line 135) | def _load_mlc_param(self, mlc_name: str, device: Optional[Device]) -> ...
    method _load_or_quantize (line 160) | def _load_or_quantize(self, mlc_name, param, device: Device):
    method _load_file (line 184) | def _load_file(self, path: Path) -> None:
    method _unload_file (line 196) | def _unload_file(self, path: Path) -> None:
  function _loading_order (line 205) | def _loading_order(param_map: ExternMapping, torch_to_path: Dict[str, Pa...

FILE: python/mlc_llm/loader/mapping.py
  class ExternMapping (line 19) | class ExternMapping:
    method add_mapping (line 48) | def add_mapping(
    method add_unused (line 58) | def add_unused(self, name: str):
  class QuantizeMapping (line 64) | class QuantizeMapping:

FILE: python/mlc_llm/loader/standard_loader.py
  function _default_export_spec (line 18) | def _default_export_spec(model: nn.Module) -> object:
  function make_standard_hf_loader (line 22) | def make_standard_hf_loader(  # pylint: disable=too-many-arguments,too-m...

FILE: python/mlc_llm/loader/stats.py
  class Stats (line 14) | class Stats:
    method timer (line 51) | def timer(self, attr):
    method mem_add (line 63) | def mem_add(self, nbytes: int):
    method mem_rm (line 70) | def mem_rm(self, nbytes: int):
    method log_time_info (line 75) | def log_time_info(self, weight_format: str):
    method log_mem_usage (line 89) | def log_mem_usage(self):

FILE: python/mlc_llm/loader/utils.py
  function check_parameter_usage (line 20) | def check_parameter_usage(param_map: "ExternMapping", extern_weights: Se...
  function load_torch_shard (line 39) | def load_torch_shard(path: Path) -> Iterator[Tuple[str, np.ndarray]]:
  function load_safetensor_shard (line 55) | def load_safetensor_shard(path: Path) -> Iterator[Tuple[str, np.ndarray]]:

FILE: python/mlc_llm/model/baichuan/baichuan_model.py
  class BaichuanConfig (line 23) | class BaichuanConfig(ConfigBase):  # pylint: disable=too-many-instance-a...
    method __post_init__ (line 45) | def __post_init__(self):
  class BaichuanAttention (line 86) | class BaichuanAttention(nn.Module):  # pylint: disable=too-many-instance...
    method __init__ (line 87) | def __init__(self, config: BaichuanConfig):
    method forward (line 99) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
  class BaichuanMLP (line 114) | class BaichuanMLP(nn.Module):
    method __init__ (line 115) | def __init__(self, config: BaichuanConfig):
    method forward (line 129) | def forward(self, x):
  class BaichuanDecoderLayer (line 135) | class BaichuanDecoderLayer(nn.Module):
    method __init__ (line 136) | def __init__(self, config: BaichuanConfig):
    method forward (line 169) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
    method _apply_residual (line 176) | def _apply_residual(self, out, residual):
  class BaichuanModel (line 182) | class BaichuanModel(nn.Module):
    method __init__ (line 183) | def __init__(self, config: BaichuanConfig):
    method forward (line 191) | def forward(self, inputs: Tensor, paged_kv_cache: PagedKVCache):
  class BaichuanForCausalLM (line 199) | class BaichuanForCausalLM(nn.Module):  # pylint: disable=too-many-instan...
    method __init__ (line 200) | def __init__(self, config: BaichuanConfig):
    method to (line 213) | def to(self, dtype: Optional[str] = None):
    method batch_forward (line 218) | def batch_forward(
    method embed (line 234) | def embed(self, input_ids: Tensor):
    method prefill (line 239) | def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 253) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method batch_prefill (line 262) | def batch_prefill(
    method batch_decode (line 273) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 277) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method create_paged_kv_cache (line 281) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 307) | def get_default_spec(self):

FILE: python/mlc_llm/model/bert/bert_loader.py
  function huggingface (line 17) | def huggingface(
  function huggingface_bge (line 107) | def huggingface_bge(model_config: BertConfig, quantization: Quantization...

FILE: python/mlc_llm/model/bert/bert_model.py
  class BertConfig (line 22) | class BertConfig(ConfigBase):  # pylint: disable=too-many-instance-attri...
    method __post_init__ (line 42) | def __post_init__(self):
  class BertSelfAttention (line 87) | class BertSelfAttention(nn.Module):  # pylint: disable=too-many-instance...
    method __init__ (line 88) | def __init__(self, config: BertConfig):
    method forward (line 103) | def forward(self, hidden_states: Tensor, attention_mask: Tensor):
  class BertSelfOutput (line 116) | class BertSelfOutput(nn.Module):
    method __init__ (line 117) | def __init__(self, config: BertConfig):
    method forward (line 121) | def forward(self, hidden_states: Tensor, input_tensor: Tensor):
  class BertAttention (line 127) | class BertAttention(nn.Module):
    method __init__ (line 128) | def __init__(self, config: BertConfig):
    method forward (line 132) | def forward(self, hidden_states: Tensor, attention_mask: Tensor):
  class BertIntermediate (line 147) | class BertIntermediate(nn.Module):
    method __init__ (line 148) | def __init__(self, config: BertConfig):
    method forward (line 152) | def forward(self, hidden_states: Tensor):
  class BertOutput (line 158) | class BertOutput(nn.Module):
    method __init__ (line 159) | def __init__(self, config: BertConfig):
    method forward (line 163) | def forward(self, hidden_states: Tensor, input_tensor: Tensor):
  class BertLayer (line 169) | class BertLayer(nn.Module):
    method __init__ (line 170) | def __init__(self, config: BertConfig):
    method forward (line 175) | def forward(self, hidden_states: Tensor, attention_mask: Tensor):
  class BertEncoder (line 182) | class BertEncoder(nn.Module):
    method __init__ (line 183) | def __init__(self, config: BertConfig):
    method forward (line 186) | def forward(self, hidden_states: Tensor, attention_mask: Tensor):
  class BertEmbeddings (line 192) | class BertEmbeddings(nn.Module):
    method __init__ (line 193) | def __init__(self, config: BertConfig):
    method forward (line 203) | def forward(self, input_ids: Tensor, token_type_ids: Tensor, position_...
  class BertModel (line 213) | class BertModel(nn.Module):
    method __init__ (line 214) | def __init__(self, config: BertConfig):
    method to (line 219) | def to(self, dtype: Optional[str] = None):
    method forward (line 224) | def forward(self, inputs: Tensor, attention_mask: Tensor):
    method prefill (line 245) | def prefill(self, inputs: Tensor, attention_mask: Tensor):
    method get_default_spec (line 265) | def get_default_spec(self):

FILE: python/mlc_llm/model/chatglm3/chatglm3_loader.py
  function huggingface (line 14) | def huggingface(model_config: GLMConfig, quantization: Quantization) -> ...

FILE: python/mlc_llm/model/chatglm3/chatglm3_model.py
  class GLMConfig (line 23) | class GLMConfig(ConfigBase):  # pylint: disable=too-many-instance-attrib...
    method __post_init__ (line 47) | def __post_init__(self):
  class GLMAttention (line 92) | class GLMAttention(nn.Module):  # pylint: disable=too-many-instance-attr...
    method __init__ (line 93) | def __init__(self, config: GLMConfig):
    method forward (line 119) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
  class GLMMLP (line 134) | class GLMMLP(nn.Module):
    method __init__ (line 135) | def __init__(self, config: GLMConfig):
    method forward (line 160) | def forward(self, x):
  class GLMBlock (line 167) | class GLMBlock(nn.Module):
    method __init__ (line 168) | def __init__(self, config: GLMConfig):
    method forward (line 226) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
    method _apply_residual (line 233) | def _apply_residual(self, out, residual):
  class GLMTransformer (line 239) | class GLMTransformer(nn.Module):
    method __init__ (line 242) | def __init__(self, config: GLMConfig):
    method forward (line 259) | def forward(self, inputs: Tensor, paged_kv_cache: PagedKVCache):
  class ChatGLMModel (line 267) | class ChatGLMModel(nn.Module):
    method __init__ (line 268) | def __init__(self, config: GLMConfig):
    method forward (line 273) | def forward(self, inputs: Tensor, paged_kv_cache: PagedKVCache):
  class ChatGLMForCausalLM (line 279) | class ChatGLMForCausalLM(nn.Module):  # pylint: disable=too-many-instanc...
    method __init__ (line 280) | def __init__(self, config: GLMConfig):
    method to (line 296) | def to(self, dtype: Optional[str] = None):
    method batch_forward (line 301) | def batch_forward(
    method embed (line 317) | def embed(self, input_ids: Tensor):
    method prefill (line 322) | def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 336) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method batch_prefill (line 345) | def batch_prefill(
    method batch_decode (line 356) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 360) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method create_paged_kv_cache (line 364) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 390) | def get_default_spec(self):

FILE: python/mlc_llm/model/cohere/cohere_loader.py
  function _cohere_name_transform (line 19) | def _cohere_name_transform(name: str) -> str:
  function awq (line 33) | def awq(model_config: CohereConfig, quantization: Quantization) -> Exter...

FILE: python/mlc_llm/model/cohere/cohere_model.py
  class CohereConfig (line 23) | class CohereConfig(ConfigBase):  # pylint: disable=too-many-instance-att...
    method __post_init__ (line 42) | def __post_init__(self):
  class CohereMLP (line 92) | class CohereMLP(nn.Module):
    method __init__ (line 93) | def __init__(self, config: CohereConfig):
    method forward (line 106) | def forward(self, x):
  class CohereAttention (line 114) | class CohereAttention(nn.Module):
    method __init__ (line 115) | def __init__(self, config: CohereConfig):
    method forward (line 135) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
  class CohereDecoderLayer (line 151) | class CohereDecoderLayer(nn.Module):
    method __init__ (line 152) | def __init__(self, config: CohereConfig):
    method forward (line 182) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
    method _apply_parallel_residual (line 190) | def _apply_parallel_residual(self, mlp_out, residual):
  class CohereNorm (line 196) | class CohereNorm(nn.Module):
    method __init__ (line 197) | def __init__(
    method forward (line 205) | def forward(self, x: Tensor) -> Tensor:
  class CohereEmbedding (line 215) | class CohereEmbedding(nn.Embedding):
    method lm_head_forward (line 216) | def lm_head_forward(self, x: nn.Tensor):
  class CohereModel (line 224) | class CohereModel(nn.Module):
    method __init__ (line 225) | def __init__(self, config: CohereConfig):
    method forward (line 233) | def forward(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
  class CohereForCausalLM (line 241) | class CohereForCausalLM(nn.Module):
    method __init__ (line 243) | def __init__(self, config: CohereConfig) -> None:
    method to (line 256) | def to(self, dtype: Optional[str] = None):
    method batch_forward (line 261) | def batch_forward(
    method prefill (line 277) | def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 294) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method batch_prefill (line 303) | def batch_prefill(
    method batch_decode (line 314) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 318) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method embed (line 322) | def embed(self, input_ids: Tensor):
    method create_paged_kv_cache (line 328) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 354) | def get_default_spec(self):

FILE: python/mlc_llm/model/deepseek/deepseek_loader.py
  function huggingface (line 16) | def huggingface(model_config: DeepseekConfig, quantization: Quantization...

FILE: python/mlc_llm/model/deepseek/deepseek_model.py
  class DeepseekConfig (line 25) | class DeepseekConfig(ConfigBase):  # pylint: disable=too-many-instance-a...
    method __post_init__ (line 56) | def __post_init__(self):
  class DeepseekAttention (line 97) | class DeepseekAttention(nn.Module):  # pylint: disable=too-many-instance...
    method __init__ (line 98) | def __init__(self, config: DeepseekConfig):
    method forward (line 125) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
  class DeepseekMLP (line 149) | class DeepseekMLP(nn.Module):
    method __init__ (line 150) | def __init__(self, config: DeepseekConfig, intermediate_size=None):
    method forward (line 165) | def forward(self, x: Tensor):
  class DeepseekMoE (line 171) | class DeepseekMoE(nn.Module):  # pylint: disable=too-many-instance-attri...
    method __init__ (line 172) | def __init__(self, config: DeepseekConfig):
    method forward (line 196) | def forward(self, x: Tensor):  # pylint: disable=too-many-locals
  class DeepseekDecoderLayer (line 245) | class DeepseekDecoderLayer(nn.Module):  # pylint: disable=too-many-insta...
    method __init__ (line 246) | def __init__(self, config: DeepseekConfig, layer_idx: int):
    method forward (line 315) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
    method _apply_residual (line 324) | def _apply_residual(self, out, residual):
  class DeepseekModel (line 330) | class DeepseekModel(nn.Module):
    method __init__ (line 331) | def __init__(self, config: DeepseekConfig):
    method forward (line 342) | def forward(self, inputs: Tensor, paged_kv_cache: PagedKVCache):
  class DeepseekForCausalLM (line 350) | class DeepseekForCausalLM(nn.Module):  # pylint: disable=too-many-instan...
    method __init__ (line 351) | def __init__(self, config: DeepseekConfig):
    method to (line 365) | def to(self, dtype: Optional[str] = None):
    method batch_forward (line 370) | def batch_forward(
    method embed (line 386) | def embed(self, input_ids: Tensor):
    method prefill (line 391) | def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 406) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method batch_prefill (line 415) | def batch_prefill(
    method batch_decode (line 426) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 430) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method create_paged_kv_cache (line 434) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 460) | def get_default_spec(self):

FILE: python/mlc_llm/model/deepseek_v2/deepseek_v2_loader.py
  function huggingface (line 17) | def huggingface(  # pylint: disable=too-many-locals,too-many-statements

FILE: python/mlc_llm/model/deepseek_v2/deepseek_v2_model.py
  class DeepseekV2Config (line 27) | class DeepseekV2Config(ConfigBase):  # pylint: disable=too-many-instance...
    method __post_init__ (line 65) | def __post_init__(self):
  class DeepseekV2MLP (line 128) | class DeepseekV2MLP(nn.Module):
    method __init__ (line 129) | def __init__(self, config: DeepseekV2Config, hidden_size=None, interme...
    method forward (line 145) | def forward(self, x: Tensor) -> Tensor:
  function yarn_get_mscale (line 151) | def yarn_get_mscale(scale=1, mscale=1):
  class DeepseekV2YarnRotaryEmbedding (line 157) | class DeepseekV2YarnRotaryEmbedding(nn.Module):
    method __init__ (line 158) | def __init__(self, config: DeepseekV2Config):
    method forward (line 163) | def forward(
  class DeepseekV2Attention (line 212) | class DeepseekV2Attention(nn.Module):  # pylint: disable=too-many-instan...
    method __init__ (line 213) | def __init__(self, config: DeepseekV2Config):
    method forward (line 272) | def forward(  # pylint: disable=too-many-arguments
    method self_attn (line 318) | def self_attn(  # pylint: disable=too-many-arguments
    method cross_attn (line 341) | def cross_attn(
  class DeepseekV2MoE (line 390) | class DeepseekV2MoE(nn.Module):  # pylint: disable=too-many-instance-att...
    method __init__ (line 391) | def __init__(self, config: DeepseekV2Config):
    method forward (line 434) | def forward(self, x: Tensor):
    method to (line 519) | def to(self, dtype: Optional[str] = None):
  class DeepseekV2DecoderLayer (line 526) | class DeepseekV2DecoderLayer(nn.Module):
    method __init__ (line 527) | def __init__(self, config: DeepseekV2Config, layer_idx: int):
    method forward (line 607) | def forward(  # pylint: disable=too-many-arguments
    method _apply_residual (line 625) | def _apply_residual(self, out, residual):
  class DeepseekV2Model (line 631) | class DeepseekV2Model(nn.Module):
    method __init__ (line 632) | def __init__(self, config: DeepseekV2Config):
    method forward (line 642) | def forward(
  class DeepseekV2ForCausalLM (line 658) | class DeepseekV2ForCausalLM(nn.Module):  # pylint: disable=too-many-inst...
    method __init__ (line 659) | def __init__(self, config: DeepseekV2Config):
    method to (line 678) | def to(self, dtype: Optional[str] = None):
    method batch_forward (line 683) | def batch_forward(
    method embed (line 700) | def embed(self, input_ids: Tensor):
    method prefill (line 705) | def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method extend (line 719) | def extend(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 733) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method batch_prefill (line 742) | def batch_prefill(
    method batch_extend (line 755) | def batch_extend(
    method batch_decode (line 768) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 772) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method create_paged_kv_cache (line 776) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 804) | def get_default_spec(self):

FILE: python/mlc_llm/model/eagle/eagle_loader.py
  function awq (line 26) | def awq(model_config: EagleConfig, quantization: Quantization) -> Extern...

FILE: python/mlc_llm/model/eagle/eagle_model.py
  class EagleConfig (line 22) | class EagleConfig(LlamaConfig):
  class EagleDecoderLayer (line 31) | class EagleDecoderLayer(nn.Module):
    method __init__ (line 32) | def __init__(self, config: EagleConfig, index: int):
    method forward (line 64) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
    method _apply_residual (line 73) | def _apply_residual(self, out, residual):
  class EagleForCausalLM (line 79) | class EagleForCausalLM(nn.Module):  # pylint: disable=too-many-instance-...
    method __init__ (line 80) | def __init__(self, config: EagleConfig):
    method fuse_embed_hidden_states (line 103) | def fuse_embed_hidden_states(self, input_embed: Tensor, hidden_states:...
    method forward_to_last_hidden_states (line 108) | def forward_to_last_hidden_states(self, hidden_states: Tensor, paged_k...
    method forward (line 113) | def forward(self, input_embed: Tensor, hidden_states: Tensor, paged_kv...
    method to (line 118) | def to(self, dtype: Optional[str] = None):
    method batch_forward (line 123) | def batch_forward(
    method embed (line 136) | def embed(self, input_ids: Tensor):
    method prefill_to_last_hidden_states (line 141) | def prefill_to_last_hidden_states(self, hidden_states: Tensor, paged_k...
    method decode_to_last_hidden_states (line 147) | def decode_to_last_hidden_states(self, hidden_states: Tensor, paged_kv...
    method batch_prefill_to_last_hidden_states (line 153) | def batch_prefill_to_last_hidden_states(
    method batch_decode_to_last_hidden_states (line 161) | def batch_decode_to_last_hidden_states(
    method create_paged_kv_cache (line 167) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 193) | def get_default_spec(self):

FILE: python/mlc_llm/model/gemma/gemma_loader.py
  function huggingface (line 15) | def huggingface(model_config: GemmaConfig, quantization: Quantization) -...

FILE: python/mlc_llm/model/gemma/gemma_model.py
  class GemmaConfig (line 21) | class GemmaConfig(ConfigBase):  # pylint: disable=too-many-instance-attr...
    method __post_init__ (line 41) | def __post_init__(self):
  class GemmaEmbedding (line 91) | class GemmaEmbedding(nn.Embedding):
    method lm_head_forward (line 96) | def lm_head_forward(self, x: nn.Tensor):
  class GemmaMLP (line 104) | class GemmaMLP(nn.Module):
    method __init__ (line 105) | def __init__(self, config: GemmaConfig):
    method forward (line 120) | def forward(self, x: Tensor):
  class GemmaAttention (line 126) | class GemmaAttention(nn.Module):  # pylint: disable=too-many-instance-at...
    method __init__ (line 127) | def __init__(self, config: GemmaConfig):
    method forward (line 148) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
  class GemmaDecoderLayer (line 164) | class GemmaDecoderLayer(nn.Module):
    method __init__ (line 165) | def __init__(self, config: GemmaConfig):
    method forward (line 196) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
    method _apply_residual (line 203) | def _apply_residual(self, out, residual):
  class GemmaModel (line 209) | class GemmaModel(nn.Module):
    method __init__ (line 210) | def __init__(self, config: GemmaConfig):
    method forward (line 219) | def forward(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
  class GemmaForCausalLM (line 228) | class GemmaForCausalLM(nn.Module):  # pylint: disable=too-many-instance-...
    method __init__ (line 229) | def __init__(self, config: GemmaConfig):
    method to (line 241) | def to(self, dtype: Optional[str] = None):
    method get_logits (line 246) | def get_logits(self, hidden_states: Tensor):
    method batch_forward (line 252) | def batch_forward(
    method embed (line 266) | def embed(self, input_ids: Tensor):
    method prefill (line 271) | def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 283) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method batch_prefill (line 290) | def batch_prefill(
    method batch_decode (line 301) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 305) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method create_paged_kv_cache (line 309) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 335) | def get_default_spec(self):

FILE: python/mlc_llm/model/gemma2/gemma2_loader.py
  function huggingface (line 15) | def huggingface(model_config: Gemma2Config, quantization: Quantization) ...

FILE: python/mlc_llm/model/gemma2/gemma2_model.py
  class Gemma2Config (line 23) | class Gemma2Config(GemmaConfig):
    method __post_init__ (line 35) | def __post_init__(self):
  class Gemma2Attention (line 45) | class Gemma2Attention(GemmaAttention):
    method __init__ (line 46) | def __init__(self, config: Gemma2Config):
  class Gemma2DecoderLayer (line 51) | class Gemma2DecoderLayer(nn.Module):
    method __init__ (line 52) | def __init__(self, config: Gemma2Config):
    method forward (line 89) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
    method _apply_post_matmul_norm (line 101) | def _apply_post_matmul_norm(self, out: Tensor, norm: nn.Tensor):
  class Gemma2Model (line 107) | class Gemma2Model(GemmaModel):
    method __init__ (line 108) | def __init__(self, config: Gemma2Config):
  class Gemma2ForCausalLM (line 115) | class Gemma2ForCausalLM(GemmaForCausalLM):  # pylint: disable=too-many-i...
    method __init__ (line 116) | def __init__(self, config: Gemma2Config):
    method get_logits (line 121) | def get_logits(self, hidden_states: Tensor):

FILE: python/mlc_llm/model/gemma3/gemma3_loader.py
  function huggingface (line 15) | def huggingface(model_config: Gemma3Config, quantization: Quantization) ...

FILE: python/mlc_llm/model/gemma3/gemma3_model.py
  class Gemma3TextConfig (line 22) | class Gemma3TextConfig(ConfigBase):  # pylint: disable=too-many-instance...
    method __post_init__ (line 46) | def __post_init__(self):
  class Gemma3Config (line 96) | class Gemma3Config(ConfigBase):  # pylint: disable=too-many-instance-att...
    method __post_init__ (line 109) | def __post_init__(self):
  class Gemma3MLP (line 134) | class Gemma3MLP(nn.Module):
    method __init__ (line 135) | def __init__(self, config: Gemma3Config):
    method forward (line 154) | def forward(self, x: Tensor):
  class Gemma3Attention (line 160) | class Gemma3Attention(nn.Module):  # pylint: disable=too-many-instance-a...
    method __init__ (line 161) | def __init__(self, config: Gemma3Config):
    method forward (line 201) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
  class Gemma3DecoderLayer (line 224) | class Gemma3DecoderLayer(nn.Module):
    method __init__ (line 225) | def __init__(self, config: Gemma3Config):
    method forward (line 263) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
    method _apply_post_matmul_norm (line 275) | def _apply_post_matmul_norm(self, out: Tensor, norm: nn.Tensor):
  class Gemma3TextModel (line 281) | class Gemma3TextModel(nn.Module):
    method __init__ (line 282) | def __init__(self, config: Gemma3Config):
    method forward (line 296) | def forward(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
  class Gemma3LanguageModel (line 305) | class Gemma3LanguageModel(nn.Module):  # pylint: disable=too-many-instan...
    method __init__ (line 306) | def __init__(self, config: Gemma3Config):
    method to (line 320) | def to(self, dtype: Optional[str] = None):
    method get_logits (line 325) | def get_logits(self, hidden_states: Tensor):
    method batch_forward (line 331) | def batch_forward(
    method embed (line 345) | def embed(self, input_ids: Tensor):
    method prefill (line 350) | def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 362) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method batch_prefill (line 369) | def batch_prefill(
    method batch_decode (line 380) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 384) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method create_paged_kv_cache (line 388) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 425) | def get_default_spec(self):
  class Gemma3ForCausalLM (line 490) | class Gemma3ForCausalLM(nn.Module):  # pylint: disable=too-many-instance...
    method __init__ (line 491) | def __init__(self, config: Gemma3Config):
    method to (line 499) | def to(self, dtype: Optional[str] = None):
    method get_logits (line 505) | def get_logits(self, hidden_states: Tensor):
    method batch_forward (line 511) | def batch_forward(
    method embed (line 525) | def embed(self, input_ids: Tensor):
    method prefill (line 530) | def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 542) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method batch_prefill (line 549) | def batch_prefill(
    method batch_decode (line 560) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 564) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method create_paged_kv_cache (line 568) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 607) | def get_default_spec(self):

FILE: python/mlc_llm/model/gpt2/gpt2_loader.py
  function huggingface (line 14) | def huggingface(model_config: GPT2Config, quantization: Quantization) ->...

FILE: python/mlc_llm/model/gpt2/gpt2_model.py
  class GPT2Config (line 23) | class GPT2Config(ConfigBase):  # pylint: disable=too-many-instance-attri...
    method __post_init__ (line 40) | def __post_init__(self):
  class GPT2Attention (line 83) | class GPT2Attention(nn.Module):  # pylint: disable=too-many-instance-att...
    method __init__ (line 84) | def __init__(self, config: GPT2Config):
    method forward (line 102) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
  class GPT2MLP (line 127) | class GPT2MLP(nn.Module):
    method __init__ (line 128) | def __init__(self, config: GPT2Config):
    method forward (line 139) | def forward(self, hidden_states: Tensor):
  class GPT2Block (line 146) | class GPT2Block(nn.Module):
    method __init__ (line 147) | def __init__(self, config: GPT2Config):
    method forward (line 179) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
    method _apply_residual (line 192) | def _apply_residual(self, out, residual):
  class GPT2Model (line 198) | class GPT2Model(nn.Module):
    method __init__ (line 199) | def __init__(self, config: GPT2Config):
    method forward (line 206) | def forward(self, inputs: Tensor, paged_kv_cache: PagedKVCache):
  class GPT2LMHeadModel (line 221) | class GPT2LMHeadModel(nn.Module):  # pylint: disable=too-many-instance-a...
    method __init__ (line 222) | def __init__(self, config: GPT2Config):
    method to (line 232) | def to(self, dtype: Optional[str] = None):
    method batch_forward (line 237) | def batch_forward(
    method embed (line 253) | def embed(self, input_ids: Tensor):
    method prefill (line 258) | def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 272) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method batch_prefill (line 281) | def batch_prefill(
    method batch_decode (line 292) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 296) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method create_paged_kv_cache (line 300) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 326) | def get_default_spec(self):

FILE: python/mlc_llm/model/gpt_bigcode/gpt_bigcode_model.py
  class GPTBigCodeConfig (line 23) | class GPTBigCodeConfig(ConfigBase):  # pylint: disable=too-many-instance...
    method __post_init__ (line 39) | def __post_init__(self):
  class GPTBigCodeMLP (line 75) | class GPTBigCodeMLP(nn.Module):
    method __init__ (line 76) | def __init__(self, config: GPTBigCodeConfig):
    method forward (line 82) | def forward(self, x: Tensor):
  class GPTBigCodeAttention (line 89) | class GPTBigCodeAttention(nn.Module):  # pylint: disable=too-many-instan...
    method __init__ (line 90) | def __init__(self, config: GPTBigCodeConfig):
    method forward (line 109) | def forward(
  class GPTBigCodeBlock (line 131) | class GPTBigCodeBlock(nn.Module):
    method __init__ (line 132) | def __init__(self, config: GPTBigCodeConfig):
    method forward (line 157) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
  class GPTBigCodeModel (line 165) | class GPTBigCodeModel(nn.Module):
    method __init__ (line 166) | def __init__(self, config: GPTBigCodeConfig):
    method forward (line 173) | def forward(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
  class GPTBigCodeForCausalLM (line 188) | class GPTBigCodeForCausalLM(nn.Module):  # pylint: disable=too-many-inst...
    method __init__ (line 189) | def __init__(self, config: GPTBigCodeConfig):
    method to (line 200) | def to(self, dtype: Optional[str] = None):
    method batch_forward (line 205) | def batch_forward(
    method embed (line 221) | def embed(self, input_ids: Tensor):
    method prefill (line 226) | def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 240) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method batch_prefill (line 249) | def batch_prefill(
    method batch_decode (line 260) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 264) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method create_paged_kv_cache (line 268) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 294) | def get_default_spec(self):

FILE: python/mlc_llm/model/gpt_j/gpt_j_model.py
  class GPTJConfig (line 25) | class GPTJConfig(ConfigBase):  # pylint: disable=too-many-instance-attri...
    method __post_init__ (line 44) | def __post_init__(self):
  class GPTJAttention (line 85) | class GPTJAttention(nn.Module):  # pylint: disable=too-many-instance-att...
    method __init__ (line 86) | def __init__(self, config: GPTJConfig):
    method forward (line 100) | def forward(  # pylint: disable=too-many-locals
  class GPTJMLP (line 129) | class GPTJMLP(nn.Module):
    method __init__ (line 130) | def __init__(self, config: GPTJConfig):  # in MLP: intermediate_size= ...
    method forward (line 137) | def forward(self, hidden_states: Tensor):
  class GPTJBlock (line 144) | class GPTJBlock(nn.Module):
    method __init__ (line 145) | def __init__(self, config: GPTJConfig):
    method forward (line 172) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
    method _apply_residual (line 180) | def _apply_residual(self, out, residual):
  class GPTJModel (line 186) | class GPTJModel(nn.Module):
    method __init__ (line 187) | def __init__(self, config: GPTJConfig):
    method forward (line 194) | def forward(self, inputs: Tensor, paged_kv_cache: PagedKVCache):
  class GPTJForCausalLM (line 202) | class GPTJForCausalLM(nn.Module):  # pylint: disable=too-many-instance-a...
    method __init__ (line 203) | def __init__(self, config: GPTJConfig):
    method to (line 218) | def to(self, dtype: Optional[str] = None):
    method batch_forward (line 223) | def batch_forward(
    method embed (line 239) | def embed(self, input_ids: Tensor):
    method prefill (line 244) | def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 258) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method batch_prefill (line 267) | def batch_prefill(
    method batch_decode (line 278) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 282) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method create_paged_kv_cache (line 286) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 314) | def get_default_spec(self):

FILE: python/mlc_llm/model/gpt_neox/gpt_neox_loader.py
  function huggingface (line 16) | def huggingface(model_config: GPTNeoXConfig, quantization: Quantization)...

FILE: python/mlc_llm/model/gpt_neox/gpt_neox_model.py
  class GPTNeoXConfig (line 23) | class GPTNeoXConfig(ConfigBase):  # pylint: disable=too-many-instance-at...
    method __post_init__ (line 43) | def __post_init__(self):
  class GPTNeoXAttention (line 90) | class GPTNeoXAttention(nn.Module):  # pylint: disable=too-many-instance-...
    method __init__ (line 93) | def __init__(self, config: GPTNeoXConfig):
    method forward (line 112) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
  class GPTNeoXMLP (line 131) | class GPTNeoXMLP(nn.Module):
    method __init__ (line 132) | def __init__(self, config: GPTNeoXConfig):
    method forward (line 152) | def forward(self, hidden_states: Tensor):
  class GPTNeoXLayer (line 166) | class GPTNeoXLayer(nn.Module):
    method __init__ (line 167) | def __init__(self, config: GPTNeoXConfig):
    method forward (line 205) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
    method _apply_residual (line 226) | def _apply_residual(self, out, residual):
  class GPTNeoXModel (line 232) | class GPTNeoXModel(nn.Module):
    method __init__ (line 233) | def __init__(self, config: GPTNeoXConfig):
    method forward (line 238) | def forward(self, inputs: Tensor, paged_kv_cache: PagedKVCache):
  class GPTNeoXForCausalLM (line 247) | class GPTNeoXForCausalLM(nn.Module):  # pylint: disable=too-many-instanc...
    method __init__ (line 248) | def __init__(self, config: GPTNeoXConfig):
    method to (line 266) | def to(self, dtype: Optional[str] = None):
    method batch_forward (line 271) | def batch_forward(
    method embed (line 287) | def embed(self, input_ids: Tensor):
    method prefill (line 292) | def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 306) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method batch_prefill (line 315) | def batch_prefill(
    method batch_decode (line 326) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 330) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method create_paged_kv_cache (line 334) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 361) | def get_default_spec(self):

FILE: python/mlc_llm/model/internlm/internlm_model.py
  class InternLMConfig (line 23) | class InternLMConfig(ConfigBase):  # pylint: disable=too-many-instance-a...
    method __post_init__ (line 44) | def __post_init__(self):
  class InternLMAttention (line 85) | class InternLMAttention(nn.Module):  # pylint: disable=too-many-instance...
    method __init__ (line 86) | def __init__(self, config: InternLMConfig):
    method forward (line 102) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
  class InternLMMLP (line 117) | class InternLMMLP(nn.Module):
    method __init__ (line 118) | def __init__(self, config: InternLMConfig):
    method forward (line 133) | def forward(self, x):
  class InternLMDecoderLayer (line 139) | class InternLMDecoderLayer(nn.Module):
    method __init__ (line 140) | def __init__(self, config: InternLMConfig):
    method forward (line 187) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
    method _apply_residual (line 194) | def _apply_residual(self, out, residual):
  class InternLMModel (line 200) | class InternLMModel(nn.Module):
    method __init__ (line 201) | def __init__(self, config: InternLMConfig):
    method forward (line 208) | def forward(self, inputs: Tensor, paged_kv_cache: PagedKVCache):
  class InternLMForCausalLM (line 216) | class InternLMForCausalLM(nn.Module):  # pylint: disable=too-many-instan...
    method __init__ (line 217) | def __init__(self, config: InternLMConfig):
    method to (line 230) | def to(self, dtype: Optional[str] = None):
    method batch_forward (line 235) | def batch_forward(
    method embed (line 251) | def embed(self, input_ids: Tensor):
    method prefill (line 256) | def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 270) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method batch_prefill (line 279) | def batch_prefill(
    method batch_decode (line 290) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 294) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method create_paged_kv_cache (line 298) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 324) | def get_default_spec(self):

FILE: python/mlc_llm/model/internlm2/internlm2_loader.py
  function huggingface (line 17) | def huggingface(model_config: InternLM2ForCausalLM, quantization: Quanti...

FILE: python/mlc_llm/model/internlm2/internlm2_model.py
  class InternLM2Config (line 23) | class InternLM2Config(ConfigBase):  # pylint: disable=too-many-instance-...
    method __post_init__ (line 46) | def __post_init__(self):
  class InternLM2Attention (line 87) | class InternLM2Attention(nn.Module):  # pylint: disable=too-many-instanc...
    method __init__ (line 88) | def __init__(self, config: InternLM2Config):
    method forward (line 108) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
  class InternLM2MLP (line 123) | class InternLM2MLP(nn.Module):
    method __init__ (line 124) | def __init__(self, config: InternLM2Config):
    method forward (line 138) | def forward(self, x: Tensor):
  class InternLM2DecoderLayer (line 144) | class InternLM2DecoderLayer(nn.Module):
    method __init__ (line 145) | def __init__(self, config: InternLM2Config):
    method forward (line 179) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
    method _apply_residual (line 190) | def _apply_residual(self, out, residual):
  class InternLM2Model (line 196) | class InternLM2Model(nn.Module):
    method __init__ (line 197) | def __init__(self, config: InternLM2Config):
    method forward (line 205) | def forward(self, inputs: Tensor, paged_kv_cache: PagedKVCache):
  class InternLM2ForCausalLM (line 213) | class InternLM2ForCausalLM(nn.Module):  # pylint: disable=R0902
    method __init__ (line 214) | def __init__(self, config: InternLM2Config):
    method to (line 227) | def to(self, dtype: Optional[str] = None):
    method batch_forward (line 232) | def batch_forward(
    method embed (line 248) | def embed(self, input_ids: Tensor):
    method prefill (line 253) | def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 267) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method batch_prefill (line 276) | def batch_prefill(
    method batch_decode (line 287) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 291) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method create_paged_kv_cache (line 295) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 321) | def get_default_spec(self):

FILE: python/mlc_llm/model/llama/llama_loader.py
  function awq (line 25) | def awq(model_config: LlamaConfig, quantization: Quantization) -> Extern...

FILE: python/mlc_llm/model/llama/llama_model.py
  class LlamaConfig (line 23) | class LlamaConfig(ConfigBase):  # pylint: disable=too-many-instance-attr...
    method __post_init__ (line 45) | def __post_init__(self):  # pylint: disable=too-many-branches
  class LlamaFFN (line 108) | class LlamaFFN(nn.Module):
    method __init__ (line 109) | def __init__(self, config: LlamaConfig):
    method forward (line 124) | def forward(self, x: Tensor):
  class LlamaEmbedding (line 130) | class LlamaEmbedding(nn.Embedding):
    method lm_head_forward (line 133) | def lm_head_forward(self, x: nn.Tensor):
  class LlamaAttention (line 141) | class LlamaAttention(nn.Module):  # pylint: disable=too-many-instance-at...
    method __init__ (line 142) | def __init__(self, config: LlamaConfig):
    method forward (line 159) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
  class LlamaDecoderLayer (line 175) | class LlamaDecoderLayer(nn.Module):
    method __init__ (line 176) | def __init__(self, config: LlamaConfig):
    method forward (line 206) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
    method _apply_residual (line 213) | def _apply_residual(self, out, residual):
  class LlamaModel (line 219) | class LlamaModel(nn.Module):
    method __init__ (line 220) | def __init__(self, config: LlamaConfig):
    method forward (line 239) | def forward(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
  class LlamaForCausalLM (line 249) | class LlamaForCausalLM(nn.Module):  # pylint: disable=too-many-instance-...
    method __init__ (line 250) | def __init__(self, config: LlamaConfig):
    method to (line 284) | def to(self, dtype: Optional[str] = None):
    method batch_forward (line 289) | def batch_forward(
    method batch_forward_to_last_hidden_states (line 304) | def batch_forward_to_last_hidden_states(
    method embed (line 314) | def embed(self, input_ids: Tensor):
    method get_logits (line 319) | def get_logits(self, hidden_states: Tensor):
    method batch_select_last_hidden_states (line 329) | def batch_select_last_hidden_states(self, hidden_states: Tensor, logit...
    method prefill (line 336) | def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 348) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method prefill_to_last_hidden_states (line 355) | def prefill_to_last_hidden_states(self, input_embed: Tensor, paged_kv_...
    method decode_to_last_hidden_states (line 361) | def decode_to_last_hidden_states(self, input_embed: Tensor, paged_kv_c...
    method batch_prefill (line 367) | def batch_prefill(
    method batch_decode (line 376) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 380) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_prefill_to_last_hidden_states (line 384) | def batch_prefill_to_last_hidden_states(
    method batch_decode_to_last_hidden_states (line 390) | def batch_decode_to_last_hidden_states(
    method batch_verify_to_last_hidden_states (line 396) | def batch_verify_to_last_hidden_states(
    method create_paged_kv_cache (line 402) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 431) | def get_default_spec(self):

FILE: python/mlc_llm/model/llama4/llama4_loader.py
  function huggingface (line 16) | def huggingface(model_config: Llama4Config, quantization: Quantization) ...

FILE: python/mlc_llm/model/llama4/llama4_model.py
  class Llama4TextConfig (line 26) | class Llama4TextConfig(ConfigBase):  # pylint: disable=too-many-instance...
    method __post_init__ (line 56) | def __post_init__(self):  # pylint: disable=too-many-branches
  class Llama4Config (line 96) | class Llama4Config(ConfigBase):  # pylint: disable=too-many-instance-att...
    method __post_init__ (line 111) | def __post_init__(self) -> None:
  class Llama4TextMLP (line 161) | class Llama4TextMLP(nn.Module):
    method __init__ (line 162) | def __init__(self, config: Llama4Config):
    method forward (line 181) | def forward(self, x: Tensor):
  class LlamaEmbedding (line 189) | class LlamaEmbedding(nn.Embedding):
    method lm_head_forward (line 192) | def lm_head_forward(self, x: nn.Tensor):
  class Llama4TextL2Norm (line 200) | class Llama4TextL2Norm(nn.Module):
    method __init__ (line 201) | def __init__(self, eps, hidden_size):
    method forward (line 205) | def forward(self, x):
  class Llama4TextAttention (line 210) | class Llama4TextAttention(nn.Module):  # pylint: disable=too-many-instan...
    method __init__ (line 211) | def __init__(self, config: Llama4Config, layer_idx):
    method forward (line 264) | def forward(  # pylint: disable=too-many-locals
  class Llama4TextExperts (line 338) | class Llama4TextExperts(nn.Module):
    method __init__ (line 339) | def __init__(self, config: Llama4Config):
    method forward (line 353) | def forward(self, hidden_states):
  class Llama4Router (line 362) | class Llama4Router(nn.Module):
    method __init__ (line 363) | def __init__(self, config: Llama4Config):
    method forward (line 373) | def forward(self, hidden_states):
  class Llama4TextMoe (line 390) | class Llama4TextMoe(nn.Module):
    method __init__ (line 391) | def __init__(self, config: Llama4Config):
    method forward (line 399) | def forward(self, hidden_states):
  class Llama4TextDecoderLayer (line 419) | class Llama4TextDecoderLayer(nn.Module):
    method __init__ (line 420) | def __init__(self, config: Llama4Config, layer_idx):
    method forward (line 488) | def forward(
    method _apply_residual (line 510) | def _apply_residual(self, out, residual):
  class Llama4TextModel (line 516) | class Llama4TextModel(nn.Module):
    method __init__ (line 517) | def __init__(self, config: Llama4Config):
    method forward (line 533) | def forward(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
  class Llama4ForCausalLM (line 545) | class Llama4ForCausalLM(nn.Module):  # pylint: disable=too-many-instance...
    method __init__ (line 546) | def __init__(self, config: Llama4Config):
    method to (line 564) | def to(self, dtype: Optional[str] = None):
    method batch_forward (line 569) | def batch_forward(
    method batch_forward_to_last_hidden_states (line 584) | def batch_forward_to_last_hidden_states(
    method embed (line 594) | def embed(self, input_ids: Tensor):
    method get_logits (line 599) | def get_logits(self, hidden_states: Tensor):
    method batch_select_last_hidden_states (line 609) | def batch_select_last_hidden_states(self, hidden_states: Tensor, logit...
    method prefill (line 616) | def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 628) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method prefill_to_last_hidden_states (line 635) | def prefill_to_last_hidden_states(self, input_embed: Tensor, paged_kv_...
    method decode_to_last_hidden_states (line 641) | def decode_to_last_hidden_states(self, input_embed: Tensor, paged_kv_c...
    method batch_prefill (line 647) | def batch_prefill(
    method batch_decode (line 656) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 660) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_prefill_to_last_hidden_states (line 664) | def batch_prefill_to_last_hidden_states(
    method batch_decode_to_last_hidden_states (line 670) | def batch_decode_to_last_hidden_states(
    method batch_verify_to_last_hidden_states (line 676) | def batch_verify_to_last_hidden_states(
    method create_paged_kv_cache (line 682) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 710) | def get_default_spec(self):

FILE: python/mlc_llm/model/llava/llava_loader.py
  function _num_layers (line 19) | def _num_layers(config: object) -> int:
  function awq (line 31) | def awq(model_config: LlavaConfig, quantization: Quantization) -> Extern...

FILE: python/mlc_llm/model/llava/llava_model.py
  class LlavaConfig (line 36) | class LlavaConfig(ConfigBase):  # pylint: disable=too-many-instance-attr...
    method __post_init__ (line 53) | def __post_init__(self) -> None:
    method get_hf_config (line 90) | def get_hf_config(self, text_config_dict: Dict[str, Any]) -> Dict[str,...
  class LlavaMultiModalProjector (line 121) | class LlavaMultiModalProjector(nn.Module):
    method __init__ (line 122) | def __init__(self, config: LlavaConfig):
    method forward (line 133) | def forward(self, image_features: Tensor) -> Tensor:
  class LlavaForCausalLM (line 140) | class LlavaForCausalLM(Module):
    method __init__ (line 141) | def __init__(self, config: LlavaConfig):
    method to (line 151) | def to(self, dtype: Optional[str] = None):
    method embed (line 157) | def embed(self, input_ids: Tensor) -> Tensor:
    method image_preprocess (line 160) | def image_preprocess(self, pixel_values: Tensor) -> Tensor:
    method image_embed (line 179) | def image_embed(self, pixel_values: Tensor) -> Tensor:
    method batch_forward (line 196) | def batch_forward(
    method prefill (line 206) | def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 211) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method batch_prefill (line 216) | def batch_prefill(
    method batch_decode (line 224) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 227) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method create_paged_kv_cache (line 230) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 258) | def get_default_spec(self):

FILE: python/mlc_llm/model/medusa/medusa_model.py
  class MedusaConfig (line 15) | class MedusaConfig(ConfigBase):  # pylint: disable=too-many-instance-att...
  class ResBlock (line 35) | class ResBlock(nn.Module):
    method __init__ (line 38) | def __init__(self, hidden_size):
    method forward (line 43) | def forward(self, x):
  class MedusaModel (line 47) | class MedusaModel(nn.Module):
    method __init__ (line 50) | def __init__(self, config: MedusaConfig):
    method get_default_spec (line 63) | def get_default_spec(self):
    method get_logits (line 75) | def get_logits(self, hidden_states: nn.Tensor):
    method to (line 81) | def to(self, dtype: Optional[str] = None):

FILE: python/mlc_llm/model/minicpm/minicpm_loader.py
  function huggingface (line 16) | def huggingface(model_config: MiniCPMConfig, quantization: Quantization)...

FILE: python/mlc_llm/model/minicpm/minicpm_model.py
  class MiniCPMConfig (line 26) | class MiniCPMConfig(ConfigBase):  # pylint: disable=too-many-instance-at...
    method __post_init__ (line 54) | def __post_init__(self):
  class MiniCPMAttention (line 95) | class MiniCPMAttention(nn.Module):  # pylint: disable=too-many-instance-...
    method __init__ (line 96) | def __init__(self, config: MiniCPMConfig):
    method forward (line 120) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
  class MiniCPMEmbedding (line 144) | class MiniCPMEmbedding(nn.Embedding):
    method lm_head_forward (line 149) | def lm_head_forward(self, x: nn.Tensor):
  class MiniCPMMLP (line 157) | class MiniCPMMLP(nn.Module):
    method __init__ (line 158) | def __init__(self, config: MiniCPMConfig):
    method forward (line 171) | def forward(self, x: Tensor):
  class MiniCPMMoE (line 177) | class MiniCPMMoE(nn.Module):
    method __init__ (line 178) | def __init__(self, config: MiniCPMConfig):
    method forward (line 197) | def forward(self, x: Tensor):  # pylint: disable=too-many-locals
  class MiniCPMDecoderLayer (line 255) | class MiniCPMDecoderLayer(nn.Module):  # pylint: disable=too-many-instan...
    method __init__ (line 256) | def __init__(self, config: MiniCPMConfig):
    method forward (line 304) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
    method _apply_residual (line 321) | def _apply_residual(self, out, residual):
  class MiniCPMModel (line 327) | class MiniCPMModel(nn.Module):
    method __init__ (line 328) | def __init__(self, config: MiniCPMConfig):
    method forward (line 336) | def forward(self, inputs: Tensor, paged_kv_cache: PagedKVCache):
  class MiniCPMForCausalLM (line 344) | class MiniCPMForCausalLM(nn.Module):  # pylint: disable=too-many-instanc...
    method __init__ (line 345) | def __init__(self, config: MiniCPMConfig):
    method to (line 363) | def to(self, dtype: Optional[str] = None):
    method batch_forward (line 368) | def batch_forward(
    method embed (line 387) | def embed(self, input_ids: Tensor):
    method prefill (line 392) | def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 409) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method batch_prefill (line 421) | def batch_prefill(
    method batch_decode (line 432) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 436) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method create_paged_kv_cache (line 440) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 466) | def get_default_spec(self):

FILE: python/mlc_llm/model/ministral3/ministral3_loader.py
  function _dequantize_block_scale_weight (line 17) | def _dequantize_block_scale_weight(  # pylint: disable=too-many-locals
  function huggingface (line 44) | def huggingface(  # pylint: disable=too-many-locals,too-many-statements

FILE: python/mlc_llm/model/ministral3/ministral3_model.py
  class Ministral3Config (line 25) | class Ministral3Config(ConfigBase):  # pylint: disable=too-many-instance...
    method from_dict (line 52) | def from_dict(  # type: ignore[override]
    method __post_init__ (line 68) | def __post_init__(self):  # pylint: disable=too-many-branches,too-many...
  class Ministral3Embedding (line 178) | class Ministral3Embedding(nn.Embedding):
    method lm_head_forward (line 183) | def lm_head_forward(self, x: nn.Tensor):
  class Ministral3MLP (line 194) | class Ministral3MLP(nn.Module):
    method __init__ (line 197) | def __init__(self, config: Ministral3Config):
    method forward (line 213) | def forward(self, x: Tensor):
  function yarn_get_sm_scale (line 219) | def yarn_get_sm_scale(scale=1, mscale=1):
  class Ministral3Attention (line 225) | class Ministral3Attention(nn.Module):  # pylint: disable=too-many-instan...
    method __init__ (line 228) | def __init__(self, config: Ministral3Config):
    method forward (line 252) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
  class Ministral3DecoderLayer (line 268) | class Ministral3DecoderLayer(nn.Module):
    method __init__ (line 271) | def __init__(self, config: Ministral3Config):
    method forward (line 301) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
    method _apply_residual (line 308) | def _apply_residual(self, out, residual):
  class Ministral3Model (line 314) | class Ministral3Model(nn.Module):
    method __init__ (line 317) | def __init__(self, config: Ministral3Config):
    method forward (line 327) | def forward(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
  class Mistral3ForConditionalGeneration (line 335) | class Mistral3ForConditionalGeneration(nn.Module):  # pylint: disable=to...
    method __init__ (line 336) | def __init__(self, config: Ministral3Config):
    method _mark_modules_no_quant (line 357) | def _mark_modules_no_quant(self, modules: Tuple[str, ...]):
    method to (line 371) | def to(self, dtype: Optional[str] = None):
    method batch_forward (line 376) | def batch_forward(
    method embed (line 396) | def embed(self, input_ids: Tensor):
    method prefill (line 401) | def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 419) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method batch_prefill (line 432) | def batch_prefill(
    method batch_decode (line 443) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 447) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method create_paged_kv_cache (line 451) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 478) | def get_default_spec(self):

FILE: python/mlc_llm/model/mistral/mistral_loader.py
  function awq (line 25) | def awq(model_config: MistralConfig, quantization: Quantization) -> Exte...

FILE: python/mlc_llm/model/mistral/mistral_model.py
  class MistralConfig (line 23) | class MistralConfig(ConfigBase):  # pylint: disable=too-many-instance-at...
    method __post_init__ (line 43) | def __post_init__(self):  # pylint: disable=too-many-branches
  class MistralMLP (line 98) | class MistralMLP(nn.Module):
    method __init__ (line 101) | def __init__(self, config: MistralConfig):
    method forward (line 116) | def forward(self, x: Tensor):
  class MistralAttention (line 122) | class MistralAttention(nn.Module):  # pylint: disable=too-many-instance-...
    method __init__ (line 125) | def __init__(self, config: MistralConfig):
    method forward (line 141) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
  class MistralDecoderLayer (line 157) | class MistralDecoderLayer(nn.Module):
    method __init__ (line 160) | def __init__(self, config: MistralConfig):
    method forward (line 190) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
    method _apply_residual (line 197) | def _apply_residual(self, out, residual):
  class MistralModel (line 203) | class MistralModel(nn.Module):
    method __init__ (line 206) | def __init__(self, config: MistralConfig):
    method forward (line 215) | def forward(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
  class MistralForCausalLM (line 223) | class MistralForCausalLM(nn.Module):  # pylint: disable=too-many-instanc...
    method __init__ (line 226) | def __init__(self, config: MistralConfig):
    method to (line 240) | def to(self, dtype: Optional[str] = None):
    method batch_forward (line 245) | def batch_forward(
    method embed (line 261) | def embed(self, input_ids: Tensor):
    method prefill (line 266) | def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 280) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method batch_prefill (line 289) | def batch_prefill(
    method batch_decode (line 300) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 304) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method create_paged_kv_cache (line 308) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 334) | def get_default_spec(self):

FILE: python/mlc_llm/model/mixtral/mixtral_loader.py
  function huggingface (line 16) | def huggingface(model_config: MixtralConfig, quantization: Quantization)...

FILE: python/mlc_llm/model/mixtral/mixtral_model.py
  class MixtralConfig (line 25) | class MixtralConfig(LlamaConfig):  # pylint: disable=too-many-instance-a...
  class MixtralMoE (line 35) | class MixtralMoE(nn.Module):
    method __init__ (line 38) | def __init__(self, config: MixtralConfig):
    method forward (line 67) | def forward(self, x: Tensor):
  class MixtralDecoderLayer (line 125) | class MixtralDecoderLayer(nn.Module):
    method __init__ (line 128) | def __init__(self, config: MixtralConfig):
    method forward (line 155) | def forward(self, hidden_states: Tensor, attention_mask: Tensor, total...
    method batch_forward (line 163) | def batch_forward(self, hidden_states: Tensor, paged_kv_cache: PagedKV...
    method _apply_residual (line 170) | def _apply_residual(self, out, residual):
  class MixtralModel (line 176) | class MixtralModel(LlamaModel):
    method __init__ (line 179) | def __init__(self, config: MixtralConfig):
  class MixtralForCausalLM (line 186) | class MixtralForCausalLM(LlamaForCausalLM):
    method __init__ (line 189) | def __init__(self, config: MixtralConfig):

FILE: python/mlc_llm/model/model.py
  class EmbeddingMetadata (line 65) | class EmbeddingMetadata:
  class Model (line 86) | class Model:
    method __post_init__ (line 123) | def __post_init__(self):

FILE: python/mlc_llm/model/nemotron/nemotron_model.py
  class NemotronConfig (line 23) | class NemotronConfig(ConfigBase):  # pylint: disable=too-many-instance-a...
    method __post_init__ (line 48) | def __post_init__(self):  # pylint: disable=too-many-branches
  class NemotronMLP (line 75) | class NemotronMLP(nn.Module):
    method __init__ (line 78) | def __init__(self, config: NemotronConfig):
    method forward (line 88) | def forward(self, x: Tensor) -> Tensor:
  class NemotronEmbedding (line 96) | class NemotronEmbedding(nn.Embedding):
    method lm_head_forward (line 99) | def lm_head_forward(self, x: Tensor):
  class NemotronLayerNorm1P (line 107) | class NemotronLayerNorm1P(nn.LayerNorm):
    method __init__ (line 110) | def __init__(self, normalized_shape: int, eps: float = 1e-5, elementwi...
    method forward (line 113) | def forward(self, x: Tensor) -> Tensor:
  class NemotronAttention (line 124) | class NemotronAttention(nn.Module):  # pylint: disable=too-many-instance...
    method __init__ (line 125) | def __init__(self, config: NemotronConfig):
    method forward (line 142) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
  class NemotronDecoderLayer (line 158) | class NemotronDecoderLayer(nn.Module):
    method __init__ (line 159) | def __init__(self, config: NemotronConfig):
    method forward (line 184) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
    method _apply_residual (line 191) | def _apply_residual(self, out, residual):
  class NemotronModel (line 197) | class NemotronModel(nn.Module):
    method __init__ (line 198) | def __init__(self, config: NemotronConfig):
    method forward (line 217) | def forward(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
  class NemotronForCausalLM (line 227) | class NemotronForCausalLM(nn.Module):  # pylint: disable=too-many-instan...
    method __init__ (line 228) | def __init__(self, config: NemotronConfig):
    method to (line 263) | def to(self, dtype: Optional[str] = None):
    method batch_forward (line 268) | def batch_forward(
    method batch_forward_to_last_hidden_states (line 283) | def batch_forward_to_last_hidden_states(
    method embed (line 293) | def embed(self, input_ids: Tensor):
    method get_logits (line 298) | def get_logits(self, hidden_states: Tensor):
    method batch_select_last_hidden_states (line 308) | def batch_select_last_hidden_states(self, hidden_states: Tensor, logit...
    method prefill (line 315) | def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 327) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method prefill_to_last_hidden_states (line 334) | def prefill_to_last_hidden_states(self, input_embed: Tensor, paged_kv_...
    method decode_to_last_hidden_states (line 340) | def decode_to_last_hidden_states(self, input_embed: Tensor, paged_kv_c...
    method batch_prefill (line 346) | def batch_prefill(
    method batch_decode (line 355) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 359) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_prefill_to_last_hidden_states (line 363) | def batch_prefill_to_last_hidden_states(
    method batch_decode_to_last_hidden_states (line 369) | def batch_decode_to_last_hidden_states(
    method batch_verify_to_last_hidden_states (line 375) | def batch_verify_to_last_hidden_states(
    method create_paged_kv_cache (line 381) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 411) | def get_default_spec(self):

FILE: python/mlc_llm/model/olmo/olmo_loader.py
  function awq (line 25) | def awq(model_config: OLMoConfig, quantization: Quantization) -> ExternM...

FILE: python/mlc_llm/model/olmo/olmo_model.py
  class OLMoConfig (line 25) | class OLMoConfig(ConfigBase):  # pylint: disable=too-many-instance-attri...
    method __post_init__ (line 47) | def __post_init__(self):  # pylint: disable=too-many-branches
  class OLMoEmbedding (line 107) | class OLMoEmbedding(nn.Embedding):
    method lm_head_forward (line 110) | def lm_head_forward(self, x: nn.Tensor):
  class OLMoAttention (line 118) | class OLMoAttention(nn.Module):  # pylint: disable=missing-class-docstring
    method __init__ (line 119) | def __init__(self, config: OLMoConfig):
    method forward (line 141) | def forward(  # pylint: disable=missing-function-docstring
  class OLMoFFN (line 175) | class OLMoFFN(nn.Module):  # pylint: disable=missing-class-docstring
    method __init__ (line 176) | def __init__(self, config: OLMoConfig):
    method forward (line 196) | def forward(self, x: Tensor):  # pylint: disable=missing-function-docs...
  class OLMoDecoderLayer (line 205) | class OLMoDecoderLayer(nn.Module):  # pylint: disable=missing-class-docs...
    method __init__ (line 206) | def __init__(self, config: OLMoConfig):
    method _apply_residual (line 243) | def _apply_residual(self, out, residual):
    method forward (line 248) | def forward(  # pylint: disable=missing-function-docstring
  class OLMoModel (line 258) | class OLMoModel(nn.Module):  # pylint: disable=missing-class-docstring
    method __init__ (line 259) | def __init__(self, config: OLMoConfig):
    method forward (line 282) | def forward(  # pylint: disable=missing-function-docstring
  class OLMoForCausalLM (line 294) | class OLMoForCausalLM(  # pylint: disable=missing-class-docstring,too-ma...
    method __init__ (line 297) | def __init__(self, config: OLMoConfig):
    method to (line 329) | def to(self, dtype: Optional[str] = None):
    method batch_forward (line 334) | def batch_forward(  # pylint: disable=missing-function-docstring
    method batch_forward_to_last_hidden_states (line 348) | def batch_forward_to_last_hidden_states(  # pylint: disable=missing-fu...
    method embed (line 357) | def embed(self, input_ids: Tensor):  # pylint: disable=missing-functio...
    method get_logits (line 362) | def get_logits(self, hidden_states: Tensor):  # pylint: disable=missin...
    method batch_select_last_hidden_states (line 372) | def batch_select_last_hidden_states(  # pylint: disable=missing-functi...
    method prefill (line 381) | def prefill(  # pylint: disable=missing-function-docstring
    method decode (line 397) | def decode(  # pylint: disable=missing-function-docstring
    method prefill_to_last_hidden_states (line 405) | def prefill_to_last_hidden_states(  # pylint: disable=missing-function...
    method decode_to_last_hidden_states (line 412) | def decode_to_last_hidden_states(  # pylint: disable=missing-function-...
    method batch_prefill (line 419) | def batch_prefill(  # pylint: disable=missing-function-docstring
    method batch_decode (line 428) | def batch_decode(  # pylint: disable=missing-function-docstring
    method batch_verify (line 434) | def batch_verify(  # pylint: disable=missing-function-docstring
    method batch_prefill_to_last_hidden_states (line 440) | def batch_prefill_to_last_hidden_states(  # pylint: disable=missing-fu...
    method batch_decode_to_last_hidden_states (line 446) | def batch_decode_to_last_hidden_states(  # pylint: disable=missing-fun...
    method batch_verify_to_last_hidden_states (line 452) | def batch_verify_to_last_hidden_states(  # pylint: disable=missing-fun...
    method create_paged_kv_cache (line 458) | def create_paged_kv_cache(  # pylint: disable=missing-function-docstri...
    method get_default_spec (line 486) | def get_default_spec(self):  # pylint: disable=missing-function-docstring

FILE: python/mlc_llm/model/orion/orion_model.py
  class OrionConfig (line 23) | class OrionConfig(ConfigBase):  # pylint: disable=too-many-instance-attr...
    method __post_init__ (line 41) | def __post_init__(self):
  class OrionFFN (line 90) | class OrionFFN(nn.Module):
    method __init__ (line 91) | def __init__(self, config: OrionConfig):
    method forward (line 106) | def forward(self, x: Tensor):
  class OrionAttention (line 112) | class OrionAttention(nn.Module):  # pylint: disable=too-many-instance-at...
    method __init__ (line 113) | def __init__(self, config: OrionConfig):
    method forward (line 130) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
  class OrionDecoderLayer (line 146) | class OrionDecoderLayer(nn.Module):
    method __init__ (line 147) | def __init__(self, config: OrionConfig):
    method forward (line 177) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
    method _apply_residual (line 184) | def _apply_residual(self, out, residual):
  class OrionModel (line 190) | class OrionModel(nn.Module):
    method __init__ (line 191) | def __init__(self, config: OrionConfig):
    method forward (line 200) | def forward(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
  class OrionForCausalLM (line 208) | class OrionForCausalLM(nn.Module):  # pylint: disable=too-many-instance-...
    method __init__ (line 209) | def __init__(self, config: OrionConfig):
    method to (line 222) | def to(self, dtype: Optional[str] = None):
    method batch_forward (line 227) | def batch_forward(
    method embed (line 243) | def embed(self, input_ids: Tensor):
    method prefill (line 248) | def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 262) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method batch_prefill (line 271) | def batch_prefill(
    method batch_decode (line 282) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 286) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method create_paged_kv_cache (line 290) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 316) | def get_default_spec(self):

FILE: python/mlc_llm/model/phi/phi_loader.py
  function huggingface (line 16) | def huggingface(model_config: PhiConfig, quantization: Quantization) -> ...
  function phi1_huggingface (line 87) | def phi1_huggingface(model_config: Phi1Config, quantization: Quantizatio...

FILE: python/mlc_llm/model/phi/phi_model.py
  class Phi1Config (line 23) | class Phi1Config(ConfigBase):  # pylint: disable=too-many-instance-attri...
    method __post_init__ (line 42) | def __post_init__(self):
  class PhiConfig (line 91) | class PhiConfig(ConfigBase):  # pylint: disable=too-many-instance-attrib...
    method __post_init__ (line 111) | def __post_init__(self):
    method from_phi1 (line 149) | def from_phi1(config: Phi1Config) -> "PhiConfig":
  class PhiMLP (line 174) | class PhiMLP(nn.Module):
    method __init__ (line 175) | def __init__(self, config: PhiConfig):
    method forward (line 186) | def forward(self, hidden_states: Tensor):
  class PhiMHA (line 194) | class PhiMHA(nn.Module):  # pylint: disable=too-many-instance-attributes
    method __init__ (line 195) | def __init__(self, config: PhiConfig):
    method forward (line 211) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
  class PhiParallelBlock (line 227) | class PhiParallelBlock(nn.Module):
    method __init__ (line 228) | def __init__(self, config: PhiConfig):
    method forward (line 259) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
    method _apply_parallel_residual (line 276) | def _apply_parallel_residual(self, attn_out, mlp_out, residual):
  class PhiCausalLMHead (line 284) | class PhiCausalLMHead(nn.Module):
    method __init__ (line 285) | def __init__(self, config: PhiConfig) -> None:
    method forward (line 291) | def forward(self, hidden_states: Tensor):
  class PhiModel (line 300) | class PhiModel(nn.Module):
    method __init__ (line 301) | def __init__(self, config: PhiConfig) -> None:
    method forward (line 306) | def forward(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
  class PhiForCausalLM (line 314) | class PhiForCausalLM(nn.Module):
    method __init__ (line 316) | def __init__(self, config: Union[PhiConfig, Phi1Config]) -> None:
    method to (line 335) | def to(self, dtype: Optional[str] = None):
    method batch_forward (line 340) | def batch_forward(
    method prefill (line 356) | def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 372) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method batch_prefill (line 381) | def batch_prefill(
    method batch_decode (line 392) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 396) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method embed (line 400) | def embed(self, input_ids: Tensor):
    method create_paged_kv_cache (line 406) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 433) | def get_default_spec(self):

FILE: python/mlc_llm/model/phi3/phi3_loader.py
  function phi3_huggingface (line 14) | def phi3_huggingface(model_config: Phi3Config, quantization: Quantizatio...

FILE: python/mlc_llm/model/phi3/phi3_model.py
  class Phi3Config (line 23) | class Phi3Config(ConfigBase):  # pylint: disable=too-many-instance-attri...
    method __post_init__ (line 47) | def __post_init__(self):
  class Phi3Embedding (line 102) | class Phi3Embedding(nn.Embedding):
    method lm_head_forward (line 105) | def lm_head_forward(self, x: nn.Tensor):
  class Phi3MLP (line 113) | class Phi3MLP(nn.Module):
    method __init__ (line 114) | def __init__(self, config: Phi3Config):
    method forward (line 125) | def forward(self, hidden_states: Tensor):
  class PhiMHA (line 132) | class PhiMHA(nn.Module):  # pylint: disable=too-many-instance-attributes
    method __init__ (line 133) | def __init__(self, config: Phi3Config):
    method forward (line 153) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
  class Phi3ParallelBlock (line 169) | class Phi3ParallelBlock(nn.Module):
    method __init__ (line 170) | def __init__(self, config: Phi3Config):
    method forward (line 204) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
    method _apply_parallel_residual (line 211) | def _apply_parallel_residual(self, mlp_out, residual):
  class Phi3Model (line 217) | class Phi3Model(nn.Module):
    method __init__ (line 218) | def __init__(self, config: Phi3Config) -> None:
    method forward (line 224) | def forward(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
  class Phi3ForCausalLM (line 232) | class Phi3ForCausalLM(nn.Module):
    method __init__ (line 234) | def __init__(self, config: Phi3Config) -> None:
    method to (line 258) | def to(self, dtype: Optional[str] = None):
    method get_logits (line 263) | def get_logits(self, hidden_states: Tensor):
    method batch_forward (line 273) | def batch_forward(
    method prefill (line 286) | def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 298) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method batch_prefill (line 305) | def batch_prefill(
    method batch_decode (line 316) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 320) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method embed (line 324) | def embed(self, input_ids: Tensor):
    method create_paged_kv_cache (line 330) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 359) | def get_default_spec(self):

FILE: python/mlc_llm/model/phi3v/phi3v_image.py
  class ImageProjection (line 16) | class ImageProjection(Module):  # pylint: disable=too-many-instance-attr...
    method __init__ (line 17) | def __init__(self, config: ConfigBase):
    method forward (line 25) | def forward(self, image_features: Tensor) -> Tensor:
  class Phi3ImageEmbedding (line 55) | class Phi3ImageEmbedding(Module):
    method __init__ (line 56) | def __init__(self, config: ConfigBase):
    method apply_schedule (line 69) | def apply_schedule(self, sch, block, bdx=32, tile=[32, 32]):
    method dyn_repeat_4d_tensor (line 80) | def dyn_repeat_4d_tensor(self, input_tensor, r0, r1, r2, r3) -> Tensor:
    method dyn_concate_dim_2 (line 119) | def dyn_concate_dim_2(self, input_1, input_2) -> Tensor:
    method dyn_concate_dim_1 (line 158) | def dyn_concate_dim_1(self, input_1, input_2) -> Tensor:
    method get_img_features (line 192) | def get_img_features(self, img_embeds: Tensor) -> Tensor:
    method reshape_hd_patches_2x2merge (line 197) | def reshape_hd_patches_2x2merge(self, image_features, h_crop, w_crop):
    method add_image_newline (line 267) | def add_image_newline(self, image_features_hd):
    method forward (line 283) | def forward(self, pixel_values: Tensor, h_crop, w_crop) -> Tensor:

FILE: python/mlc_llm/model/phi3v/phi3v_loader.py
  function huggingface (line 15) | def huggingface(model_config: Phi3VConfig, quantization: Quantization) -...

FILE: python/mlc_llm/model/phi3v/phi3v_model.py
  class Phi3VConfig (line 38) | class Phi3VConfig(ConfigBase):  # pylint: disable=too-many-instance-attr...
    method __post_init__ (line 63) | def __post_init__(self):
  class Phi3VForCausalLM (line 130) | class Phi3VForCausalLM(nn.Module):
    method __init__ (line 132) | def __init__(self, config: Phi3VConfig) -> None:
    method to (line 161) | def to(self, dtype: Optional[str] = None):
    method batch_forward (line 166) | def batch_forward(
    method prefill (line 182) | def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 198) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method batch_prefill (line 207) | def batch_prefill(
    method batch_decode (line 218) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 222) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method embed (line 226) | def embed(self, input_ids: Tensor):
    method image_preprocess (line 233) | def image_preprocess(
    method image_embed (line 283) | def image_embed(  # pylint: disable=too-many-arguments
    method create_paged_kv_cache (line 296) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 324) | def get_default_spec(self):

FILE: python/mlc_llm/model/qwen/qwen_model.py
  class QWenConfig (line 23) | class QWenConfig(ConfigBase):  # pylint: disable=too-many-instance-attri...
    method __post_init__ (line 42) | def __post_init__(self):
  class QWenAttention (line 83) | class QWenAttention(nn.Module):  # pylint: disable=too-many-instance-att...
    method __init__ (line 84) | def __init__(self, config: QWenConfig):
    method forward (line 98) | def forward(  # pylint: disable=too-many-locals
  class QWenMLP (line 118) | class QWenMLP(nn.Module):
    method __init__ (line 119) | def __init__(self, config: QWenConfig):
    method forward (line 133) | def forward(self, x: Tensor):
  class QWenBlock (line 139) | class QWenBlock(nn.Module):
    method __init__ (line 140) | def __init__(self, config: QWenConfig):
    method forward (line 174) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
    method _apply_residual (line 181) | def _apply_residual(self, out, residual):
  class QWenModel (line 187) | class QWenModel(nn.Module):
    method __init__ (line 188) | def __init__(self, config: QWenConfig):
    method forward (line 194) | def forward(self, inputs: Tensor, paged_kv_cache: PagedKVCache):
  class QWenLMHeadModel (line 202) | class QWenLMHeadModel(nn.Module):  # pylint: disable=too-many-instance-a...
    method __init__ (line 203) | def __init__(self, config: QWenConfig):
    method to (line 215) | def to(self, dtype: Optional[str] = None):
    method batch_forward (line 220) | def batch_forward(
    method embed (line 235) | def embed(self, input_ids: Tensor):
    method prefill (line 240) | def prefill(self, inputs: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 258) | def decode(self, inputs: Tensor, paged_kv_cache: PagedKVCache):
    method batch_prefill (line 267) | def batch_prefill(self, inputs: Tensor, logit_positions: Tensor, paged...
    method batch_decode (line 273) | def batch_decode(self, inputs: Tensor, paged_kv_cache: PagedKVCache):
    method batch_verify (line 277) | def batch_verify(self, inputs: Tensor, paged_kv_cache: PagedKVCache):
    method create_paged_kv_cache (line 281) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 307) | def get_default_spec(self):

FILE: python/mlc_llm/model/qwen2/qwen2_model.py
  class QWen2Config (line 24) | class QWen2Config(ConfigBase):  # pylint: disable=too-many-instance-attr...
    method __post_init__ (line 45) | def __post_init__(self):
  class QWen2Attention (line 86) | class QWen2Attention(nn.Module):  # pylint: disable=too-many-instance-at...
    method __init__ (line 87) | def __init__(self, config: QWen2Config):
    method forward (line 107) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
  class Qwen2Embedding (line 131) | class Qwen2Embedding(nn.Embedding):
    method lm_head_forward (line 136) | def lm_head_forward(self, x: nn.Tensor):
  class QWen2MLP (line 144) | class QWen2MLP(nn.Module):
    method __init__ (line 145) | def __init__(self, config: QWen2Config):
    method forward (line 156) | def forward(self, x: Tensor):
  class QWen2DecoderLayer (line 162) | class QWen2DecoderLayer(nn.Module):
    method __init__ (line 163) | def __init__(self, config: QWen2Config):
    method forward (line 198) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
    method _apply_residual (line 207) | def _apply_residual(self, out, residual):
  class QWen2Model (line 213) | class QWen2Model(nn.Module):
    method __init__ (line 214) | def __init__(self, config: QWen2Config):
    method forward (line 221) | def forward(self, inputs: Tensor, paged_kv_cache: PagedKVCache):
  class QWen2LMHeadModel (line 229) | class QWen2LMHeadModel(nn.Module):  # pylint: disable=too-many-instance-...
    method __init__ (line 230) | def __init__(self, config: QWen2Config):
    method to (line 247) | def to(self, dtype: Optional[str] = None):
    method batch_forward (line 252) | def batch_forward(
    method embed (line 272) | def embed(self, input_ids: Tensor):
    method prefill (line 277) | def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 294) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method batch_prefill (line 306) | def batch_prefill(
    method batch_decode (line 317) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 321) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method create_paged_kv_cache (line 325) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 351) | def get_default_spec(self):

FILE: python/mlc_llm/model/qwen2_5_vl/qwen2_5_vl_model.py
  class Qwen25VLVisionTokenConfig (line 47) | class Qwen25VLVisionTokenConfig:
  class Qwen25VLVisionGridConfig (line 57) | class Qwen25VLVisionGridConfig:
  class Qwen25VLAttentionState (line 66) | class Qwen25VLAttentionState:
  class Qwen25VLConfig (line 77) | class Qwen25VLConfig(ConfigBase):  # pylint: disable=too-many-instance-a...
    method __post_init__ (line 106) | def __post_init__(self):  # pylint: disable=too-many-branches
    method image_token_id (line 160) | def image_token_id(self) -> int:
    method video_token_id (line 164) | def video_token_id(self) -> int:
    method vision_start_token_id (line 168) | def vision_start_token_id(self) -> int:
    method vision_end_token_id (line 172) | def vision_end_token_id(self) -> int:
    method spatial_merge_size (line 176) | def spatial_merge_size(self) -> int:
    method temporal_patch_size (line 180) | def temporal_patch_size(self) -> int:
    method tokens_per_second (line 184) | def tokens_per_second(self) -> float:
    method vision_metadata (line 188) | def vision_metadata(self) -> VisionPositionMetadata:
  class Qwen25VLEmbedding (line 198) | class Qwen25VLEmbedding(nn.Embedding):
    method lm_head_forward (line 201) | def lm_head_forward(self, x: Tensor):
  class Qwen25VLAttention (line 206) | class Qwen25VLAttention(nn.Module):
    method __init__ (line 207) | def __init__(self, config: Qwen25VLConfig):
    method head_dim (line 240) | def head_dim(self) -> int:
    method num_attention_heads (line 244) | def num_attention_heads(self) -> int:
    method num_key_value_heads (line 248) | def num_key_value_heads(self) -> int:
    method forward (line 251) | def forward(  # pylint: disable=too-many-locals
  class Qwen25VLMLP (line 274) | class Qwen25VLMLP(nn.Module):
    method __init__ (line 275) | def __init__(self, config: Qwen25VLConfig):
    method forward (line 286) | def forward(self, x: Tensor):
  class Qwen25VLDecoderLayer (line 292) | class Qwen25VLDecoderLayer(nn.Module):
    method __init__ (line 293) | def __init__(self, config: Qwen25VLConfig):
    method _set_tp (line 304) | def _set_tp(self, config: Qwen25VLConfig):
    method forward (line 328) | def forward(
    method _apply_residual (line 343) | def _apply_residual(self, out: Tensor, residual: Tensor) -> Tensor:
  class Qwen25VLModel (line 349) | class Qwen25VLModel(nn.Module):
    method __init__ (line 350) | def __init__(self, config: Qwen25VLConfig):
    method forward (line 364) | def forward(
  class Qwen25VLLMHeadModel (line 377) | class Qwen25VLLMHeadModel(nn.Module):
    method __init__ (line 378) | def __init__(self, config: Qwen25VLConfig):
    method to (line 386) | def to(self, dtype: Optional[str] = None):
    method _apply_lm_head (line 391) | def _apply_lm_head(self, hidden_states: Tensor):
    method _set_mrope_delta (line 400) | def _set_mrope_delta(self, paged_kv_cache: PagedKVCache, deltas: Tensor):
    method _get_mrope_delta (line 404) | def _get_mrope_delta(self, paged_kv_cache: PagedKVCache, batch: int) -...
    method _build_decode_position_ids (line 411) | def _build_decode_position_ids(
    method prefill (line 425) | def prefill(
    method decode (line 444) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method batch_prefill (line 452) | def batch_prefill(  # pylint: disable=too-many-arguments
    method batch_forward (line 467) | def batch_forward(  # pylint: disable=too-many-arguments
    method batch_decode (line 482) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 490) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method embed (line 493) | def embed(self, input_ids: Tensor):
    method create_paged_kv_cache (line 498) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 526) | def get_default_spec(self):

FILE: python/mlc_llm/model/qwen2_moe/qwen2_moe_loader.py
  function huggingface (line 16) | def huggingface(model_config: Qwen2MoeConfig, quantization: Quantization...

FILE: python/mlc_llm/model/qwen2_moe/qwen2_moe_model.py
  class Qwen2MoeConfig (line 23) | class Qwen2MoeConfig(QWen2Config):  # pylint: disable=too-many-instance-...
  class Qwen2MoeMLP (line 37) | class Qwen2MoeMLP(nn.Module):
    method __init__ (line 38) | def __init__(self, config: Qwen2MoeConfig, intermediate_size: Optional...
    method forward (line 50) | def forward(self, x: Tensor):
  class Qwen2MoeSparseMoeBlock (line 56) | class Qwen2MoeSparseMoeBlock(nn.Module):  # pylint: disable=too-many-ins...
    method __init__ (line 59) | def __init__(self, config: Qwen2MoeConfig):
    method forward (line 90) | def forward(self, x: Tensor):
  class Qwen2MoeDecoderLayer (line 141) | class Qwen2MoeDecoderLayer(nn.Module):
    method __init__ (line 142) | def __init__(self, config: Qwen2MoeConfig):
    method forward (line 193) | def forward(self, hidden_states: Tensor, paged_kv_cache: PagedKVCache,...
    method _apply_residual (line 202) | def _apply_residual(self, out, residual):
  class Qwen2MoeModel (line 208) | class Qwen2MoeModel(nn.Module):
    method __init__ (line 209) | def __init__(self, config: Qwen2MoeConfig):
    method forward (line 216) | def forward(self, inputs: Tensor, paged_kv_cache: PagedKVCache):
  class Qwen2MoeForCausalLM (line 224) | class Qwen2MoeForCausalLM(nn.Module):  # pylint: disable=too-many-instan...
    method __init__ (line 225) | def __init__(self, config: Qwen2MoeConfig):
    method to (line 240) | def to(self, dtype: Optional[str] = None):
    method batch_forward (line 245) | def batch_forward(
    method embed (line 261) | def embed(self, input_ids: Tensor):
    method prefill (line 266) | def prefill(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method decode (line 280) | def decode(self, input_embed: Tensor, paged_kv_cache: PagedKVCache):
    method batch_prefill (line 289) | def batch_prefill(
    method batch_decode (line 300) | def batch_decode(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method batch_verify (line 304) | def batch_verify(self, input_embeds: Tensor, paged_kv_cache: PagedKVCa...
    method create_paged_kv_cache (line 308) | def create_paged_kv_cache(  # pylint: disable=too-many-arguments
    method get_default_spec (line 334) | def get_default_spec(self):

FILE: python/mlc_llm/model/qwen3/qwen3_loader.py
  function huggingface (line 17) | def huggingface(
  function huggingface_embedding (line 150) | def huggingface_embedding(model_config: Qwen3Config, quantization: Quant...

FILE: python/mlc_llm/model/qwen3/qwen3_model.py
  class Qwen3C

Download .json

Condensed preview — 661 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (4,380K chars).

[
  {
    "path": ".clang-format",
    "chars": 292,
    "preview": "# Run the following command to reformat a file:\n# clang-format -i -style=Google <file>\n# Or use clang-format-diff to onl"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/bug-report.md",
    "chars": 1091,
    "preview": "---\nname: \"🐛 Bug Report\"\nabout: Submit a bug report to help us improve MLC-LLM\ntitle: '[Bug] '\nlabels: ['bug']\nassignees"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/config.yml",
    "chars": 323,
    "preview": "blank_issues_enabled: false\n\ncontact_links:\n  - name: Check the MLC-LLM Documentation\n    url: https://llm.mlc.ai/docs/\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/documentation.md",
    "chars": 345,
    "preview": "---\nname: \"\\U0001F4DA Documentation\"\nabout: Report an issue related to https://llm.mlc.ai/docs/\ntitle: '[Doc] '\nlabels: "
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature-request.md",
    "chars": 649,
    "preview": "---\nname: \"\\U0001F680 Feature Request\"\nabout: Submit a proposal/request for a new MLC-LLM feature, or an enhancement on "
  },
  {
    "path": ".github/ISSUE_TEMPLATE/general.md",
    "chars": 200,
    "preview": "---\nname: \"❓ General Questions\"\nabout: General questions you have about MLC-LLM.\ntitle: '[Question] '\nlabels: ['question"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/model-request.md",
    "chars": 527,
    "preview": "---\nname: \"️️⚙️  Model Request\"\nabout: Request a new model in MLC-LLM\ntitle: '[Model Request] '\nlabels: ['new-models']\na"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/speed-report.md",
    "chars": 743,
    "preview": "---\nname: \" 🏎️  Speed Report\"\nabout: Submit a speed report of an model running in MLC-LLM\ntitle: '[Speed] '\nlabels: ['pe"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/tracking.md",
    "chars": 1053,
    "preview": "---\nname: \"Tracking\"\nabout: A tracking issue that tracks ongoing item in the project\ntitle: '[Tracking] '\nlabels: ['stat"
  },
  {
    "path": ".github/workflows/documentation.yaml",
    "chars": 962,
    "preview": "name: Build Docs\n\non:\n  push:\n    branches:\n      - main\n\njobs:\n  test_linux:\n    name: Deploy Docs\n    runs-on: ubuntu-"
  },
  {
    "path": ".github/workflows/update-relax.yaml",
    "chars": 777,
    "preview": "name: 'Relax Submodule Sync'\n\non:\n  workflow_dispatch:\n\njobs:\n  sync:\n    name: 'Relax Submodule Sync'\n    runs-on: ubun"
  },
  {
    "path": ".github/workflows/windows-build.yaml",
    "chars": 852,
    "preview": "# GH actions.\n# We use it to cover windows builds\n# Jenkins is still the primary CI\nname: Windows CI\n\non:\n  push:\n    br"
  },
  {
    "path": ".gitignore",
    "chars": 3934,
    "preview": "tmp/\ndist/\nparams/\ndebug/\n*.bak\n# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n.DS_Store\n\n*."
  },
  {
    "path": ".gitmodules",
    "chars": 615,
    "preview": "[submodule \"3rdparty/argparse\"]\n\tpath = 3rdparty/argparse\n\turl = https://github.com/p-ranav/argparse\n[submodule \"3rdpart"
  },
  {
    "path": ".pre-commit-config.yaml",
    "chars": 1453,
    "preview": "# To use:\n#\n#     pre-commit run -a\n#\n# Or:\n#\n#     pre-commit install  # (runs every time you commit in git)\n#\n# To upd"
  },
  {
    "path": ".pylintrc",
    "chars": 72,
    "preview": "[MESSAGES CONTROL]\ndisable=too-many-positional-arguments,duplicate-code\n"
  },
  {
    "path": "CMakeLists.txt",
    "chars": 7709,
    "preview": "cmake_minimum_required(VERSION 3.18)\nproject(mlc_llm C CXX)\n\ninclude(CheckCXXCompilerFlag)\nif(MSVC)\n  set(CMAKE_CXX_FLAG"
  },
  {
    "path": "CONTRIBUTORS.md",
    "chars": 153,
    "preview": "MLC LLM Contributors\n====================\n\n\n## List of Contributors\n- [Full List of Contributors](https://github.com/mlc"
  },
  {
    "path": "LICENSE",
    "chars": 11357,
    "preview": "                                 Apache License\n                           Version 2.0, January 2004\n                   "
  },
  {
    "path": "NOTICE",
    "chars": 57,
    "preview": "MLC LLM\n\nCopyright (c) 2023-2025 by MLC LLM Contributors\n"
  },
  {
    "path": "README.md",
    "chars": 5179,
    "preview": "<div align=\"center\">\n\n# MLC LLM\n\n[![Installation](https://img.shields.io/badge/docs-latest-green)](https://llm.mlc.ai/do"
  },
  {
    "path": "android/.gitignore",
    "chars": 286,
    "preview": "app/src/main/jni/*.h\napp/src/main/jni/*.cc\napp/src/main/obj\n\n*.iml\n.gradle\n/local.properties\n/.idea/caches\n/.idea/librar"
  },
  {
    "path": "android/MLCChat/README.md",
    "chars": 202,
    "preview": "# MLC-LLM Android\n\nCheckout [Documentation page](https://llm.mlc.ai/docs/deploy/android.html) for more information.\n\n- r"
  },
  {
    "path": "android/MLCChat/app/.gitignore",
    "chars": 22,
    "preview": "/build\n/src/main/libs\n"
  },
  {
    "path": "android/MLCChat/app/build.gradle",
    "chars": 2507,
    "preview": "plugins {\n    id 'com.android.application'\n    id 'org.jetbrains.kotlin.android'\n}\n\nandroid {\n    namespace 'ai.mlc.mlcc"
  },
  {
    "path": "android/MLCChat/app/proguard-rules.pro",
    "chars": 751,
    "preview": "# Add project specific ProGuard rules here.\n# You can control the set of applied configuration files using the\n# proguar"
  },
  {
    "path": "android/MLCChat/app/src/main/AndroidManifest.xml",
    "chars": 1604,
    "preview": "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<manifest xmlns:android=\"http://schemas.android.com/apk/res/android\"\n    xmlns:to"
  },
  {
    "path": "android/MLCChat/app/src/main/java/ai/mlc/mlcchat/AppViewModel.kt",
    "chars": 31167,
    "preview": "package ai.mlc.mlcchat\n\nimport ai.mlc.mlcllm.MLCEngine\nimport ai.mlc.mlcllm.OpenAIProtocol\nimport android.app.Applicatio"
  },
  {
    "path": "android/MLCChat/app/src/main/java/ai/mlc/mlcchat/ChatView.kt",
    "chars": 13526,
    "preview": "package ai.mlc.mlcchat\n\nimport android.app.Activity\nimport android.graphics.Bitmap\nimport android.graphics.BitmapFactory"
  },
  {
    "path": "android/MLCChat/app/src/main/java/ai/mlc/mlcchat/MainActivity.kt",
    "chars": 5371,
    "preview": "package ai.mlc.mlcchat\n\nimport android.Manifest\nimport android.content.ContentValues\nimport android.content.pm.PackageMa"
  },
  {
    "path": "android/MLCChat/app/src/main/java/ai/mlc/mlcchat/NavView.kt",
    "chars": 756,
    "preview": "package ai.mlc.mlcchat\n\nimport android.app.Activity\nimport androidx.compose.material3.ExperimentalMaterial3Api\nimport an"
  },
  {
    "path": "android/MLCChat/app/src/main/java/ai/mlc/mlcchat/StartView.kt",
    "chars": 9297,
    "preview": "package ai.mlc.mlcchat\n\nimport androidx.compose.foundation.gestures.detectTapGestures\nimport androidx.compose.foundation"
  },
  {
    "path": "android/MLCChat/app/src/main/java/ai/mlc/mlcchat/ui/theme/Color.kt",
    "chars": 1216,
    "preview": "package ai.mlc.mlcchat.ui.theme\n\nimport androidx.compose.ui.graphics.Color\n\nval Blue10 = Color(0xFF000F5E)\nval Blue20 = "
  },
  {
    "path": "android/MLCChat/app/src/main/java/ai/mlc/mlcchat/ui/theme/Theme.kt",
    "chars": 3297,
    "preview": "package ai.mlc.mlcchat.ui.theme\n\nimport android.app.Activity\nimport android.os.Build\nimport androidx.compose.foundation."
  },
  {
    "path": "android/MLCChat/app/src/main/java/ai/mlc/mlcchat/ui/theme/Type.kt",
    "chars": 984,
    "preview": "package ai.mlc.mlcchat.ui.theme\n\nimport androidx.compose.material3.Typography\nimport androidx.compose.ui.text.TextStyle\n"
  },
  {
    "path": "android/MLCChat/app/src/main/res/drawable/ic_android_black_24dp.xml",
    "chars": 579,
    "preview": "<vector android:height=\"24dp\" android:tint=\"#000000\"\n    android:viewportHeight=\"24\" android:viewportWidth=\"24\"\n    andr"
  },
  {
    "path": "android/MLCChat/app/src/main/res/drawable/mlc_logo_108.xml",
    "chars": 5909,
    "preview": "<vector xmlns:android=\"http://schemas.android.com/apk/res/android\"\n    android:width=\"108dp\"\n    android:height=\"108dp\"\n"
  },
  {
    "path": "android/MLCChat/app/src/main/res/values/colors.xml",
    "chars": 379,
    "preview": "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<resources>\n    <color name=\"purple_200\">#FFBB86FC</color>\n    <color name=\"purpl"
  },
  {
    "path": "android/MLCChat/app/src/main/res/values/strings.xml",
    "chars": 70,
    "preview": "<resources>\n    <string name=\"app_name\">MLCChat</string>\n</resources>\n"
  },
  {
    "path": "android/MLCChat/app/src/main/res/values/themes.xml",
    "chars": 139,
    "preview": "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<resources>\n\n    <style name=\"Theme.MLCChat\" parent=\"android:Theme.Material.Light"
  },
  {
    "path": "android/MLCChat/app/src/main/res/xml/backup_rules.xml",
    "chars": 479,
    "preview": "<?xml version=\"1.0\" encoding=\"utf-8\"?><!--\n   Sample backup rules file; uncomment and customize as necessary.\n   See htt"
  },
  {
    "path": "android/MLCChat/app/src/main/res/xml/data_extraction_rules.xml",
    "chars": 552,
    "preview": "<?xml version=\"1.0\" encoding=\"utf-8\"?><!--\n   Sample data extraction rules file; uncomment and customize as necessary.\n "
  },
  {
    "path": "android/MLCChat/build.gradle",
    "chars": 197,
    "preview": "plugins {\n    id 'com.android.application' version '8.2.0' apply false\n    id 'com.android.library' version '8.2.0' appl"
  },
  {
    "path": "android/MLCChat/bundle_weight.py",
    "chars": 2462,
    "preview": "import argparse\nimport os\nimport subprocess\nfrom pathlib import Path\n\nfrom mlc_llm.support import logging\n\nlogging.enabl"
  },
  {
    "path": "android/MLCChat/gradle/wrapper/gradle-wrapper.properties",
    "chars": 230,
    "preview": "#Thu Jan 25 10:19:50 EST 2024\ndistributionBase=GRADLE_USER_HOME\ndistributionPath=wrapper/dists\ndistributionUrl=https\\://"
  },
  {
    "path": "android/MLCChat/gradle.properties",
    "chars": 1359,
    "preview": "# Project-wide Gradle settings.\n# IDE (e.g. Android Studio) users:\n# Gradle settings configured through the IDE *will ov"
  },
  {
    "path": "android/MLCChat/gradlew",
    "chars": 5766,
    "preview": "#!/usr/bin/env sh\n\n#\n# Copyright 2015 the original author or authors.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "android/MLCChat/gradlew.bat",
    "chars": 2674,
    "preview": "@rem\n@rem Copyright 2015 the original author or authors.\n@rem\n@rem Licensed under the Apache License, Version 2.0 (the \""
  },
  {
    "path": "android/MLCChat/mlc-package-config.json",
    "chars": 1660,
    "preview": "{\n    \"device\": \"android\",\n    \"model_list\": [\n        {\n            \"model\": \"HF://mlc-ai/Phi-3.5-mini-instruct-q4f16_0"
  },
  {
    "path": "android/MLCChat/settings.gradle",
    "chars": 466,
    "preview": "pluginManagement {\n    repositories {\n        google()\n        mavenCentral()\n        gradlePluginPortal()\n    }\n}\ndepen"
  },
  {
    "path": "android/MLCEngineExample/README.md",
    "chars": 211,
    "preview": "# MLC-LLM Android\n\nCheckout [Documentation page](https://llm.mlc.ai/docs/deploy/android.html) for more information.\n\n- r"
  },
  {
    "path": "android/MLCEngineExample/app/.gitignore",
    "chars": 22,
    "preview": "/build\n/src/main/libs\n"
  },
  {
    "path": "android/MLCEngineExample/app/build.gradle",
    "chars": 2459,
    "preview": "plugins {\n    id 'com.android.application'\n    id 'org.jetbrains.kotlin.android'\n}\n\nandroid {\n    namespace 'ai.mlc.mlce"
  },
  {
    "path": "android/MLCEngineExample/app/proguard-rules.pro",
    "chars": 751,
    "preview": "# Add project specific ProGuard rules here.\n# You can control the set of applied configuration files using the\n# proguar"
  },
  {
    "path": "android/MLCEngineExample/app/src/main/AndroidManifest.xml",
    "chars": 1546,
    "preview": "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<manifest xmlns:android=\"http://schemas.android.com/apk/res/android\"\n    xmlns:to"
  },
  {
    "path": "android/MLCEngineExample/app/src/main/java/ai/mlc/mlcengineexample/MainActivity.kt",
    "chars": 3036,
    "preview": "package ai.mlc.mlcengineexample\n\nimport ai.mlc.mlcengineexample.ui.theme.MLCEngineExampleTheme\nimport ai.mlc.mlcllm.MLCE"
  },
  {
    "path": "android/MLCEngineExample/app/src/main/java/ai/mlc/mlcengineexample/ui/theme/Color.kt",
    "chars": 1225,
    "preview": "package ai.mlc.mlcengineexample.ui.theme\n\nimport androidx.compose.ui.graphics.Color\n\nval Blue10 = Color(0xFF000F5E)\nval "
  },
  {
    "path": "android/MLCEngineExample/app/src/main/java/ai/mlc/mlcengineexample/ui/theme/Theme.kt",
    "chars": 3315,
    "preview": "package ai.mlc.mlcengineexample.ui.theme\n\nimport android.app.Activity\nimport android.os.Build\nimport androidx.compose.fo"
  },
  {
    "path": "android/MLCEngineExample/app/src/main/java/ai/mlc/mlcengineexample/ui/theme/Type.kt",
    "chars": 993,
    "preview": "package ai.mlc.mlcengineexample.ui.theme\n\nimport androidx.compose.material3.Typography\nimport androidx.compose.ui.text.T"
  },
  {
    "path": "android/MLCEngineExample/app/src/main/res/drawable/ic_android_black_24dp.xml",
    "chars": 579,
    "preview": "<vector android:height=\"24dp\" android:tint=\"#000000\"\n    android:viewportHeight=\"24\" android:viewportWidth=\"24\"\n    andr"
  },
  {
    "path": "android/MLCEngineExample/app/src/main/res/drawable/mlc_logo_108.xml",
    "chars": 5909,
    "preview": "<vector xmlns:android=\"http://schemas.android.com/apk/res/android\"\n    android:width=\"108dp\"\n    android:height=\"108dp\"\n"
  },
  {
    "path": "android/MLCEngineExample/app/src/main/res/values/colors.xml",
    "chars": 379,
    "preview": "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<resources>\n    <color name=\"purple_200\">#FFBB86FC</color>\n    <color name=\"purpl"
  },
  {
    "path": "android/MLCEngineExample/app/src/main/res/values/strings.xml",
    "chars": 79,
    "preview": "<resources>\n    <string name=\"app_name\">MLCEngineExample</string>\n</resources>\n"
  },
  {
    "path": "android/MLCEngineExample/app/src/main/res/values/themes.xml",
    "chars": 148,
    "preview": "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<resources>\n\n    <style name=\"Theme.MLCEngineExample\" parent=\"android:Theme.Mater"
  },
  {
    "path": "android/MLCEngineExample/app/src/main/res/xml/backup_rules.xml",
    "chars": 479,
    "preview": "<?xml version=\"1.0\" encoding=\"utf-8\"?><!--\n   Sample backup rules file; uncomment and customize as necessary.\n   See htt"
  },
  {
    "path": "android/MLCEngineExample/app/src/main/res/xml/data_extraction_rules.xml",
    "chars": 552,
    "preview": "<?xml version=\"1.0\" encoding=\"utf-8\"?><!--\n   Sample data extraction rules file; uncomment and customize as necessary.\n "
  },
  {
    "path": "android/MLCEngineExample/build.gradle",
    "chars": 197,
    "preview": "plugins {\n    id 'com.android.application' version '8.2.0' apply false\n    id 'com.android.library' version '8.2.0' appl"
  },
  {
    "path": "android/MLCEngineExample/bundle_weight.py",
    "chars": 2489,
    "preview": "import argparse\nimport os\nimport subprocess\nfrom pathlib import Path\n\nfrom mlc_llm.support import logging\n\nlogging.enabl"
  },
  {
    "path": "android/MLCEngineExample/gradle/wrapper/gradle-wrapper.properties",
    "chars": 230,
    "preview": "#Thu Jan 25 10:19:50 EST 2024\ndistributionBase=GRADLE_USER_HOME\ndistributionPath=wrapper/dists\ndistributionUrl=https\\://"
  },
  {
    "path": "android/MLCEngineExample/gradle.properties",
    "chars": 1359,
    "preview": "# Project-wide Gradle settings.\n# IDE (e.g. Android Studio) users:\n# Gradle settings configured through the IDE *will ov"
  },
  {
    "path": "android/MLCEngineExample/gradlew",
    "chars": 5766,
    "preview": "#!/usr/bin/env sh\n\n#\n# Copyright 2015 the original author or authors.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "android/MLCEngineExample/gradlew.bat",
    "chars": 2674,
    "preview": "@rem\n@rem Copyright 2015 the original author or authors.\n@rem\n@rem Licensed under the Apache License, Version 2.0 (the \""
  },
  {
    "path": "android/MLCEngineExample/mlc-package-config.json",
    "chars": 306,
    "preview": "{\n    \"device\": \"android\",\n    \"model_list\": [\n        {\n            \"model\": \"HF://mlc-ai/phi-2-q4f16_1-MLC\",\n         "
  },
  {
    "path": "android/MLCEngineExample/settings.gradle",
    "chars": 404,
    "preview": "pluginManagement {\n    repositories {\n        google()\n        mavenCentral()\n        gradlePluginPortal()\n    }\n}\ndepen"
  },
  {
    "path": "android/README.md",
    "chars": 85,
    "preview": "# MLC-LLM Android\n\n[Documentation page](https://llm.mlc.ai/docs/deploy/android.html)\n"
  },
  {
    "path": "android/mlc4j/.gitignore",
    "chars": 7,
    "preview": "/build\n"
  },
  {
    "path": "android/mlc4j/CMakeLists.txt",
    "chars": 2916,
    "preview": "cmake_minimum_required(VERSION 3.18)\n\nproject(mlc-chat C CXX)\n\nset(ANDROID_DIR ${CMAKE_CURRENT_LIST_DIR})\nset(ANDROID_BI"
  },
  {
    "path": "android/mlc4j/build.gradle",
    "chars": 852,
    "preview": "plugins {\n    id 'com.android.library'\n    id 'org.jetbrains.kotlin.android'\n    id 'org.jetbrains.kotlin.plugin.seriali"
  },
  {
    "path": "android/mlc4j/prepare_libs.py",
    "chars": 3882,
    "preview": "\"\"\"The build script for mlc4j (MLC LLM and tvm4j)\"\"\"\n\nimport argparse\nimport json\nimport os\nimport subprocess\nimport sys"
  },
  {
    "path": "android/mlc4j/src/cpp/tvm_runtime.h",
    "chars": 2251,
    "preview": "#define TVM_USE_LIBBACKTRACE 0\n\n#include <android/log.h>\n#include <dlfcn.h>\n#include <tvm/runtime/logging.h>\n\n#include <"
  },
  {
    "path": "android/mlc4j/src/main/AndroidManifest.xml",
    "chars": 122,
    "preview": "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<manifest xmlns:android=\"http://schemas.android.com/apk/res/android\">\n\n</manifest"
  },
  {
    "path": "android/mlc4j/src/main/java/ai/mlc/mlcllm/JSONFFIEngine.java",
    "chars": 3012,
    "preview": "package ai.mlc.mlcllm;\n\nimport org.apache.tvm.Device;\nimport org.apache.tvm.Function;\nimport org.apache.tvm.Module;\nimpo"
  },
  {
    "path": "android/mlc4j/src/main/java/ai/mlc/mlcllm/MLCEngine.kt",
    "chars": 6071,
    "preview": "package ai.mlc.mlcllm\n\nimport ai.mlc.mlcllm.OpenAIProtocol.*\nimport kotlinx.coroutines.GlobalScope\nimport kotlinx.corout"
  },
  {
    "path": "android/mlc4j/src/main/java/ai/mlc/mlcllm/OpenAIProtocol.kt",
    "chars": 7384,
    "preview": "package ai.mlc.mlcllm\n\nimport kotlinx.serialization.KSerializer\nimport kotlinx.serialization.Serializable\nimport kotlinx"
  },
  {
    "path": "ci/bash.sh",
    "chars": 3263,
    "preview": "#!/usr/bin/env bash\n\nif [ \"$#\" -lt 1 ]; then\n    echo \"Usage: ci/bash.sh <CONTAINER_NAME> -e key value -v key value [COM"
  },
  {
    "path": "ci/build-environment.yaml",
    "chars": 207,
    "preview": "name: mlc-llm-build\n\nchannels:\n  - conda-forge\n\ndependencies:\n  - conda-build\n  - anaconda-client\n  - libvulkan-headers\n"
  },
  {
    "path": "ci/jenkinsfile.groovy",
    "chars": 10745,
    "preview": "// Licensed to the Apache Software Foundation (ASF) under one\n// or more contributor license agreements.  See the NOTICE"
  },
  {
    "path": "ci/task/black.sh",
    "chars": 212,
    "preview": "#!/bin/bash\nset -eo pipefail\nset -x\n: ${NUM_THREADS:=$(nproc)}\n: ${WORKSPACE_CWD:=$(pwd)}\n: ${GPU:=\"cpu\"}\n\nblack --diff "
  },
  {
    "path": "ci/task/build_clean.sh",
    "chars": 167,
    "preview": "#!/bin/bash\nset -eo pipefail\nset -x\n: ${NUM_THREADS:=$(nproc)}\n: ${WORKSPACE_CWD:=$(pwd)}\n: ${GPU:=\"cpu\"}\n\nrm -rf ${WORK"
  },
  {
    "path": "ci/task/build_lib.sh",
    "chars": 2523,
    "preview": "#!/bin/bash\nset -eo pipefail\nset -x\n: ${NUM_THREADS:=$(nproc)}\n: ${WORKSPACE_CWD:=$(pwd)}\n: ${GPU:=\"cpu\"}\nexport CCACHE_"
  },
  {
    "path": "ci/task/build_win.bat",
    "chars": 138,
    "preview": "cd mlc-llm\nrd /s /q build\nmkdir build\n\necho set(USE_VULKAN ON) >> config.cmake\n\npip install . -v\n\nif %errorlevel% neq 0 "
  },
  {
    "path": "ci/task/clang-format.sh",
    "chars": 1821,
    "preview": "#!/bin/bash\nset -eo pipefail\nset -x\n: ${NUM_THREADS:=$(nproc)}\n: ${WORKSPACE_CWD:=$(pwd)}\n: ${GPU:=\"cpu\"}\n\nINPLACE_FORMA"
  },
  {
    "path": "ci/task/isort.sh",
    "chars": 220,
    "preview": "#!/bin/bash\nset -eo pipefail\nset -x\n: ${NUM_THREADS:=$(nproc)}\n: ${WORKSPACE_CWD:=$(pwd)}\n: ${GPU:=\"cpu\"}\n\nisort --check"
  },
  {
    "path": "ci/task/mypy.sh",
    "chars": 167,
    "preview": "#!/bin/bash\n: ${NUM_THREADS:=$(nproc)}\n: ${WORKSPACE_CWD:=$(pwd)}\n: ${GPU:=\"cpu\"}\n\nmypy --install-types --non-interactiv"
  },
  {
    "path": "ci/task/pylint.sh",
    "chars": 618,
    "preview": "#!/bin/bash\nset -eo pipefail\nset -x\n: ${NUM_THREADS:=$(nproc)}\n: ${WORKSPACE_CWD:=$(pwd)}\n: ${GPU:=\"cpu\"}\nexport PYTHONP"
  },
  {
    "path": "ci/task/test_model_compile.sh",
    "chars": 1624,
    "preview": "#!/bin/bash\nset -eo pipefail\nset -x\n: ${NUM_THREADS:=$(nproc)}\n: ${WORKSPACE_CWD:=$(pwd)}\n: ${GPU:=\"cpu\"}\n\npip install -"
  },
  {
    "path": "ci/task/test_unittest.sh",
    "chars": 653,
    "preview": "#!/bin/bash\nset -eo pipefail\nset -x\n\n# this scripts only triggers in CI_ENV where these environment variable are passed\n"
  },
  {
    "path": "cmake/gen_cmake_config.py",
    "chars": 2383,
    "preview": "from collections import namedtuple\n\nBackend = namedtuple(\"Backend\", [\"name\", \"cmake_config_name\", \"prompt_str\", \"parent\""
  },
  {
    "path": "cpp/base.h",
    "chars": 302,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file base.h\n */\n\n#ifndef MLC_LLM_DLL\n#ifdef _WIN32\n#ifdef MLC_LLM_EX"
  },
  {
    "path": "cpp/json_ffi/conv_template.cc",
    "chars": 23452,
    "preview": "#include \"conv_template.h\"\n\n#include <tvm/ffi/function.h>\n\n#include \"../support/json_parser.h\"\n#include \"image_utils.h\"\n"
  },
  {
    "path": "cpp/json_ffi/conv_template.h",
    "chars": 5055,
    "preview": "#ifndef MLC_LLM_JSON_FFI_CONV_TEMPLATE_H\n#define MLC_LLM_JSON_FFI_CONV_TEMPLATE_H\n\n#include <tvm/ffi/extra/json.h>\n\n#inc"
  },
  {
    "path": "cpp/json_ffi/image_utils.cc",
    "chars": 5514,
    "preview": "#include \"image_utils.h\"\n\n#include <tvm/support/io.h>\n\n#include \"../../3rdparty/tvm/src/support/base64.h\"\n#define STB_IM"
  },
  {
    "path": "cpp/json_ffi/image_utils.h",
    "chars": 919,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file json_ffi/image_utils.h\n * \\brief The header of Image utils for "
  },
  {
    "path": "cpp/json_ffi/json_ffi_engine.cc",
    "chars": 12714,
    "preview": "#include \"json_ffi_engine.h\"\n\n#include <tvm/ffi/extra/json.h>\n#include <tvm/ffi/function.h>\n#include <tvm/ffi/reflection"
  },
  {
    "path": "cpp/json_ffi/json_ffi_engine.h",
    "chars": 1759,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file json_ffi/json_ffi_engine.h\n * \\brief The header of JSON FFI eng"
  },
  {
    "path": "cpp/json_ffi/openai_api_protocol.cc",
    "chars": 19296,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file json_ffi/openai_api_protocol.cc\n * \\brief The implementation of"
  },
  {
    "path": "cpp/json_ffi/openai_api_protocol.h",
    "chars": 6126,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file json_ffi/openai_api_protocol.h\n * \\brief The header of OpenAI A"
  },
  {
    "path": "cpp/metadata/model.cc",
    "chars": 7459,
    "preview": "#include \"./model.h\"\n\n#include <unordered_map>\n\n#include \"../support/json_parser.h\"\n\nnamespace mlc {\nnamespace llm {\n\nus"
  },
  {
    "path": "cpp/metadata/model.h",
    "chars": 3138,
    "preview": "/*!\n * \\file model.h\n * \\brief Metadata stored in model lib\n */\n#ifndef MLC_LLM_CPP_MODEL_METADATA_H_\n#define MLC_LLM_CP"
  },
  {
    "path": "cpp/multi_gpu/builtin.cc",
    "chars": 3792,
    "preview": "/*!\n * \\file builtin.cc\n * \\brief Multi-GPU builtin functions in MLC LLM.\n */\n#ifndef MLC_SINGLE_GPU_ONLY\n\n#include <tvm"
  },
  {
    "path": "cpp/multi_gpu/multi_gpu_loader.cc",
    "chars": 13779,
    "preview": "/*!\n * \\file multi_gpu_loader.cc\n * \\brief Implementation of a multi-GPU loader with loading-time sharding.\n */\n#ifndef "
  },
  {
    "path": "cpp/serve/config.cc",
    "chars": 50975,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/config.cc\n */\n#include \"config.h\"\n\n#include <tvm/ffi/func"
  },
  {
    "path": "cpp/serve/config.h",
    "chars": 16224,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/config.h\n */\n#ifndef MLC_LLM_SERVE_CONFIG_H_\n#define MLC_"
  },
  {
    "path": "cpp/serve/data.cc",
    "chars": 9855,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/data.cc\n */\n#include \"data.h\"\n\n#include <tvm/ffi/function"
  },
  {
    "path": "cpp/serve/data.h",
    "chars": 8396,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/data.h\n */\n#ifndef MLC_LLM_SERVE_DATA_H_\n#define MLC_LLM_"
  },
  {
    "path": "cpp/serve/draft_token_workspace_manager.cc",
    "chars": 2849,
    "preview": "/*!\n * Copyright (c) 2023-2025 by Contributors\n * \\file serve/draft_token_workspace_manager.cc\n */\n\n#include \"draft_toke"
  },
  {
    "path": "cpp/serve/draft_token_workspace_manager.h",
    "chars": 4049,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/draft_token_workspace_manager.h\n */\n\n#ifndef MLC_LLM_SERV"
  },
  {
    "path": "cpp/serve/engine.cc",
    "chars": 49206,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/engine.cc\n * \\brief The implementation for runtime module"
  },
  {
    "path": "cpp/serve/engine.h",
    "chars": 4595,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/engine.h\n * \\brief The header of serving engine in MLC LL"
  },
  {
    "path": "cpp/serve/engine_actions/action.cc",
    "chars": 299,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/engine_actions/action.cc\n */\n\n#include \"action.h\"\n\nnamesp"
  },
  {
    "path": "cpp/serve/engine_actions/action.h",
    "chars": 13550,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/engine_actions/action.h\n * \\brief The abstraction of acti"
  },
  {
    "path": "cpp/serve/engine_actions/action_commons.cc",
    "chars": 23329,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/engine_actions/action_commons.cc\n */\n\n#include \"action_co"
  },
  {
    "path": "cpp/serve/engine_actions/action_commons.h",
    "chars": 5747,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/engine_actions/action_commons.h\n * \\brief Common function"
  },
  {
    "path": "cpp/serve/engine_actions/auto_spec_decode.cc",
    "chars": 3213,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/engine_actions/auto_spec_decode.cc\n */\n\n#include <tvm/run"
  },
  {
    "path": "cpp/serve/engine_actions/batch_decode.cc",
    "chars": 13774,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/engine_actions/batch_decode.cc\n */\n\n#include <tvm/runtime"
  },
  {
    "path": "cpp/serve/engine_actions/batch_draft.cc",
    "chars": 19836,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/engine_actions/batch_draft.cc\n */\n\n#include <numeric>\n\n#i"
  },
  {
    "path": "cpp/serve/engine_actions/batch_jumpforward.cc",
    "chars": 9199,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/engine_actions/batch_verify.cc\n */\n\n#include <tvm/runtime"
  },
  {
    "path": "cpp/serve/engine_actions/batch_prefill_base.cc",
    "chars": 24246,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/engine_actions/batch_prefill_base.h\n */\n\n#include \"batch_"
  },
  {
    "path": "cpp/serve/engine_actions/batch_prefill_base.h",
    "chars": 6337,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/engine_actions/batch_prefill_base.h\n */\n\n#include <tvm/ru"
  },
  {
    "path": "cpp/serve/engine_actions/batch_verify.cc",
    "chars": 19183,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/engine_actions/batch_verify.cc\n */\n\n#include <tvm/runtime"
  },
  {
    "path": "cpp/serve/engine_actions/disagg_prepare_recv.cc",
    "chars": 20558,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/engine_actions/new_request_prefill.cc\n */\n\n#include <opti"
  },
  {
    "path": "cpp/serve/engine_actions/disagg_remote_send.cc",
    "chars": 24612,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/engine_actions/new_request_prefill.cc\n */\n\n#include \"../s"
  },
  {
    "path": "cpp/serve/engine_actions/eagle_batch_draft.cc",
    "chars": 10962,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/engine_actions/eagle_batch_draft.cc\n */\n\n#include <numeri"
  },
  {
    "path": "cpp/serve/engine_actions/eagle_batch_verify.cc",
    "chars": 23426,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/engine_actions/eagle_batch_verify.cc\n */\n\n#include <tvm/r"
  },
  {
    "path": "cpp/serve/engine_actions/eagle_new_request_prefill.cc",
    "chars": 25653,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/engine_actions/eagle_new_request_prefill.cc\n */\n\n#include"
  },
  {
    "path": "cpp/serve/engine_actions/new_request_prefill.cc",
    "chars": 17293,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/engine_actions/new_request_prefill.cc\n */\n\n#include \"../s"
  },
  {
    "path": "cpp/serve/engine_state.cc",
    "chars": 1660,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/engine_state.cc\n */\n#include \"engine_state.h\"\n\nnamespace "
  },
  {
    "path": "cpp/serve/engine_state.h",
    "chars": 3927,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/engine_state.h\n */\n#ifndef MLC_LLM_SERVE_ENGINE_STATE_H_\n"
  },
  {
    "path": "cpp/serve/event_trace_recorder.cc",
    "chars": 5796,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/event_trace_recorder.cc\n */\n#include \"event_trace_recorde"
  },
  {
    "path": "cpp/serve/event_trace_recorder.h",
    "chars": 2665,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/event_trace_recorder.h\n * \\brief The event trace recorder"
  },
  {
    "path": "cpp/serve/function_table.cc",
    "chars": 17774,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/function_table.cc\n * \\brief The implementation of functio"
  },
  {
    "path": "cpp/serve/function_table.h",
    "chars": 5393,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/function_table.h\n * \\brief The header for function table "
  },
  {
    "path": "cpp/serve/logit_processor.cc",
    "chars": 23147,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/logit_processor.cc\n * \\brief The implementation of logit "
  },
  {
    "path": "cpp/serve/logit_processor.h",
    "chars": 4191,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/logit_processor.h\n * \\brief The header for logit processo"
  },
  {
    "path": "cpp/serve/metrics.cc",
    "chars": 6614,
    "preview": "\n/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/metrics.cc\n */\n#include \"metrics.h\"\n\n#include <tvm/runti"
  },
  {
    "path": "cpp/serve/metrics.h",
    "chars": 9438,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/metric.h\n * \\brief Metrics of serving engine/requests.\n *"
  },
  {
    "path": "cpp/serve/model.cc",
    "chars": 49460,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/model.cc\n * \\brief The implementation of runtime module o"
  },
  {
    "path": "cpp/serve/model.h",
    "chars": 17827,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/model.h\n * \\brief The header for runtime module of LLM fu"
  },
  {
    "path": "cpp/serve/prefix_cache.cc",
    "chars": 17950,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/prefix_cache.cc\n */\n#include \"prefix_cache.h\"\n\n#include <"
  },
  {
    "path": "cpp/serve/prefix_cache.h",
    "chars": 5165,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/prefix_cache.h\n */\n#ifndef MLC_LLM_SERVE_PREFIX_CACHE_H_\n"
  },
  {
    "path": "cpp/serve/radix_tree.cc",
    "chars": 30835,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/radix_tree.cc\n */\n#include \"radix_tree.h\"\n\n#include <tvm/"
  },
  {
    "path": "cpp/serve/radix_tree.h",
    "chars": 4075,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/radix_tree.h\n */\n#ifndef MLC_LLM_SERVE_RADIX_TREE_H_\n#def"
  },
  {
    "path": "cpp/serve/request.cc",
    "chars": 2521,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/request.cc\n */\n\n#include \"request.h\"\n\n#include <tvm/ffi/f"
  },
  {
    "path": "cpp/serve/request.h",
    "chars": 2691,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/request.h\n * \\brief Implementation of llm chat.\n */\n#ifnd"
  },
  {
    "path": "cpp/serve/request_state.cc",
    "chars": 12429,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/request_state.cc\n */\n\n#include \"request_state.h\"\n\n#includ"
  },
  {
    "path": "cpp/serve/request_state.h",
    "chars": 12756,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/request_state.h\n * \\brief The data structure maintaining "
  },
  {
    "path": "cpp/serve/sampler/cpu_sampler.cc",
    "chars": 23650,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/sampler/cpu_sampler.cc\n * \\brief The implementation for C"
  },
  {
    "path": "cpp/serve/sampler/gpu_sampler.cc",
    "chars": 38441,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/sampler/gpu_sampler.cc\n * \\brief The implementation for G"
  },
  {
    "path": "cpp/serve/sampler/sampler.h",
    "chars": 7638,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/sampler/sampler.h\n * \\brief The header for runtime module"
  },
  {
    "path": "cpp/serve/threaded_engine.cc",
    "chars": 16139,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/threaded_engine.cc\n * \\brief The implementation for threa"
  },
  {
    "path": "cpp/serve/threaded_engine.h",
    "chars": 2983,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file serve/threaded_engine.h\n * \\brief The header of threaded servin"
  },
  {
    "path": "cpp/support/debug_utils.h",
    "chars": 988,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file support/debug_utils.h\n * \\brief Tools for debug purposes.\n */\n#"
  },
  {
    "path": "cpp/support/dynamic_bitset.h",
    "chars": 4510,
    "preview": "/*!\n * Copyright (c) 2023-2025 by Contributors\n * \\file support/dynamic_bitset.h\n * \\brief The header for utilities used"
  },
  {
    "path": "cpp/support/encoding.cc",
    "chars": 7264,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file support/encoding.cc\n */\n#include \"encoding.h\"\n\n#include <tvm/ru"
  },
  {
    "path": "cpp/support/encoding.h",
    "chars": 4000,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file support/encoding.h\n * \\brief Encoding and decoding from/to UTF-"
  },
  {
    "path": "cpp/support/json_parser.h",
    "chars": 10405,
    "preview": "/*!\n * \\file support/json_parser.h\n * \\brief Helps to parse JSON strings and objects.\n */\n#ifndef MLC_LLM_SUPPORT_JSON_P"
  },
  {
    "path": "cpp/support/load_bytes_from_file.h",
    "chars": 808,
    "preview": "/*!\n * Copyright (c) 2023-2025 by Contributors\n * \\file support/load_bytes_from_file.h\n * \\brief Utility methods to load"
  },
  {
    "path": "cpp/support/progress_bar.h",
    "chars": 1260,
    "preview": "/*!\n * Copyright (c) 2023-2025 by Contributors\n * \\file support/progress_bar.h\n * \\brief A simple progress bar in C++.\n "
  },
  {
    "path": "cpp/support/random.h",
    "chars": 780,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file support/random.h\n * \\brief Header of random number generator.\n "
  },
  {
    "path": "cpp/support/result.h",
    "chars": 2406,
    "preview": "/*!\n * Copyright (c) 2023-2025 by Contributors\n * \\file support/result.h\n * \\brief The header for the Result class in ML"
  },
  {
    "path": "cpp/support/utils.h",
    "chars": 2108,
    "preview": "/*!\n * Copyright (c) 2023-2025 by Contributors\n * \\file support/utils.h\n * \\brief Utility functions.\n */\n#ifndef MLC_LLM"
  },
  {
    "path": "cpp/support/vlm_utils.cc",
    "chars": 2016,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file support/image_utils.cc\n */\n#include \"vlm_utils.h\"\n\n#include <cm"
  },
  {
    "path": "cpp/support/vlm_utils.h",
    "chars": 2194,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file support/vlm_utils.h\n * \\brief Tools for debug purposes.\n */\n#if"
  },
  {
    "path": "cpp/tokenizers/streamer.cc",
    "chars": 11522,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file streamer.cc\n */\n\n#include \"streamer.h\"\n\n#include <tvm/ffi/funct"
  },
  {
    "path": "cpp/tokenizers/streamer.h",
    "chars": 5218,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file streamer.h\n * \\brief Header of streamers in MLC LLM.\n */\n\n#ifnd"
  },
  {
    "path": "cpp/tokenizers/tokenizers.cc",
    "chars": 22774,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file tokenizer.cc\n */\n\n#include \"tokenizers.h\"\n\n#include <tokenizers"
  },
  {
    "path": "cpp/tokenizers/tokenizers.h",
    "chars": 6294,
    "preview": "/*!\n *  Copyright (c) 2023-2025 by Contributors\n * \\file tokenizers.h\n * \\brief Header of tokenizer related functions.\n "
  },
  {
    "path": "docs/.gitignore",
    "chars": 8,
    "preview": "_build/\n"
  },
  {
    "path": "docs/Makefile",
    "chars": 638,
    "preview": "# Minimal makefile for Sphinx documentation\n#\n\n# You can set these variables from the command line, and also\n# from the "
  },
  {
    "path": "docs/README.md",
    "chars": 652,
    "preview": "# MLC-LLM Documentation\n\nThe documentation was built upon [Sphinx](https://www.sphinx-doc.org/en/master/).\n\n## Dependenc"
  },
  {
    "path": "docs/community/faq.rst",
    "chars": 865,
    "preview": ".. _FAQ:\n\nFrequently Asked Questions\n==========================\n\nThis is a list of Frequently Asked Questions (FAQ) abou"
  },
  {
    "path": "docs/community/guideline.rst",
    "chars": 6639,
    "preview": ".. _community_guide:\n\nCommunity Guideline\n===================\n\n.. contents::\n  :depth: 2\n  :local:\n\nWelcome to the MLC-L"
  },
  {
    "path": "docs/compilation/compile_models.rst",
    "chars": 54771,
    "preview": ".. _compile-model-libraries:\n\nCompile Model Libraries\n=======================\n\nTo run a model with MLC LLM in any platfo"
  },
  {
    "path": "docs/compilation/configure_quantization.rst",
    "chars": 3505,
    "preview": "Configure Quantization\n======================\n\nQuantization Algorithm\n----------------------\n\nThe default quantization a"
  },
  {
    "path": "docs/compilation/convert_weights.rst",
    "chars": 6008,
    "preview": ".. _convert-weights-via-MLC:\n\nConvert Model Weights\n=====================\n\nTo run a model with MLC LLM,\nwe need to conve"
  },
  {
    "path": "docs/compilation/define_new_models.rst",
    "chars": 1174,
    "preview": "Define New Model Architectures\n==============================\n\nThis page guides you how to add a new model architecture "
  },
  {
    "path": "docs/compilation/package_libraries_and_weights.rst",
    "chars": 8785,
    "preview": ".. _package-libraries-and-weights:\n\nPackage Libraries and Weights\n=============================\n\nWhen we want to build L"
  },
  {
    "path": "docs/conf.py",
    "chars": 2476,
    "preview": "# -*- coding: utf-8 -*-\nimport os\nimport sys\n\nimport tlcpack_sphinx_addon\n\n# -- General configuration ------------------"
  },
  {
    "path": "docs/deploy/android.rst",
    "chars": 15487,
    "preview": ".. _deploy-android:\n\nAndroid SDK\n===========\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\nDemo App\n--------"
  },
  {
    "path": "docs/deploy/cli.rst",
    "chars": 3922,
    "preview": ".. _deploy-cli:\n\nCLI\n===============\n\nMLC Chat CLI is the command line tool to run MLC-compiled LLMs out of the box inte"
  },
  {
    "path": "docs/deploy/ide_integration.rst",
    "chars": 7209,
    "preview": ".. _deploy-ide-integration:\n\nIDE Integration\n===============\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\nM"
  },
  {
    "path": "docs/deploy/ios.rst",
    "chars": 15208,
    "preview": ".. _deploy-ios:\n\niOS Swift SDK\n=============\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\nThe MLC LLM iOS a"
  },
  {
    "path": "docs/deploy/mlc_chat_config.rst",
    "chars": 7287,
    "preview": ".. _configure-mlc-chat-json:\n\nCustomize MLC Chat Config\n=========================\n\n``mlc-chat-config.json`` is required "
  },
  {
    "path": "docs/deploy/python_engine.rst",
    "chars": 9921,
    "preview": ".. _deploy-python-engine:\n\nPython API\n==========\n\n.. note::\n  This page introduces the Python API with MLCEngine in MLC "
  },
  {
    "path": "docs/deploy/rest.rst",
    "chars": 20412,
    "preview": ".. _deploy-rest-api:\n\nREST API\n========\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\nWe provide `REST API <"
  },
  {
    "path": "docs/deploy/webllm.rst",
    "chars": 15005,
    "preview": ".. _webllm-runtime:\n\nWebLLM Javascript SDK\n=====================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: "
  },
  {
    "path": "docs/get_started/introduction.rst",
    "chars": 14543,
    "preview": ".. _introduction-to-mlc-llm:\n\nIntroduction to MLC LLM\n=======================\n\n.. contents:: Table of Contents\n    :loca"
  },
  {
    "path": "docs/get_started/quick_start.rst",
    "chars": 6525,
    "preview": ".. _quick-start:\n\nQuick Start\n===========\n\nExamples\n--------\n\nTo begin with, try out MLC LLM support for int4-quantized "
  },
  {
    "path": "docs/index.rst",
    "chars": 1773,
    "preview": "👋 Welcome to MLC LLM\n=====================\n\n`Discord <https://discord.gg/9Xpy2HGBuD>`_ | `GitHub <https://github.com/mlc"
  },
  {
    "path": "docs/install/conda.rst",
    "chars": 2748,
    "preview": "Install Conda\n=============\n\nMLC LLM does not depend on, but generally recommends conda as a generic dependency manager,"
  }
]

// ... and 461 more files (download for full content)

About this extraction

This page contains the full source code of the mlc-ai/mlc-llm GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 661 files (4.0 MB), approximately 1.1M tokens, and a symbol index with 3167 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo