Full Code of OpenGVLab/InternVL for AI

main 2410d1dbf208 cached

864 files

28.3 MB

7.5M tokens

2771 symbols

1 requests

Copy disabled (too large) Download .txt

Showing preview only (29,860K chars total). Download the full file to get everything.

Repository: OpenGVLab/InternVL
Branch: main
Commit: 2410d1dbf208
Files: 864
Total size: 28.3 MB

Directory structure:
gitextract_i3i5r_p7/

├── .flake8
├── .github/
│   ├── CONTRIBUTING.md
│   └── ISSUE_TEMPLATE/
│       ├── 1-bug-report.yml
│       ├── 2-feature-request.yml
│       └── 3-documentation.yml
├── .gitignore
├── .isort.cfg
├── .pre-commit-config.yaml
├── INSTALLATION.md
├── LICENSE
├── README.md
├── README_zh.md
├── classification/
│   ├── README.md
│   ├── config.py
│   ├── configs/
│   │   ├── attn_pooling_probing/
│   │   │   ├── attn_pooling_probing_intern_vit_6b_224px_in1k_224_64gpu.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_a.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_r.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_real.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_sketch.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenetv2.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_224px_in1k_224to448_64gpu.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_a.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_r.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_real.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_sketch.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenetv2.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_a.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_r.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_real.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_sketch.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenetv2.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_a.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_r.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_real.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_sketch.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenetv2.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenet_a.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenet_r.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenet_real.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenet_sketch.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenetv2.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenet_a.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenet_r.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenet_real.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenet_sketch.yaml
│   │   │   └── attn_pooling_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenetv2.yaml
│   │   ├── intern_vit_6b_1k_224.yaml
│   │   ├── intern_vit_6b_1k_224_test_imagenet_a.yaml
│   │   ├── intern_vit_6b_1k_224_test_imagenet_r.yaml
│   │   ├── intern_vit_6b_1k_224_test_imagenet_real.yaml
│   │   ├── intern_vit_6b_1k_224_test_imagenet_sketch.yaml
│   │   ├── intern_vit_6b_1k_224_test_imagenetv2.yaml
│   │   └── linear_probing/
│   │       ├── linear_probing_intern_vit_6b_224px_in1k_224_64gpu.yaml
│   │       ├── linear_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_a.yaml
│   │       ├── linear_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_r.yaml
│   │       ├── linear_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_real.yaml
│   │       ├── linear_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_sketch.yaml
│   │       ├── linear_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenetv2.yaml
│   │       ├── linear_probing_intern_vit_6b_224px_in1k_224to448_64gpu.yaml
│   │       ├── linear_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_a.yaml
│   │       ├── linear_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_r.yaml
│   │       ├── linear_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_real.yaml
│   │       ├── linear_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_sketch.yaml
│   │       ├── linear_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenetv2.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_a.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_r.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_real.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_sketch.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenetv2.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_a.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_r.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_real.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_sketch.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenetv2.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenet_a.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenet_r.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenet_real.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenet_sketch.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenetv2.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenet_a.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenet_r.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenet_real.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenet_sketch.yaml
│   │       └── linear_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenetv2.yaml
│   ├── dataset/
│   │   ├── __init__.py
│   │   ├── build.py
│   │   ├── cached_image_folder.py
│   │   ├── imagenet_a_r_indices.py
│   │   ├── imagenet_real.py
│   │   ├── imagenetv2.py
│   │   ├── samplers.py
│   │   └── zipreader.py
│   ├── ddp_hooks.py
│   ├── gflops.py
│   ├── hf2pytorch.py
│   ├── logger.py
│   ├── lr_scheduler.py
│   ├── main.py
│   ├── meta_data/
│   │   ├── 22k_class_to_idx.json
│   │   ├── imagenet_classes.json
│   │   ├── map22kto1k.txt
│   │   └── real.json
│   ├── models/
│   │   ├── __init__.py
│   │   ├── build.py
│   │   ├── clip_vit.py
│   │   ├── flash_attention.py
│   │   └── intern_vit_6b.py
│   ├── optimizer.py
│   ├── train_in1k.sh
│   ├── utils.py
│   └── work_dirs/
│       └── intern_vit_6b_1k_224/
│           └── log_rank0.txt
├── clip_benchmark/
│   ├── AUTHORS.rst
│   ├── CONTRIBUTING.rst
│   ├── HISTORY.rst
│   ├── LICENSE
│   ├── MANIFEST.in
│   ├── Makefile
│   ├── README.md
│   ├── benchmark/
│   │   ├── README.md
│   │   ├── benchmark.csv
│   │   ├── dataset_type.csv
│   │   ├── datasets.txt
│   │   ├── datasets_multilingual.txt
│   │   ├── models.txt
│   │   ├── results.ipynb
│   │   └── webdatasets.txt
│   ├── clip_benchmark/
│   │   ├── __init__.py
│   │   ├── cli.py
│   │   ├── datasets/
│   │   │   ├── __init__.py
│   │   │   ├── ar_classnames.json
│   │   │   ├── ar_zeroshot_classification_templates.json
│   │   │   ├── birdsnap.py
│   │   │   ├── builder.py
│   │   │   ├── caltech101.py
│   │   │   ├── cn_classnames.json
│   │   │   ├── cn_zeroshot_classification_templates.json
│   │   │   ├── cupl_prompts.json
│   │   │   ├── en_classnames.json
│   │   │   ├── en_zeroshot_classification_templates.json
│   │   │   ├── flickr.py
│   │   │   ├── imagenetv2.py
│   │   │   ├── it_classnames.json
│   │   │   ├── it_zeroshot_classification_templates.json
│   │   │   ├── jp_classnames.json
│   │   │   ├── jp_zeroshot_classification_templates.json
│   │   │   ├── kitti.py
│   │   │   ├── multilingual_mscoco.py
│   │   │   ├── objectnet.py
│   │   │   ├── tfds.py
│   │   │   ├── tools.py
│   │   │   └── voc2007.py
│   │   ├── metrics/
│   │   │   ├── __init__.py
│   │   │   ├── linear_probe.py
│   │   │   ├── mscoco_generative.py
│   │   │   ├── zeroshot_classification.py
│   │   │   └── zeroshot_retrieval.py
│   │   ├── model_collection.py
│   │   ├── models/
│   │   │   ├── __init__.py
│   │   │   ├── intern_vit_6b/
│   │   │   │   ├── configuration_intern_vit.py
│   │   │   │   ├── flash_attention.py
│   │   │   │   └── modeling_intern_vit.py
│   │   │   ├── internvl.py
│   │   │   ├── internvl_c_pytorch/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── chinese_alpaca_lora_7b/
│   │   │   │   │   ├── config.json
│   │   │   │   │   ├── generation_config.json
│   │   │   │   │   ├── pytorch_model.bin.index.json
│   │   │   │   │   ├── special_tokens_map.json
│   │   │   │   │   ├── tokenizer.model
│   │   │   │   │   └── tokenizer_config.json
│   │   │   │   ├── flash_attention.py
│   │   │   │   └── internvl_c.py
│   │   │   ├── internvl_huggingface/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── configuration_intern_vit.py
│   │   │   │   ├── configuration_internvl.py
│   │   │   │   ├── flash_attention.py
│   │   │   │   ├── modeling_intern_vit.py
│   │   │   │   ├── modeling_internvl.py
│   │   │   │   └── modeling_qllama.py
│   │   │   ├── japanese_clip.py
│   │   │   └── open_clip.py
│   │   └── webdataset_builder.py
│   ├── data/
│   │   ├── birdsnap/
│   │   │   └── test_images_valid.txt
│   │   ├── flickr30k/
│   │   │   └── flickr30k_cn_test.txt
│   │   └── mscoco_captions/
│   │       └── coco-cn_test.json
│   ├── probe_benchmark/
│   │   ├── PROBES.md
│   │   ├── build_df_scaling_experiments.py
│   │   ├── clip_table_2.csv
│   │   ├── generate_table.py
│   │   ├── laion5b_fewshot_experiments.py
│   │   ├── openclip_results.csv
│   │   ├── process_vtab.py
│   │   ├── scaling_experiment_data2.json
│   │   ├── scaling_experiment_data_vtab.json
│   │   ├── scaling_experiments.py
│   │   └── scaling_plot.ipynb
│   ├── requirements-test.txt
│   ├── requirements.txt
│   ├── setup.cfg
│   ├── setup.py
│   ├── test_internvl_c_classification.sh
│   ├── test_internvl_c_imagenet.sh
│   ├── test_internvl_c_retrieval.sh
│   ├── test_internvl_c_xtd.sh
│   ├── test_internvl_g_classification.sh
│   ├── test_internvl_g_imagenet.sh
│   ├── test_internvl_g_retrieval.sh
│   ├── test_internvl_g_retrieval_finetune.sh
│   ├── test_internvl_g_xtd.sh
│   ├── tests/
│   │   └── test_clip_benchmark.py
│   └── tox.ini
├── internvl_chat/
│   ├── README.md
│   ├── eval/
│   │   ├── README.md
│   │   ├── caption/
│   │   │   ├── README.md
│   │   │   └── evaluate_caption.py
│   │   ├── domain_specific/
│   │   │   ├── drivelm/
│   │   │   │   └── evaluate.py
│   │   │   ├── mme_rw/
│   │   │   │   └── evaluate.py
│   │   │   ├── rs_det/
│   │   │   │   ├── caculate.py
│   │   │   │   └── evaluate.py
│   │   │   └── rs_vqa/
│   │   │       ├── evaluate.py
│   │   │       └── score.py
│   │   ├── llava_bench/
│   │   │   ├── README.md
│   │   │   ├── eval_gpt_review_bench.py
│   │   │   ├── evaluate_llava_bench.py
│   │   │   ├── rule.json
│   │   │   └── summarize_gpt_review.py
│   │   ├── mantis_eval/
│   │   │   ├── README.md
│   │   │   └── evaluate_mantis.py
│   │   ├── mathvista/
│   │   │   ├── README.md
│   │   │   ├── calculate_score.py
│   │   │   ├── evaluate_mathvista.py
│   │   │   ├── extract_answer.py
│   │   │   ├── prompts/
│   │   │   │   └── ext_ans.py
│   │   │   └── utilities.py
│   │   ├── mirb/
│   │   │   ├── README.md
│   │   │   └── evaluate_mirb.py
│   │   ├── mmbench/
│   │   │   ├── README.md
│   │   │   └── evaluate_mmbench.py
│   │   ├── mme/
│   │   │   ├── README.md
│   │   │   ├── Your_Results/
│   │   │   │   ├── OCR.txt
│   │   │   │   ├── artwork.txt
│   │   │   │   ├── celebrity.txt
│   │   │   │   ├── code_reasoning.txt
│   │   │   │   ├── color.txt
│   │   │   │   ├── commonsense_reasoning.txt
│   │   │   │   ├── count.txt
│   │   │   │   ├── existence.txt
│   │   │   │   ├── landmark.txt
│   │   │   │   ├── numerical_calculation.txt
│   │   │   │   ├── position.txt
│   │   │   │   ├── posters.txt
│   │   │   │   ├── scene.txt
│   │   │   │   └── text_translation.txt
│   │   │   ├── calculation.py
│   │   │   └── eval.py
│   │   ├── mmhal/
│   │   │   ├── README.md
│   │   │   ├── eval_gpt_mmhal.py
│   │   │   └── evaluate_mmhal.py
│   │   ├── mmiu/
│   │   │   ├── README.md
│   │   │   ├── evaluate_mmiu.py
│   │   │   └── mmiu.jsonl
│   │   ├── mmmu/
│   │   │   ├── README.md
│   │   │   ├── answer_dict_val.json
│   │   │   ├── data_utils.py
│   │   │   ├── eval_utils.py
│   │   │   ├── evaluate_mmmu.py
│   │   │   └── main_eval_only.py
│   │   ├── mmmu_pro/
│   │   │   ├── README.md
│   │   │   ├── evaluate.py
│   │   │   ├── evaluate_mmmu_pro.py
│   │   │   └── prompts.yaml
│   │   ├── mmvet/
│   │   │   ├── README.md
│   │   │   └── evaluate_mmvet.py
│   │   ├── mmvetv2/
│   │   │   ├── README.md
│   │   │   └── evaluate_mmvet_v2.py
│   │   ├── mmvp/
│   │   │   ├── README.md
│   │   │   └── evaluate_mmvp.py
│   │   ├── mpdocvqa/
│   │   │   ├── README.md
│   │   │   ├── evaluate_vqa.py
│   │   │   └── infographicsvqa_eval.py
│   │   ├── mvbench/
│   │   │   ├── README.md
│   │   │   └── evaluate_mvbench.py
│   │   ├── pope/
│   │   │   ├── README.md
│   │   │   ├── eval_pope.py
│   │   │   └── evaluate_pope.py
│   │   ├── refcoco/
│   │   │   ├── README.md
│   │   │   └── evaluate_grounding.py
│   │   ├── scienceqa/
│   │   │   ├── README.md
│   │   │   └── evaluate_scienceqa.py
│   │   ├── seed/
│   │   │   ├── README.md
│   │   │   ├── calculation.py
│   │   │   └── evaluate_seed.py
│   │   ├── tiny_lvlm/
│   │   │   ├── README.md
│   │   │   ├── calculate_score.py
│   │   │   ├── evaluate_lvlm.py
│   │   │   └── tools.py
│   │   └── vqa/
│   │       ├── README.md
│   │       ├── convert_gqa_for_eval.py
│   │       ├── evaluate_vqa.py
│   │       ├── infographicsvqa_eval.py
│   │       └── textvqa_eval.py
│   ├── evaluate.sh
│   ├── internvl/
│   │   ├── conversation.py
│   │   ├── dist_utils.py
│   │   ├── model/
│   │   │   ├── __init__.py
│   │   │   ├── internlm2/
│   │   │   │   ├── configuration_internlm2.py
│   │   │   │   ├── modeling_internlm2.py
│   │   │   │   ├── tokenization_internlm2.py
│   │   │   │   └── tokenization_internlm2_fast.py
│   │   │   ├── internvl_chat/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── configuration_intern_vit.py
│   │   │   │   ├── configuration_internvl_chat.py
│   │   │   │   ├── modeling_intern_vit.py
│   │   │   │   └── modeling_internvl_chat.py
│   │   │   └── phi3/
│   │   │       ├── configuration_phi3.py
│   │   │       └── modeling_phi3.py
│   │   ├── patch/
│   │   │   ├── __init__.py
│   │   │   ├── internlm2_packed_training_patch.py
│   │   │   ├── internvit_liger_monkey_patch.py
│   │   │   ├── llama2_flash_attn_monkey_patch.py
│   │   │   ├── llama_flash_attn_monkey_patch.py
│   │   │   ├── llama_packed_training_patch.py
│   │   │   ├── llama_rmsnorm_monkey_patch.py
│   │   │   ├── pad_data_collator.py
│   │   │   ├── phi3_packed_training_patch.py
│   │   │   ├── qwen2_packed_training_patch.py
│   │   │   ├── train_dataloader_patch.py
│   │   │   └── train_sampler_patch.py
│   │   └── train/
│   │       ├── __init__.py
│   │       ├── constants.py
│   │       ├── dataset.py
│   │       ├── dataset_packed.py
│   │       ├── internvl_chat_finetune.py
│   │       ├── internvl_chat_mpo.py
│   │       ├── internvl_chat_pretrain.py
│   │       └── trainer_dpo.py
│   ├── pyproject.toml
│   ├── shell/
│   │   ├── data/
│   │   │   ├── coco_caption.json
│   │   │   ├── internvl_1_2_finetune.json
│   │   │   └── internvl_1_2_finetune_custom.json
│   │   ├── internvl1.2/
│   │   │   ├── 2nd_finetune/
│   │   │   │   ├── internvl_chat_v1_2_hermes2_yi34b_448_res_2nd_finetune_full.sh
│   │   │   │   └── internvl_chat_v1_2_hermes2_yi34b_448_res_2nd_finetune_lora.sh
│   │   │   └── hermes2_yi34b/
│   │   │       └── internvl_chat_v1_2_hermes2_yi34b_448_res_finetune.sh
│   │   ├── internvl1.5/
│   │   │   ├── 2nd_finetune/
│   │   │   │   ├── internvl_chat_v1_5_internlm2_1_8b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   ├── internvl_chat_v1_5_internlm2_1_8b_dynamic_res_2nd_finetune_lora.sh
│   │   │   │   ├── internvl_chat_v1_5_internlm2_20b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   ├── internvl_chat_v1_5_internlm2_20b_dynamic_res_2nd_finetune_lora.sh
│   │   │   │   ├── internvl_chat_v1_5_phi3_3_8b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   └── internvl_chat_v1_5_phi3_3_8b_dynamic_res_2nd_finetune_lora.sh
│   │   │   ├── hermes2_yi34b/
│   │   │   │   ├── internvl_chat_v1_5_hermes2_yi34b_dynamic_res_finetune.sh
│   │   │   │   └── internvl_chat_v1_5_hermes2_yi34b_dynamic_res_pretrain.sh
│   │   │   ├── internlm2_1_8b/
│   │   │   │   ├── internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune.sh
│   │   │   │   └── internvl_chat_v1_5_internlm2_1_8b_dynamic_res_pretrain.sh
│   │   │   ├── internlm2_20b/
│   │   │   │   ├── internvl_chat_v1_5_internlm2_20b_dynamic_res_finetune.sh
│   │   │   │   └── internvl_chat_v1_5_internlm2_20b_dynamic_res_pretrain.sh
│   │   │   └── phi3_3_8b/
│   │   │       ├── internvl_chat_v1_5_phi3_3_8b_dynamic_res_finetune.sh
│   │   │       └── internvl_chat_v1_5_phi3_3_8b_dynamic_res_pretrain.sh
│   │   ├── internvl2.0/
│   │   │   └── 2nd_finetune/
│   │   │       ├── internvl2_1b_qwen2_0_5b_dynamic_res_2nd_finetune_full.sh
│   │   │       ├── internvl2_1b_qwen2_0_5b_dynamic_res_2nd_finetune_lora.sh
│   │   │       ├── internvl2_26b_internlm2_20b_dynamic_res_2nd_finetune_full.sh
│   │   │       ├── internvl2_26b_internlm2_20b_dynamic_res_2nd_finetune_lora.sh
│   │   │       ├── internvl2_2b_internlm2_1_8b_dynamic_res_2nd_finetune_full.sh
│   │   │       ├── internvl2_2b_internlm2_1_8b_dynamic_res_2nd_finetune_lora.sh
│   │   │       ├── internvl2_2b_internlm2_1_8b_dynamic_res_2nd_finetune_lora_coco.sh
│   │   │       ├── internvl2_40b_hermes2_yi_34b_dynamic_res_2nd_finetune_full.sh
│   │   │       ├── internvl2_40b_hermes2_yi_34b_dynamic_res_2nd_finetune_lora.sh
│   │   │       ├── internvl2_4b_phi3_3_8b_dynamic_res_2nd_finetune_full.sh
│   │   │       ├── internvl2_4b_phi3_3_8b_dynamic_res_2nd_finetune_lora.sh
│   │   │       ├── internvl2_76b_hermes2_llama3_70b_dynamic_res_2nd_finetune_full.sh
│   │   │       ├── internvl2_76b_hermes2_llama3_70b_dynamic_res_2nd_finetune_lora.sh
│   │   │       ├── internvl2_8b_internlm2_7b_dynamic_res_2nd_finetune_full.sh
│   │   │       └── internvl2_8b_internlm2_7b_dynamic_res_2nd_finetune_lora.sh
│   │   ├── internvl2.0_mpo/
│   │   │   ├── README.md
│   │   │   └── preference_optimization/
│   │   │       └── internvl2_8b_internlm2_7b_dynamic_res_mpo_full.sh
│   │   ├── internvl2.5/
│   │   │   ├── 2nd_finetune/
│   │   │   │   ├── internvl2_5_1b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   ├── internvl2_5_1b_dynamic_res_2nd_finetune_lora.sh
│   │   │   │   ├── internvl2_5_26b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   ├── internvl2_5_26b_dynamic_res_2nd_finetune_lora.sh
│   │   │   │   ├── internvl2_5_2b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   ├── internvl2_5_2b_dynamic_res_2nd_finetune_lora.sh
│   │   │   │   ├── internvl2_5_2b_dynamic_res_2nd_finetune_lora_coco.sh
│   │   │   │   ├── internvl2_5_38b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   ├── internvl2_5_38b_dynamic_res_2nd_finetune_lora.sh
│   │   │   │   ├── internvl2_5_4b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   ├── internvl2_5_4b_dynamic_res_2nd_finetune_lora.sh
│   │   │   │   ├── internvl2_5_78b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   ├── internvl2_5_78b_dynamic_res_2nd_finetune_lora.sh
│   │   │   │   ├── internvl2_5_8b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   └── internvl2_5_8b_dynamic_res_2nd_finetune_lora.sh
│   │   │   ├── stage1/
│   │   │   │   ├── internvl2_5_1b_qwen2_5_0_5b_dynamic_res_stage1.sh
│   │   │   │   ├── internvl2_5_26b_internlm2_5_20b_dynamic_res_stage1.sh
│   │   │   │   ├── internvl2_5_2b_internlm2_5_1_8b_dynamic_res_stage1.sh
│   │   │   │   ├── internvl2_5_38b_qwen2_5_32b_dynamic_res_stage1.sh
│   │   │   │   ├── internvl2_5_4b_qwen2_5_3b_dynamic_res_stage1.sh
│   │   │   │   ├── internvl2_5_78b_qwen2_5_72b_dynamic_res_stage1.sh
│   │   │   │   └── internvl2_5_8b_internlm2_5_7b_dynamic_res_stage1.sh
│   │   │   ├── stage1.5/
│   │   │   │   ├── internvl2_5_26b_internlm2_5_20b_dynamic_res_stage1_5.sh
│   │   │   │   └── internvl2_5_8b_internlm2_5_7b_dynamic_res_stage1_5.sh
│   │   │   └── stage2/
│   │   │       ├── internvl2_5_1b_qwen2_5_0_5b_dynamic_res_stage2.sh
│   │   │       ├── internvl2_5_26b_internlm2_5_20b_dynamic_res_stage2.sh
│   │   │       ├── internvl2_5_2b_internlm2_5_1_8b_dynamic_res_stage2.sh
│   │   │       ├── internvl2_5_38b_qwen2_5_32b_dynamic_res_stage2.sh
│   │   │       ├── internvl2_5_4b_qwen2_5_3b_dynamic_res_stage2.sh
│   │   │       ├── internvl2_5_78b_qwen2_5_72b_dynamic_res_stage2.sh
│   │   │       └── internvl2_5_8b_internlm2_5_7b_dynamic_res_stage2.sh
│   │   ├── internvl2.5_mpo/
│   │   │   └── preference_optimization/
│   │   │       ├── internvl2_5_1b_qwen2_5_0_5b_dynamic_res_mpo.sh
│   │   │       ├── internvl2_5_26b_internlm2_5_20b_dynamic_res_mpo.sh
│   │   │       ├── internvl2_5_2b_internlm2_5_1_8b_dynamic_res_mpo.sh
│   │   │       ├── internvl2_5_38b_qwen2_5_32b_dynamic_res_mpo.sh
│   │   │       ├── internvl2_5_4b_qwen2_5_3b_dynamic_res_mpo.sh
│   │   │       ├── internvl2_5_78b_qwen2_5_72b_dynamic_res_mpo.sh
│   │   │       └── internvl2_5_8b_internlm2_5_7b_dynamic_res_mpo.sh
│   │   ├── internvl3.0/
│   │   │   ├── 2nd_finetune/
│   │   │   │   ├── internvl3_14b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   ├── internvl3_1b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   ├── internvl3_2b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   ├── internvl3_38b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   ├── internvl3_78b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   ├── internvl3_8b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   └── internvl3_9b_dynamic_res_2nd_finetune_full.sh
│   │   │   ├── mpo/
│   │   │   │   ├── internvl3_14b_mpo.sh
│   │   │   │   ├── internvl3_1b_mpo.sh
│   │   │   │   ├── internvl3_2b_mpo.sh
│   │   │   │   ├── internvl3_38b_mpo.sh
│   │   │   │   ├── internvl3_78b_mpo.sh
│   │   │   │   ├── internvl3_8b_mpo.sh
│   │   │   │   └── internvl3_9b_mpo.sh
│   │   │   ├── mpo_data_construction/
│   │   │   │   ├── correctness_build_data.sh
│   │   │   │   └── correctness_mmpr_8b.sh
│   │   │   └── visualprm_data_construction/
│   │   │       ├── visualprm_build_data.sh
│   │   │       └── visualprm_mmpr_8b.sh
│   │   └── mini_internvl/
│   │       ├── README.md
│   │       └── domain_adaptation/
│   │           ├── internvl2_1b_qwen2_0_5b_dynamic_res_finetune_bdd.sh
│   │           ├── internvl2_1b_qwen2_0_5b_dynamic_res_finetune_drivelm.sh
│   │           ├── internvl2_1b_qwen2_0_5b_dynamic_res_finetune_medical.sh
│   │           ├── internvl2_1b_qwen2_0_5b_dynamic_res_finetune_remote.sh
│   │           ├── internvl2_2b_internlm2_1_8b_dynamic_res_finetune_bdd.sh
│   │           ├── internvl2_2b_internlm2_1_8b_dynamic_res_finetune_drivelm.sh
│   │           ├── internvl2_2b_internlm2_1_8b_dynamic_res_finetune_medical.sh
│   │           ├── internvl2_2b_internlm2_1_8b_dynamic_res_finetune_remote.sh
│   │           ├── internvl2_4b_phi3_3_8b_dynamic_res_finetune_bdd.sh
│   │           ├── internvl2_4b_phi3_3_8b_dynamic_res_finetune_drivelm.sh
│   │           ├── internvl2_4b_phi3_3_8b_dynamic_res_finetune_medical.sh
│   │           └── internvl2_4b_phi3_3_8b_dynamic_res_finetune_remote.sh
│   ├── tools/
│   │   ├── README.md
│   │   ├── convert_to_int8.py
│   │   ├── extract_mlp.py
│   │   ├── extract_video_frames.py
│   │   ├── extract_vit.py
│   │   ├── images_stitching.py
│   │   ├── internvl_custom2hf.py
│   │   ├── internvl_hf2custom.py
│   │   ├── json2jsonl.py
│   │   ├── jsonl2jsonl.py
│   │   ├── merge_lora.py
│   │   ├── reasoning_data_pipeline/
│   │   │   ├── mmpr_data_pipeline_correctness.py
│   │   │   ├── mmpr_data_pipeline_correctness_postprocess.py
│   │   │   ├── mmpr_data_pipeline_dropout_ntp.py
│   │   │   ├── utils/
│   │   │   │   ├── accuracy_reward.py
│   │   │   │   ├── constants.py
│   │   │   │   └── utils.py
│   │   │   ├── visualprm_data_pieline.py
│   │   │   └── visualprm_data_pipeline_postprocess.py
│   │   ├── replace_llm.py
│   │   └── resize_pos_embed.py
│   ├── zero_stage1_config.json
│   ├── zero_stage2_config.json
│   ├── zero_stage3_config.json
│   ├── zero_stage3_config_100b.json
│   ├── zero_stage3_config_100b_1e7_offload.json
│   ├── zero_stage3_config_100b_1e8.json
│   ├── zero_stage3_config_34b.json
│   └── zero_stage3_config_70b.json
├── internvl_chat_gpt_oss/
│   ├── README.md
│   ├── internvl/
│   │   ├── dist_utils.py
│   │   ├── model/
│   │   │   └── internvl_chat/
│   │   │       ├── __init__.py
│   │   │       ├── configuration_intern_vit.py
│   │   │       ├── configuration_internvl_chat.py
│   │   │       ├── conversation.py
│   │   │       ├── modeling_intern_vit.py
│   │   │       └── modeling_internvl_chat.py
│   │   ├── patch/
│   │   │   ├── __init__.py
│   │   │   ├── flash_sink_attn/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── flash_attn_with_sink.py
│   │   │   │   ├── flash_sink_attn.py
│   │   │   │   ├── flash_sink_attn_gpt_oss.py
│   │   │   │   ├── flash_sink_varlen_attn_gpt_oss.py
│   │   │   │   └── sliding_cache.py
│   │   │   ├── flash_sink_attn_monkey_patch.py
│   │   │   ├── pad_data_collator.py
│   │   │   ├── qwen3_flash_monkey_patch.py
│   │   │   └── train_dataloader_patch.py
│   │   ├── train/
│   │   │   ├── constants.py
│   │   │   ├── dataset.py
│   │   │   ├── dataset_packed.py
│   │   │   ├── internvl_chat_finetune.py
│   │   │   ├── internvl_chat_mpo.py
│   │   │   └── trainer_dpo.py
│   │   └── utils/
│   │       ├── s3_config.py
│   │       ├── s3_exception.py
│   │       └── s3_fileio.py
│   ├── requirements.txt
│   ├── shell/
│   │   ├── data/
│   │   │   ├── debug_mpo.json
│   │   │   └── debug_sft.json
│   │   ├── internvl3_5_gpt_oss/
│   │   │   ├── internvl3_5_gpt_oss_20b_stage0_mlp_warmup.sh
│   │   │   ├── internvl3_5_gpt_oss_20b_stage1_pretrain.sh
│   │   │   ├── internvl3_5_gpt_oss_20b_stage2_sft.sh
│   │   │   └── internvl3_5_gpt_oss_20b_stage3_mpo.sh
│   │   └── internvl3_5_qwen3/
│   │       ├── internvl3_5_14b_mpo.sh
│   │       ├── internvl3_5_14b_sft.sh
│   │       ├── internvl3_5_1b_mpo.sh
│   │       ├── internvl3_5_1b_sft.sh
│   │       ├── internvl3_5_241b_mpo.sh
│   │       ├── internvl3_5_241b_sft.sh
│   │       ├── internvl3_5_2b_mpo.sh
│   │       ├── internvl3_5_2b_sft.sh
│   │       ├── internvl3_5_30b_mpo.sh
│   │       ├── internvl3_5_30b_sft.sh
│   │       ├── internvl3_5_38b_mpo.sh
│   │       ├── internvl3_5_38b_sft.sh
│   │       ├── internvl3_5_4b_mpo.sh
│   │       ├── internvl3_5_4b_sft.sh
│   │       ├── internvl3_5_8b_mpo.sh
│   │       └── internvl3_5_8b_sft.sh
│   ├── zero_stage1_config.json
│   └── zero_stage3_config.json
├── internvl_chat_llava/
│   ├── LICENSE
│   ├── README.md
│   ├── docs/
│   │   ├── Customize_Component.md
│   │   ├── Data.md
│   │   ├── Evaluation.md
│   │   ├── LLaVA_Bench.md
│   │   ├── LLaVA_from_LLaMA2.md
│   │   ├── LoRA.md
│   │   ├── MODEL_ZOO.md
│   │   └── ScienceQA.md
│   ├── llava/
│   │   ├── __init__.py
│   │   ├── constants.py
│   │   ├── conversation.py
│   │   ├── eval/
│   │   │   ├── eval_gpt_review.py
│   │   │   ├── eval_gpt_review_bench.py
│   │   │   ├── eval_gpt_review_visual.py
│   │   │   ├── eval_pope.py
│   │   │   ├── eval_science_qa.py
│   │   │   ├── eval_science_qa_gpt4.py
│   │   │   ├── eval_science_qa_gpt4_requery.py
│   │   │   ├── eval_textvqa.py
│   │   │   ├── generate_webpage_data_from_table.py
│   │   │   ├── m4c_evaluator.py
│   │   │   ├── model_qa.py
│   │   │   ├── model_vqa.py
│   │   │   ├── model_vqa_loader.py
│   │   │   ├── model_vqa_mmbench.py
│   │   │   ├── model_vqa_science.py
│   │   │   ├── qa_baseline_gpt35.py
│   │   │   ├── run_llava.py
│   │   │   ├── summarize_gpt_review.py
│   │   │   ├── table/
│   │   │   │   ├── answer/
│   │   │   │   │   ├── answer_alpaca-13b.jsonl
│   │   │   │   │   ├── answer_bard.jsonl
│   │   │   │   │   ├── answer_gpt35.jsonl
│   │   │   │   │   ├── answer_llama-13b.jsonl
│   │   │   │   │   └── answer_vicuna-13b.jsonl
│   │   │   │   ├── caps_boxes_coco2014_val_80.jsonl
│   │   │   │   ├── model.jsonl
│   │   │   │   ├── prompt.jsonl
│   │   │   │   ├── question.jsonl
│   │   │   │   ├── review/
│   │   │   │   │   ├── review_alpaca-13b_vicuna-13b.jsonl
│   │   │   │   │   ├── review_bard_vicuna-13b.jsonl
│   │   │   │   │   ├── review_gpt35_vicuna-13b.jsonl
│   │   │   │   │   └── review_llama-13b_vicuna-13b.jsonl
│   │   │   │   ├── reviewer.jsonl
│   │   │   │   └── rule.json
│   │   │   └── webpage/
│   │   │       ├── index.html
│   │   │       ├── script.js
│   │   │       └── styles.css
│   │   ├── mm_utils.py
│   │   ├── model/
│   │   │   ├── __init__.py
│   │   │   ├── apply_delta.py
│   │   │   ├── builder.py
│   │   │   ├── consolidate.py
│   │   │   ├── language_model/
│   │   │   │   ├── llava_llama.py
│   │   │   │   ├── llava_mpt.py
│   │   │   │   └── mpt/
│   │   │   │       ├── adapt_tokenizer.py
│   │   │   │       ├── attention.py
│   │   │   │       ├── blocks.py
│   │   │   │       ├── configuration_mpt.py
│   │   │   │       ├── custom_embedding.py
│   │   │   │       ├── flash_attn_triton.py
│   │   │   │       ├── hf_prefixlm_converter.py
│   │   │   │       ├── meta_init_context.py
│   │   │   │       ├── modeling_mpt.py
│   │   │   │       ├── norm.py
│   │   │   │       └── param_init_fns.py
│   │   │   ├── llava_arch.py
│   │   │   ├── make_delta.py
│   │   │   ├── multimodal_encoder/
│   │   │   │   ├── builder.py
│   │   │   │   ├── clip_encoder.py
│   │   │   │   ├── eva_clip/
│   │   │   │   │   ├── configuration_evaclip.py
│   │   │   │   │   └── modeling_evaclip.py
│   │   │   │   ├── intern_vit_6b/
│   │   │   │   │   ├── configuration_intern_vit.py
│   │   │   │   │   ├── flash_attention.py
│   │   │   │   │   └── modeling_intern_vit.py
│   │   │   │   └── internvl_14b/
│   │   │   │       ├── __init__.py
│   │   │   │       ├── configuration_intern_vit.py
│   │   │   │       ├── configuration_internvl.py
│   │   │   │       ├── flash_attention.py
│   │   │   │       ├── modeling_intern_vit.py
│   │   │   │       ├── modeling_internvl.py
│   │   │   │       └── modeling_qllama.py
│   │   │   ├── multimodal_projector/
│   │   │   │   └── builder.py
│   │   │   └── utils.py
│   │   ├── serve/
│   │   │   ├── __init__.py
│   │   │   ├── cli.py
│   │   │   ├── controller.py
│   │   │   ├── gradio_web_server.py
│   │   │   ├── model_worker.py
│   │   │   ├── register_worker.py
│   │   │   └── test_message.py
│   │   ├── train/
│   │   │   ├── dist_utils.py
│   │   │   ├── llama_flash_attn_monkey_patch.py
│   │   │   ├── llava_trainer.py
│   │   │   ├── train.py
│   │   │   ├── train_custom.py
│   │   │   ├── train_mem.py
│   │   │   └── train_mem_custom.py
│   │   └── utils.py
│   ├── pyproject.toml
│   ├── scripts/
│   │   ├── convert_gqa_for_eval.py
│   │   ├── convert_mmbench_for_submission.py
│   │   ├── convert_mmvet_for_eval.py
│   │   ├── convert_seed_for_submission.py
│   │   ├── convert_sqa_to_llava.py
│   │   ├── convert_sqa_to_llava_base_prompt.py
│   │   ├── convert_vizwiz_for_submission.py
│   │   ├── convert_vqav2_for_submission.py
│   │   ├── finetune.sh
│   │   ├── finetune_full_schedule.sh
│   │   ├── finetune_lora.sh
│   │   ├── finetune_qlora.sh
│   │   ├── finetune_sqa.sh
│   │   ├── merge_lora_weights.py
│   │   ├── pretrain.sh
│   │   ├── sqa_eval_batch.sh
│   │   ├── sqa_eval_gather.sh
│   │   ├── v1_5/
│   │   │   ├── eval/
│   │   │   │   ├── gqa.sh
│   │   │   │   ├── llavabench.sh
│   │   │   │   ├── mmbench.sh
│   │   │   │   ├── mmbench_cn.sh
│   │   │   │   ├── mme.sh
│   │   │   │   ├── mmvet.sh
│   │   │   │   ├── pope.sh
│   │   │   │   ├── seed.sh
│   │   │   │   ├── sqa.sh
│   │   │   │   ├── textvqa.sh
│   │   │   │   ├── vizwiz.sh
│   │   │   │   └── vqav2.sh
│   │   │   ├── finetune.sh
│   │   │   └── pretrain.sh
│   │   ├── zero1.json
│   │   ├── zero2.json
│   │   ├── zero3.json
│   │   └── zero3_offload.json
│   └── scripts_internvl/
│       ├── eval/
│       │   ├── gqa.sh
│       │   ├── llavabench.sh
│       │   ├── mmbench.sh
│       │   ├── mme.sh
│       │   ├── mmvet.sh
│       │   ├── pope.sh
│       │   ├── sqa.sh
│       │   ├── textvqa.sh
│       │   ├── vizwiz.sh
│       │   └── vqav2.sh
│       ├── finetune_internvit6b_224to336_vicuna13b.sh
│       ├── finetune_internvit6b_224to336_vicuna13b_custom_data.sh
│       ├── finetune_internvit6b_224to336_vicuna7b.sh
│       ├── finetune_internvit6b_448_v1_2_vicuna13b.sh
│       ├── finetune_internvit6b_448_v1_5_vicuna13b.sh
│       ├── finetune_internvit6b_448_vicuna13b.sh
│       ├── finetune_internvit6b_448_vicuna7b.sh
│       ├── meta/
│       │   └── custom_data.json
│       ├── pretrain_internvit6b_224to336_vicuna13b.sh
│       ├── pretrain_internvit6b_224to336_vicuna7b.sh
│       ├── pretrain_internvit6b_448_v1_2_vicuna13b.sh
│       ├── pretrain_internvit6b_448_v1_5_vicuna13b.sh
│       ├── pretrain_internvit6b_448_vicuna13b.sh
│       └── pretrain_internvit6b_448_vicuna7b.sh
├── internvl_g/
│   ├── README.md
│   ├── eval/
│   │   └── evaluate_caption.py
│   ├── evaluate.sh
│   ├── internvl/
│   │   ├── dist_utils.py
│   │   ├── model/
│   │   │   ├── __init__.py
│   │   │   ├── internvl_stage2/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── configuration_intern_vit.py
│   │   │   │   ├── configuration_internvl.py
│   │   │   │   ├── flash_attention.py
│   │   │   │   ├── modeling_intern_vit.py
│   │   │   │   ├── modeling_internvl.py
│   │   │   │   └── modeling_qllama.py
│   │   │   └── internvl_stage2_retrieval/
│   │   │       ├── __init__.py
│   │   │       ├── configuration_intern_vit.py
│   │   │       ├── configuration_internvl.py
│   │   │       ├── flash_attention.py
│   │   │       ├── modeling_intern_vit.py
│   │   │       ├── modeling_internvl.py
│   │   │       └── modeling_qllama.py
│   │   └── train/
│   │       ├── __init__.py
│   │       ├── dataset.py
│   │       ├── internvl_stage2_finetune.py
│   │       └── trainer_monkey_patch.py
│   ├── shell/
│   │   ├── finetune/
│   │   │   ├── internvl_stage2_finetune_coco_364_bs1024_ep5.sh
│   │   │   ├── internvl_stage2_finetune_flickr_364_bs1024_ep10.sh
│   │   │   └── internvl_stage2_finetune_flickrcn_364_bs1024_ep10.sh
│   │   ├── head_finetune/
│   │   │   ├── internvl_stage2_finetune_coco_224_bs1024_ep5_head_4gpu.sh
│   │   │   ├── internvl_stage2_finetune_flickr_224_bs1024_ep10_head_4gpu.sh
│   │   │   └── internvl_stage2_finetune_flickrcn_224_bs1024_ep10_head_4gpu.sh
│   │   └── lora_finetune/
│   │       ├── internvl_stage2_finetune_coco_224_bs1024_ep5_lora16_4gpu.sh
│   │       ├── internvl_stage2_finetune_flickr_224_bs1024_ep10_lora16_4gpu.sh
│   │       └── internvl_stage2_finetune_flickrcn_224_bs1024_ep10_lora16_4gpu.sh
│   ├── zero_stage1_config.json
│   ├── zero_stage2_config.json
│   └── zero_stage3_config.json
├── requirements/
│   ├── classification.txt
│   ├── clip_benchmark.txt
│   ├── internvl_chat.txt
│   ├── segmentation.txt
│   └── streamlit_demo.txt
├── requirements.txt
├── segmentation/
│   ├── README.md
│   ├── configs/
│   │   ├── _base_/
│   │   │   ├── datasets/
│   │   │   │   ├── ade20k.py
│   │   │   │   ├── ade20k_504x504.py
│   │   │   │   ├── ade20k_504x504_1of16.py
│   │   │   │   ├── ade20k_504x504_1of2.py
│   │   │   │   ├── ade20k_504x504_1of4.py
│   │   │   │   ├── ade20k_504x504_1of8.py
│   │   │   │   ├── ade20k_640x640.py
│   │   │   │   ├── ade20k_896x896.py
│   │   │   │   ├── chase_db1.py
│   │   │   │   ├── cityscapes.py
│   │   │   │   ├── cityscapes_1024x1024.py
│   │   │   │   ├── cityscapes_768x768.py
│   │   │   │   ├── cityscapes_769x769.py
│   │   │   │   ├── cityscapes_832x832.py
│   │   │   │   ├── coco-stuff10k.py
│   │   │   │   ├── coco-stuff164k.py
│   │   │   │   ├── coco-stuff164k_896x896.py
│   │   │   │   ├── drive.py
│   │   │   │   ├── hrf.py
│   │   │   │   ├── isaid.py
│   │   │   │   ├── loveda.py
│   │   │   │   ├── pascal_context.py
│   │   │   │   ├── pascal_context_59.py
│   │   │   │   ├── pascal_voc12.py
│   │   │   │   ├── pascal_voc12_aug.py
│   │   │   │   ├── potsdam.py
│   │   │   │   ├── stare.py
│   │   │   │   └── vaihingen.py
│   │   │   ├── default_runtime.py
│   │   │   ├── models/
│   │   │   │   ├── ann_r50-d8.py
│   │   │   │   ├── apcnet_r50-d8.py
│   │   │   │   ├── bisenetv1_r18-d32.py
│   │   │   │   ├── bisenetv2.py
│   │   │   │   ├── ccnet_r50-d8.py
│   │   │   │   ├── cgnet.py
│   │   │   │   ├── danet_r50-d8.py
│   │   │   │   ├── deeplabv3_r50-d8.py
│   │   │   │   ├── deeplabv3_unet_s5-d16.py
│   │   │   │   ├── deeplabv3plus_r50-d8.py
│   │   │   │   ├── dmnet_r50-d8.py
│   │   │   │   ├── dnl_r50-d8.py
│   │   │   │   ├── dpt_vit-b16.py
│   │   │   │   ├── emanet_r50-d8.py
│   │   │   │   ├── encnet_r50-d8.py
│   │   │   │   ├── erfnet_fcn.py
│   │   │   │   ├── fast_scnn.py
│   │   │   │   ├── fastfcn_r50-d32_jpu_psp.py
│   │   │   │   ├── fcn_hr18.py
│   │   │   │   ├── fcn_r50-d8.py
│   │   │   │   ├── fcn_unet_s5-d16.py
│   │   │   │   ├── fpn_r50.py
│   │   │   │   ├── gcnet_r50-d8.py
│   │   │   │   ├── icnet_r50-d8.py
│   │   │   │   ├── isanet_r50-d8.py
│   │   │   │   ├── lraspp_m-v3-d8.py
│   │   │   │   ├── mask2former_beit.py
│   │   │   │   ├── nonlocal_r50-d8.py
│   │   │   │   ├── ocrnet_hr18.py
│   │   │   │   ├── ocrnet_r50-d8.py
│   │   │   │   ├── pointrend_r50.py
│   │   │   │   ├── psanet_r50-d8.py
│   │   │   │   ├── pspnet_r50-d8.py
│   │   │   │   ├── pspnet_unet_s5-d16.py
│   │   │   │   ├── segformer_mit-b0.py
│   │   │   │   ├── segmenter_vit-b16_mask.py
│   │   │   │   ├── setr_mla.py
│   │   │   │   ├── setr_naive.py
│   │   │   │   ├── setr_pup.py
│   │   │   │   ├── stdc.py
│   │   │   │   ├── twins_pcpvt-s_fpn.py
│   │   │   │   ├── twins_pcpvt-s_upernet.py
│   │   │   │   ├── upernet_beit.py
│   │   │   │   ├── upernet_convnext.py
│   │   │   │   ├── upernet_mae.py
│   │   │   │   ├── upernet_r50.py
│   │   │   │   ├── upernet_swin.py
│   │   │   │   └── upernet_vit-b16_ln_mln.py
│   │   │   └── schedules/
│   │   │       ├── schedule_10k.py
│   │   │       ├── schedule_160k.py
│   │   │       ├── schedule_20k.py
│   │   │       ├── schedule_320k.py
│   │   │       ├── schedule_40k.py
│   │   │       ├── schedule_5k.py
│   │   │       └── schedule_80k.py
│   │   └── intern_vit_6b/
│   │       ├── few_shot/
│   │       │   ├── linear_intern_vit_6b_504_10k_ade20k_bs16_lr4e-5_1of8.py
│   │       │   ├── linear_intern_vit_6b_504_20k_ade20k_bs16_lr4e-5_1of4.py
│   │       │   ├── linear_intern_vit_6b_504_40k_ade20k_bs16_lr4e-5_1of2.py
│   │       │   ├── linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16.py
│   │       │   └── linear_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_1of1.py
│   │       ├── full_tuning/
│   │       │   └── upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5.py
│   │       ├── head_tuning/
│   │       │   └── upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.py
│   │       └── linear_probing/
│   │           └── linear_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.py
│   ├── dist_test.sh
│   ├── dist_train.sh
│   ├── mmcv_custom/
│   │   ├── __init__.py
│   │   ├── ddp_hooks.py
│   │   └── layer_decay_optimizer_constructor.py
│   ├── mmseg_custom/
│   │   ├── __init__.py
│   │   ├── datasets/
│   │   │   ├── __init__.py
│   │   │   ├── ade.py
│   │   │   └── pipelines/
│   │   │       ├── __init__.py
│   │   │       └── transform.py
│   │   └── models/
│   │       ├── __init__.py
│   │       ├── backbones/
│   │       │   ├── __init__.py
│   │       │   ├── flash_attention.py
│   │       │   └── intern_vit_6b.py
│   │       └── decode_heads/
│   │           ├── __init__.py
│   │           └── fcn_head.py
│   ├── release.py
│   ├── slurm_test.sh
│   ├── slurm_train.sh
│   ├── test.py
│   ├── train.py
│   └── zero_configs/
│       ├── adam_fp16.json
│       ├── adam_zero1_amp.json
│       ├── adam_zero1_bf16.json
│       ├── adam_zero1_fp16.json
│       ├── adam_zero2_bf16.json
│       ├── adam_zero2_fp16.json
│       └── adam_zero3_fp16.json
├── streamlit_demo/
│   ├── .streamlit/
│   │   └── config.toml
│   ├── api.py
│   ├── app.py
│   ├── constants.py
│   ├── controller.py
│   ├── library.py
│   ├── model_worker.py
│   ├── sd_worker.py
│   └── utils.py
└── video_retrieval/
    └── test_msrvtt.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .flake8
================================================
[flake8]
ignore = E501, F403, C901, W504, W605, E251, E122, E126, E127, E722, W503, E128, E741, E731, E701
select = E1, E3, E502, E7, E9, W1, W5, W6
max-line-length = 180
exclude=*.egg/*,build,dist,detection/configs/*


================================================
FILE: .github/CONTRIBUTING.md
================================================
## Contributing to InternLM

Welcome to the InternLM community, all kinds of contributions are welcomed, including but not limited to

**Fix bug**

You can directly post a Pull Request to fix typo in code or documents

The steps to fix the bug of code implementation are as follows.

1. If the modification involve significant changes, you should create an issue first and describe the error information and how to trigger the bug. Other developers will discuss with you and propose an proper solution.

2. Posting a pull request after fixing the bug and adding corresponding unit test.

**New Feature or Enhancement**

1. If the modification involve significant changes, you should create an issue to discuss with our developers to propose an proper design.
2. Post a Pull Request after implementing the new feature or enhancement and add corresponding unit test.

**Document**

You can directly post a pull request to fix documents. If you want to add a document, you should first create an issue to check if it is reasonable.

### Pull Request Workflow

If you're not familiar with Pull Request, don't worry! The following guidance will tell you how to create a Pull Request step by step. If you want to dive into the develop mode of Pull Request, you can refer to the [official documents](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests)

#### 1. Fork and clone

If you are posting a pull request for the first time, you should fork the OpenMMLab repositories by clicking the **Fork** button in the top right corner of the GitHub page, and the forked repositories will appear under your GitHub profile.

<img src="https://user-images.githubusercontent.com/57566630/167305749-43c7f4e9-449b-4e98-ade5-0c9276d5c9ce.png" width="1200">

Then, you can clone the repositories to local:

```shell
git clone git@github.com:{username}/lmdeploy.git
```

After that, you should add official repository as the upstream repository

```bash
git remote add upstream git@github.com:InternLM/lmdeploy.git
```

Check whether remote repository has been added successfully by `git remote -v`

```bash
origin	git@github.com:{username}/lmdeploy.git (fetch)
origin	git@github.com:{username}/lmdeploy.git (push)
upstream	git@github.com:InternLM/lmdeploy.git (fetch)
upstream	git@github.com:InternLM/lmdeploy.git (push)
```

> Here's a brief introduction to origin and upstream. When we use "git clone", we create an "origin" remote by default, which points to the repository cloned from. As for "upstream", we add it ourselves to point to the target repository. Of course, if you don't like the name "upstream", you could name it as you wish. Usually, we'll push the code to "origin". If the pushed code conflicts with the latest code in official("upstream"), we should pull the latest code from upstream to resolve the conflicts, and then push to "origin" again. The posted Pull Request will be updated automatically.

#### 2. Configure pre-commit

You should configure [pre-commit](https://pre-commit.com/#intro) in the local development environment to make sure the code style matches that of InternLM. **Note**: The following code should be executed under the lmdeploy directory.

```shell
pip install -U pre-commit
pre-commit install
```

Check that pre-commit is configured successfully, and install the hooks defined in `.pre-commit-config.yaml`.

```shell
pre-commit run --all-files
```

<img src="https://user-images.githubusercontent.com/57566630/173660750-3df20a63-cb66-4d33-a986-1f643f1d8aaf.png" width="1200">

<img src="https://user-images.githubusercontent.com/57566630/202368856-0465a90d-8fce-4345-918e-67b8b9c82614.png" width="1200">

If the installation process is interrupted, you can repeatedly run `pre-commit run ... ` to continue the installation.

If the code does not conform to the code style specification, pre-commit will raise a warning and  fixes some of the errors automatically.

<img src="https://user-images.githubusercontent.com/57566630/202369176-67642454-0025-4023-a095-263529107aa3.png" width="1200">

If we want to commit our code bypassing the pre-commit hook, we can use the `--no-verify` option(**only for temporarily commit**).

```shell
git commit -m "xxx" --no-verify
```

#### 3. Create a development branch

After configuring the pre-commit, we should create a branch based on the master branch to develop the new feature or fix the bug. The proposed branch name is `username/pr_name`

```shell
git checkout -b yhc/refactor_contributing_doc
```

In subsequent development, if the master branch of the local repository is behind the master branch of "upstream", we need to pull the upstream for synchronization, and then execute the above command:

```shell
git pull upstream master
```

#### 4. Commit the code and pass the unit test

- lmdeploy introduces mypy to do static type checking to increase the robustness of the code. Therefore, we need to add Type Hints to our code and pass the mypy check. If you are not familiar with Type Hints, you can refer to [this tutorial](https://docs.python.org/3/library/typing.html).

- The committed code should pass through the unit test

  ```shell
  # Pass all unit tests
  pytest tests

  # Pass the unit test of runner
  pytest tests/test_runner/test_runner.py
  ```

  If the unit test fails for lack of dependencies, you can install the dependencies referring to the [guidance](#unit-test)

- If the documents are modified/added, we should check the rendering result referring to [guidance](#document-rendering)

#### 5. Push the code to remote

We could push the local commits to remote after passing through the check of unit test and pre-commit. You can associate the local branch with remote branch by adding `-u` option.

```shell
git push -u origin {branch_name}
```

This will allow you to use the `git push` command to push code directly next time, without having to specify a branch or the remote repository.

#### 6. Create a Pull Request

(1) Create a pull request in GitHub's Pull request interface

<img src="https://user-images.githubusercontent.com/57566630/201533288-516f7ac4-0b14-4dc8-afbd-912475c368b5.png" width="1200">

(2) Modify the PR description according to the guidelines so that other developers can better understand your changes

<img src="https://user-images.githubusercontent.com/57566630/202242953-c91a18ff-e388-4ff9-8591-5fae0ead6c1e.png" width="1200">

Find more details about Pull Request description in [pull request guidelines](#pr-specs).

**note**

(a) The Pull Request description should contain the reason for the change, the content of the change, and the impact of the change, and be associated with the relevant Issue (see [documentation](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue))

(b) If it is your first contribution, please sign the CLA

<img src="https://user-images.githubusercontent.com/57566630/167307569-a794b967-6e28-4eac-a942-00deb657815f.png" width="1200">

(c) Check whether the Pull Request pass through the CI

<img src="https://user-images.githubusercontent.com/57566630/167307490-f9ebf9fa-63c0-4d83-8ba1-081ea169eb3a.png" width="1200">

IternLM will run unit test for the posted Pull Request on different platforms (Linux, Window, Mac), based on different versions of Python, PyTorch, CUDA to make sure the code is correct. We can see the specific test information by clicking `Details` in the above image so that we can modify the code.

(3) If the Pull Request passes the CI, then you can wait for the review from other developers. You'll modify the code based on the reviewer's comments, and repeat the steps [4](#4-commit-the-code-and-pass-the-unit-test)-[5](#5-push-the-code-to-remote) until all reviewers approve it. Then, we will merge it ASAP.

<img src="https://user-images.githubusercontent.com/57566630/202145400-cc2cd8c4-10b0-472f-ba37-07e6f50acc67.png" width="1200">

#### 7. Resolve conflicts

If your local branch conflicts with the latest master branch of "upstream", you'll need to resolove them. There are two ways to do this:

```shell
git fetch --all --prune
git rebase upstream/master
```

or

```shell
git fetch --all --prune
git merge upstream/master
```

If you are very good at handling conflicts, then you can use rebase to resolve conflicts, as this will keep your commit logs tidy. If you are not familiar with `rebase`, then you can use `merge` to resolve conflicts.

### Guidance

#### Document rendering

If the documents are modified/added, we should check the rendering result. We could install the dependencies and run the following command to render the documents and check the results:

```shell
pip install -r requirements/docs.txt
cd docs/zh_cn/
# or docs/en
make html
# check file in ./docs/zh_cn/_build/html/index.html
```

### Code style

#### Python

We adopt [PEP8](https://www.python.org/dev/peps/pep-0008/) as the preferred code style.

We use the following tools for linting and formatting:

- [flake8](https://github.com/PyCQA/flake8): A wrapper around some linter tools.
- [isort](https://github.com/timothycrosley/isort): A Python utility to sort imports.
- [yapf](https://github.com/google/yapf): A formatter for Python files.
- [codespell](https://github.com/codespell-project/codespell): A Python utility to fix common misspellings in text files.
- [mdformat](https://github.com/executablebooks/mdformat): Mdformat is an opinionated Markdown formatter that can be used to enforce a consistent style in Markdown files.
- [docformatter](https://github.com/myint/docformatter): A formatter to format docstring.

We use [pre-commit hook](https://pre-commit.com/) that checks and formats for `flake8`, `yapf`, `isort`, `trailing whitespaces`, `markdown files`,
fixes `end-of-files`, `double-quoted-strings`, `python-encoding-pragma`, `mixed-line-ending`, sorts `requirments.txt` automatically on every commit.
The config for a pre-commit hook is stored in [.pre-commit-config](../.pre-commit-config.yaml).

#### C++ and CUDA

The clang-format config is stored in [.clang-format](../.clang-format). And it's recommended to use clang-format version **11**. Please do not use older or newer versions as they will result in differences after formatting, which can cause the [lint](https://github.com/InternLM/lmdeploy/blob/main/.github/workflows/lint.yml#L25) to fail.

### PR Specs

1. Use [pre-commit](https://pre-commit.com) hook to avoid issues of code style

2. One short-time branch should be matched with only one PR

3. Accomplish a detailed change in one PR. Avoid large PR

   - Bad: Support Faster R-CNN
   - Acceptable: Add a box head to Faster R-CNN
   - Good: Add a parameter to box head to support custom conv-layer number

4. Provide clear and significant commit message

5. Provide clear and meaningful PR description

   - Task name should be clarified in title. The general format is: \[Prefix\] Short description of the PR (Suffix)
   - Prefix: add new feature \[Feature\], fix bug \[Fix\], related to documents \[Docs\], in developing \[WIP\] (which will not be reviewed temporarily)
   - Introduce main changes, results and influences on other modules in short description
   - Associate related issues and pull requests with a milestone


================================================
FILE: .github/ISSUE_TEMPLATE/1-bug-report.yml
================================================
name: 🐞 Bug report
description: Create a report to help us reproduce and fix the bug
title: "[Bug] "
labels: ['Bug']

body:
- type: checkboxes
  attributes:
    label: Checklist
    options:
    - label: 1. I have searched related issues but cannot get the expected help.
    - label: 2. The bug has not been fixed in the latest version.
    - label: 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- type: textarea
  attributes:
    label: Describe the bug
    description: A clear and concise description of what the bug is.
  validations:
    required: true
- type: textarea
  attributes:
    label: Reproduction
    description: |
      1. What command or script did you run?
    placeholder: |
      A placeholder for the command.
  validations:
    required: true
- type: textarea
  attributes:
    label: Environment
    description: |
      1. Please run `lmdeploy check_env` to collect necessary environment information and paste it here.
      2. You may add addition that may be helpful for locating the problem, such as
         - Which **model** are you using?
         - How you installed PyTorch \[e.g., pip, conda, source\]
         - Other environment variables that may be related (such as `$PATH`, `$LD_LIBRARY_PATH`, `$PYTHONPATH`, etc.)
    placeholder: Environment here.
    render: Shell
  validations:
    required: true
- type: textarea
  attributes:
    label: Error traceback
    description: |
      If applicable, paste the error trackback here.
    placeholder: Logs and traceback here.
    render: Shell
- type: markdown
  attributes:
    value: >
     If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

     Thanks for your bug report. We appreciate it a lot.


================================================
FILE: .github/ISSUE_TEMPLATE/2-feature-request.yml
================================================
name: 🚀 Feature request
description: Suggest an idea for this project
title: "[Feature] "

body:
- type: markdown
  attributes:
    value: |
      We strongly appreciate you creating a PR to implement this feature [here](https://github.com/OpenGVLab/InternVL/pulls)!
      If you need our help, please fill in as much of the following form as you're able to.

      **The less clear the description, the longer it will take to solve it.**
- type: textarea
  attributes:
    label: Motivation
    description: |
      A clear and concise description of the motivation of the feature.
      Ex1. It is inconvenient when \[....\].
  validations:
    required: true
- type: textarea
  attributes:
    label: Related resources
    description: |
      If there is an official code release or third-party implementations, please also provide the information here, which would be very helpful.
- type: textarea
  attributes:
    label: Additional context
    description: |
      Add any other context or screenshots about the feature request here.
      If you would like to implement the feature and create a PR, please leave a comment here and that would be much appreciated.


================================================
FILE: .github/ISSUE_TEMPLATE/3-documentation.yml
================================================
name: 📚 Documentation
description: Report an issue related to the documentation.
labels: "kind/doc,status/unconfirmed"
title: "[Docs] "

body:
- type: textarea
  attributes:
    label: 📚 The doc issue
    description: >
      A clear and concise description the issue.
  validations:
    required: true

- type: textarea
  attributes:
    label: Suggest a potential alternative/fix
    description: >
      Tell us how we could improve the documentation in this regard.
- type: markdown
  attributes:
    value: >
      Thanks for contributing 🎉!


================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# poetry
#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
#   This is especially recommended for binary packages to ensure reproducibility, and is more
#   commonly ignored for libraries.
#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
#   in version control.
#   https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

.idea/

.DS_Store
data_process/
internvl_chat/work_dirs/
internvl_chat/unittest/
internvl_chat/data/
Husky2/*
data_process/
*distillation*

batchscript-*
results/


================================================
FILE: .isort.cfg
================================================
[isort]
line-length = 180
multi_line_output = 0
extra_standard_library = setuptools
known_third_party = PIL,asynctest,cityscapesscripts,cv2,gather_models,matplotlib,mmcv,numpy,onnx,onnxruntime,pycocotools,pytest,pytorch_sphinx_theme,requests,scipy,seaborn,six,terminaltables,torch,ts,yaml
no_lines_before = STDLIB,LOCALFOLDER
default_section = THIRDPARTY

[yapf]
BASED_ON_STYLE = pep8
BLANK_LINE_BEFORE_NESTED_CLASS_OR_DEF = true
SPLIT_BEFORE_EXPRESSION_AFTER_OPENING_PAREN = true

[codespell]
skip = *.ipynb
quiet-level = 3
ignore-words-list = patten,nd,ty,mot,hist,formating,winn,gool,datas,wan,confids,TOOD,tood
© 2022 GitHub, Inc.
Terms
Privacy
Security
Status
Docs
Contact GitHub
Pricing
API


================================================
FILE: .pre-commit-config.yaml
================================================
exclude: ^internvl_chat_llava/
repos:
  - repo: https://github.com/PyCQA/flake8
    rev: 5.0.4
    hooks:
      - id: flake8
  - repo: https://github.com/PyCQA/isort
    rev: 5.11.5
    hooks:
      - id: isort
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.3.0
    hooks:
      - id: trailing-whitespace
      - id: check-yaml
      - id: end-of-file-fixer
      - id: requirements-txt-fixer
      - id: double-quote-string-fixer
      - id: check-merge-conflict
      - id: fix-encoding-pragma
        args: ["--remove"]
      - id: mixed-line-ending
        args: ["--fix=lf"]
  - repo: https://github.com/executablebooks/mdformat
    rev: 0.7.9
    hooks:
      - id: mdformat
        args: ["--number"]
        additional_dependencies:
          - mdformat-openmmlab
          - mdformat_frontmatter
          - linkify-it-py


================================================
FILE: INSTALLATION.md
================================================
## 🛠️ Installation

- Clone this repository:

  ```bash
  git clone https://github.com/OpenGVLab/InternVL.git
  ```

- Create a conda virtual environment and activate it:

  ```bash
  conda create -n internvl python=3.9 -y
  conda activate internvl
  ```

- Install dependencies using `requirements.txt`:

  ```bash
  pip install -r requirements.txt
  ```

  By default, our `requirements.txt` file includes the following dependencies:

  - `-r requirements/internvl_chat.txt`
  - `-r requirements/streamlit_demo.txt`
  - `-r requirements/classification.txt`
  - `-r requirements/segmentation.txt`

  The `clip_benchmark.txt` is **not** included in the default installation. If you require the `clip_benchmark` functionality, please install it manually by running the following command:

  ```bash
  pip install -r requirements/clip_benchmark.txt
  ```

### Additional Instructions

- Install `flash-attn==2.3.6`:

  ```bash
  pip install flash-attn==2.3.6 --no-build-isolation
  ```

  Alternatively you can compile from source:

  ```bash
  git clone https://github.com/Dao-AILab/flash-attention.git
  cd flash-attention
  git checkout v2.3.6
  python setup.py install
  ```

- Install `mmcv-full==1.6.2` (optional, for `segmentation`):

  ```bash
  pip install -U openmim
  mim install mmcv-full==1.6.2
  ```

- Install `apex` (optional, for `segmentation`):

  ```bash
  git clone https://github.com/NVIDIA/apex.git
  git checkout 2386a912164b0c5cfcd8be7a2b890fbac5607c82  # https://github.com/NVIDIA/apex/issues/1735
  pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
  ```

  If you encounter `ModuleNotFoundError: No module named 'fused_layer_norm_cuda'`, it is because apex's CUDA extensions are not being installed successfully. You can try uninstalling apex and the code will default to the PyTorch version of RMSNorm. Alternatively, if you prefer using apex, try adding a few lines to `setup.py` and then recompiling.

  <img src=https://github.com/OpenGVLab/InternVL/assets/23737120/c04a989c-8024-49fa-b62c-2da623e63729 width=50%>


================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2023 OpenGVLab

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
<div align="center">

# InternVL Family: Closing the Gap to Commercial Multimodal Models with Open-Source Suites —— A Pioneering Open-Source Alternative to GPT-5

<div align="center">
  <img width="500" alt="image" src="https://github.com/user-attachments/assets/930e6814-8a9f-43e1-a284-118a5732daa4">
  <br>
</div>

[\[🆕 Blog\]](https://internvl.github.io/blog/)
[\[🤔 FAQs\]](https://internvl.readthedocs.io/en/latest/tutorials/faqs.html)
[\[🗨️ Chat Demo\]](https://chat.intern-ai.org.cn/)
[\[📖 Document\]](https://internvl.readthedocs.io/en/latest/)
[\[🌐 API\]](https://internlm.intern-ai.org.cn/api/document)
[\[🚀 Quick Start\]](#quick-start-with-huggingface)

[\[🔥 InternVL3.5 Report\]](https://huggingface.co/papers/2508.18265)
[\[📜 InternVL3.0 Report\]](https://huggingface.co/papers/2504.10479)
[\[📜 InternVL2.5 MPO\]](https://huggingface.co/papers/2411.10442)
[\[📜 InternVL2.5 Report\]](https://huggingface.co/papers/2412.05271)

[\[📜 Mini-InternVL Paper\]](https://arxiv.org/abs/2410.16261)
[\[📜 InternVL2 Blog\]](https://internvl.github.io/blog/2024-07-02-InternVL-2.0/)
[\[📜 InternVL 1.5 Paper\]](https://huggingface.co/papers/2404.16821)
[\[📜 InternVL 1.0 Paper\]](https://huggingface.co/papers/2312.14238)

[\[📖 2.0 中文解读\]](https://zhuanlan.zhihu.com/p/706547971)
[\[📖 1.5 中文解读\]](https://zhuanlan.zhihu.com/p/699439759)
[\[📖 1.0 中文解读\]](https://zhuanlan.zhihu.com/p/702946079)

[Switch to the Chinese version (切换至中文版)](/README_zh.md)

<a href="https://trendshift.io/repositories/9803" target="_blank"><img src="https://trendshift.io/api/badge/repositories/9803" alt="OpenGVLab%2FInternVL | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
<img height="55" alt="image" src="https://github.com/user-attachments/assets/bd62ab46-f0ea-40c6-ab10-7fde671716cc">

![image/jpg](https://huggingface.co/OpenGVLab/InternVL3_5-241B-A28B/resolve/main/images/performance.jpg)

</div>

## News 🚀🚀🚀

- `2025/08/30`: 🔥 We open-source the training code of [InternVL3_5-GPT-OSS-20B-A4B](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat_gpt_oss) and CascadeRL, which consists of a [offline RL stage](https://github.com/OpenGVLab/InternVL/blob/main/internvl_chat_gpt_oss/shell/internvl3_5_gpt_oss/internvl3_5_gpt_oss_20b_stage3_mpo.sh) and a [online RL stage](https://github.com/Weiyun1025/verl-internvl). The training data for these two stages ([MMPR-v1.2](https://huggingface.co/datasets/OpenGVLab/MMPR-v1.2) and [MMPR-Tiny](https://huggingface.co/datasets/OpenGVLab/MMPR-Tiny)) are also open-sourced.
- `2025/08/26`: 🚀 We introduce [InternVL3.5](https://huggingface.co/papers/2508.18265),  a new family of open-source multimodal models that significantly advances versatility, reasoning capability, and inference efficiency along the InternVL series. Our largest model, i.e., [InternVL3.5-241B-A28B](https://huggingface.co/OpenGVLab/InternVL3_5-241B-A28B), attains state-of-the-art results among open-source MLLMs across general multimodal, reasoning, text, and agentic tasks. We also provide a 20B-A4B version (i.e., [InternVL3_5-GPT-OSS-20B-A4B](https://huggingface.co/OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview)), which is built up on GPT-OSS-20B-A4B. Notably, we provide two model formats: [the GitHub format](https://huggingface.co/OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview#github-format), consistent with prior releases, and [the HF format](https://huggingface.co/OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview#huggingface-format), aligned with the official `transformers` standard.
- `2025/04/17`: We open-source the [data construction pipeline](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat/tools/reasoning_data_pipeline) and [training scripts](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat/shell/internvl3.0/mpo) of [MPO](https://huggingface.co/papers/2411.10442) and [VisualPRM](https://huggingface.co/papers/2503.10291). Additionally, the data construction scripts for [MPO](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat/shell/internvl3.0/mpo_data_construction) and [VisualPRM](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat/shell/internvl3.0/visualprm_data_construction) are also released for reference.
- `2025/04/11`: We introduce [InternVL3](https://huggingface.co/collections/OpenGVLab/internvl3-67f7f690be79c2fe9d74fe9d), an advanced multimodal large language model (MLLM) series that demonstrates superior overall performance. InternVL3-78B achieves SoTA performance in both [perception](https://rank.opencompass.org.cn/leaderboard-multimodal/?m=REALTIME) and [reasoning performance](https://rank.opencompass.org.cn/leaderboard-multimodal-reasoning/?m=REALTIME) among open-source MLLMs. The key designs of InternVL3-78B include [Variable Visual Position Encoding](https://huggingface.co/papers/2412.09616), [Native Multimodal Pre-Training](https://huggingface.co/papers/2504.10479), [Mixed Preference Optimization](https://huggingface.co/papers/2411.10442), and [Multimodal Test-Time Scaling](https://huggingface.co/papers/2503.10291).
- `2025/03/13`: We introduce [VisualPRM](https://huggingface.co/OpenGVLab/VisualPRM-8B), an advanced multimodal Process Reward Model (PRM) with 8B parameters, which improves the overall reasoning performance of InternVL2.5-8B and InternVL2.5-78B by 8.4 and 5.9 points, respectively. The training data for this model, termed [VisualPRM400K](https://huggingface.co/datasets/OpenGVLab/VisualPRM400K), is also open-sourced. Please refer to our [paper](https://huggingface.co/papers/2503.10291) and [project page](https://internvl.github.io/blog/2025-03-13-VisualPRM/) for more details.
- `2024/12/20`: We release the [InternVL2.5-MPO](https://internvl.github.io/blog/2024-12-20-InternVL-2.5-MPO/), which is finetuned with [Mixed Preference Optimization](https://huggingface.co/papers/2411.10442) on [MMPR-v1.1](https://huggingface.co/datasets/OpenGVLab/MMPR-v1.1). **The resulting models outperform their counterparts without MPO by an average of 2 points across all model scales on the OpenCompass leaderboard.** These models are available at [HF link](https://huggingface.co/collections/OpenGVLab/internvl25-mpo-6753fed98cd828219b12f849).
- `2024/12/17`: [InternVL2/2.5](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/paddlemix/examples/internvl2) is supported in [PaddleMIX](https://github.com/PaddlePaddle/PaddleMIX) by Paddle Team.
- `2024/12/05`: We release the [InternVL2.5](https://huggingface.co/collections/OpenGVLab/internvl-25-673e1019b66e2218f68d7c1c), an advanced multimodal large language model (MLLM) series with parameter coverage ranging from 1B to 78B. [InternVL2_5-78B](https://huggingface.co/OpenGVLab/InternVL2_5-78B) is the first open-source MLLMs to achieve over **70%** on the **MMMU benchmark**, matching the performance of leading closed-source commercial models like GPT-4o. These models are available at [HF link](https://huggingface.co/collections/OpenGVLab/internvl-25-673e1019b66e2218f68d7c1c).
- `2024/11/14`: We introduce [MMPR](https://huggingface.co/datasets/OpenGVLab/MMPR), a high-quality, large-scale multimodal reasoning preference dataset, and [MPO](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat/shell/internvl2.0_mpo), an effective preference optimization algorithm. The resulting model, [InternVL2-8B-MPO](https://huggingface.co/OpenGVLab/InternVL2-8B-MPO), achieves an accuracy of 67.0 on MathVista. Please refer to our [paper](https://arxiv.org/abs/2411.10442), [project page](https://internvl.github.io/blog/2024-11-14-InternVL-2.0-MPO/) and [document](https://internvl.readthedocs.io/en/latest/internvl2.0/preference_optimization.html) for more details.

<details>
<summary>More News</summary>


- `2024/10/21`: We release the Mini-InternVL series. These models achieve impressive performance with minimal size: the 4B model achieves 90% of the performance with just 5% of the model size. For more details, please check our [project page](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat/shell/mini_internvl) and [document](https://internvl.readthedocs.io/en/latest/internvl2.0/domain_adaptation.html).
- `2024/08/01`: The [Chartmimic](https://chartmimic.github.io/) team evaluated the InternVL2 series models on their benchmark. The InternVL2-26B and 76B models achieved the top two performances among open-source models, with the InternVL2 76B model surpassing GeminiProVision and exhibiting comparable results to Claude-3-opus.
- `2024/08/01`: InternVL2-Pro achieved the SOTA performance among open-source models on the [CharXiv](https://charxiv.github.io/#leaderboard) dataset, surpassing many closed-source models such as GPT-4V, Gemini 1.5 Flash, and Claude 3 Sonnet.
- `2024/07/24`: The [MLVU](https://github.com/JUNJIE99/MLVU) team evaluated InternVL-1.5 on their benchmark. The average performance on the multiple-choice task was 50.4%, while the performance on the generative tasks was 4.02. The performance on the multiple-choice task ranked #1 among all open-source MLLMs.
- `2024/07/04`: We release the [InternVL2 series](https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e). InternVL2-Pro achieved a 62.0% accuracy on the MMMU benchmark, matching the performance of leading closed-source commercial models like GPT-4o.
- `2024/07/18`: InternVL2-40B achieved SOTA performance among open-source models on the [Video-MME](https://github.com/BradyFU/Video-MME) dataset, scoring 61.2 when inputting 16 frames and 64.4 when inputting 32 frames. It significantly outperforms other open-source models and is the closest open-source model to GPT-4o mini.
- `2024/07/18`: InternVL2-Pro achieved the SOTA performance on the [DocVQA](https://rrc.cvc.uab.es/?ch=17&com=evaluation&task=1) and [InfoVQA](https://rrc.cvc.uab.es/?ch=17&com=evaluation&task=3) benchmarks.
- `2024/06/19`: We propose Needle In A Multimodal Haystack ([MM-NIAH](https://github.com/OpenGVLab/MM-NIAH)), the first benchmark designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents.
- `2024/05/30`: We release [ShareGPT-4o](https://sharegpt4o.github.io/), a large-scale dataset that we plan to open-source with 200K images, 10K videos, and 10K audios with detailed descriptions.
- `2024/05/28`: Thanks to the [lmdeploy](https://github.com/InternLM/lmdeploy) team for providing AWQ quantization support. The 4-bit model is available at [OpenGVLab/InternVL-Chat-V1-5-AWQ](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5-AWQ).
- `2024/05/13`: InternVL 1.0 can now be used as the [text encoder](https://huggingface.co/OpenGVLab/InternVL-14B-224px) for diffusion models to support multilingual generation natively in over 110 languages worldwide. See [MuLan](https://github.com/mulanai/MuLan) for more details.
- `2024/04/18`: InternVL-Chat-V1-5 has been released at [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5), approaching the performance of GPT-4V and Gemini Pro on various benchmarks like MMMU, DocVQA, ChartQA, MathVista, etc.
- `2024/02/27`: InternVL is accepted by CVPR 2024 (Oral)! 🎉
- `2024/02/21`: [InternVL-Chat-V1-2-Plus](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus) achieved SOTA performance on MathVista (59.9), MMBench (83.8), and MMVP (58.7). See our [blog](https://internvl.github.io/blog/2024-02-21-InternVL-1.2/) for more details.
- `2024/02/12`: InternVL-Chat-V1-2 has been released. It achieves 51.6 on MMMU val and 82.3 on MMBench test. For more details, please refer to our [blog](https://internvl.github.io/blog/2024-02-21-InternVL-1.2/) and [SFT data](./internvl_chat#prepare-training-datasets). The model is now available on [HuggingFace](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2), and both training / evaluation data and scripts are open-sourced.
- `2024/01/24`: InternVL-Chat-V1-1 is released, it supports Chinese and has stronger OCR capability, see [here](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-1).
- `2024/01/16`: We release our [customized mmcv/mmsegmentation/mmdetection code](https://github.com/OpenGVLab/InternVL-MMDetSeg), integrated with DeepSpeed, which can be used for training large-scale detection and segmentation models.

</details>

## Documents

### 🌟 **Get Started**

- **Installation**: 🌱 [Installation Guide](https://internvl.readthedocs.io/en/latest/get_started/installation.html) | 📄 [requirements.txt](./requirements.txt)
- **Chat Data Format**: 📝 [Meta File](https://internvl.readthedocs.io/en/latest/get_started/chat_data_format.html#meta-file) | ✏️ [Text](https://internvl.readthedocs.io/en/latest/get_started/chat_data_format.html#pure-text-data) | 🖼️ [Single-Image](https://internvl.readthedocs.io/en/latest/get_started/chat_data_format.html#single-image-data) | 🖼️🖼️ [Multi-Image](https://internvl.readthedocs.io/en/latest/get_started/chat_data_format.html#multi-image-data) | 🎥 [Video](https://internvl.readthedocs.io/en/latest/get_started/chat_data_format.html#video-data)
- **Local Chat Demo**: 🤖 [Streamlit Demo](https://internvl.readthedocs.io/en/latest/get_started/local_chat_demo.html#streamlit-demo)
- **InternVL-Chat API**: 🌐 [InternVL2.5 API](https://internlm.intern-ai.org.cn/api/document)
- **Tutorials**: 🚀 [Enhancing InternVL2 on COCO Caption Using LoRA Fine-Tuning](https://internvl.readthedocs.io/en/latest/tutorials/coco_caption_finetune.html)

### 🏆 **InternVL Family**

- **InternVL 3.0**: 📖 [Intro](https://internvl.readthedocs.io/en/latest/internvl3.0/introduction.html) | ⚡ [Quick Start](https://internvl.readthedocs.io/en/latest/internvl3.0/quick_start.html) | ✨ [Finetune](https://internvl.readthedocs.io/en/latest/internvl3.0/finetune.html) | 📊 [Evaluate](https://internvl.readthedocs.io/en/latest/internvl3.0/evaluation.html) | 📦 [Deploy](https://internvl.readthedocs.io/en/latest/internvl3.0/deployment.html) | 🎯 [MPO](https://internvl.readthedocs.io/en/latest/internvl3.0/preference_optimization.html)
- **InternVL 2.5**: 📖 [Intro](https://internvl.readthedocs.io/en/latest/internvl2.5/introduction.html) | ⚡ [Quick Start](https://internvl.readthedocs.io/en/latest/internvl2.5/quick_start.html) | ✨ [Finetune](https://internvl.readthedocs.io/en/latest/internvl2.5/finetune.html) | 📊 [Evaluate](https://internvl.readthedocs.io/en/latest/internvl2.5/evaluation.html) | 📦 [Deploy](https://internvl.readthedocs.io/en/latest/internvl2.5/deployment.html) | 🎯 [MPO](https://internvl.readthedocs.io/en/latest/internvl2.5/preference_optimization.html)
- **InternVL 2.0**: 📖 [Intro](https://internvl.readthedocs.io/en/latest/internvl2.0/introduction.html) | ⚡ [Quick Start](https://internvl.readthedocs.io/en/latest/internvl2.0/quick_start.html) | ✨ [Finetune](https://internvl.readthedocs.io/en/latest/internvl2.0/finetune.html) | 📊 [Evaluate](https://internvl.readthedocs.io/en/latest/internvl2.0/evaluation.html) | 📦 [Deploy](https://internvl.readthedocs.io/en/latest/internvl2.0/deployment.html) | 🎯 [MPO](https://internvl.readthedocs.io/en/latest/internvl2.0/preference_optimization.html)
- **InternVL 1.5**: 📖 [Intro](https://internvl.readthedocs.io/en/latest/internvl1.5/introduction.html) | ⚡ [Quick Start](https://internvl.readthedocs.io/en/latest/internvl1.5/quick_start.html) | ✨ [Finetune](https://internvl.readthedocs.io/en/latest/internvl1.5/finetune.html) | 📊 [Evaluate](https://internvl.readthedocs.io/en/latest/internvl1.5/evaluation.html) | 📦 [Deploy](https://internvl.readthedocs.io/en/latest/internvl1.5/deployment.html)
- **InternVL 1.2**: 📖 [Intro](https://internvl.readthedocs.io/en/latest/internvl1.2/introduction.html) | ⚡ [Quick Start](https://internvl.readthedocs.io/en/latest/internvl1.2/quick_start.html) | ✨ [Finetune](https://internvl.readthedocs.io/en/latest/internvl1.2/finetune.html) | 📊 [Evaluate](https://internvl.readthedocs.io/en/latest/internvl1.2/evaluation.html)
- **InternVL 1.1**: 📖 [Intro](https://internvl.readthedocs.io/en/latest/internvl1.1/introduction.html) | ⚡ [Quick Start](https://internvl.readthedocs.io/en/latest/internvl1.1/quick_start.html) | 📊 [Evaluation](https://internvl.readthedocs.io/en/latest/internvl1.1/evaluation.html)
- **InternVL 1.0**: 🖼️ [Classification](https://internvl.readthedocs.io/en/latest/internvl1.0/classification.html) | 📊 [CLIP-Benchmark](https://internvl.readthedocs.io/en/latest/internvl1.0/clip_benchmark.html) | 🎨 [Segmentation](https://internvl.readthedocs.io/en/latest/internvl1.0/segmentation.html) | 💬 [Chat-LLaVA](https://internvl.readthedocs.io/en/latest/internvl1.0/internvl_chat_llava.html) | ✨ [InternVL-G](https://internvl.readthedocs.io/en/latest/internvl1.0/internvl_g.html)

## Model Zoo

#### Multimodal Large Language Model (InternVL 3.5)

To maintain consistency with earlier generations, we provide two model formats: [the GitHub format](https://huggingface.co/OpenGVLab/InternVL3_5-241B-A28B), consistent with prior releases, and [the HF format](https://huggingface.co/OpenGVLab/InternVL3_5-241B-A28B-HF), aligned with the official Transformers standard.

> If you want to convert the checkpoint between these two formats, please refer to the scripts about [custom2hf](https://github.com/OpenGVLab/InternVL/blob/main/internvl_chat/tools/internvl_custom2hf.py) and [hf2custom](https://github.com/OpenGVLab/InternVL/blob/main/internvl_chat/tools/internvl_hf2custom.py).

**Github Format**
| Model                 | #Vision Param | #Language Param | #Total Param | HF Link                                                                        | ModelScope Link                                                                          |
| --------------------- | ------------- | --------------- | ------------ | ------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------- |
| InternVL3.5-1B        | 0.3B          | 0.8B            | 1.1B         | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-1B)                      | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-1B)                      |
| InternVL3.5-2B        | 0.3B          | 2.0B            | 2.3B         | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-2B)                      | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-2B)                      |
| InternVL3.5-4B        | 0.3B          | 4.4B            | 4.7B         | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-4B)                      | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-4B)                      |
| InternVL3.5-8B        | 0.3B          | 8.2B            | 8.5B         | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-8B)                      | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-8B)                      |
| InternVL3.5-14B       | 0.3B          | 14.8B           | 15.1B        | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-14B)                     | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-14B)                     |
| InternVL3.5-38B       | 5.5B          | 32.8B           | 38.4B        | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-38B)                     | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-38B)                     |
| InternVL3.5-20B-A4B   | 0.3B          | 20.9B           | 21.2B-A4B    | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview) | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview) |
| InternVL3.5-30B-A3B   | 0.3B          | 30.5B           | 30.8B-A3B    | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-30B-A3B)                 | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-30B-A3B)                 |
| InternVL3.5-241B-A28B | 5.5B          | 235.1B          | 240.7B-A28B  | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-241B-A28B)               | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-241B-A28B)               |

**HuggingFace Format**

| Model                    | #Vision Param | #Language Param | #Total Param | HF Link                                                                           | ModelScope Link                                                                             |
| ------------------------ | ------------- | --------------- | ------------ | --------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------- |
| InternVL3.5-1B-HF        | 0.3B          | 0.8B            | 1.1B         | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-1B-HF)                      | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-1B-HF)                      |
| InternVL3.5-2B-HF        | 0.3B          | 2.0B            | 2.3B         | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-2B-HF)                      | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-2B-HF)                      |
| InternVL3.5-4B-HF        | 0.3B          | 4.4B            | 4.7B         | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-4B-HF)                      | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-4B-HF)                      |
| InternVL3.5-8B-HF        | 0.3B          | 8.2B            | 8.5B         | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-8B-HF)                      | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-8B-HF)                      |
| InternVL3.5-14B-HF       | 0.3B          | 14.8B           | 15.1B        | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-14B-HF)                     | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-14B-HF)                     |
| InternVL3.5-38B-HF       | 5.5B          | 32.8B           | 38.4B        | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-38B-HF)                     | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-38B-HF)                     |
| InternVL3.5-20B-A4B-HF   | 0.3B          | 20.9B           | 21.2B-A4B    | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview-HF) | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview-HF) |
| InternVL3.5-30B-A3B-HF   | 0.3B          | 30.5B           | 30.8B-A3B    | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-30B-A3B-HF)                 | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-30B-A3B-HF)                 |
| InternVL3.5-241B-A28B-HF | 5.5B          | 235.1B          | 240.7B-A28B  | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-241B-A28B-HF)               | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-241B-A28B-HF)               |


#### Multimodal Large Language Model (InternVL 3.0)
<table>
  <tr>
    <th>Model Name</th>
    <th>Vision Part</th>
    <th>Language Part</th>
    <th>HF&nbsp;Link</th>
    <th>MS&nbsp;Link</th>
  </tr>
  <tr>
    <td>InternVL3-1B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px-V2_5">InternViT&#8209;300M&#8209;448px&#8209;V2_5</a></td>
    <td><a href="https://huggingface.co/Qwen/Qwen2.5-0.5B">Qwen2.5&#8209;0.5B</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL3-1B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL3-1B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL3-2B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px-V2_5">InternViT-300M-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/Qwen/Qwen2.5-1.5B">Qwen2.5-1.5B</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL3-2B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL3-2B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL3-8B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px-V2_5">InternViT-300M-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/Qwen/Qwen2.5-7B">Qwen2.5-7B</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL3-8B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL3-8B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL3-9B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px-V2_5">InternViT-300M-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/internlm/internlm3-8b-instruct">internlm3-8b-instruct</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL3-9B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL3-9B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL3-14B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px-V2_5">InternViT-300M-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/Qwen/Qwen2.5-14B">Qwen2.5-14B</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL3-14B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL3-14B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL3-38B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V2_5">InternViT-6B-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/Qwen/Qwen2.5-32B">Qwen2.5-32B</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL3-38B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL3-38B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL3-78B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V2_5">InternViT-6B-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/Qwen/Qwen2.5-72B">Qwen2.5-72B</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL3-78B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL3-78B">🤖 link</a></td>
  </tr>
</table>

#### Multimodal Large Language Model (InternVL 2.5)

<table>
  <tr>
    <th>Model Name</th>
    <th>Vision Part</th>
    <th>Language Part</th>
    <th>HF&nbsp;Link</th>
    <th>MS&nbsp;Link</th>
  </tr>
  <tr>
    <td>InternVL2_5-1B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px-V2_5">InternViT&#8209;300M&#8209;448px&#8209;V2_5</a></td>
    <td><a href="https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct">Qwen2.5&#8209;0.5B&#8209;Instruct</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-1B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-1B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2_5-2B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px-V2_5">InternViT-300M-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/internlm/internlm2_5-1_8b-chat">internlm2_5-1_8b-chat</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-2B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-2B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2_5-4B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px-V2_5">InternViT-300M-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/Qwen/Qwen2.5-3B-Instruct">Qwen2.5-3B-Instruct</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-4B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-4B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2_5-8B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px-V2_5">InternViT-300M-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/internlm/internlm2_5-7b-chat">internlm2_5-7b-chat</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-8B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-8B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2_5-26B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V2_5">InternViT-6B-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/internlm/internlm2_5-20b-chat">internlm2_5-20b-chat</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-26B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-26B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2_5-38B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V2_5">InternViT-6B-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/Qwen/Qwen2.5-32B-Instruct">Qwen2.5-32B-Instruct</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-38B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-38B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2_5-78B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V2_5">InternViT-6B-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/Qwen/Qwen2.5-72B-Instruct">Qwen2.5-72B-Instruct</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-78B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-78B">🤖 link</a></td>
  </tr>
</table>

<table>
  <tr>
    <th>Model Name</th>
    <th>Vision Part</th>
    <th>Language Part</th>
    <th>HF&nbsp;Link</th>
    <th>MS&nbsp;Link</th>
  </tr>
  <tr>
    <td>InternVL2_5-1B-MPO</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px-V2_5">InternViT&#8209;300M&#8209;448px&#8209;V2_5</a></td>
    <td><a href="https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct">Qwen2.5&#8209;0.5B&#8209;Instruct</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-1B-MPO">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-1B-MPO">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2_5-2B-MPO</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px-V2_5">InternViT-300M-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/internlm/internlm2_5-1_8b-chat">internlm2_5-1_8b-chat</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-2B-MPO">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-2B-MPO">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2_5-4B-MPO</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px-V2_5">InternViT-300M-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/Qwen/Qwen2.5-3B-Instruct">Qwen2.5-3B-Instruct</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-4B-MPO">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-4B-MPO">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2_5-8B-MPO</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px-V2_5">InternViT-300M-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/internlm/internlm2_5-7b-chat">internlm2_5-7b-chat</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-8B-MPO">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-8B-MPO">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2_5-26B-MPO</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V2_5">InternViT-6B-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/internlm/internlm2_5-20b-chat">internlm2_5-20b-chat</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-26B-MPO">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-26B-MPO">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2_5-38B-MPO</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V2_5">InternViT-6B-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/Qwen/Qwen2.5-32B-Instruct">Qwen2.5-32B-Instruct</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-38B-MPO">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-38B-MPO">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2_5-78B-MPO</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V2_5">InternViT-6B-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/Qwen/Qwen2.5-72B-Instruct">Qwen2.5-72B-Instruct</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-78B-MPO">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-78B-MPO">🤖 link</a></td>
  </tr>
</table>

#### Multimodal Large Language Model (InternVL 2.0)

<table>
  <tr>
    <th>Model Name</th>
    <th>Vision Part</th>
    <th>Language Part</th>
    <th>HF&nbsp;Link</th>
    <th>MS&nbsp;Link</th>
  </tr>
  <tr>
    <td>InternVL2-1B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px">InternViT-300M-448px</a></td>
    <td><a href="https://huggingface.co/Qwen/Qwen2-0.5B-Instruct">Qwen2-0.5B-Instruct</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2-1B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2-1B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2-2B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px">InternViT-300M-448px</a></td>
    <td><a href="https://huggingface.co/internlm/internlm2-chat-1_8b">internlm2-chat-1-8b</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2-2B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2-2B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2-4B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px">InternViT-300M-448px</a></td>
    <td><a href="https://huggingface.co/microsoft/Phi-3-mini-128k-instruct">Phi&#8209;3&#8209;mini&#8209;128k&#8209;instruct</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2-4B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2-4B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2-8B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px">InternViT-300M-448px</a></td>
    <td><a href="https://huggingface.co/internlm/internlm2_5-7b-chat">internlm2_5-7b-chat</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2-8B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2-8B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2-26B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5">InternViT-6B-448px-V1-5</a></td>
    <td><a href="https://huggingface.co/internlm/internlm2-chat-20b">internlm2-chat-20b</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2-26B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2-26B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2-40B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5">InternViT&#8209;6B&#8209;448px&#8209;V1&#8209;5</a></td>
    <td><a href="https://huggingface.co/NousResearch/Nous-Hermes-2-Yi-34B">Nous&#8209;Hermes&#8209;2&#8209;Yi&#8209;34B</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2-40B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2-40B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2&#8209;Llama3-76B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5">InternViT-6B-448px-V1-5</a></td>
    <td><a href="https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-70B">Hermes‑2‑Theta‑<br>Llama‑3‑70B</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2-Llama3-76B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2-Llama3-76B">🤖 link</a></td>
  </tr>
</table>

#### Multimodal Large Language Model (InternVL 1.0-1.5)

<table>
  <tr>
    <th>Model</th>
    <th>Date</th>
    <th>HF&nbsp;Link</th>
    <th>MS&nbsp;Link</th>
    <th>Note</th>
  </tr>
  <tr>
    <td>Mini&#8209;InternVL&#8209;Chat&#8209;4B&#8209;V1&#8209;5</td>
    <td>2024.05.28</td>
    <td><a href="https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-4B-V1-5">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/Mini-InternVL-Chat-4B-V1-5">🤖 link</a></td>
    <td>🚀🚀 16% of the model size, 90% of the performance</td>
  </tr>
  <tr>
    <td>Mini-InternVL-Chat-2B-V1-5</td>
    <td>2024.05.19</td>
    <td><a href="https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-2B-V1-5">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/Mini-InternVL-Chat-2B-V1-5">🤖 link</a></td>
    <td>🚀 8% of the model size, 80% of the performance</td>
  </tr>
  <tr>
    <td>InternVL-Chat-V1-5</td>
    <td>2024.04.18</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL-Chat-V1-5">🤖 link</a></td>
    <td>support 4K image; super strong OCR; Approaching the performance of GPT-4V and Gemini Pro on various benchmarks like MMMU, DocVQA, ChartQA, MathVista, etc.</td>
  </tr>
  <tr>
    <td>InternVL-Chat-V1-2-Plus</td>
    <td>2024.02.21</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL-Chat-V1-2-Plus">🤖 link</a></td>
    <td>more SFT data and stronger</td>
  </tr>
  <tr>
    <td>InternVL-Chat-V1-2</td>
    <td>2024.02.11</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL-Chat-V1-2">🤖 link</a></td>
    <td>scaling up LLM to 34B</td>
  </tr>
  <tr>
    <td>InternVL-Chat-V1-1</td>
    <td>2024.01.24</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL-Chat-V1-1">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL-Chat-V1-1">🤖 link</a></td>
    <td>support Chinese and stronger OCR</td>
  </tr>
  <tr>
    <td>InternVL-Chat-19B</td>
    <td>2023.12.25</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL-Chat-ViT-6B-Vicuna-13B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL-Chat-ViT-6B-Vicuna-13B">🤖 link</a></td>
    <td>English multimodal dialogue</td>
  </tr>
  <tr>
    <td>InternVL-Chat-13B</td>
    <td>2023.12.25</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL-Chat-ViT-6B-Vicuna-7B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL-Chat-ViT-6B-Vicuna-7B">🤖 link</a></td>
    <td>English multimodal dialogue</td>
  </tr>
</table>

#### CLIP-like Model (InternVL 1.0-2.5)

<table>
  <tr>
    <th>Model</th>
    <th>Date</th>
    <th>HF&nbsp;Link</th>
    <th>MS&nbsp;Link</th>
    <th>Note</th>
  </tr>
  <tr>
    <td>InternViT-300M-448px-V2_5</td>
    <td>2024.12.05</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px-V2_5">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternViT-300M-448px-V2_5">🤖 link</a></td>
    <td>🚀🚀 A more powerful lightweight visual encoder. (🔥new)</td>
  </tr>
  <tr>
    <td>InternViT-6B-448px-V2_5</td>
    <td>2024.12.05</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V2_5">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternViT-6B-448px-V2_5">🤖 link</a></td>
    <td>🚀🚀 A stronger visual encoder to extract visual features. (🔥new)</td>
  </tr>
  <tr>
    <td>InternViT-300M-448px</td>
    <td>2024.05.25</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternViT-300M-448px">🤖 link</a></td>
    <td>distilled small vision foundation model with 300M parameters </td>
  </tr>
  <tr>
    <td>InternViT&#8209;6B&#8209;448px&#8209;V1&#8209;5</td>
    <td>2024.04.20</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternViT-6B-448px-V1-5">🤖 link</a></td>
    <td>support dynamic resolution and super strong OCR feature extraction capability by incremental pre-training</td>
  </tr>
  <tr>
    <td>InternViT-6B-448px-V1-2</td>
    <td>2024.02.11</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternViT-6B-448px-V1-2">🤖 link</a></td>
    <td>support 448 resolution by incremental pre-training</td>
  </tr>
  <tr>
    <td>InternViT-6B-448px-V1-0</td>
    <td>2024.01.30</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-0">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternViT-6B-448px-V1-0">🤖 link</a></td>
    <td>support 448 resolution by incremental pre-training</td>
  </tr>
  <tr>
    <td>InternViT-6B-224px</td>
    <td>2023.12.22</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-224px">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternViT-6B-224px">🤖 link</a></td>
    <td>the first version of InternViT-6B, extracted from InternVL‑14B‑224px</td>
  </tr>
</table>

#### Vision-Language Foundation Model (InternVL 1.0)

<table>
  <tr>
    <th>Model</th>
    <th>Date</th>
    <th>HF&nbsp;Link</th>
    <th>MS&nbsp;Link</th>
    <th>Note</th>
  </tr>
  <tr>
    <td>InternVL&#8209;14B&#8209;224px</td>
    <td>2023.12.22</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL-14B-224px">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL-14B-224px">🤖 link</a></td>
    <td>vision-language foundation model, InternViT-6B + QLLaMA, can be used for image-text retrieval like CLIP</td>
  </tr>
</table>

## TODO List

- [x] Release training / evaluation code for InternVL2.5 series
- [x] Support liger kernels to save GPU memory
- [x] Release the code, model, and data of MPO
- [x] Support multimodal packed dataset
- [ ] Support vLLM and Ollama
- [ ] Support video and PDF input in online demo
- [ ] Release InternVL2 with VisionLLMv2 integration
- [x] Rebuild documents using readthedocs
- [x] Support fine-tuning different LLMs with LoRA
- [x] Release `requirements.txt` for InternVL2
- [x] Release training / evaluation code for InternVL2 series
- [x] Release Streamlit web UI for InternVL1.5 and InternVL2

## What can InternVL do?

<details>
  <summary>Visual Perception (click to expand)</summary>

- Linear-Probe Image Classification [\[see details\]](./classification#-evaluation)

  ViT-22B uses the private JFT-3B dataset.

  | method              | #param | IN-1K | IN-ReaL | IN-V2 | IN-A  | IN-R  | IN-Sketch |
  | ------------------- | :----: | :---: | :-----: | :---: | :---: | :---: | :-------: |
  | OpenCLIP-G          |  1.8B  | 86.2  |  89.4   | 77.2  | 63.8  | 87.8  |   66.4    |
  | DINOv2-g            |  1.1B  | 86.5  |  89.6   | 78.4  | 75.9  | 78.8  |   62.5    |
  | EVA-01-CLIP-g       |  1.1B  | 86.5  |  89.3   | 77.4  | 70.5  | 87.7  |   63.1    |
  | MAWS-ViT-6.5B       |  6.5B  | 87.8  |    -    |   -   |   -   |   -   |     -     |
  | ViT-22B\*           | 21.7B  | 89.5  |  90.9   | 83.2  | 83.8  | 87.4  |     -     |
  | InternViT-6B (ours) |  5.9B  | 88.2  |  90.4   | 79.9  | 77.5  | 89.8  |   69.1    |

- Semantic Segmentation [\[see details\]](./segmentation#-evaluation)

  | method                | decoder | #param (train/total) | crop size | mIoU         |
  | --------------------- | :-----: | :------------------: | :-------: | ------------ |
  | OpenCLIP-G (frozen)   | Linear  |     0.3M / 1.8B      |    512    | 39.3         |
  | ViT-22B (frozen)      | Linear  |     0.9M / 21.7B     |    504    | 34.6         |
  | InternViT-6B (frozen) | Linear  |     0.5M / 5.9B      |    504    | 47.2 (+12.6) |
  | ViT-22B (frozen)      | UperNet |     0.8B / 22.5B     |    504    | 52.7         |
  | InternViT-6B (frozen) | UperNet |     0.4B / 6.3B      |    504    | 54.9 (+2.2)  |
  | ViT-22B               | UperNet |    22.5B / 22.5B     |    504    | 55.3         |
  | InternViT-6B          | UperNet |     6.3B / 6.3B      |    504    | 58.9 (+3.6)  |

- Zero-Shot Image Classification [\[see details\]](./clip_benchmark#imagenet-variants-and-objectnet)

  | method            | IN-1K | IN-A  | IN-R  | IN-V2 | IN-Sketch | ObjectNet |
  | ----------------- | :---: | :---: | :---: | :---: | :-------: | :-------: |
  | OpenCLIP-G        | 80.1  | 69.3  | 92.1  | 73.6  |   68.9    |   73.0    |
  | EVA-02-CLIP-E+    | 82.0  | 82.1  | 94.5  | 75.7  |   71.6    |   79.6    |
  | ViT-22B\*         | 85.9  | 90.1  | 96.0  | 80.9  |     -     |   87.6    |
  | InternVL-C (ours) | 83.2  | 83.8  | 95.5  | 77.3  |   73.9    |   80.6    |

- Multilingual Zero-Shot Image Classification [\[see details\]](./clip_benchmark#multilingual-imagenet-1k)

  EN: English, ZH: Chinese, JP: Japanese, Ar: Arabic, IT: Italian

  | method            | IN-1K (EN) | IN-1K (ZH) | IN-1K (JP) | IN-1K (AR) | IN-1K (IT) |
  | ----------------- | :--------: | :--------: | :--------: | :--------: | :--------: |
  | Taiyi-CLIP-ViT-H  |     -      |    54.4    |     -      |     -      |     -      |
  | WuKong-ViT-L-G    |     -      |    57.5    |     -      |     -      |     -      |
  | CN-CLIP-ViT-H     |     -      |    59.6    |     -      |     -      |     -      |
  | AltCLIP-ViT-L     |    74.5    |    59.6    |     -      |     -      |     -      |
  | EVA-02-CLIP-E+    |    82.0    |     -      |     -      |     -      |    41.2    |
  | OpenCLIP-XLM-R-H  |    77.0    |    55.7    |    53.1    |    37.0    |    56.8    |
  | InternVL-C (ours) |    83.2    |    64.5    |    61.5    |    44.9    |    65.7    |

- Zero-Shot Video Classification

  | method            | #frame | K400  | K600  | K700  |
  | ----------------- | :----: | :---: | :---: | :---: |
  | OpenCLIP-G        |   1    | 65.9  | 66.1  | 59.2  |
  | EVA-02-CLIP-E+    |   1    | 69.8  | 69.3  | 63.4  |
  | InternVL-C (ours) |   1    | 71.0  | 71.3  | 65.7  |
  | ViCLIP            |   8    | 75.7  | 73.5  | 66.4  |
  | InternVL-C (ours) |   8    | 79.4  | 78.8  | 71.5  |

</details>

<details>
  <summary>Cross-Modal Retrieval (click to expand)</summary>

- English Zero-Shot Image-Text Retrieval [\[see details\]](./clip_benchmark#flickr30k--coco)

  <table>
    <tr align=center>
        <td rowspan="3" align=left><b>model</b></td>
        <td colspan="6" align=center><b>Flickr30K</b></td>
        <td colspan="6" align=center><b>COCO</b></td>
        <td rowspan="3" align=center><b>avg</b></td>
    </tr>
     <tr align=center>
        <td colspan="3" align=center><b>image-to-text</b></td>
        <td colspan="3" align=center><b>text-to-image</b></td>
         <td colspan="3" align=center><b>image-to-text</b></td>
        <td colspan="3" align=center><b>text-to-image</b></td>
     </tr>
     <tr>
        <td>R@1</td>
        <td>R@5</td>
        <td>R@10</td>
        <td>R@1</td>
        <td>R@5</td>
        <td>R@10</td>
        <td>R@1</td>
        <td>R@5</td>
        <td>R@10</td>
        <td>R@1</td>
        <td>R@5</td>
        <td>R@10</td>
     </tr>
  <tr align=center>
        <td align=left>OpenCLIP-G</td>
        <td>92.9</td>
        <td>99.3</td>
        <td>99.8</td>
        <td>79.5</td>
        <td>95.0</td>
        <td>97.1</td>
        <td>67.3</td>
        <td>86.9</td>
        <td>92.6</td>
        <td>51.4</td>
        <td>74.9</td>
        <td>83.0</td>
        <td>85.0</td>
     </tr>
  <tr align=center>
        <td align=left>EVA-02-CLIP-E+</td>
        <td>93.9</td>
        <td>99.4</td>
        <td>99.8</td>
        <td>78.8</td>
        <td>94.2</td>
        <td>96.8</td>
        <td>68.8</td>
        <td>87.8</td>
        <td>92.8</td>
        <td>51.1</td>
        <td>75.0</td>
        <td>82.7</td>
        <td>85.1</td>
     </tr>
    <tr align=center>
        <td align=left>EVA-CLIP-8B</td>
        <td>95.6</td>
        <td>99.6</td>
        <td>99.9</td>
        <td>80.8</td>
        <td>95.5</td>
        <td>97.6</td>
        <td>70.3</td>
        <td>89.3</td>
        <td>93.9</td>
        <td>53.0</td>
        <td>76.0</td>
        <td>83.4</td>
        <td>86.2</td>
     </tr>
  <tr align=center>
        <td align=left>InternVL-C (ours)</td>
        <td>94.7</td>
        <td>99.6</td>
        <td>99.9</td>
        <td>81.7</td>
        <td>96.0</td>
        <td>98.2</td>
        <td>70.6</td>
        <td>89.0</td>
        <td>93.5</td>
        <td>54.1</td>
        <td>77.3</td>
        <td>84.6</td>
        <td>86.6</td>
     </tr>
  <tr align=center>
        <td align=left>InternVL-G (ours)</td>
        <td>95.7</td>
        <td>99.7</td>
        <td>99.9</td>
        <td>85.0</td>
        <td>97.0</td>
        <td>98.6</td>
        <td>74.9</td>
        <td>91.3</td>
        <td>95.2</td>
        <td>58.6</td>
        <td>81.3</td>
        <td>88.0</td>
        <td>88.8</td>
     </tr>

  </table>

- Chinese Zero-Shot Image-Text Retrieval [\[see details\]](./clip_benchmark#flickr30k-cn--coco-cn)

  <table>
    <tr  align=center>
        <td rowspan="3" align=left><b>model</b></td>
        <td colspan="6" align=center><b>Flickr30K-CN</b></td>
        <td colspan="6" align=center><b>COCO-CN</b></td>
        <td rowspan="3" align=center><b>avg</b></td>

  </tr>
     <tr  align=center>
        <td colspan="3" align=center><b>image-to-text</b></td>
        <td colspan="3" align=center><b>text-to-image</b></td>
         <td colspan="3" align=center><b>image-to-text</b></td>
        <td colspan="3" align=center><b>text-to-image</b></td>
     </tr>
     <tr>
        <td>R@1</td>
        <td>R@5</td>
        <td>R@10</td>
        <td>R@1</td>
        <td>R@5</td>
        <td>R@10</td>
        <td>R@1</td>
        <td>R@5</td>
        <td>R@10</td>
        <td>R@1</td>
        <td>R@5</td>
        <td>R@10</td>
     </tr>

  <tr align=center>
        <td align=left>CN-CLIP-ViT-H</td>
        <td>81.6</td>
        <td>97.5</td>
        <td>98.8</td>
        <td>71.2</td>
        <td>91.4</td>
        <td>95.5</td>
        <td>63.0</td>
        <td>86.6</td>
        <td>92.9</td>
        <td>69.2</td>
        <td>89.9</td>
        <td>96.1</td>
        <td>86.1</td>
     </tr>

  <tr align=center>
        <td align=left>OpenCLIP-XLM-R-H</td>
        <td>86.1</td>
        <td>97.5</td>
        <td>99.2</td>
        <td>71.0</td>
        <td>90.5</td>
        <td>94.9</td>
        <td>70.0</td>
        <td>91.5</td>
        <td>97.0</td>
        <td>66.1</td>
        <td>90.8</td>
        <td>96.0</td>
        <td>87.6</td>
     </tr>

  <tr align=center>
        <td align=left>InternVL-C (ours)</td>
        <td>90.3</td>
        <td>98.8</td>
        <td>99.7</td>
        <td>75.1</td>
        <td>92.9</td>
        <td>96.4</td>
        <td>68.8</td>
        <td>92.0</td>
        <td>96.7</td>
        <td>68.9</td>
        <td>91.9</td>
        <td>96.5</td>
        <td>89.0</td>
     </tr>
  <tr align=center>
        <td align=left>InternVL-G (ours)</td>
        <td>92.9</td>
        <td>99.4</td>
        <td>99.8</td>
        <td>77.7</td>
        <td>94.8</td>
        <td>97.3</td>
        <td>71.4</td>
        <td>93.9</td>
        <td>97.7</td>
        <td>73.8</td>
        <td>94.4</td>
        <td>98.1</td>
        <td>90.9</td>
     </tr>

  </table>

- Multilingual Zero-Shot Image-Text Retrieval on XTD [\[see details\]](./clip_benchmark#xtd)

  | method            |  EN   |  ES   |  FR   |  ZH   |  IT   |  KO   |  RU   |  JP   | average |
  | ----------------- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :-----: |
  | AltCLIP           | 95.4  | 94.1  | 92.9  | 95.1  | 94.2  | 94.4  | 91.8  | 91.7  |  93.7   |
  | OpenCLIP-XLM-R-H  | 97.3  | 96.1  | 94.5  | 94.7  | 96.0  | 90.2  | 93.9  | 94.0  |  94.6   |
  | InternVL-C (ours) | 97.3  | 95.7  | 95.1  | 95.6  | 96.0  | 92.2  | 93.3  | 95.5  |  95.1   |
  | InternVL-G (ours) | 98.6  | 97.7  | 96.5  | 96.7  | 96.9  | 95.1  | 94.8  | 96.1  |  96.6   |

</details>

<details>
  <summary>Multimodal Dialogue</summary>

</details>

## Quick Start with HuggingFace

<details>
  <summary>using InternViT-6B for visual feature extraction (click to expand)</summary>

```python
import torch
from PIL import Image
from transformers import AutoModel, CLIPImageProcessor

model = AutoModel.from_pretrained(
    'OpenGVLab/InternViT-6B-448px-V2_5',
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True).cuda().eval()

image = Image.open('./examples/image1.jpg').convert('RGB')

image_processor = CLIPImageProcessor.from_pretrained('OpenGVLab/InternViT-6B-448px-V1-5')

pixel_values = image_processor(images=image, return_tensors='pt').pixel_values
pixel_values = pixel_values.to(torch.bfloat16).cuda()

outputs = model(pixel_values)
```

</details>

<details>
  <summary>using InternVL-C(ontrastive) and InternVL-G(enerative) for cross-modal retrieval (click to expand)</summary>

```python
import torch
from PIL import Image
from transformers import AutoModel, CLIPImageProcessor
from transformers import AutoTokenizer


model = AutoModel.from_pretrained(
    'OpenGVLab/InternVL-14B-224px',
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True).cuda().eval()

image_processor = CLIPImageProcessor.from_pretrained('OpenGVLab/InternVL-14B-224px')

tokenizer = AutoTokenizer.from_pretrained(
    'OpenGVLab/InternVL-14B-224px', use_fast=False, add_eos_token=True)
tokenizer.pad_token_id = 0  # set pad_token_id to 0

images = [
    Image.open('./examples/image1.jpg').convert('RGB'),
    Image.open('./examples/image2.jpg').convert('RGB'),
    Image.open('./examples/image3.jpg').convert('RGB')
]
prefix = 'summarize:'
texts = [
    prefix + 'a photo of a red panda',  # English
    prefix + '一张熊猫的照片',  # Chinese
    prefix + '二匹の猫の写真'  # Japanese
]

pixel_values = image_processor(images=images, return_tensors='pt').pixel_values
pixel_values = pixel_values.to(torch.bfloat16).cuda()
input_ids = tokenizer(texts, return_tensors='pt', max_length=80,
                      truncation=True, padding='max_length').input_ids.cuda()

# InternVL-C
logits_per_image, logits_per_text = model(
    image=pixel_values, text=input_ids, mode='InternVL-C')
probs = logits_per_image.softmax(dim=-1)
# tensor([[9.9609e-01, 5.2185e-03, 6.0070e-08],
#         [2.2949e-02, 9.7656e-01, 5.9903e-06],
#         [3.2932e-06, 7.4863e-05, 1.0000e+00]], device='cuda:0',
#        dtype=torch.bfloat16, grad_fn=<SoftmaxBackward0>)

# InternVL-G
logits_per_image, logits_per_text = model(
    image=pixel_values, text=input_ids, mode='InternVL-G')
probs = logits_per_image.softmax(dim=-1)
# tensor([[9.9609e-01, 3.1738e-03, 3.6322e-08],
#         [8.6060e-03, 9.9219e-01, 2.8759e-06],
#         [1.7583e-06, 3.1233e-05, 1.0000e+00]], device='cuda:0',
#        dtype=torch.bfloat16, grad_fn=<SoftmaxBackward0>)

# please set add_eos_token to False for generation
tokenizer.add_eos_token = False
image = Image.open('./examples/image1.jpg').convert('RGB')
pixel_values = image_processor(images=image, return_tensors='pt').pixel_values
pixel_values = pixel_values.to(torch.bfloat16).cuda()

tokenized = tokenizer("English caption:", return_tensors='pt')
pred = model.generate(
    pixel_values=pixel_values,
    input_ids=tokenized.input_ids.cuda(),
    attention_mask=tokenized.attention_mask.cuda(),
    num_beams=5,
    min_new_tokens=8,
)
caption = tokenizer.decode(pred[0].cpu(), skip_special_tokens=True).strip()
# English caption: a red panda sitting on top of a wooden platform
```

</details>

<details>
  <summary>using InternVL 2.5 for multimodal chat (click to expand)</summary>

Here, we take the smaller `OpenGVLab/InternVL2_5-8B` as an example:

```python
import numpy as np
import torch
import torchvision.transforms as T
from decord import VideoReader, cpu
from PIL import Image
from torchvision.transforms.functional import InterpolationMode
from transformers import AutoModel, AutoTokenizer

IMAGENET_MEAN = (0.485, 0.456, 0.406)
IMAGENET_STD = (0.229, 0.224, 0.225)

def build_transform(input_size):
    MEAN, STD = IMAGENET_MEAN, IMAGENET_STD
    transform = T.Compose([
        T.Lambda(lambda img: img.convert('RGB') if img.mode != 'RGB' else img),
        T.Resize((input_size, input_size), interpolation=InterpolationMode.BICUBIC),
        T.ToTensor(),
        T.Normalize(mean=MEAN, std=STD)
    ])
    return transform

def find_closest_aspect_ratio(aspect_ratio, target_ratios, width, height, image_size):
    best_ratio_diff = float('inf')
    best_ratio = (1, 1)
    area = width * height
    for ratio in target_ratios:
        target_aspect_ratio = ratio[0] / ratio[1]
        ratio_diff = abs(aspect_ratio - target_aspect_ratio)
        if ratio_diff < best_ratio_diff:
            best_ratio_diff = ratio_diff
            best_ratio = ratio
        elif ratio_diff == best_ratio_diff:
            if area > 0.5 * image_size * image_size * ratio[0] * ratio[1]:
                best_ratio = ratio
    return best_ratio

def dynamic_preprocess(image, min_num=1, max_num=12, image_size=448, use_thumbnail=False):
    orig_width, orig_height = image.size
    aspect_ratio = orig_width / orig_height

    # calculate the existing image aspect ratio
    target_ratios = set(
        (i, j) for n in range(min_num, max_num + 1) for i in range(1, n + 1) for j in range(1, n + 1) if
        i * j <= max_num and i * j >= min_num)
    target_ratios = sorted(target_ratios, key=lambda x: x[0] * x[1])

    # find the closest aspect ratio to the target
    target_aspect_ratio = find_closest_aspect_ratio(
        aspect_ratio, target_ratios, orig_width, orig_height, image_size)

    # calculate the target width and height
    target_width = image_size * target_aspect_ratio[0]
    target_height = image_size * target_aspect_ratio[1]
    blocks = target_aspect_ratio[0] * target_aspect_ratio[1]

    # resize the image
    resized_img = image.resize((target_width, target_height))
    processed_images = []
    for i in range(blocks):
        box = (
            (i % (target_width // image_size)) * image_size,
            (i // (target_width // image_size)) * image_size,
            ((i % (target_width // image_size)) + 1) * image_size,
            ((i // (target_width // image_size)) + 1) * image_size
        )
        # split the image
        split_img = resized_img.crop(box)
        processed_images.append(split_img)
    assert len(processed_images) == blocks
    if use_thumbnail and len(processed_images) != 1:
        thumbnail_img = image.resize((image_size, image_size))
        processed_images.append(thumbnail_img)
    return processed_images

def load_image(image_file, input_size=448, max_num=12):
    image = Image.open(image_file).convert('RGB')
    transform = build_transform(input_size=input_size)
    images = dynamic_preprocess(image, image_size=input_size, use_thumbnail=True, max_num=max_num)
    pixel_values = [transform(image) for image in images]
    pixel_values = torch.stack(pixel_values)
    return pixel_values

# If you have an 80G A100 GPU, you can put the entire model on a single GPU.
# Otherwise, you need to load a model using multiple GPUs, please refer to the `Multiple GPUs` section.
path = 'OpenGVLab/InternVL2_5-8B'
model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True).eval().cuda()
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False)

# set the max number of tiles in `max_num`
pixel_values = load_image('./examples/image1.jpg', max_num=12).to(torch.bfloat16).cuda()
generation_config = dict(max_new_tokens=1024, do_sample=False)

# pure-text conversation (纯文本对话)
question = 'Hello, who are you?'
response, history = model.chat(tokenizer, None, question, generation_config, history=None, return_history=True)
print(f'User: {question}\nAssistant: {response}')

question = 'Can you tell me a story?'
response, history = model.chat(tokenizer, None, question, generation_config, history=history, return_history=True)
print(f'User: {question}\nAssistant: {response}')

# single-image single-round conversation (单图单轮对话)
question = '<image>\nPlease describe the image shortly.'
response = model.chat(tokenizer, pixel_values, question, generation_config)
print(f'User: {question}\nAssistant: {response}')

# single-image multi-round conversation (单图多轮对话)
question = '<image>\nPlease describe the image in detail.'
response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True)
print(f'User: {question}\nAssistant: {response}')

question = 'Please write a poem according to the image.'
response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=history, return_history=True)
print(f'User: {question}\nAssistant: {response}')

# multi-image multi-round conversation, combined images (多图多轮对话，拼接图像)
pixel_values1 = load_image('./examples/image1.jpg', max_num=12).to(torch.bfloat16).cuda()
pixel_values2 = load_image('./examples/image2.jpg', max_num=12).to(torch.bfloat16).cuda()
pixel_values = torch.cat((pixel_values1, pixel_values2), dim=0)

question = '<image>\nDescribe the two images in detail.'
response, history = model.chat(tokenizer, pixel_values, question, generation_config,
                               history=None, return_history=True)
print(f'User: {question}\nAssistant: {response}')

question = 'What are the similarities and differences between these two images.'
response, history = model.chat(tokenizer, pixel_values, question, generation_config,
                               history=history, return_history=True)
print(f'User: {question}\nAssistant: {response}')

# multi-image multi-round conversation, separate images (多图多轮对话，独立图像)
pixel_values1 = load_image('./examples/image1.jpg', max_num=12).to(torch.bfloat16).cuda()
pixel_values2 = load_image('./examples/image2.jpg', max_num=12).to(torch.bfloat16).cuda()
pixel_values = torch.cat((pixel_values1, pixel_values2), dim=0)
num_patches_list = [pixel_values1.size(0), pixel_values2.size(0)]

question = 'Image-1: <image>\nImage-2: <image>\nDescribe the two images in detail.'
response, history = model.chat(tokenizer, pixel_values, question, generation_config,
                               num_patches_list=num_patches_list,
                               history=None, return_history=True)
print(f'User: {question}\nAssistant: {response}')

question = 'What are the similarities and differences between these two images.'
response, history = model.chat(tokenizer, pixel_values, question, generation_config,
                               num_patches_list=num_patches_list,
                               history=history, return_history=True)
print(f'User: {question}\nAssistant: {response}')

# batch inference, single image per sample (单图批处理)
pixel_values1 = load_image('./examples/image1.jpg', max_num=12).to(torch.bfloat16).cuda()
pixel_values2 = load_image('./examples/image2.jpg', max_num=12).to(torch.bfloat16).cuda()
num_patches_list = [pixel_values1.size(0), pixel_values2.size(0)]
pixel_values = torch.cat((pixel_values1, pixel_values2), dim=0)

questions = ['<image>\nDescribe the image in detail.'] * len(num_patches_list)
responses = model.batch_chat(tokenizer, pixel_values,
                             num_patches_list=num_patches_list,
                             questions=questions,
                             generation_config=generation_config)
for question, response in zip(questions, responses):
    print(f'User: {question}\nAssistant: {response}')

# video multi-round conversation (视频多轮对话)
def get_index(bound, fps, max_frame, first_idx=0, num_segments=32):
    if bound:
        start, end = bound[0], bound[1]
    else:
        start, end = -100000, 100000
    start_idx = max(first_idx, round(start * fps))
    end_idx = min(round(end * fps), max_frame)
    seg_size = float(end_idx - start_idx) / num_segments
    frame_indices = np.array([
        int(start_idx + (seg_size / 2) + np.round(seg_size * idx))
        for idx in range(num_segments)
    ])
    return frame_indices

def load_video(video_path, bound=None, input_size=448, max_num=1, num_segments=32):
    vr = VideoReader(video_path, ctx=cpu(0), num_threads=1)
    max_frame = len(vr) - 1
    fps = float(vr.get_avg_fps())

    pixel_values_list, num_patches_list = [], []
    transform = build_transform(input_size=input_size)
    frame_indices = get_index(bound, fps, max_frame, first_idx=0, num_segments=num_segments)
    for frame_index in frame_indices:
        img = Image.fromarray(vr[frame_index].asnumpy()).convert('RGB')
        img = dynamic_preprocess(img, image_size=input_size, use_thumbnail=True, max_num=max_num)
        pixel_values = [transform(tile) for tile in img]
        pixel_values = torch.stack(pixel_values)
        num_patches_list.append(pixel_values.shape[0])
        pixel_values_list.append(pixel_values)
    pixel_values = torch.cat(pixel_values_list)
    return pixel_values, num_patches_list

video_path = './examples/red-panda.mp4'
pixel_values, num_patches_list = load_video(video_path, num_segments=8, max_num=1)
pixel_values = pixel_values.to(torch.bfloat16).cuda()
video_prefix = ''.join([f'Frame-{i+1}: <image>\n' for i in range(len(num_patches_list))])
question = video_prefix + 'What is the red panda doing?'
# Frame1: <image>\nFrame2: <image>\n...\nFrame8: <image>\n{question}
response, history = model.chat(tokenizer, pixel_values, question, generation_config,
                               num_patches_list=num_patches_list, history=None, return_history=True)
print(f'User: {question}\nAssistant: {response}')

question = 'Describe this video in detail.'
response, history = model.chat(tokenizer, pixel_values, question, generation_config,
                               num_patches_list=num_patches_list, history=history, return_history=True)
print(f'User: {question}\nAssistant: {response}')
```

</details>

## License

This project is released under the [MIT license](LICENSE). Parts of this project contain code and models from other sources, which are subject to their respective licenses.

## Citation

If you find this project useful in your research, please consider cite:

```BibTeX
@article{wang2025internvl3_5,
  title={InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency},
  author={Wang, Weiyun and Gao, Zhangwei and Gu, Lixin and Pu, Hengjun and Cui, Long and Wei, Xingguang and Liu, Zhaoyang and Jing, Linglin and Ye, Shenglong and Shao, Jie and others},
  journal={arXiv preprint arXiv:2508.18265},
  year={2025}
}
@article{zhu2025internvl3,
  title={Internvl3: Exploring advanced training and test-time recipes for open-source multimodal models},
  author={Zhu, Jinguo and Wang, Weiyun and Chen, Zhe and Liu, Zhaoyang and Ye, Shenglong and Gu, Lixin and Tian, Hao and Duan, Yuchen and Su, Weijie and Shao, Jie and others},
  journal={arXiv preprint arXiv:2504.10479},
  year={2025}
}
@article{chen2024expanding,
  title={Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling},
  author={Chen, Zhe and Wang, Weiyun and Cao, Yue and Liu, Yangzhou and Gao, Zhangwei and Cui, Erfei and Zhu, Jinguo and Ye, Shenglong and Tian, Hao and Liu, Zhaoyang and others},
  journal={arXiv preprint arXiv:2412.05271},
  year={2024}
}
@article{wang2024mpo,
  title={Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization},
  author={Wang, Weiyun and Chen, Zhe and Wang, Wenhai and Cao, Yue and Liu, Yangzhou and Gao, Zhangwei and Zhu, Jinguo and Zhu, Xizhou and Lu, Lewei and Qiao, Yu and Dai, Jifeng},
  journal={arXiv preprint arXiv:2411.10442},
  year={2024}
}
@article{gao2024mini,
  title={Mini-InternVL: a flexible-transfer pocket multi-modal model with 5\% parameters and 90\% performance},
  author={Gao, Zhangwei and Chen, Zhe and Cui, Erfei and Ren, Yiming and Wang, Weiyun and Zhu, Jinguo and Tian, Hao and Ye, Shenglong and He, Junjun and Zhu, Xizhou and others},
  journal={Visual Intelligence},
  volume={2},
  number={1},
  pages={1--17},
  year={2024},
  publisher={Springer}
}
@article{chen2024far,
  title={How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites},
  author={Chen, Zhe and Wang, Weiyun and Tian, Hao and Ye, Shenglong and Gao, Zhangwei and Cui, Erfei and Tong, Wenwen and Hu, Kongzhi and Luo, Jiapeng and Ma, Zheng and others},
  journal={Science China Information Sciences},
  volume={67},
  number={12},
  pages={220101},
  year={2024},
  publisher={Springer}
}
@inproceedings{chen2024internvl,
  title={Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks},
  author={Chen, Zhe and Wu, Jiannan and Wang, Wenhai and Su, Weijie and Chen, Guo and Xing, Sen and Zhong, Muyan and Zhang, Qinglong and Zhu, Xizhou and Lu, Lewei and others},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={24185--24198},
  year={2024}
}
```

## Acknowledgement

InternVL is built with reference to the code of the following projects: [OpenAI CLIP](https://github.com/openai/CLIP), [Open CLIP](https://github.com/mlfoundations/open_clip), [CLIP Benchmark](https://github.com/LAION-AI/CLIP_benchmark), [EVA](https://github.com/baaivision/EVA/tree/master), [InternImage](https://github.com/OpenGVLab/InternImage), [ViT-Adapter](https://github.com/czczup/ViT-Adapter), [MMSegmentation](https://github.com/open-mmlab/mmsegmentation), [Transformers](https://github.com/huggingface/transformers), [DINOv2](https://github.com/facebookresearch/dinov2), [BLIP-2](https://github.com/salesforce/LAVIS/tree/main/projects/blip2), [Qwen-VL](https://github.com/QwenLM/Qwen-VL/tree/master/eval_mm), and [LLaVA-1.5](https://github.com/haotian-liu/LLaVA). Thanks for their awesome work!

______________________________________________________________________

Scan the following QR Code, join our WeChat group.

<p align="center"><img width="300" alt="image" src="https://github.com/user-attachments/assets/f776df09-ebba-4fd5-80c2-fec4ff1518be"></p>


================================================
FILE: README_zh.md
================================================
<div align="center">

# InternVL家族：通过开源组件缩小与商业多模态模型的差距 —— GPT-5的开源替代方案

<div align="center">
  <img width="500" alt="image" src="https://github.com/user-attachments/assets/930e6814-8a9f-43e1-a284-118a5732daa4">
  <br>
</div>

[\[🆕 博客\]](https://internvl.github.io/blog/)
[\[🤔 常见问题\]](https://internvl.readthedocs.io/en/latest/tutorials/faqs.html)
[\[🗨️ 对话Demo\]](https://chat.intern-ai.org.cn/)
[\[📖 文档\]](https://internvl.readthedocs.io/en/latest/)
[\[🌐 API\]](https://internlm.intern-ai.org.cn/api/document)
[\[🚀 快速开始\]](#使用-huggingface-快速开始)

[\[🔥 InternVL3.5 Report\]](https://huggingface.co/papers/2508.18265)
[\[📜 InternVL3.0 Report\]](https://huggingface.co/papers/2504.10479)
[\[📜 InternVL2.5 MPO\]](https://huggingface.co/papers/2411.10442)
[\[📜 InternVL 2.5 报告\]](https://huggingface.co/papers/2412.05271)

[\[📜 Mini-InternVL 论文\]](https://arxiv.org/abs/2410.16261)
[\[📜 InternVL2 博客\]](https://internvl.github.io/blog/2024-07-02-InternVL-2.0/)
[\[📜 InternVL 1.5 论文\]](https://huggingface.co/papers/2404.16821)
[\[📜 InternVL 1.0 论文\]](https://huggingface.co/papers/2312.14238)

[\[📖 2.0 中文解读\]](https://zhuanlan.zhihu.com/p/706547971)  [\[📖 1.5 中文解读\]](https://zhuanlan.zhihu.com/p/699439759)  [\[📖 1.0 中文解读\]](https://zhuanlan.zhihu.com/p/702946079)

[Switch to the English version (切换至英文版)](/README.md)

<a href="https://trendshift.io/repositories/9803" target="_blank"><img src="https://trendshift.io/api/badge/repositories/9803" alt="OpenGVLab%2FInternVL | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
<img height="55" alt="image" src="https://github.com/user-attachments/assets/bd62ab46-f0ea-40c6-ab10-7fde671716cc">

![image/jpg](https://huggingface.co/OpenGVLab/InternVL3_5-241B-A28B/resolve/main/images/performance.jpg)

</div>

## 最新消息 🚀🚀🚀


- `2025/08/30`: 🔥 我们开源了[InternVL3_5-GPT-OSS-20B-A4B](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat_gpt_oss)以及CascadeRL（包含[离线强化学习](https://github.com/OpenGVLab/InternVL/blob/main/internvl_chat_gpt_oss/shell/internvl3_5_gpt_oss/internvl3_5_gpt_oss_20b_stage3_mpo.sh)和[在线强化学习](https://github.com/Weiyun1025/verl-internvl)两个阶段）的训练代码。这两个阶段的训练数据（[MMPR-v1.2](https://huggingface.co/datasets/OpenGVLab/MMPR-v1.2)和[MMPR-Tiny](https://huggingface.co/datasets/OpenGVLab/MMPR-Tiny)）也已经开源。
- `2025/08/26`: 🚀 我们发布了[InternVL3.5](https://huggingface.co/papers/2508.18265)，一个在全面性、推理能力以及推理效率上都取得了全面提升的开源多模态模型系列。其中，最大的模型（[InternVL3.5-241B-A28B](https://huggingface.co/OpenGVLab/InternVL3_5-241B-A28B)）在开源多模态大语言模型中取得了最优的多模态感知、推理、语言以及agency性能。同时，我们基于OpenAI开源的GPT-OSS-20B-A4B也发布了一个20B-A4B的版本（[InternVL3_5-GPT-OSS-20B-A4B](https://huggingface.co/OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview)）。值得注意的是，我们提供了两种模型权重的格式，包括和前几代权重格式一致的 [GitHub 格式](https://huggingface.co/OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview#github-format)以及和`transformers`库格式一致的 [HuggingFace 格式](https://huggingface.co/OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview#huggingface-format)。
- `2025/04/17`: 我们开源了 [MPO](https://huggingface.co/papers/2411.10442) 和 [VisualPRM](https://huggingface.co/papers/2503.10291) 的[数据构造管线](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat/tools/reasoning_data_pipeline)及[训练脚本](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat/shell/internvl3.0/mpo)。 此外 [MPO](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat/shell/internvl3.0/mpo_data_construction) 和 [VisualPRM](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat/shell/internvl3.0/visualprm_data_construction) 的数据构建脚本也已经开源。
- `2025/04/11`: 我们发布了 [InternVL3](https://huggingface.co/collections/OpenGVLab/internvl3-67f7f690be79c2fe9d74fe9d)， 一个性能强大的开源多模态大模型。 其中 InternVL3-78B 同时在[感知能力](https://rank.opencompass.org.cn/leaderboard-multimodal/?m=REALTIME)和[推理能力](https://rank.opencompass.org.cn/leaderboard-multimodal-reasoning/?m=REALTIME)上同时达到了开源第一的性能。 InternVL3-78B 的核心技术包括：[Variable Visual Position Encoding](https://huggingface.co/papers/2412.09616)，[Native Multimodal Pre-Training](https://huggingface.co/papers/2504.10479)，[Mixed Preference Optimization](https://huggingface.co/papers/2411.10442)，以及 [Multimodal Test-Time Scaling](https://huggingface.co/papers/2503.10291)。
- `2025/03/13`: 我们发布了 [VisualPRM](https://huggingface.co/OpenGVLab/VisualPRM-8B)，一个8B参数两的多模态过程奖励模型（PRM）。该模型在 Best-of-8 的评测设置下使得 InternVL2.5-8B 和 InternVL2.5-78B 在七个多模态推理评测基准上的综合性能分别提升了 8.4 和 5.9 分。该模型的训练数据 [VisualPRM400K](https://huggingface.co/datasets/OpenGVLab/VisualPRM400K)也已经开源。请参考我们的[论文](https://huggingface.co/papers/2503.10291)和[项目主页](https://internvl.github.io/blog/2025-03-13-VisualPRM/)来了解更多细节。
- `2024/12/20`: 我们发布了 [InternVL2.5-MPO系列](https://internvl.github.io/blog/2024-12-20-InternVL-2.5-MPO/)。该系列通过 [Mixed Preference Optimization](https://huggingface.co/papers/2411.10442) 算法和 [MMPR-v1.1](https://huggingface.co/datasets/OpenGVLab/MMPR-v1.1) 数据集微调得到。**该系列模型在OpenCompass评测榜单中的整体性能超过MPO训练前两个百分点。** 这些模型可在 [HF 链接](https://huggingface.co/collections/OpenGVLab/internvl25-mpo-6753fed98cd828219b12f849)中下载。
- `2024/12/17`: Paddle团队已在[PaddleMIX](https://github.com/PaddlePaddle/PaddleMIX)框架中适配[InternVL2/2.5](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/paddlemix/examples/internvl2)。
- `2024/12/05`: 我们发布了 InternVL2.5 系列，覆盖了从1B参数到78B参数的多模态大语言模型。[InternVL2_5-78B](https://huggingface.co/OpenGVLab/InternVL2_5-78B) 是首个在MMMU benchmark上得分超过70的开源模型。 这些模型可在 [HF 链接](https://huggingface.co/collections/OpenGVLab/internvl-25-673e1019b66e2218f68d7c1c) 中下载。
- `2024/11/14`: 我们发布了 [MMPR](https://huggingface.co/datasets/OpenGVLab/MMPR)，一个高质量、大规模的多模态推理偏好数据集，以及 [MPO](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat/shell/internvl2.0_mpo)，一种高效的偏好优化算法。由此训练的模型 [InternVL2-8B-MPO](https://huggingface.co/OpenGVLab/InternVL2-8B-MPO) 在 MathVista 上取得了 67.0 的准确率。更多详情请参阅我们的[论文](https://arxiv.org/abs/2411.10442)、[项目主页](https://internvl.github.io/blog/2024-11-14-InternVL-2.0-MPO/) 和 [文档](https://internvl.readthedocs.io/en/latest/internvl2.0/preference_optimization.html)。


<details>
<summary>更多</summary>


- `2024/10/21`: 我们发布了 Mini-InternVL 系列。这些模型在保持极小模型体积的同时实现了出色的性能：4B 模型仅用 5% 的模型大小便达到了 90% 的性能。有关更多详细信息，请查看我们的 [项目页面](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat/shell/mini_internvl) 和 [文档](https://internvl.readthedocs.io/en/latest/internvl2.0/domain_adaptation.html)。
- `2024/08/01`: [Chartmimic](https://chartmimic.github.io/) 团队在他们的基准测试中评估了 InternVL2 系列模型。InternVL2-26B 和 76B 模型在开源模型中取得了前两名的成绩，其中 InternVL2-Llama3-76B 模型超过了 GeminiProVision，并表现出与 Claude-3-opus 相当的结果。
- `2024/08/01`: InternVL2-Pro 在 [CharXiv](https://charxiv.github.io/#leaderboard) 数据集中实现了开源模型中的 SOTA 性能，也比部分知名闭源模型如 GPT-4V、Gemini 1.5 Flash、Claude 3 Sonnet 取得了更好成绩
- `2024/07/24`: [MLVU](https://github.com/JUNJIE99/MLVU)团队在它们的基准测试中评估了InternVL-1.5。在多项选择任务上的平均表现为50.4%，而在生成任务上的表现为4.02。多项选择任务的表现在所有开源多模态大语言模型中排名第一。
- `2024/07/04`: 我们发布了 InternVL2 系列模型。InternVL2-Pro 在 MMMU 基准测试中达到了 62.0% 的准确率，实现了与 GPT-4o 等领先闭源商业模型比肩的性能。模型权重可在 [HF 链接](https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e) 中下载。
- `2024/07/18`: InternVL2-40B 在 [Video-MME](https://github.com/BradyFU/Video-MME) 数据集中实现了开源模型中的 SOTA 性能，当输入 16 帧时得分为 61.2，输入 32 帧时得分为 64.4，大幅领先其它开源模型，是最接近 GPT-4o mini 的开源模型。
- `2024/07/18`: InternVL2-Pro 在 [DocVQA](https://rrc.cvc.uab.es/?ch=17&com=evaluation&task=1) 和 [InfoVQA](https://rrc.cvc.uab.es/?ch=17&com=evaluation&task=3) 的基准测试中实现了 SOTA 性能。
- `2024/06/19`: 我们提出了 Needle In A Multimodal Haystack ([MM-NIAH](https://github.com/OpenGVLab/MM-NIAH))，这是第一个针对模型关于长多模态文档理解能力的评测基准。
- `2024/05/30`: 我们发布了 [ShareGPT-4o](https://sharegpt4o.github.io/)，这是一个大规模、高质量的多模态数据集。我们计划开源一批使用 GPT-4o 精心标注的数据，包括 200K 条图像详细描述、10K 条视频详细描述，以及 10K 条音频详细描述。
- `2024/05/29`: 我们开源了 Mini-InternVL 系列，包括以下两个对话模型：[Mini-InternVL-Chat-2B-V1-5](https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-2B-V1-5) 和 [Mini-InternVL-Chat-4B-V1-5](https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-4B-V1-5)。这些模型在极小的尺寸下实现了令人印象深刻的性能：2B 模型以 8% 的模型尺寸实现了 80% 的性能，4B 模型以 16% 的模型尺寸实现了 90% 的性能。更多细节请查看我们的[博客](https://internvl.github.io/blog/2024-05-25-Mini-InternVL-1.5/)。
- `2024/05/13`: InternVL 1.0 现在可以作为扩散模型的 [文本编码器](https://huggingface.co/OpenGVLab/InternVL-14B-224px)，支持全球超过 110 种语言的多语言生成。详情请看 [MuLan](https://github.com/mulanai/MuLan)。
- `2024/04/18`: InternVL-Chat-V1-5 已经在 [HuggingFace](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5) 发布，在 MMMU、DocVQA、ChartQA、MathVista 等各种基准测试中，性能接近 GPT-4V 和 Gemini Pro。
- `2024/02/27`: InternVL 已被 CVPR 2024 (Oral) 接收！🎉
- `2024/02/21`: [InternVL-Chat-V1-2-Plus](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus) 在 MathVista（59.9）、MMBench（83.8）和 MMVP（58.7）上实现了 SOTA 性能。详情请看我们的[博客](https://internvl.github.io/blog/2024-02-21-InternVL-1.2/)。
- `2024/02/12`: InternVL-Chat-V1-2 已经发布，它在 MMMU 验证集上达到了 51.6，在 MMBench 测试集上达到了 82.3。 更多信息请参考我们的[博客](https://internvl.github.io/blog/2024-02-21-InternVL-1.2/)以及 [SFT 数据](./internvl_chat#prepare-training-datasets)。该模型已经在 [HuggingFace](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2) 发布，训练、测评的数据和脚本均已开源。
- `2024/01/24`: InternVL-Chat-V1-1 已经发布，它支持中文对话，并具备强大的 OCR 能力，详情请看[这里](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-1)。
- `2024/01/16`: 我们发布了 [定制的 mmcv/mmsegmentation/mmdetection 代码库](https://github.com/OpenGVLab/InternVL-MMDetSeg)，集成了 DeepSpeed，可以用于训练检测和分割大模型。

</details>

## 使用文档

### 🌟 **Get Started**

- **Installation**: 🌱 [Installation Guide](https://internvl.readthedocs.io/en/latest/get_started/installation.html) | 📄 [requirements.txt](./requirements.txt)
- **Chat Data Format**: 📝 [Meta File](https://internvl.readthedocs.io/en/latest/get_started/chat_data_format.html#meta-file) | ✏️ [Text](https://internvl.readthedocs.io/en/latest/get_started/chat_data_format.html#pure-text-data) | 🖼️ [Single-Image](https://internvl.readthedocs.io/en/latest/get_started/chat_data_format.html#single-image-data) | 🖼️🖼️ [Multi-Image](https://internvl.readthedocs.io/en/latest/get_started/chat_data_format.html#multi-image-data) | 🎥 [Video](https://internvl.readthedocs.io/en/latest/get_started/chat_data_format.html#video-data)
- **Local Chat Demo**: 🤖 [Streamlit Demo](https://internvl.readthedocs.io/en/latest/get_started/local_chat_demo.html#streamlit-demo)
- **InternVL-Chat API**: 🌐 [InternVL2.5 API](https://internlm.intern-ai.org.cn/api/document)
- **Tutorials**: 🚀 [Enhancing InternVL2 on COCO Caption Using LoRA Fine-Tuning](https://internvl.readthedocs.io/en/latest/tutorials/coco_caption_finetune.html)

### 🏆 **InternVL Family**

- **InternVL 2.5**: 📖 [Intro](https://internvl.readthedocs.io/en/latest/internvl2.5/introduction.html) | ⚡ [Quick Start](https://internvl.readthedocs.io/en/latest/internvl2.5/quick_start.html) | ✨ [Finetune](https://internvl.readthedocs.io/en/latest/internvl2.5/finetune.html) | 📊 [Evaluate](https://internvl.readthedocs.io/en/latest/internvl2.5/evaluation.html) | 📦 [Deploy](https://internvl.readthedocs.io/en/latest/internvl2.5/deployment.html) | 🎯 [MPO](https://internvl.readthedocs.io/en/latest/internvl2.5/preference_optimization.html)
- **InternVL 2.0**: 📖 [Intro](https://internvl.readthedocs.io/en/latest/internvl2.0/introduction.html) | ⚡ [Quick Start](https://internvl.readthedocs.io/en/latest/internvl2.0/quick_start.html) | ✨ [Finetune](https://internvl.readthedocs.io/en/latest/internvl2.0/finetune.html) | 📊 [Evaluate](https://internvl.readthedocs.io/en/latest/internvl2.0/evaluation.html) | 📦 [Deploy](https://internvl.readthedocs.io/en/latest/internvl2.0/deployment.html) | 🎯 [MPO](https://internvl.readthedocs.io/en/latest/internvl2.0/preference_optimization.html)
- **InternVL 1.5**: 📖 [Intro](https://internvl.readthedocs.io/en/latest/internvl1.5/introduction.html) | ⚡ [Quick Start](https://internvl.readthedocs.io/en/latest/internvl1.5/quick_start.html) | ✨ [Finetune](https://internvl.readthedocs.io/en/latest/internvl1.5/finetune.html) | 📊 [Evaluate](https://internvl.readthedocs.io/en/latest/internvl1.5/evaluation.html) | 📦 [Deploy](https://internvl.readthedocs.io/en/latest/internvl1.5/deployment.html)
- **InternVL 1.2**: 📖 [Intro](https://internvl.readthedocs.io/en/latest/internvl1.2/introduction.html) | ⚡ [Quick Start](https://internvl.readthedocs.io/en/latest/internvl1.2/quick_start.html) | ✨ [Finetune](https://internvl.readthedocs.io/en/latest/internvl1.2/finetune.html) | 📊 [Evaluate](https://internvl.readthedocs.io/en/latest/internvl1.2/evaluation.html)
- **InternVL 1.1**: 📖 [Intro](https://internvl.readthedocs.io/en/latest/internvl1.1/introduction.html) | ⚡ [Quick Start](https://internvl.readthedocs.io/en/latest/internvl1.1/quick_start.html) | 📊 [Evaluation](https://internvl.readthedocs.io/en/latest/internvl1.1/evaluation.html)
- **InternVL 1.0**: 🖼️ [Classification](https://internvl.readthedocs.io/en/latest/internvl1.0/classification.html) | 📊 [CLIP-Benchmark](https://internvl.readthedocs.io/en/latest/internvl1.0/clip_benchmark.html) | 🎨 [Segmentation](https://internvl.readthedocs.io/en/latest/internvl1.0/segmentation.html) | 💬 [Chat-LLaVA](https://internvl.readthedocs.io/en/latest/internvl1.0/internvl_chat_llava.html) | ✨ [InternVL-G](https://internvl.readthedocs.io/en/latest/internvl1.0/internvl_g.html)

## 模型库

#### 多模态大语言模型 (InternVL 3.5)

为了保持和前几代模型的一致性，我们提供了两种模型权重的格式，包括和前几代权重格式一致的 [GitHub 格式](https://huggingface.co/OpenGVLab/InternVL3_5-241B-A28B)以及和`transformers`库格式一致的 [HuggingFace 格式](https://huggingface.co/OpenGVLab/InternVL3_5-241B-A28B-HF)。

> 如果你希望转换这两种格式的权重，请参考我们的脚本：[custom2hf](https://github.com/OpenGVLab/InternVL/blob/main/internvl_chat/tools/internvl_custom2hf.py) 以及 [hf2custom](https://github.com/OpenGVLab/InternVL/blob/main/internvl_chat/tools/internvl_hf2custom.py).

**Github 格式**
| Model                 | #Vision Param | #Language Param | #Total Param | HF Link                                                                        | ModelScope Link                                                                          |
| --------------------- | ------------- | --------------- | ------------ | ------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------- |
| InternVL3.5-1B        | 0.3B          | 0.8B            | 1.1B         | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-1B)                      | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-1B)                      |
| InternVL3.5-2B        | 0.3B          | 2.0B            | 2.3B         | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-2B)                      | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-2B)                      |
| InternVL3.5-4B        | 0.3B          | 4.4B            | 4.7B         | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-4B)                      | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-4B)                      |
| InternVL3.5-8B        | 0.3B          | 8.2B            | 8.5B         | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-8B)                      | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-8B)                      |
| InternVL3.5-14B       | 0.3B          | 14.8B           | 15.1B        | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-14B)                     | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-14B)                     |
| InternVL3.5-38B       | 5.5B          | 32.8B           | 38.4B        | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-38B)                     | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-38B)                     |
| InternVL3.5-20B-A4B   | 0.3B          | 20.9B           | 21.2B-A4B    | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview) | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview) |
| InternVL3.5-30B-A3B   | 0.3B          | 30.5B           | 30.8B-A3B    | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-30B-A3B)                 | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-30B-A3B)                 |
| InternVL3.5-241B-A28B | 5.5B          | 235.1B          | 240.7B-A28B  | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-241B-A28B)               | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-241B-A28B)               |

**HuggingFace 格式**

| Model                    | #Vision Param | #Language Param | #Total Param | HF Link                                                                           | ModelScope Link                                                                             |
| ------------------------ | ------------- | --------------- | ------------ | --------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------- |
| InternVL3.5-1B-HF        | 0.3B          | 0.8B            | 1.1B         | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-1B-HF)                      | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-1B-HF)                      |
| InternVL3.5-2B-HF        | 0.3B          | 2.0B            | 2.3B         | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-2B-HF)                      | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-2B-HF)                      |
| InternVL3.5-4B-HF        | 0.3B          | 4.4B            | 4.7B         | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-4B-HF)                      | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-4B-HF)                      |
| InternVL3.5-8B-HF        | 0.3B          | 8.2B            | 8.5B         | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-8B-HF)                      | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-8B-HF)                      |
| InternVL3.5-14B-HF       | 0.3B          | 14.8B           | 15.1B        | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-14B-HF)                     | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-14B-HF)                     |
| InternVL3.5-38B-HF       | 5.5B          | 32.8B           | 38.4B        | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-38B-HF)                     | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-38B-HF)                     |
| InternVL3.5-20B-A4B-HF   | 0.3B          | 20.9B           | 21.2B-A4B    | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview-HF) | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview-HF) |
| InternVL3.5-30B-A3B-HF   | 0.3B          | 30.5B           | 30.8B-A3B    | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-30B-A3B-HF)                 | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-30B-A3B-HF)                 |
| InternVL3.5-241B-A28B-HF | 5.5B          | 235.1B          | 240.7B-A28B  | [🤗 link](https://huggingface.co/OpenGVLab/InternVL3_5-241B-A28B-HF)               | [🤖 link](https://www.modelscope.cn/models/OpenGVLab/InternVL3_5-241B-A28B-HF)               |



#### 多模态大语言模型 (InternVL 2.5)

<table>
  <tr>
    <th>Model Name</th>
    <th>Vision Part</th>
    <th>Language Part</th>
    <th>HF&nbsp;Link</th>
    <th>MS&nbsp;Link</th>
  </tr>
  <tr>
    <td>InternVL2_5-1B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px-V2_5">InternViT&#8209;300M&#8209;448px&#8209;V2_5</a></td>
    <td><a href="https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct">Qwen2.5&#8209;0.5B&#8209;Instruct</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-1B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-1B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2_5-2B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px-V2_5">InternViT-300M-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/internlm/internlm2_5-1_8b-chat">internlm2_5-1_8b-chat</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-2B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-2B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2_5-4B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px-V2_5">InternViT-300M-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/Qwen/Qwen2.5-3B-Instruct">Qwen2.5-3B-Instruct</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-4B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-4B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2_5-8B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px-V2_5">InternViT-300M-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/internlm/internlm2_5-7b-chat">internlm2_5-7b-chat</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-8B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-8B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2_5-26B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V2_5">InternViT-6B-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/internlm/internlm2_5-20b-chat">internlm2_5-20b-chat</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-26B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-26B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2_5-38B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V2_5">InternViT-6B-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/Qwen/Qwen2.5-32B-Instruct">Qwen2.5-32B-Instruct</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-38B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-38B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2_5-78B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V2_5">InternViT-6B-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/Qwen/Qwen2.5-72B-Instruct">Qwen2.5-72B-Instruct</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-78B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-78B">🤖 link</a></td>
  </tr>
</table>

<table>
  <tr>
    <th>Model Name</th>
    <th>Vision Part</th>
    <th>Language Part</th>
    <th>HF&nbsp;Link</th>
    <th>MS&nbsp;Link</th>
  </tr>
  <tr>
    <td>InternVL2_5-1B-MPO</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px-V2_5">InternViT&#8209;300M&#8209;448px&#8209;V2_5</a></td>
    <td><a href="https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct">Qwen2.5&#8209;0.5B&#8209;Instruct</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-1B-MPO">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-1B-MPO">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2_5-2B-MPO</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px-V2_5">InternViT-300M-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/internlm/internlm2_5-1_8b-chat">internlm2_5-1_8b-chat</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-2B-MPO">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-2B-MPO">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2_5-4B-MPO</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px-V2_5">InternViT-300M-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/Qwen/Qwen2.5-3B-Instruct">Qwen2.5-3B-Instruct</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-4B-MPO">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-4B-MPO">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2_5-8B-MPO</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px-V2_5">InternViT-300M-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/internlm/internlm2_5-7b-chat">internlm2_5-7b-chat</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-8B-MPO">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-8B-MPO">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2_5-26B-MPO</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V2_5">InternViT-6B-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/internlm/internlm2_5-20b-chat">internlm2_5-20b-chat</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-26B-MPO">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-26B-MPO">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2_5-38B-MPO</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V2_5">InternViT-6B-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/Qwen/Qwen2.5-32B-Instruct">Qwen2.5-32B-Instruct</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-38B-MPO">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-38B-MPO">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2_5-78B-MPO</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V2_5">InternViT-6B-448px-V2_5</a></td>
    <td><a href="https://huggingface.co/Qwen/Qwen2.5-72B-Instruct">Qwen2.5-72B-Instruct</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2_5-78B-MPO">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2_5-78B-MPO">🤖 link</a></td>
  </tr>
</table>

#### 多模态大语言模型 (InternVL 2.0)

<table>
  <tr>
    <th>Model Name</th>
    <th>Vision Part</th>
    <th>Language Part</th>
    <th>HF&nbsp;Link</th>
    <th>MS&nbsp;Link</th>
  </tr>
  <tr>
    <td>InternVL2-1B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px">InternViT-300M-448px</a></td>
    <td><a href="https://huggingface.co/Qwen/Qwen2-0.5B-Instruct">Qwen2-0.5B-Instruct</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2-1B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2-1B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2-2B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px">InternViT-300M-448px</a></td>
    <td><a href="https://huggingface.co/internlm/internlm2-chat-1_8b">internlm2-chat-1-8b</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2-2B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2-2B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2-4B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px">InternViT-300M-448px</a></td>
    <td><a href="https://huggingface.co/microsoft/Phi-3-mini-128k-instruct">Phi&#8209;3&#8209;mini&#8209;128k&#8209;instruct</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2-4B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2-4B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2-8B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px">InternViT-300M-448px</a></td>
    <td><a href="https://huggingface.co/internlm/internlm2_5-7b-chat">internlm2_5-7b-chat</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2-8B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2-8B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2-26B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5">InternViT-6B-448px-V1-5</a></td>
    <td><a href="https://huggingface.co/internlm/internlm2-chat-20b">internlm2-chat-20b</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2-26B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2-26B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2-40B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5">InternViT&#8209;6B&#8209;448px&#8209;V1&#8209;5</a></td>
    <td><a href="https://huggingface.co/NousResearch/Nous-Hermes-2-Yi-34B">Nous&#8209;Hermes&#8209;2&#8209;Yi&#8209;34B</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2-40B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2-40B">🤖 link</a></td>
  </tr>
  <tr>
    <td>InternVL2&#8209;Llama3-76B</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5">InternViT-6B-448px-V1-5</a></td>
    <td><a href="https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-70B">Hermes‑2‑Theta‑<br>Llama‑3‑70B</a></td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL2-Llama3-76B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL2-Llama3-76B">🤖 link</a></td>
  </tr>
</table>

#### 多模态大语言模型 (InternVL 1.0-1.5)

<table>
  <tr>
    <th>Model</th>
    <th>Date</th>
    <th>HF&nbsp;Link</th>
    <th>MS&nbsp;Link</th>
    <th>Note</th>
  </tr>
  <tr>
    <td>Mini&#8209;InternVL&#8209;Chat&#8209;4B&#8209;V1&#8209;5</td>
    <td>2024.05.28</td>
    <td><a href="https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-4B-V1-5">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/Mini-InternVL-Chat-4B-V1-5">🤖 link</a></td>
    <td>🚀🚀 16% 的模型大小, 90% 的性能</td>
  </tr>
  <tr>
    <td>Mini-InternVL-Chat-2B-V1-5</td>
    <td>2024.05.19</td>
    <td><a href="https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-2B-V1-5">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/Mini-InternVL-Chat-2B-V1-5">🤖 link</a></td>
    <td>🚀 8% 的模型大小, 80% 的性能</td>
  </tr>
  <tr>
    <td>InternVL-Chat-V1-5</td>
    <td>2024.04.18</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL-Chat-V1-5">🤖 link</a></td>
    <td>支持 4K 图像；超强的 OCR 能力；在 MMMU、DocVQA、ChartQA、MathVista 等各种基准测试中，性能接近 GPT-4V 和 Gemini Pro
  </tr>
  <tr>
    <td>InternVL-Chat-V1-2-Plus</td>
    <td>2024.02.21</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL-Chat-V1-2-Plus">🤖 link</a></td>
    <td>更多的 SFT 数据和更强的性能</td>
  </tr>
  <tr>
    <td>InternVL-Chat-V1-2</td>
    <td>2024.02.11</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL-Chat-V1-2">🤖 link</a></td>
    <td>将 LLM 扩展到 34B</td>
  </tr>
  <tr>
    <td>InternVL-Chat-V1-1</td>
    <td>2024.01.24</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL-Chat-V1-1">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL-Chat-V1-1">🤖 link</a></td>
    <td>支持中文和更强的 OCR 能力</td>
  </tr>
  <tr>
    <td>InternVL-Chat-19B</td>
    <td>2023.12.25</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL-Chat-ViT-6B-Vicuna-13B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL-Chat-ViT-6B-Vicuna-13B">🤖 link</a></td>
    <td>英语多模态对话</td>
  </tr>
  <tr>
    <td>InternVL-Chat-13B</td>
    <td>2023.12.25</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL-Chat-ViT-6B-Vicuna-7B">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL-Chat-ViT-6B-Vicuna-7B">🤖 link</a></td>
    <td>英语多模态对话</td>
  </tr>
</table>

#### 类 CLIP 模型 (InternVL 1.0-2.5)

<table>
  <tr>
    <th>Model</th>
    <th>Date</th>
    <th>HF&nbsp;Link</th>
    <th>MS&nbsp;Link</th>
    <th>Note</th>
  </tr>
  <tr>
    <td>InternViT-300M-448px-V2_5</td>
    <td>2024.12.05</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px-V2_5">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternViT-300M-448px-V2_5">🤖 link</a></td>
    <td>🚀🚀 一个更强的轻量视觉编码器 (🔥新)</td>
  </tr>
  <tr>
    <td>InternViT-6B-448px-V2_5</td>
    <td>2024.12.05</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V2_5">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternViT-6B-448px-V2_5">🤖 link</a></td>
    <td>🚀🚀 拥有更强的视觉特征提取能力 (🔥新)</td>
  </tr>
  <tr>
    <td>InternViT-300M-448px</td>
    <td>2024.05.25</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-300M-448px">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternViT-300M-448px">🤖 link</a></td>
    <td>蒸馏的小型视觉基础模型，具有 300M 参数</td>
  </tr>
  <tr>
    <td>InternViT&#8209;6B&#8209;448px&#8209;V1&#8209;5</td>
    <td>2024.04.20</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternViT-6B-448px-V1-5">🤖 link</a></td>
    <td>通过增量预训练支持动态分辨率和超强的 OCR 特征提取能力</td>
  </tr>
  <tr>
    <td>InternViT-6B-448px-V1-2</td>
    <td>2024.02.11</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternViT-6B-448px-V1-2">🤖 link</a></td>
    <td>通过增量预训练支持 448 分辨率</td>
  </tr>
  <tr>
    <td>InternViT-6B-448px-V1-0</td>
    <td>2024.01.30</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-0">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternViT-6B-448px-V1-0">🤖 link</a></td>
    <td>通过增量预训练支持 448 分辨率</td>
  </tr>
  <tr>
    <td>InternViT-6B-224px</td>
    <td>2023.12.22</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternViT-6B-224px">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternViT-6B-224px">🤖 link</a></td>
    <td>InternViT-6B 的第一个版本，提取自 InternVL‑14B‑224px</td>
  </tr>
</table>

#### 视觉语言基础模型 (InternVL 1.0)

<table>
  <tr>
    <th>Model</th>
    <th>Date</th>
    <th>HF&nbsp;Link</th>
    <th>MS&nbsp;Link</th>
    <th>Note</th>
  </tr>
  <tr>
    <td>InternVL&#8209;14B&#8209;224px</td>
    <td>2023.12.22</td>
    <td><a href="https://huggingface.co/OpenGVLab/InternVL-14B-224px">🤗 link</a></td>
    <td><a href="https://modelscope.cn/models/OpenGVLab/InternVL-14B-224px">🤖 link</a></td>
    <td>视觉-语言基础模型，InternViT-6B + QLLaMA，可以用于类似 CLIP 的图文检索</td>
  </tr>
</table>

## TODO 列表

- [x] 发布 InternVL2.5 系列的训练 / 评估代码
- [x] 支持 liger kernels 以节省显存
- [x] 发布 MPO 的代码、模型和数据
- [x] 支持多模态 packed dataset
- [ ] 支持 vLLM 和 Ollama
- [ ] 在 Demo 中支持视频和 PDF 输入
- [ ] 发布集成 VisionLLMv2 的 InternVL2
- [x] 使用 readthedocs 重新构建文档
- [x] 支持使用 LoRA 微调不同的 LLMs
- [x] 发布 InternVL2 的 `requirements.txt`
- [x] 发布 InternVL2 系列的训练 / 评估代码
- [x] 发布 InternVL1.5 和 InternVL2 的 Streamlit 网页 UI

## InternVL 可以做什么?

<details>
  <summary>视觉感知 (点击展开)</summary>

- 线性探针图像分类 [\[查看详情\]](./classification#-evaluation)

  ViT-22B uses the private JFT-3B dataset.

  | method              | #param | IN-1K | IN-ReaL | IN-V2 | IN-A  | IN-R  | IN-Sketch |
  | ------------------- | :----: | :---: | :-----: | :---: | :---: | :---: | :-------: |
  | OpenCLIP-G          |  1.8B  | 86.2  |  89.4   | 77.2  | 63.8  | 87.8  |   66.4    |
  | DINOv2-g            |  1.1B  | 86.5  |  89.6   | 78.4  | 75.9  | 78.8  |   62.5    |
  | EVA-01-CLIP-g       |  1.1B  | 86.5  |  89.3   | 77.4  | 70.5  | 87.7  |   63.1    |
  | MAWS-ViT-6.5B       |  6.5B  | 87.8  |    -    |   -   |   -   |   -   |     -     |
  | ViT-22B\*           | 21.7B  | 89.5  |  90.9   | 83.2  | 83.8  | 87.4  |     -     |
  | InternViT-6B (ours) |  5.9B  | 88.2  |  90.4   | 79.9  | 77.5  | 89.8  |   69.1    |

- 语义分割 [\[查看详情\]](./segmentation#-evaluation)

  | method                | decoder | #param (train/total) | crop size | mIoU         |
  | --------------------- | :-----: | :------------------: | :-------: | ------------ |
  | OpenCLIP-G (frozen)   | Linear  |     0.3M / 1.8B      |    512    | 39.3         |
  | ViT-22B (frozen)      | Linear  |     0.9M / 21.7B     |    504    | 34.6         |
  | InternViT-6B (frozen) | Linear  |     0.5M / 5.9B      |    504    | 47.2 (+12.6) |
  | ViT-22B (frozen)      | UperNet |     0.8B / 22.5B     |    504    | 52.7         |
  | InternViT-6B (frozen) | UperNet |     0.4B / 6.3B      |    504    | 54.9 (+2.2)  |
  | ViT-22B               | UperNet |    22.5B / 22.5B     |    504    | 55.3         |
  | InternViT-6B          | UperNet |     6.3B / 6.3B      |    504    | 58.9 (+3.6)  |

- 零样本图像分类 [\[查看详情\]](./clip_benchmark#imagenet-variants-and-objectnet)

  | method            | IN-1K | IN-A  | IN-R  | IN-V2 | IN-Sketch | ObjectNet |
  | ----------------- | :---: | :---: | :---: | :---: | :-------: | :-------: |
  | OpenCLIP-G        | 80.1  | 69.3  | 92.1  | 73.6  |   68.9    |   73.0    |
  | EVA-02-CLIP-E+    | 82.0  | 82.1  | 94.5  | 75.7  |   71.6    |   79.6    |
  | ViT-22B\*         | 85.9  | 90.1  | 96.0  | 80.9  |     -     |   87.6    |
  | InternVL-C (ours) | 83.2  | 83.8  | 95.5  | 77.3  |   73.9    |   80.6    |

- 多语言零样本图像分类 [\[查看详情\]](./clip_benchmark#multilingual-imagenet-1k)

  EN: English, ZH: Chinese, JP: Japanese, Ar: Arabic, IT: Italian

  | method            | IN-1K (EN) | IN-1K (ZH) | IN-1K (JP) | IN-1K (AR) | IN-1K (IT) |
  | ----------------- | :--------: | :--------: | :--------: | :--------: | :--------: |
  | Taiyi-CLIP-ViT-H  |     -      |    54.4    |     -      |     -      |     -      |
  | WuKong-ViT-L-G    |     -      |    57.5    |     -      |     -      |     -      |
  | CN-CLIP-ViT-H     |     -      |    59.6    |     -      |     -      |     -      |
  | AltCLIP-ViT-L     |    74.5    |    59.6    |     -      |     -      |     -      |
  | EVA-02-CLIP-E+    |    82.0    |     -      |     -      |     -      |    41.2    |
  | OpenCLIP-XLM-R-H  |    77.0    |    55.7    |    53.1    |    37.0    |    56.8    |
  | InternVL-C (ours) |    83.2    |    64.5    |    61.5    |    44.9    |    65.7    |

- 零样本视频分类

  | method            | #frame | K400  | K600  | K700  |
  | ----------------- | :----: | :---: | :---: | :---: |
  | OpenCLIP-G        |   1    | 65.9  | 66.1  | 59.2  |
  | EVA-02-CLIP-E+    |   1    | 69.8  | 69.3  | 63.4  |
  | InternVL-C (ours) |   1    | 71.0  | 71.3  | 65.7  |
  | ViCLIP            |   8    | 75.7  | 73.5  | 66.4  |
  | InternVL-C (ours) |   8    | 79.4  | 78.8  | 71.5  |

</details>

<details>
  <summary>跨模态检索 (点击展开)</summary>

- 英语零样本图文检索 [\[查看详情\]](./clip_benchmark#flickr30k--coco)

  <table>
    <tr align=center>
        <td rowspan="3" align=left><b>model</b></td>
        <td colspan="6" align=center><b>Flickr30K</b></td>
        <td colspan="6" align=center><b>COCO</b></td>
        <td rowspan="3" align=center><b>avg</b></td>
    </tr>
     <tr align=center>
        <td colspan="3" align=center><b>image-to-text</b></td>
        <td colspan="3" align=center><b>text-to-image</b></td>
         <td colspan="3" align=center><b>image-to-text</b></td>
        <td colspan="3" align=center><b>text-to-image</b></td>
     </tr>
     <tr>
        <td>R@1</td>
        <td>R@5</td>
        <td>R@10</td>
        <td>R@1</td>
        <td>R@5</td>
        <td>R@10</td>
        <td>R@1</td>
        <td>R@5</td>
        <td>R@10</td>
        <td>R@1</td>
        <td>R@5</td>
        <td>R@10</td>
     </tr>
  <tr align=center>
        <td align=left>OpenCLIP-G</td>
        <td>92.9</td>
        <td>99.3</td>
        <td>99.8</td>
        <td>79.5</td>
        <td>95.0</td>
        <td>97.1</td>
        <td>67.3</td>
        <td>86.9</td>
        <td>92.6</td>
        <td>51.4</td>
        <td>74.9</td>
        <td>83.0</td>
        <td>85.0</td>
     </tr>
  <tr align=center>
        <td align=left>EVA-02-CLIP-E+</td>
        <td>93.9</td>
        <td>99.4</td>
        <td>99.8</td>
        <td>78.8</td>
        <td>94.2</td>
        <td>96.8</td>
        <td>68.8</td>
        <td>87.8</td>
        <td>92.8</td>
        <td>51.1</td>
        <td>75.0</td>
        <td>82.7</td>
        <td>85.1</td>
     </tr>
    <tr align=center>
        <td align=left>EVA-CLIP-8B</td>
        <td>95.6</td>
        <td>99.6</td>
        <td>99.9</td>
        <td>80.8</td>
        <td>95.5</td>
        <td>97.6</td>
        <td>70.3</td>
        <td>89.3</td>
        <td>93.9</td>
        <td>53.0</td>
        <td>76.0</td>
        <td>83.4</td>
        <td>86.2</td>
     </tr>
  <tr align=center>
        <td align=left>InternVL-C (ours)</td>
        <td>94.7</td>
        <td>99.6</td>
        <td>99.9</td>
        <td>81.7</td>
        <td>96.0</td>
        <td>98.2</td>
        <td>70.6</td>
        <td>89.0</td>
        <td>93.5</td>
        <td>54.1</td>
        <td>77.3</td>
        <td>84.6</td>
        <td>86.6</td>
     </tr>
  <tr align=center>
        <td align=left>InternVL-G (ours)</td>
        <td>95.7</td>
        <td>99.7</td>
        <td>99.9</td>
        <td>85.0</td>
        <td>97.0</td>
        <td>98.6</td>
        <td>74.9</td>
        <td>91.3</td>
        <td>95.2</td>
        <td>58.6</td>
        <td>81.3</td>
        <td>88.0</td>
        <td>88.8</td>
     </tr>

  </table>

- 中文零样本图文检索 [\[查看详情\]](./clip_benchmark#flickr30k-cn--coco-cn)

  <table>
    <tr  align=center>
        <td rowspan="3" align=left><b>model</b></td>
        <td colspan="6" align=center><b>Flickr30K-CN</b></td>
        <td colspan="6" align=center><b>COCO-CN</b></td>
        <td rowspan="3" align=center><b>avg</b></td>

  </tr>
     <tr  align=center>
        <td colspan="3" align=center><b>image-to-text</b></td>
        <td colspan="3" align=center><b>text-to-image</b></td>
         <td colspan="3" align=center><b>image-to-text</b></td>
        <td colspan="3" align=center><b>text-to-image</b></td>
     </tr>
     <tr>
        <td>R@1</td>
        <td>R@5</td>
        <td>R@10</td>
        <td>R@1</td>
        <td>R@5</td>
        <td>R@10</td>
        <td>R@1</td>
        <td>R@5</td>
        <td>R@10</td>
        <td>R@1</td>
        <td>R@5</td>
        <td>R@10</td>
     </tr>

  <tr align=center>
        <td align=left>CN-CLIP-ViT-H</td>
        <td>81.6</td>
        <td>97.5</td>
        <td>98.8</td>
        <td>71.2</td>
        <td>91.4</td>
        <td>95.5</td>
        <td>63.0</td>
        <td>86.6</td>
        <td>92.9</td>
        <td>69.2</td>
        <td>89.9</td>
        <td>96.1</td>
        <td>86.1</td>
     </tr>

  <tr align=center>
        <td align=left>OpenCLIP-XLM-R-H</td>
        <td>86.1</td>
        <td>97.5</td>
        <td>99.2</td>
        <td>71.0</td>
        <td>90.5</td>
        <td>94.9</td>
        <td>70.0</td>
        <td>91.5</td>
        <td>97.0</td>
        <td>66.1</td>
        <td>90.8</td>
        <td>96.0</td>
        <td>87.6</td>
     </tr>

  <tr align=center>
        <td align=left>InternVL-C (ours)</td>
        <td>90.3</td>
        <td>98.8</td>
        <td>99.7</td>
        <td>75.1</td>
        <td>92.9</td>
        <td>96.4</td>
        <td>68.8</td>
        <td>92.0</td>
        <td>96.7</td>
        <td>68.9</td>
        <td>91.9</td>
        <td>96.5</td>
        <td>89.0</td>
     </tr>
  <tr align=center>
        <td align=left>InternVL-G (ours)</td>
        <td>92.9</td>
        <td>99.4</td>
        <td>99.8</td>
        <td>77.7</td>
        <td>94.8</td>
        <td>97.3</td>
        <td>71.4</td>
        <td>93.9</td>
        <td>97.7</td>
        <td>73.8</td>
        <td>94.4</td>
        <td>98.1</td>
        <td>90.9</td>
     </tr>

  </table>

- 多语言零样本图文对检索 [\[查看详情\]](./clip_benchmark#xtd)

  | method            |  EN   |  ES   |  FR   |  ZH   |  IT   |  KO   |  RU   |  JP   | average |
  | ----------------- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :-----: |
  | AltCLIP           | 95.4  | 94.1  | 92.9  | 95.1  | 94.2  | 94.4  | 91.8  | 91.7  |  93.7   |
  | OpenCLIP-XLM-R-H  | 97.3  | 96.1  | 94.5  | 94.7  | 96.0  | 90.2  | 93.9  | 94.0  |  94.6   |
  | InternVL-C (ours) | 97.3  | 95.7  | 95.1  | 95.6  | 96.0  | 92.2  | 93.3  | 95.5  |  95.1   |
  | InternVL-G (ours) | 98.6  | 97.7  | 96.5  | 96.7  | 96.9  | 95.1  | 94.8  | 96.1  |  96.6   |

</details>

<details>
  <summary>多模态对话</summary>

</details>

## 使用 HuggingFace 快速开始

<details>
  <summary>使用 InternViT-6B 提取视觉特征 (点击展开)</summary>

```python
import torch
from PIL import Image
from transformers import AutoModel, CLIPImageProcessor

model = AutoModel.from_pretrained(
    'OpenGVLab/InternViT-6B-448px-V2_5',
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True).cuda().eval()

image = Image.open('./examples/image1.jpg').convert('RGB')

image_processor = CLIPImageProcessor.from_pretrained('OpenGVLab/InternViT-6B-448px-V1-5')

pixel_values = image_processor(images=image, return_tensors='pt').pixel_values
pixel_values = pixel_values.to(torch.bfloat16).cuda()

outputs = model(pixel_values)
```

</details>

<details>
  <summary>使用 InternVL-C(ontrastive) 和 InternVL-G(enerative) 进行跨模态检索 (点击展开)</summary>

```python
import torch
from PIL import Image
from transformers import AutoModel, CLIPImageProcessor
from transformers import AutoTokenizer


model = AutoModel.from_pretrained(
    'OpenGVLab/InternVL-14B-224px',
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True).cuda().eval()

image_processor = CLIPImageProcessor.from_pretrained('OpenGVLab/InternVL-14B-224px')

tokenizer = AutoTokenizer.from_pretrained(
    'OpenGVLab/InternVL-14B-224px', use_fast=False, add_eos_token=True)
tokenizer.pad_token_id = 0  # set pad_token_id to 0

images = [
    Image.open('./examples/image1.jpg').convert('RGB'),
    Image.open('./examples/image2.jpg').convert('RGB'),
    Image.open('./examples/image3.jpg').convert('RGB')
]
prefix = 'summarize:'
texts = [
    prefix + 'a photo of a red panda',  # English
    prefix + '一张熊猫的照片',  # Chinese
    prefix + '二匹の猫の写真'  # Japanese
]

pixel_values = image_processor(images=images, return_tensors='pt').pixel_values
pixel_values = pixel_values.to(torch.bfloat16).cuda()
input_ids = tokenizer(texts, return_tensors='pt', max_length=80,
                      truncation=True, padding='max_length').input_ids.cuda()

# InternVL-C
logits_per_image, logits_per_text = model(
    image=pixel_values, text=input_ids, mode='InternVL-C')
probs = logits_per_image.softmax(dim=-1)
# tensor([[9.9609e-01, 5.2185e-03, 6.0070e-08],
#         [2.2949e-02, 9.7656e-01, 5.9903e-06],
#         [3.2932e-06, 7.4863e-05, 1.0000e+00]], device='cuda:0',
#        dtype=torch.bfloat16, grad_fn=<SoftmaxBackward0>)

# InternVL-G
logits_per_image, logits_per_text = model(
    image=pixel_values, text=input_ids, mode='InternVL-G')
probs = logits_per_image.softmax(dim=-1)
# tensor([[9.9609e-01, 3.1738e-03, 3.6322e-08],
#         [8.6060e-03, 9.9219e-01, 2.8759e-06],
#         [1.7583e-06, 3.1233e-05, 1.0000e+00]], device='cuda:0',
#        dtype=torch.bfloat16, grad_fn=<SoftmaxBackward0>)

# please set add_eos_token to False for generation
tokenizer.add_eos_token = False
image = Image.open('./examples/image1.jpg').convert('RGB')
pixel_values = image_processor(images=image, return_tensors='pt').pixel_values
pixel_values = pixel_values.to(torch.bfloat16).cuda()

tokenized = tokenizer("English caption:", return_tensors='pt')
pred = model.generate(
    pixel_values=pixel_values,
    input_ids=tokenized.input_ids.cuda(),
    attention_mask=tokenized.attention_mask.cuda(),
    num_beams=5,
    min_new_tokens=8,
)
caption = tokenizer.decode(pred[0].cpu(), skip_special_tokens=True).strip()
# English caption: a red panda sitting on top of a wooden platform
```

</details>

<details>
  <summary>使用 InternVL 2.5 进行多模态对话 (点击展开)</summary>

这里我们以较小的 `OpenGVLab/InternVL2_5-8B` 为例：

```python
import numpy as np
import torch
import torchvision.transforms as T
from decord import VideoReader, cpu
from PIL import Image
from torchvision.transforms.functional import InterpolationMode
from transformers import AutoModel, AutoTokenizer

IMAGENET_MEAN = (0.485, 0.456, 0.406)
IMAGENET_STD = (0.229, 0.224, 0.225)

def build_transform(input_size):
    MEAN, STD = IMAGENET_MEAN, IMAGENET_STD
    transform = T.Compose([
        T.Lambda(lambda img: img.convert('RGB') if img.mode != 'RGB' else img),
        T.Resize((input_size, input_size), interpolation=InterpolationMode.BICUBIC),
        T.ToTensor(),
        T.Normalize(mean=MEAN, std=STD)
    ])
    return transform

def find_closest_aspect_ratio(aspect_ratio, target_ratios, width, height, image_size):
    best_ratio_diff = float('inf')
    best_ratio = (1, 1)
    area = width * height
    for ratio in target_ratios:
        target_aspect_ratio = ratio[0] / ratio[1]
        ratio_diff = abs(aspect_ratio - target_aspect_ratio)
        if ratio_diff < best_ratio_diff:
            best_ratio_diff = ratio_diff
            best_ratio = ratio
        elif ratio_diff == best_ratio_diff:
            if area > 0.5 * image_size * image_size * ratio[0] * ratio[1]:
                best_ratio = ratio
    return best_ratio

def dynamic_preprocess(image, min_num=1, max_num=12, image_size=448, use_thumbnail=False):
    orig_width, orig_height = image.size
    aspect_ratio = orig_width / orig_height

    # calculate the existing image aspect ratio
    target_ratios = set(
        (i, j) for n in range(min_num, max_num + 1) for i in range(1, n + 1) for j in range(1, n + 1) if
        i * j <= max_num and i * j >= min_num)
    target_ratios = sorted(target_ratios, key=lambda x: x[0] * x[1])

    # find the closest aspect ratio to the target
    target_aspect_ratio = find_closest_aspect_ratio(
        aspect_ratio, target_ratios, orig_width, orig_height, image_size)

    # calculate the target width and height
    target_width = image_size * target_aspect_ratio[0]
    target_height = image_size * target_aspect_ratio[1]
    blocks = target_aspect_ratio[0] * target_aspect_ratio[1]

    # resize the image
    resized_img = image.resize((target_width, target_height))
    processed_images = []
    for i in range(blocks):
        box = (
            (i % (target_width // image_size)) * image_size,
            (i // (target_width // image_size)) * image_size,
            ((i % (target_width // image_size)) + 1) * image_size,
            ((i // (target_width // image_size)) + 1) * image_size
        )
        # split the image
        split_img = resized_img.crop(box)
        processed_images.append(split_img)
    assert len(processed_images) == blocks
    if use_thumbnail and len(processed_images) != 1:
        thumbnail_img = image.resize((image_size, image_size))
        processed_images.append(thumbnail_img)
    return processed_images

def load_image(image_file, input_size=448, max_num=12):
    image = Image.open(image_file).convert('RGB')
    transform = build_transform(input_size=input_size)
    images = dynamic_preprocess(image, image_size=input_size, use_thumbnail=True, max_num=max_num)
    pixel_values = [transform(image) for image in images]
    pixel_values = torch.stack(pixel_values)
    return pixel_values

# If you have an 80G A100 GPU, you can put the entire model on a single GPU.
# Otherwise, you need to load a model using multiple GPUs, please refer to the `Multiple GPUs` section.
path = 'OpenGVLab/InternVL2_5-8B'
model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True).eval().cuda()
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False)

# set the max number of tiles in `max_num`
pixel_values = load_image('./examples/image1.jpg', max_num=12).to(torch.bfloat16).cuda()
generation_config = dict(max_new_tokens=1024, do_sample=False)

# pure-text conversation (纯文本对话)
question = 'Hello, who are you?'
response, history = model.chat(tokenizer, None, question, generation_config, history=None, return_history=True)
print(f'User: {question}\nAssistant: {response}')

question = 'Can you tell me a story?'
response, history = model.chat(tokenizer, None, question, generation_config, history=history, return_history=True)
print(f'User: {question}\nAssistant: {response}')

# single-image single-round conversation (单图单轮对话)
question = '<image>\nPlease describe the image shortly.'
response = model.chat(tokenizer, pixel_values, question, generation_config)
print(f'User: {question}\nAssistant: {response}')

# single-image multi-round conversation (单图多轮对话)
question = '<image>\nPlease describe the image in detail.'
response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True)
print(f'User: {question}\nAssistant: {response}')

question = 'Please write a poem according to the image.'
response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=history, return_history=True)
print(f'User: {question}\nAssistant: {response}')

# multi-image multi-round conversation, combined images (多图多轮对话，拼接图像)
pixel_values1 = load_image('./examples/image1.jpg', max_num=12).to(torch.bfloat16).cuda()
pixel_values2 = load_image('./examples/image2.jpg', max_num=12).to(torch.bfloat16).cuda()
pixel_values = torch.cat((pixel_values1, pixel_values2), dim=0)

question = '<image>\nDescribe the two images in detail.'
response, history = model.chat(tokenizer, pixel_values, question, generation_config,
                               history=None, return_history=True)
print(f'User: {question}\nAssistant: {response}')

question = 'What are the similarities and differences between these two images.'
response, history = model.chat(tokenizer, pixel_values, question, generation_config,
                               history=history, return_history=True)
print(f'User: {question}\nAssistant: {response}')

# multi-image multi-round conversation, separate images (多图多轮对话，独立图像)
pixel_values1 = load_image('./examples/image1.jpg', max_num=12).to(torch.bfloat16).cuda()
pixel_values2 = load_image('./examples/image2.jpg', max_num=12).to(torch.bfloat16).cuda()
pixel_values = torch.cat((pixel_values1, pixel_values2), dim=0)
num_patches_list = [pixel_values1.size(0), pixel_values2.size(0)]

question = 'Image-1: <image>\nImage-2: <image>\nDescribe the two images in detail.'
response, history = model.chat(tokenizer, pixel_values, question, generation_config,
                               num_patches_list=num_patches_list,
                               history=None, return_history=True)
print(f'User: {question}\nAssistant: {response}')

question = 'What are the similarities and differences between these two images.'
response, history = model.chat(tokenizer, pixel_values, question, generation_config,
                               num_patches_list=num_patches_list,
                               history=history, return_history=True)
print(f'User: {question}\nAssistant: {response}')

# batch inference, single image per sample (单图批处理)
pixel_values1 = load_image('./examples/image1.jpg', max_num=12).to(torch.bfloat16).cuda()
pixel_values2 = load_image('./examples/image2.jpg', max_num=12).to(torch.bfloat16).cuda()
num_patches_list = [pixel_values1.size(0), pixel_values2.size(0)]
pixel_values = torch.cat((pixel_values1, pixel_values2), dim=0)

questions = ['<image>\nDescribe the image in detail.'] * len(num_patches_list)
responses = model.batch_chat(tokenizer, pixel_values,
                             num_patches_list=num_patches_list,
                             questions=questions,
                             generation_config=generation_config)
for question, response in zip(questions, responses):
    print(f'User: {question}\nAssistant: {response}')

# video multi-round conversation (视频多轮对话)
def get_index(bound, fps, max_frame, first_idx=0, num_segments=32):
    if bound:
        start, end = bound[0], bound[1]
    else:
        start, end = -100000, 100000
    start_idx = max(first_idx, round(start * fps))
    end_idx = min(round(end * fps), max_frame)
    seg_size = float(end_idx - start_idx) / num_segments
    frame_indices = np.array([
        int(start_idx + (seg_size / 2) + np.round(seg_size * idx))
        for idx in range(num_segments)
    ])
    return frame_indices

def load_video(video_path, bound=None, input_size=448, max_num=1, num_segments=32):
    vr = VideoReader(video_path, ctx=cpu(0), num_threads=1)
    max_frame = len(vr) - 1
    fps = float(vr.get_avg_fps())

    pixel_values_list, num_patches_list = [], []
    transform = build_transform(input_size=input_size)
    frame_indices = get_index(bound, fps, max_frame, first_idx=0, num_segments=num_segments)
    for frame_index in frame_indices:
        img = Image.fromarray(vr[frame_index].asnumpy()).convert('RGB')
        img = dynamic_preprocess(img, image_size=input_size, use_thumbnail=True, max_num=max_num)
        pixel_values = [transform(tile) for tile in img]
        pixel_values = torch.stack(pixel_values)
        num_patches_list.append(pixel_values.shape[0])
        pixel_values_list.append(pixel_values)
    pixel_values = torch.cat(pixel_values_list)
    return pixel_values, num_patches_list

video_path = './examples/red-panda.mp4'
pixel_values, num_patches_list = load_video(video_path, num_segments=8, max_num=1)
pixel_values = pixel_values.to(torch.bfloat16).cuda()
video_prefix = ''.join([f'Frame-{i+1}: <image>\n' for i in range(len(num_patches_list))])
question = video_prefix + 'What is the red panda doing?'
# Frame1: <image>\nFrame2: <image>\n...\nFrame8: <image>\n{question}
response, history = model.chat(tokenizer, pixel_values, question, generation_config,
                               num_patches_list=num_patches_list, history=None, return_history=True)
print(f'User: {question}\nAssistant: {response}')

question = 'Describe this video in detail.'
response, history = model.chat(tokenizer, pixel_values, question, generation_config,
                               num_patches_list=num_patches_list, history=history, return_history=True)
print(f'User: {question}\nAssistant: {response}')
```

</details>

## 许可证

本项目以 [MIT 许可证](LICENSE) 发布。项目中的部分代码和模型来自其它来源，受其原始许可证的约束。

## 引用

如果您在研究中发现本项目有用，请考虑引用：

```BibTeX
@article{chen2024expanding,
  title={Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling},
  author={Chen, Zhe and Wang, Weiyun and Cao, Yue and Liu, Yangzhou and Gao, Zhangwei and Cui, Erfei and Zhu, Jinguo and Ye, Shenglong and Tian, Hao and Liu, Zhaoyang and others},
  journal={arXiv preprint arXiv:2412.05271},
  year={2024}
}
@article{wang2024mpo,
  title={Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization},
  author={Wang, Weiyun and Chen, Zhe and Wang, Wenhai and Cao, Yue and Liu, Yangzhou and Gao, Zhangwei and Zhu, Jinguo and Zhu, Xizhou and Lu, Lewei and Qiao, Yu and Dai, Jifeng},
  journal={arXiv preprint arXiv:2411.10442},
  year={2024}
}
@article{gao2024mini,
  title={Mini-InternVL: a flexible-transfer pocket multi-modal model with 5\% parameters and 90\% performance},
  author={Gao, Zhangwei and Chen, Zhe and Cui, Erfei and Ren, Yiming and Wang, Weiyun and Zhu, Jinguo and Tian, Hao and Ye, Shenglong and He, Junjun and Zhu, Xizhou and others},
  journal={Visual Intelligence},
  volume={2},
  number={1},
  pages={1--17},
  year={2024},
  publisher={Springer}
}
@article{chen2024far,
  title={How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites},
  author={Chen, Zhe and Wang, Weiyun and Tian, Hao and Ye, Shenglong and Gao, Zhangwei and Cui, Erfei and Tong, Wenwen and Hu, Kongzhi and Luo, Jiapeng and Ma, Zheng and others},
  journal={Science China Information Sciences},
  volume={67},
  number={12},
  pages={220101},
  year={2024},
  publisher={Springer}
}
@inproceedings{chen2024internvl,
  title={Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks},
  author={Chen, Zhe and Wu, Jiannan and Wang, Wenhai and Su, Weijie and Chen, Guo and Xing, Sen and Zhong, Muyan and Zhang, Qinglong and Zhu, Xizhou and Lu, Lewei and others},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={24185--24198},
  year={2024}
}
```

## 致谢

InternVL 的代码构建参考了以下的项目: [OpenAI CLIP](https://github.com/openai/CLIP)、[Open CLIP](https://github.com/mlfoundations/open_clip)、[CLIP Benchmark](https://github.com/LAION-AI/CLIP_benchmark)、[EVA](https://github.com/baaivision/EVA/tree/master)、[InternImage](https://github.com/OpenGVLab/InternImage)、[ViT-Adapter](https://github.com/czczup/ViT-Adapter)、[MMSegmentation](https://github.com/open-mmlab/mmsegmentation)、[Transformers](https://github.com/huggingface/transformers)、[DINOv2](https://github.com/facebookresearch/dinov2)、[BLIP-2](https://github.com/salesforce/LAVIS/tree/main/projects/blip2)、[Qwen-VL](https://github.com/QwenLM/Qwen-VL/tree/master/eval_mm)和 [LLaVA-1.5](https://github.com/haotian-liu/LLaVA)，感谢这些杰出的工作。

______________________________________________________________________

扫描下方二维码，加入我们的项目微信群。

<p align="center"><img width="300" alt="image" src="https://github.com/user-attachments/assets/f776df09-ebba-4fd5-80c2-fec4ff1518be"></p>


================================================
FILE: classification/README.md
================================================
# InternViT-6B for Image Classification

This folder contains the implementation of the InternViT-6B for image classification, which corresponds to Section 4.2.1 of our [InternVL 1.0 paper](https://arxiv.org/pdf/2312.14238).
The codebase for this part is derived from [InternImage](https://github.com/OpenGVLab/InternImage), with some code references to [EVA](https://github.com/baaivision/EVA/tree/master) and [DINOv2](https://github.com/facebookresearch/dinov2). Thanks for their great work.

In this part, we validate the visual perception capabilities of InternViT-6B, the most core component of InternVL 1.0.
We evaluate the quality of visual representation produced by InternViT-6B using the ImageNet-1K dataset. Following common practices, we adopt the linear probing evaluation, i.e. training a linear classifier while keeping the backbone frozen. In addition to the ImageNet-1K validation set,
we also report performance metrics on several ImageNet variants, to benchmark the domain generalization capability.

InternViT-6B follows the structure of vanilla ViT, and its hyperparameters are listed in the table below.

<img width="558" alt="image" src="https://github.com/OpenGVLab/InternVL/assets/23737120/e6bb0151-ab2f-4436-982f-6c68c5a69bc4">

## 🛠️ Installation

Follow the [installation guide](../INSTALLATION.md) to perform installations.

## 📦 Data Preparation

> Please prepare the dataset according to your needs.

- `ImageNet-1K`: We use the standard ImageNet dataset, you can download it from [http://image-net.org/](http://image-net.org/).

- `ImageNet-A`: Download it from [https://people.eecs.berkeley.edu/~hendrycks/imagenet-a.tar](https://people.eecs.berkeley.edu/~hendrycks/imagenet-a.tar).

- `ImageNet-R`: Download it from [https://people.eecs.berkeley.edu/~hendrycks/imagenet-r.tar](https://people.eecs.berkeley.edu/~hendrycks/imagenet-r.tar).

- `ImageNetV2`: Download it from [https://imagenetv2public.s3-us-west-2.amazonaws.com/imagenetv2-matched-frequency.tar.gz](https://imagenetv2public.s3-us-west-2.amazonaws.com/imagenetv2-matched-frequency.tar.gz).

- `ImageNet-Sketch`: Download it using `gdown`.

  ```shell
  # GDown is needed to download the dataset.
  # Please install it via `pip install gdown`
  gdown --id 1Mj0i5HBthqH1p_yeXzsg22gZduvgoNeA
  ```

First, please prepare the `ImageNet-1K`, `ImageNet-A`, `ImageNet-R`, `ImageNetV2`, and `ImageNet-Sketch` datasets following the directory structure outlined below.

```bash
$ tree data
data
├── imagenet-1k
│         ├── train
          │    ├── n01498041
          │    └── ...
│         └── val
│              ├── ILSVRC2012_val_00000001.JPEG
│              └── ...
├── imagenet-a
│         ├── n01498041
│         └── ...
├── imagenet-r
│         ├── n01443537
│         └── ...
├── imagenet-sketch
│         ├── n01440764
│         └── ...
└── imagenetv2
    └── ImageNetV2-matched-frequency
```

Then, unzip the `train.txt.zip` and `val.txt.zip` in `meta_data/`.

```shell
cd meta_data/
unzip train.txt.zip
unzip val.txt.zip
```

## 📦 Model Preparation

| model name                   | type    | download                                                                                       |  size   |
| ---------------------------- | ------- | ---------------------------------------------------------------------------------------------- | :-----: |
| intern_vit_6b_224px.pth      | pytorch | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL/blob/main/intern_vit_6b_224px.pth)      |  12 GB  |
| intern_vit_6b_224px_head.pth | pytorch | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL/blob/main/intern_vit_6b_224px_head.pth) | 25.7 MB |

Please download the above model weights and place them in the `pretrained/` folder.

```sh
cd pretrained
wget https://huggingface.co/OpenGVLab/InternVL/resolve/main/intern_vit_6b_224px.pth
wget https://huggingface.co/OpenGVLab/InternVL/resolve/main/intern_vit_6b_224px_head.pth
```

The directory structure is:

```sh
pretrained
├── intern_vit_6b_224px_head.pth
└── intern_vit_6b_224px.pth
```

## 🔍 Linear Probing on ImageNet-1K

> **Warning**: Please install `apex` before training (see [installation guide](../INSTALLATION.md#additional-instructions) for details).

To train a linear classifier for `InternViT-6B` on ImageNet with 8 GPUs, run:

```bash
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py --cfg configs/intern_vit_6b_1k_224.yaml
# or manage jobs with slurm
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224.yaml --launcher slurm
```

Note, it is normal for the following information to appear during training and it can be safely ignored:

> \_IncompatibleKeys(missing_keys=\[\], unexpected_keys=\['clip_projector.norm1_q.weight', 'clip_projector.norm1_q.bias', 'clip_projector.norm1_k.weight', 'clip_projector.norm1_k.bias', 'clip_projector.norm1_v.weight', 'clip_projector.norm1_v.bias', 'clip_projector.cross_attn.q_bias', 'clip_projector.cross_attn.k_bias', 'clip_projector.cross_attn.v_bias', 'clip_projector.cross_attn.q.weight', 'clip_projector.cross_attn.k.weight', 'clip_projector.cross_attn.v.weight', 'clip_projector.cross_attn.proj.weight', 'clip_projector.cross_attn.proj.bias'\])

## 📊 Evaluation

> **Warning**: Please install `apex` before evaluation (see [installation guide](../INSTALLATION.md#additional-instructions) for details).

| model name                                                     | IN-1K | IN-ReaL | IN-V2 | IN-A | IN-R | IN-Sketch |                                                                       download                                                                       |
| -------------------------------------------------------------- | :---: | :-----: | :---: | :--: | :--: | :-------: | :--------------------------------------------------------------------------------------------------------------------------------------------------: |
| [intern_vit_6b_1k_224.yaml](configs/intern_vit_6b_1k_224.yaml) | 88.2  |  90.4   | 79.9  | 77.5 | 89.8 |   69.1    | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/intern_vit_6b_224px_head.pth) \| [log](./work_dirs/intern_vit_6b_1k_224/log_rank0.txt) |

<details>
  <summary>Evaluate InternViT-6B on <b>ImageNet-1K val</b> with 8 GPUs (click to expand).</summary>

```bash
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py --eval \
    --cfg configs/intern_vit_6b_1k_224.yaml --resume pretrained/intern_vit_6b_224px_head.pth
# or manage jobs with slurm
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224.yaml --eval \
    --resume pretrained/intern_vit_6b_224px_head.pth --launcher slurm
```

Expected results:

```
 * Acc@1 88.230 Acc@5 98.474
Accuracy of the network on the 50000 test images: 88.2%
```

</details>

<details>
  <summary>Evaluate InternViT-6B on <b>ImageNet-ReaL</b> with 1 GPU (click to expand).</summary>

**Note: ImageNet-ReaL now only supports single-GPU testing.**

```bash
python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345 main.py --eval \
    --cfg configs/intern_vit_6b_1k_224_test_imagenet_real.yaml --resume pretrained/intern_vit_6b_224px_head.pth
# or manage jobs with slurm
GPUS=1 GPUS_PER_NODE=1 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224_test_imagenet_real.yaml --eval \
    --resume pretrained/intern_vit_6b_224px_head.pth --launcher slurm
```

Expected results:

```
* ReaL Acc@1 90.437 Acc@5 98.567 loss 0.605
ReaL Accuracy of the network on the 50000 test images: 90.4%
```

</details>

<details>
  <summary>Evaluate InternViT-6B on <b>ImageNetV2</b> with 8 GPUs (click to expand).</summary>

```bash
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py --eval \
    --cfg configs/intern_vit_6b_1k_224_test_imagenetv2.yaml --resume pretrained/intern_vit_6b_224px_head.pth
# or manage jobs with slurm
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224_test_imagenetv2.yaml --eval \
    --resume pretrained/intern_vit_6b_224px_head.pth --launcher slurm
```

Expected results:

```
 * Acc@1 79.940 Acc@5 95.340
Accuracy of the network on the 10000 test images: 79.9%
```

</details>

<details>
  <summary>Evaluate InternViT-6B on <b>ImageNet-A</b> with 8 GPUs (click to expand).</summary>

```bash
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py --eval \
    --cfg configs/intern_vit_6b_1k_224_test_imagenet_a.yaml --resume pretrained/intern_vit_6b_224px_head.pth
# or manage jobs with slurm
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224_test_imagenet_a.yaml --eval \
    --resume pretrained/intern_vit_6b_224px_head.pth --launcher slurm
```

Expected results:

```
 * Acc@1 77.479 Acc@5 92.737
Accuracy of the network on the 7500 test images: 77.5%
```

</details>

<details>
  <summary>Evaluate InternViT-6B on <b>ImageNet-R</b> with 8 GPUs (click to expand).</summary>

```bash
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py --eval \
    --cfg configs/intern_vit_6b_1k_224_test_imagenet_r.yaml --resume pretrained/intern_vit_6b_224px_head.pth
# or manage jobs with slurm
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224_test_imagenet_r.yaml --eval \
    --resume pretrained/intern_vit_6b_224px_head.pth --launcher slurm
```

Expected results:

```
 * Acc@1 89.777 Acc@5 97.023
Accuracy of the network on the 30000 test images: 89.8%
```

</details>

<details>
  <summary>Evaluate InternViT-6B on <b>ImageNet-Sketch</b> with 8 GPUs (click to expand).</summary>

```bash
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py --eval \
    --cfg configs/intern_vit_6b_1k_224_test_imagenet_sketch.yaml --resume pretrained/intern_vit_6b_224px_head.pth
# or manage jobs with slurm
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224_test_imagenet_sketch.yaml --eval \
    --resume pretrained/intern_vit_6b_224px_head.pth --launcher slurm
```

Expected results:

```
 * Acc@1 69.117 Acc@5 88.341
Accuracy of the network on the 50889 test images: 69.1%
```

</details>


================================================
FILE: classification/config.py
================================================
# --------------------------------------------------------
# InternVL
# Copyright (c) 2022 OpenGVLab
# Licensed under The MIT License [see LICENSE for details]
# --------------------------------------------------------

import os

import yaml
from yacs.config import CfgNode as CN

_C = CN()

# Base config files
_C.BASE = ['']

# -----------------------------------------------------------------------------
# Data settings
# -----------------------------------------------------------------------------
_C.DATA = CN()
# Batch size for a single GPU, could be overwritten by command line argument
_C.DATA.BATCH_SIZE = 128
# Path to dataset, could be overwritten by command line argument
_C.DATA.DATA_PATH = ''
# Dataset name
_C.DATA.DATASET = 'imagenet'
# Input image size
_C.DATA.IMG_SIZE = 224
# Interpolation to resize image (random, bilinear, bicubic)
_C.DATA.INTERPOLATION = 'bicubic'
# Use zipped dataset instead of folder dataset
# could be overwritten by command line argument
_C.DATA.ZIP_MODE = False
# Cache Data in Memory, could be overwritten by command line argument
_C.DATA.CACHE_MODE = 'part'
# Pin CPU memory in DataLoader for more efficient (sometimes) transfer to GPU.
_C.DATA.PIN_MEMORY = True
# Number of data loading threads
_C.DATA.NUM_WORKERS = 8
# Load data to memory
_C.DATA.IMG_ON_MEMORY = False
# Name of the build_transform function
_C.DATA.TRANSFORM = 'build_transform'

# -----------------------------------------------------------------------------
# Model settings
# -----------------------------------------------------------------------------
_C.MODEL = CN()
# Model type
_C.MODEL.TYPE = 'intern_vit_6b'
# Model name
_C.MODEL.NAME = 'intern_vit_6b'
# Pretrained weight from checkpoint, could be imagenet22k pretrained weight
# could be overwritten by command line argument
_C.MODEL.PRETRAINED = ''
# Checkpoint to resume, could be overwritten by command line argument
_C.MODEL.RESUME = ''
# Number of classes, overwritten in data preparation
_C.MODEL.NUM_CLASSES = 1000
# Dropout rate
_C.MODEL.DROP_RATE = 0.0
# Drop path rate
_C.MODEL.DROP_PATH_RATE = 0.1
# Drop path type
_C.MODEL.DROP_PATH_TYPE = 'linear'  # linear, uniform
# Label Smoothing
_C.MODEL.LABEL_SMOOTHING = 0.1

# INTERN_VIT_6B parameters
_C.MODEL.INTERN_VIT_6B = CN()
_C.MODEL.INTERN_VIT_6B.PATCH_SIZE = 14
_C.MODEL.INTERN_VIT_6B.PRETRAIN_SIZE = 224
_C.MODEL.INTERN_VIT_6B.QKV_BIAS = False
_C.MODEL.INTERN_VIT_6B.EMBED_DIM = 3200
_C.MODEL.INTERN_VIT_6B.NUM_HEADS = 25
_C.MODEL.INTERN_VIT_6B.MLP_RATIO = 4
_C.MODEL.INTERN_VIT_6B.INIT_VALUES = 0.1
_C.MODEL.INTERN_VIT_6B.QK_NORMALIZATION = True
_C.MODEL.INTERN_VIT_6B.DEPTH = 48
_C.MODEL.INTERN_VIT_6B.USE_FLASH_ATTN = True
_C.MODEL.INTERN_VIT_6B.FREEZE_VIT = True
_C.MODEL.INTERN_VIT_6B.PRETRAINED = None
_C.MODEL.INTERN_VIT_6B.CLS_TARGET = 'cls_patch_concat'
_C.MODEL.INTERN_VIT_6B.NORM_TYPE = 'rms'

# CLIP_VIT parameters
_C.MODEL.CLIP_VIT = CN()
_C.MODEL.CLIP_VIT.PATCH_SIZE = 14
_C.MODEL.CLIP_VIT.PRETRAIN_SIZE = 336
_C.MODEL.CLIP_VIT.EMBED_DIM = 1024
_C.MODEL.CLIP_VIT.NUM_HEADS = 16
_C.MODEL.CLIP_VIT.MLP_RATIO = 4
_C.MODEL.CLIP_VIT.DEPTH = 24
_C.MODEL.CLIP_VIT.FREEZE_VIT = True
_C.MODEL.CLIP_VIT.PRETRAINED = 'openai/clip-vit-large-patch14-336'
_C.MODEL.CLIP_VIT.CLS_TARGET = 'cls_patch_concat'

# -----------------------------------------------------------------------------
# Training settings
# -----------------------------------------------------------------------------
_C.TRAIN = CN()
_C.TRAIN.START_EPOCH = 0
_C.TRAIN.EPOCHS = 300
_C.TRAIN.WARMUP_EPOCHS = 20
_C.TRAIN.WEIGHT_DECAY = 0.05
_C.TRAIN.BASE_LR = 5e-4
_C.TRAIN.WARMUP_LR = 5e-7
_C.TRAIN.MIN_LR = 5e-6
# Clip gradient norm
_C.TRAIN.CLIP_GRAD = 5.0
# Auto resume from latest checkpoint
_C.TRAIN.AUTO_RESUME = True
# Gradient accumulation steps
# could be overwritten by command line argument
_C.TRAIN.ACCUMULATION_STEPS = 0
# Whether to use gradient checkpointing to save memory
# could be overwritten by command line argument
_C.TRAIN.USE_CHECKPOINT = False

# LR scheduler
_C.TRAIN.LR_SCHEDULER = CN()
_C.TRAIN.LR_SCHEDULER.NAME = 'cosine'
# Epoch interval to decay LR, used in StepLRScheduler
_C.TRAIN.LR_SCHEDULER.DECAY_EPOCHS = 30
# LR decay rate, used in StepLRScheduler
_C.TRAIN.LR_SCHEDULER.DECAY_RATE = 0.1

# Optimizer
_C.TRAIN.OPTIMIZER = CN()
_C.TRAIN.OPTIMIZER.NAME = 'adamw'
# Optimizer Epsilon
_C.TRAIN.OPTIMIZER.EPS = 1e-8
# Optimizer Betas
_C.TRAIN.OPTIMIZER.BETAS = (0.9, 0.999)
# SGD momentum
_C.TRAIN.OPTIMIZER.MOMENTUM = 0.9
# ZeRO
_C.TRAIN.OPTIMIZER.USE_ZERO = False
# freeze backbone
_C.TRAIN.OPTIMIZER.FREEZE_BACKBONE = None
# dcn lr
_C.TRAIN.OPTIMIZER.DCN_LR_MUL = None

# EMA
_C.TRAIN.EMA = CN()
_C.TRAIN.EMA.ENABLE = False
_C.TRAIN.EMA.DECAY = 0.9998

# LR_LAYER_DECAY
_C.TRAIN.LR_LAYER_DECAY = False
_C.TRAIN.LR_LAYER_DECAY_RATIO = 0.875

# FT head init weights
_C.TRAIN.RAND_INIT_FT_HEAD = False

# -----------------------------------------------------------------------------
# Augmentation settings
# -----------------------------------------------------------------------------
_C.AUG = CN()
# Color jitter factor
_C.AUG.COLOR_JITTER = 0.4
# Use AutoAugment policy. "v0" or "original"
_C.AUG.AUTO_AUGMENT = 'rand-m9-mstd0.5-inc1'
# Random erase prob
_C.AUG.REPROB = 0.25
# Random erase mode
_C.AUG.REMODE = 'pixel'
# Random erase count
_C.AUG.RECOUNT = 1
# Mixup alpha, mixup enabled if > 0
_C.AUG.MIXUP = 0.8
# Cutmix alpha, cutmix enabled if > 0
_C.AUG.CUTMIX = 1.0
# Cutmix min/max ratio, overrides alpha and enables cutmix if set
_C.AUG.CUTMIX_MINMAX = None
# Probability of performing mixup or cutmix when either/both is enabled
_C.AUG.MIXUP_PROB = 1.0
# Probability of switching to cutmix when both mixup and cutmix enabled
_C.AUG.MIXUP_SWITCH_PROB = 0.5
# How to apply mixup/cutmix params. Per "batch", "pair", or "elem"
_C.AUG.MIXUP_MODE = 'batch'
# RandomResizedCrop
_C.AUG.RANDOM_RESIZED_CROP = False
_C.AUG.MEAN = (0.485, 0.456, 0.406)
_C.AUG.STD = (0.229, 0.224, 0.225)

# -----------------------------------------------------------------------------
# Testing settings
# -----------------------------------------------------------------------------
_C.TEST = CN()
# Whether to use center crop when testing
_C.TEST.CROP = True

# Whether to use SequentialSampler as validation sampler
_C.TEST.SEQUENTIAL = False

# -----------------------------------------------------------------------------
# Misc
# -----------------------------------------------------------------------------
# Mixed precision opt level, if O0, no amp is used ('O0', 'O1', 'O2')
# overwritten by command line argument
_C.AMP_OPT_LEVEL = ''
# Path to output folder, overwritten by command line argument
_C.OUTPUT = ''
# Tag of experiment, overwritten by command line argument
_C.TAG = 'default'
# Frequency to save checkpoint
_C.SAVE_FREQ = 1
# Frequency to logging info
_C.PRINT_FREQ = 10
# eval freq
_C.EVAL_FREQ = 1
# Fixed random seed
_C.SEED = 0
# Perform evaluation only, overwritten by command line argument
_C.EVAL_MODE = False
# Test throughput only, overwritten by command line argument
_C.THROUGHPUT_MODE = False
# local rank for DistributedDataParallel, given by command line argument
_C.LOCAL_RANK = 0
_C.EVAL_22K_TO_1K = False

_C.AMP_TYPE = 'float16'


def _update_config_from_file(config, cfg_file):
    config.defrost()
    with open(cfg_file, 'r') as f:
        yaml_cfg = yaml.load(f, Loader=yaml.FullLoader)

    for cfg in yaml_cfg.setdefault('BASE', ['']):
        if cfg:
            _update_config_from_file(
                config, os.path.join(os.path.dirname(cfg_file), cfg))
    print('=> merge config from {}'.format(cfg_file))
    config.merge_from_file(cfg_file)
    config.freeze()


def update_config(config, args):
    _update_config_from_file(config, args.cfg)

    config.defrost()
    if hasattr(args, 'opts') and args.opts:
        config.merge_from_list(args.opts)

    # merge from specific arguments
    if hasattr(args, 'batch_size') and args.batch_size:
        config.DATA.BATCH_SIZE = args.batch_size
    if hasattr(args, 'dataset') and args.dataset:
        config.DATA.DATASET = args.dataset
    if hasattr(args, 'data_path') and args.data_path:
        config.DATA.DATA_PATH = args.data_path
    if hasattr(args, 'zip') and args.zip:
        config.DATA.ZIP_MODE = True
    if hasattr(args, 'cache_mode') and args.cache_mode:
        config.DATA.CACHE_MODE = args.cache_mode
    if hasattr(args, 'pretrained') and args.pretrained:
        config.MODEL.PRETRAINED = args.pretrained
    if hasattr(args, 'resume') and args.resume:
        config.MODEL.RESUME = args.resume
    if hasattr(args, 'accumulation_steps') and args.accumulation_steps:
        config.TRAIN.ACCUMULATION_STEPS = args.accumulation_steps
    if hasattr(args, 'use_checkpoint') and args.use_checkpoint:
        config.TRAIN.USE_CHECKPOINT = True
    if hasattr(args, 'amp_opt_level') and args.amp_opt_level:
        config.AMP_OPT_LEVEL = args.amp_opt_level
    if hasattr(args, 'output') and args.output:
        config.OUTPUT = args.output
    if hasattr(args, 'tag') and args.tag:
        config.TAG = args.tag
    if hasattr(args, 'eval') and args.eval:
        config.EVAL_MODE = True
    if hasattr(args, 'throughput') and args.throughput:
        config.THROUGHPUT_MODE = True
    if hasattr(args, 'save_ckpt_num') and args.save_ckpt_num:
        config.SAVE_CKPT_NUM = args.save_ckpt_num
    if hasattr(args, 'use_zero') and args.use_zero:
        config.TRAIN.OPTIMIZER.USE_ZERO = True
    # set local rank for distributed training
    if hasattr(args, 'local_rank') and args.local_rank:
        config.LOCAL_RANK = args.local_rank

    # output folder
    config.MODEL.NAME = args.cfg.split('/')[-1].replace('.yaml', '')
    config.OUTPUT = os.path.join(config.OUTPUT, config.MODEL.NAME)
    # config.OUTPUT = os.path.join(config.OUTPUT, config.MODEL.NAME, config.TAG)

    config.freeze()


def get_config(args):
    """Get a yacs CfgNode object with default values."""
    # Return a clone so that the defaults will not be altered
    # This is for the "local variable" use pattern
    config = _C.clone()
    update_config(config, args)

    return config


================================================
FILE: classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_224px_in1k_224_64gpu.yaml
================================================
DATA:
  IMG_ON_MEMORY: False
  BATCH_SIZE: 16 # single GPU batch size
  TRANSFORM: 'build_transform_for_linear_probe'
  DATA_PATH: './data/imagenet-1k'
MODEL:
  TYPE: intern_vit_6b
  DROP_PATH_RATE: 0.0
  INTERN_VIT_6B:
    FREEZE_VIT: True
    PATCH_SIZE: 14
    PRETRAIN_SIZE: 224
    QKV_BIAS: False
    EMBED_DIM: 3200
    NUM_HEADS: 25
    MLP_RATIO: 4
    INIT_VALUES: 0.1
    QK_NORMALIZATION: True
    DEPTH: 48
    USE_FLASH_ATTN: True
    PRETRAINED: "./pretrained/intern_vit_6b_224px.pth"
    CLS_TARGET: 'attention_pooling'
TRAIN:
  EMA:
    ENABLE: True
    DECAY: 0.998
  EPOCHS: 10
  WARMUP_EPOCHS: 1
  WEIGHT_DECAY: 0.0
  BASE_LR: 0.1 # 512
  WARMUP_LR: .0
  MIN_LR: .0
  LR_LAYER_DECAY: false
  OPTIMIZER:
    NAME: 'sgd'


================================================
FILE: classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_a.yaml
================================================
DATA:
  IMG_ON_MEMORY: False
  BATCH_SIZE: 16 # single GPU batch size
  DATASET: 'imagenet_a'
  TRANSFORM: 'build_transform_for_linear_probe'
  DATA_PATH: './data/imagenet-a'
MODEL:
  TYPE: intern_vit_6b
  DROP_PATH_RATE: 0.0
  INTERN_VIT_6B:
    FREEZE_VIT: True
    PATCH_SIZE: 14
    PRETRAIN_SIZE: 224
    QKV_BIAS: False
    EMBED_DIM: 3200
    NUM_HEADS: 25
    MLP_RATIO: 4
    INIT_VALUES: 0.1
    QK_NORMALIZATION: True
    DEPTH: 48
    USE_FLASH_ATTN: True
    PRETRAINED: "./pretrained/intern_vit_6b_224px.pth"
    CLS_TARGET: 'attention_pooling'
TRAIN:
  EMA:
    ENABLE: True
    DECAY: 0.998
  EPOCHS: 10
  WARMUP_EPOCHS: 1
  WEIGHT_DECAY: 0.0
  BASE_LR: 0.1 # 512
  WARMUP_LR: .0
  MIN_LR: .0
  LR_LAYER_DECAY: false
  OPTIMIZER:
    NAME: 'sgd'


================================================
FILE: classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_r.yaml
================================================
DATA:
  IMG_ON_MEMORY: False
  BATCH_SIZE: 16 # single GPU batch size
  DATASET: 'imagenet_r'
  TRANSFORM: 'build_transform_for_linear_probe'
  DATA_PATH: './data/imagenet-r'
MODEL:
  TYPE: intern_vit_6b
  DROP_PATH_RATE: 0.0
  INTERN_VIT_6B:
    FREEZE_VIT: True
    PATCH_SIZE: 14
    PRETRAIN_SIZE: 224
    QKV_BIAS: False
    EMBED_DIM: 3200
    NUM_HEADS: 25
    MLP_RATIO: 4
    INIT_VALUES: 0.1
    QK_NORMALIZATION: True
    DEPTH: 48
    USE_FLASH_ATTN: True
    PRETRAINED: "./pretrained/intern_vit_6b_224px.pth"
    CLS_TARGET: 'attention_pooling'
TRAIN:
  EMA:
    ENABLE: True
    DECAY: 0.998
  EPOCHS: 10
  WARMUP_EPOCHS: 1
  WEIGHT_DECAY: 0.0
  BASE_LR: 0.1 # 512
  WARMUP_LR: .0
  MIN_LR: .0
  LR_LAYER_DECAY: false
  OPTIMIZER:
    NAME: 'sgd'


================================================
FILE: classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_real.yaml
================================================
DATA:
  IMG_ON_MEMORY: False
  BATCH_SIZE: 16 # single GPU batch size
  DATASET: 'imagenet-real'
  TRANSFORM: 'build_transform_for_linear_probe'
  DATA_PATH: './data/imagenet-1k'
MODEL:
  TYPE: intern_vit_6b
  DROP_PATH_RATE: 0.0
  INTERN_VIT_6B:
    FREEZE_VIT: True
    PATCH_SIZE: 14
    PRETRAIN_SIZE: 224
    QKV_BIAS: False
    EMBED_DIM: 3200
    NUM_HEADS: 25
    MLP_RATIO: 4
    INIT_VALUES: 0.1
    QK_NORMALIZATION: True
    DEPTH: 48
    USE_FLASH_ATTN: True
    PRETRAINED: "./pretrained/intern_vit_6b_224px.pth"
    CLS_TARGET: 'attention_pooling'
TRAIN:
  EMA:
    ENABLE: True
    DECAY: 0.998
  EPOCHS: 10
  WARMUP_EPOCHS: 1
  WEIGHT_DECAY: 0.0
  BASE_LR: 0.1 # 512
  WARMUP_LR: .0
  MIN_LR: .0
  LR_LAYER_DECAY: false
  OPTIMIZER:
    NAME: 'sgd'


================================================
FILE: classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_sketch.yaml
================================================
DATA:
  IMG_ON_MEMORY: False
  BATCH_SIZE: 16 # single GPU batch size
  DATASET: 'imagenet_sketch'
  TRANSFORM: 'build_transform_for_linear_probe'
  DATA_PATH: './data/imagenet-sketch'
MODEL:
  TYPE: intern_vit_6b
  DROP_PATH_RATE: 0.0
  INTERN_VIT_6B:
    FREEZE_VIT: True
    PATCH_SIZE: 14
    PRETRAIN_SIZE: 224
    QKV_BIAS: False
    EMBED_DIM: 3200
    NUM_HEADS: 25
    MLP_RATIO: 4
    INIT_VALUES: 0.1
    QK_NORMALIZATION: True
    DEPTH: 48
    USE_FLASH_ATTN: True
    PRETRAINED: "./pretrained/intern_vit_6b_224px.pth"
    CLS_TARGET: 'attention_pooling'
TRAIN:
  EMA:
    ENABLE: True
    DECAY: 0.998
  EPOCHS: 10
  WARMUP_EPOCHS: 1
  WEIGHT_DECAY: 0.0
  BASE_LR: 0.1 # 512
  WARMUP_LR: .0
  MIN_LR: .0
  LR_LAYER_DECAY: false
  OPTIMIZER:
    NAME: 'sgd'


================================================
FILE: classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenetv2.yaml
================================================
DATA:
  IMG_ON_MEMORY: False
  BATCH_SIZE: 16 # single GPU batch size
  DATASET: 'imagenetv2'
  TRANSFORM: 'build_transform_for_linear_probe'
  DATA_PATH: './data/imagenetv2'
MODEL:
  TYPE: intern_vit_6b
  DROP_PATH_RATE: 0.0
  INTERN_VIT_6B:
    FREEZE_VIT: True
    PATCH_SIZE: 14
    PRETRAIN_SIZE: 224
    QKV_BIAS: False
    EMBED_DIM: 3200
    NUM_HEADS: 25
    MLP_RATIO: 4
    INIT_VALUES: 0.1
    QK_NORMALIZATION: True
    DEPTH: 48
    USE_FLASH_ATTN: True
    PRETRAINED: "./pretrained/intern_vit_6b_224px.pth"
    CLS_TARGET: 'attention_pooling'
TRAIN:
  EMA:
    ENABLE: True
    DECAY: 0.998
  EPOCHS: 10
  WARMUP_EPOCHS: 1
  WEIGHT_DECAY: 0.0
  BASE_LR: 0.1 # 512
  WARMUP_LR: .0
  MIN_LR: .0
  LR_LAYER_DECAY: false
  OPTIMIZER:
    NAME: 'sgd'


================================================
FILE: classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_224px_in1k_224to448_64gpu.yaml
================================================
DATA:
  IMG_ON_MEMORY: False
  BATCH_SIZE: 16 # single GPU batch size
  TRANSFORM: 'build_transform_for_linear_probe'
  DATA_PATH: './data/imagenet-1k'
  IMG_SIZE: 448
MODEL:
  TYPE: intern_vit_6b
  DROP_PATH_RATE: 0.0
  INTERN_VIT_6B:
    FREEZE_VIT: True
    PATCH_SIZE: 14
    PRETRAIN_SIZE: 224
    QKV_BIAS: False
    EMBED_DIM: 3200
    NUM_HEADS: 25
    MLP_RATIO: 4
    INIT_VALUES: 0.1
    QK_NORMALIZATION: True
    DEPTH: 48
    USE_FLASH_ATTN: True
    PRETRAINED: "./pretrained/intern_vit_6b_224px.pth"
    CLS_TARGET: 'attention_pooling'
TRAIN:
  EMA:
    ENABLE: True
    DECAY: 0.998
  EPOCHS: 10
  WARMUP_EPOCHS: 1
  WEIGHT_DECAY: 0.0
  BASE_LR: 0.1 # 512
  WARMUP_LR: .0
  MIN_LR: .0
  LR_LAYER_DECAY: false
  OPTIMIZER:
    NAME: 'sgd'


================================================
FILE: classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_a.yaml
================================================
DATA:
  IMG_ON_MEMORY: False
  BATCH_SIZE: 16 # single GPU batch size
  DATASET: 'imagenet_a'
  TRANSFORM: 'build_transform_for_linear_probe'
  DATA_PATH: './data/imagenet-a'
  IMG_SIZE: 448
MODEL:
  TYPE: intern_vit_6b
  DROP_PATH_RATE: 0.0
  INTERN_VIT_6B:
    FREEZE_VIT: True
    PATCH_SIZE: 14
    PRETRAIN_SIZE: 224
    QKV_BIAS: False
    EMBED_DIM: 3200
    NUM_HEADS: 25
    MLP_RATIO: 4
    INIT_VALUES: 0.1
    QK_NORMALIZATION: True
    DEPTH: 48
    USE_FLASH_ATTN: True
    PRETRAINED: "./pretrained/intern_vit_6b_224px.pth"
    CLS_TARGET: 'attention_pooling'
TRAIN:
  EMA:
    ENABLE: True
    DECAY: 0.998
  EPOCHS: 10
  WARMUP_EPOCHS: 1
  WEIGHT_DECAY: 0.0
  BASE_LR: 0.1 # 512
  WARMUP_LR: .0
  MIN_LR: .0
  LR_LAYER_DECAY: false
  OPTIMIZER:
    NAME: 'sgd'


================================================
FILE: classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_r.yaml
================================================
DATA:
  IMG_ON_MEMORY: False
  BATCH_SIZE: 16 # single GPU batch size
  DATASET: 'imagenet_r'
  TRANSFORM: 'build_transform_for_linear_probe'
  DATA_PATH: './data/imagenet-r'
  IMG_SIZE: 448
MODEL:
  TYPE: intern_vit_6b
  DROP_PATH_RATE: 0.0
  INTERN_VIT_6B:
    FREEZE_VIT: True
    PATCH_SIZE: 14
    PRETRAIN_SIZE: 224
    QKV_BIAS: False
    EMBED_DIM: 3200
    NUM_HEADS: 25
    MLP_RATIO: 4
    INIT_VALUES: 0.1
    QK_NORMALIZATION: True
    DEPTH: 48
    USE_FLASH_ATTN: True
    PRETRAINED: "./pretrained/intern_vit_6b_224px.pth"
    CLS_TARGET: 'attention_pooling'
TRAIN:
  EMA:
    ENABLE: True
    DECAY: 0.998
  EPOCHS: 10
  WARMUP_EPOCHS: 1
  WEIGHT_DECAY: 0.0
  BASE_LR: 0.1 # 512
  WARMUP_LR: .0
  MIN_LR: .0
  LR_LAYER_DECAY: false
  OPTIMIZER:
    NAME: 'sgd'


================================================
FILE: classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_real.yaml
================================================
DATA:
  IMG_ON_MEMORY: False
  BATCH_SIZE: 16 # single GPU batch size
  DATASET: 'imagenet-real'
  TRANSFORM: 'build_transform_for_linear_probe'
  DATA_PATH: './data/imagenet-1k'
  IMG_SIZE: 448
MODEL:
  TYPE: intern_vit_6b
  DROP_PATH_RATE: 0.0
  INTERN_VIT_6B:
    FREEZE_VIT: True
    PATCH_SIZE: 14
    PRETRAIN_SIZE: 224
    QKV_BIAS: False
    EMBED_DIM: 3200
    NUM_HEADS: 25
    MLP_RATIO: 4
    INIT_VALUES: 0.1
    QK_NORMALIZATION: True
    DEPTH: 48
    USE_FLASH_ATTN: True
    PRETRAINED: "./pretrained/intern_vit_6b_224px.pth"
    CLS_TARGET: 'attention_pooling'
TRAIN:
  EMA:
    ENABLE: True
    DECAY: 0.998
  EPOCHS: 10
  WARMUP_EPOCHS: 1
  WEIGHT_DECAY: 0.0
  BASE_LR: 0.1 # 512
  WARMUP_LR: .0
  MIN_LR: .0
  LR_LAYER_DECAY: false
  OPTIMIZER:
    NAME: 'sgd'


================================================
FILE: classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_sketch.yaml
================================================
DATA:
  IMG_ON_MEMORY: False
  BATCH_SIZE: 16 # single GPU batch size
  DATASET: 'imagenet_sketch'
  TRANSFORM: 'build_transform_for_linear_probe'
  DATA_PATH: './data/imagenet-sketch'
  IMG_SIZE: 448
MODEL:
  TYPE: intern_vit_6b
  DROP_PATH_RATE: 0.0
  INTERN_VIT_6B:
    FREEZE_VIT: True
    PATCH_SIZE: 14
    PRETRAIN_SIZE: 224
    QKV_BIAS: False
    EMBED_DIM: 3200
    NUM_HEADS: 25
    MLP_RATIO: 4
    INIT_VALUES: 0.1
    QK_NORMALIZATION: True
    DEPTH: 48
    USE_FLASH_ATTN: True
    PRETRAINED: "./pretrained/intern_vit_6b_224px.pth"
    CLS_TARGET: 'attention_pooling'
TRAIN:
  EMA:
    ENABLE: True
    DECAY: 0.998
  EPOCHS: 10
  WARMUP_EPOCHS: 1
  WEIGHT_DECAY: 0.0
  BASE_LR: 0.1 # 512
  WARMUP_LR: .0
  MIN_LR: .0
  LR_LAYER_DECAY: false
  OPTIMIZER:
    NAME: 'sgd'


================================================
FILE: classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenetv2.yaml
================================================
DATA:
  IMG_ON_MEMORY: False
  BATCH_SIZE: 16 # single GPU batch size
  DATASET: 'imagenetv2'
  TRANSFORM: 'build_transform_for_linear_probe'
  DATA_PATH: './data/imagenetv2'
  IMG_SIZE: 448
MODEL:
  TYPE: intern_vit_6b
  DROP_PATH_RATE: 0.0
  INTERN_VIT_6B:
    FREEZE_VIT: True
    PATCH_SIZE: 14
    PRETRAIN_SIZE: 224
    QKV_BIAS: False
    EMBED_DIM: 3200
    NUM_HEADS: 25
    MLP_RATIO: 4
    INIT_VALUES: 0.1
    QK_NORMALIZATION: True
    DEPTH: 48
    USE_FLASH_ATTN: True
    PRETRAINED: "./pretrained/intern_vit_6b_224px.pth"
    CLS_TARGET: 'attention_pooling'
TRAIN:
  EMA:
    ENABLE: True
    DECAY: 0.998
  EPOCHS: 10
  WARMUP_EPOCHS: 1
  WEIGHT_DECAY: 0.0
  BASE_LR: 0.1 # 512
  WARMUP_LR: .0
  MIN_LR: .0
  LR_LAYER_DECAY: false
  OPTIMIZER:
    NAME: 'sgd'


================================================
FILE: classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu.yaml
================================================
DATA:
  IMG_ON_MEMORY: False
  BATCH_SIZE: 16 # single GPU batch size
  TRANSFORM: 'build_transform_for_linear_probe'
  DATA_PATH: './data/imagenet-1k'
  IMG_SIZE: 448
MODEL:
  TYPE: intern_vit_6b
  DROP_PATH_RATE: 0.0
  INTERN_VIT_6B:
    FREEZE_VIT: True
    PATCH_SIZE: 14
    PRETRAIN_SIZE: 448
    QKV_BIAS: False
    EMBED_DIM: 3200
    NUM_HEADS: 25
    MLP_RATIO: 4
    INIT_VALUES: 0.1
    QK_NORMALIZATION: True
    DEPTH: 45
    USE_FLASH_ATTN: True
    PRETRAINED: "./pretrained/intern_vit_6b_448px_v1_0.pth"
    CLS_TARGET: 'attention_pooling'
TRAIN:
  EMA:
    ENABLE: True
    DECAY: 0.998
  EPOCHS: 10
  WARMUP_EPOCHS: 1
  WEIGHT_DECAY: 0.0
  BASE_LR: 0.1 # 512
  WARMUP_LR: .0
  MIN_LR: .0
  LR_LAYER_DECAY: false
  OPTIMIZER:
    NAME: 'sgd'


================================================
FILE: classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_a.yaml
================================================
DATA:
  IMG_ON_MEMORY: False
  BATCH_SIZE: 16 # single GPU batch size
  DATASET: 'imagenet_a'
  TRANSFORM: 'build_transform_for_linear_probe'
  DATA_PATH: './data/imagenet-a'
  IMG_SIZE: 448
MODEL:
  TYPE: intern_vit_6b
  DROP_PATH_RATE: 0.0
  INTERN_VIT_6B:
    FREEZE_VIT: True
    PATCH_SIZE: 14
    PRETRAIN_SIZE: 448
    QKV_BIAS: False
    EMBED_DIM: 3200
    NUM_HEADS: 25
    MLP_RATIO: 4
    INIT_VALUES: 0.1
    QK_NORMALIZATION: True
    DEPTH: 45
    USE_FLASH_ATTN: True
    PRETRAINED: "./pretrained/intern_vit_6b_448px_v1_0.pth"
    CLS_TARGET: 'attention_pooling'
TRAIN:
  EMA:
    ENABLE: True
    DECAY: 0.998
  EPOCHS: 10
  WARMUP_EPOCHS: 1
  WEIGHT_DECAY: 0.0
  BASE_LR: 0.1 # 512
  WARMUP_LR: .0
  MIN_LR: .0
  LR_LAYER_DECAY: false
  OPTIMIZER:
    NAME: 'sgd'


================================================
FILE: classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_r.yaml
================================================
DATA:
  IMG_ON_MEMORY: False
  BATCH_SIZE: 16 # single GPU batch size
  DATASET: 'imagenet_r'
  TRANSFORM: 'build_transform_for_linear_probe'
  DATA_PATH: './data/imagenet-r'
  IMG_SIZE: 448
MODEL:
  TYPE: intern_vit_6b
  DROP_PATH_RATE: 0.0
  INTERN_VIT_6B:
    FREEZE_VIT: True
    PATCH_SIZE: 14
    PRETRAIN_SIZE: 448
    QKV_BIAS: False
    EMBED_DIM: 3200
    NUM_HEADS: 25
    MLP_RATIO: 4
    INIT_VALUES: 0.1
    QK_NORMALIZATION: True
    DEPTH: 45
    USE_FLASH_ATTN: True
    PRETRAINED: "./pretrained/intern_vit_6b_448px_v1_0.pth"
    CLS_TARGET: 'attention_pooling'
TRAIN:
  EMA:
    ENABLE: True
    DECAY: 0.998
  EPOCHS: 10
  WARMUP_EPOCHS: 1
  WEIGHT_DECAY: 0.0
  BASE_LR: 0.1 # 512
  WARMUP_LR: .0
  MIN_LR: .0
  LR_LAYER_DECAY: false
  OPTIMIZER:
    NAME: 'sgd'


================================================
FILE: classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_real.yaml
================================================
DATA:
  IMG_ON_MEMORY: False
  BATCH_SIZE: 16 # single GPU batch size
  DATASET: 'imagenet-real'
  TRANSFORM: 'build_transform_for_linear_probe'
  DATA_PATH: './data/imagenet-1k'
  IMG_SIZE: 448
MODEL:
  TYPE: intern_vit_6b
  DROP_PATH_RATE: 0.0
  INTERN_VIT_6B:
    FREEZE_VIT: True
    PATCH_SIZE: 14
    PRETRAIN_SIZE: 448
    QKV_BIAS: False
    EMBED_DIM: 3200
    NUM_HEADS: 25
    MLP_RATIO: 4
    INIT_VALUES: 0.1
    QK_NORMALIZATION: True
    DEPTH: 45
    USE_FLASH_ATTN: True
    PRETRAINED: "./pretrained/intern_vit_6b_448px_v1_0.pth"
    CLS_TARGET: 'attention_pooling'
TRAIN:
  EMA:
    ENABLE: True
    DECAY: 0.998
  EPOCHS: 10
  WARMUP_EPOCHS: 1
  WEIGHT_DECAY: 0.0
  BASE_LR: 0.1 # 512
  WARMUP_LR: .0
  MIN_LR: .0
  LR_LAYER_DECAY: false
  OPTIMIZER:
    NAME: 'sgd'


================================================
FILE: classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_sketch.yaml
================================================
DATA:
  IMG_ON_MEMORY: False
  BATCH_SIZE: 16 # single GPU batch size
  DATASET: 'imagenet_sketch'
  TRANSFORM: 'build_transform_for_linear_probe'
  DATA_PATH: './data/imagenet-sketch'
  IMG_SIZE: 448
MODEL:
  TYPE: intern_vit_6b
  DROP_PATH_RATE: 0.0
  INTERN_VIT_6B:
    FREEZE_VIT: True
    PATCH_SIZE: 14
    PRETRAIN_SIZE: 448
    QKV_BIAS: False
    EMBED_DIM: 3200
    NUM_HEADS: 25
    MLP_RATIO: 4
    INIT_VALUES: 0.1
    QK_NORMALIZATION: True
    DEPTH: 45
    USE_FLASH_ATTN: True
    PRETRAINED: "./pretrained/intern_vit_6b_448px_v1_0.pth"
    CLS_TARGET: 'attention_pooling'
TRAIN:
  EMA:
    ENABLE: True
    DECAY: 0.998
  EPOCHS: 10
  WARMUP_EPOCHS: 1
  WEIGHT_DECAY: 0.0
  BASE_LR: 0.1 # 512
  WARMUP_LR: .0
  MIN_LR: .0
  LR_LAYER_DECAY: false
  OPTIMIZER:
    NAME: 'sgd'


================================================
FILE: classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenetv2.yaml
================================================
DATA:
  IMG_ON_MEMORY: False
  BATCH_SIZE: 16 # single GPU batch size
  DATASET: 'imagenetv2'
  TRANSFORM: 'build_transform_for_linear_probe'
  DATA_PATH: './data/imagenetv2'
  IMG_SIZE: 448
MODEL:
  TYPE: intern_vit_6b
  DROP_PATH_RATE: 0.0
  INTERN_VIT_6B:
    FREEZE_VIT: True
    PATCH_SIZE: 14
    PRETRAIN_SIZE: 448
    QKV_BIAS: False
    EMBED_DIM: 3200
    NUM_HEADS: 25
    MLP_RATIO: 4
    INIT_VALUES: 0.1
    QK_NORMALIZATION: True
    DEPTH: 45
    USE_FLASH_ATTN: True
    PRETRAINED: "./pretrained/intern_vit_6b_448px_v1_0.pth"
    CLS_TARGET: 'attention_pooling'
TRAIN:
  EMA:
    ENABLE: True
    DECAY: 0.998
  EPOCHS: 10
  WARMUP_EPOCHS: 1
  WEIGHT_DECAY: 0.0
  BASE_LR: 0.1 # 512
  WARMUP_LR: .0
  MIN_LR: .0
  LR_LAYER_DECAY: false
  OPTIMIZER:
    NAME: 'sgd'


================================================
FILE: classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu.yaml
================================================
DATA:
  IMG_ON_MEMORY: False
  BATCH_SIZE: 16 # single GPU batch size
  TRANSFORM: 'build_transform_for_linear_probe'
  DATA_PATH: './data/imagenet-1k'
  IMG_SIZE: 448
MODEL:
  TYPE: intern_vit_6b
  DROP_PATH_RATE: 0.0
  INTERN_VIT_6B:
    FREEZE_VIT: True
    PATCH_SIZE: 14
    PRETRAIN_SIZE: 448
    QKV_BIAS: False
    EMBED_DIM: 3200
    NUM_HEADS: 25
    MLP_RATIO: 4
    INIT_VALUES: 0.1
    QK_NORMALIZATION: True
    DEPTH: 45
    USE_FLASH_ATTN: True
    PRETRAINED: "./pretrained/intern_vit_6b_448px_v1_2.pth"
    CLS_TARGET: 'attention_pooling'
TRAIN:
  EMA:
    ENABLE: True
    DECAY: 0.998
  EPOCHS: 10
  WARMUP_EPOCHS: 1
  WEIGHT_DECAY: 0.0
  BASE_LR: 0.1 # 512
  WARMUP_LR: .0
  MIN_LR: .0
  LR_LAYER_DECAY: false
  OPTIMIZER:
    NAME: 'sgd'


================================================
FILE: classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_a.yaml
================================================
DATA:
  IMG_ON_MEMORY: False
  BATCH_SIZE: 16 # single GPU batch size
  DATASET: 'imagenet_a'
  TRANSFORM: 'build_transform_for_linear_probe'
  DATA_PATH: './data/imagenet-a'
  IMG_SIZE: 448
MODEL:
  TYPE: intern_vit_6b
  DROP_PATH_RATE: 0.0
  INTERN_VIT_6B:
    FREEZE_VIT: True
    PATCH_SIZE: 14
    PRETRAIN_SIZE: 448
    QKV_BIAS: False
    EMBED_DIM: 3200
    NUM_HEADS: 25
    MLP_RATIO: 4
    INIT_VALUES: 0.1
    QK_NORMALIZATION: True
    DEPTH: 45
    USE_FLASH_ATTN: True
    PRETRAINED: "./pretrained/intern_vit_6b_448px_v1_2.pth"
    CLS_TARGET: 'attention_pooling'
TRAIN:
  EMA:
    ENABLE: True
    DECAY: 0.998
  EPOCHS: 10
  WARMUP_EPOCHS: 1
  WEIGHT_DECAY: 0.0
  BASE_LR: 0.1 # 512
  WARMUP_LR: .0
  MIN_LR: .0
  LR_LAYER_DECAY: false
  OPTIMIZER:
    NAME: 'sgd'


================================================
FILE: classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_r.yaml
================================================
DATA:
  IMG_ON_MEMORY: False
  BATCH_SIZE: 16 # single GPU batch size
  DATASET: 'imagenet_r'
  TRANSFORM: 'build_transform_for_linear_probe'
  DATA_PATH: './data/imagenet-r'
  IMG_SIZE: 448
MODEL:
  TYPE: intern_vit_6b
  DROP_PATH_RATE: 0.0
  INTERN_VIT_6B:
    FREEZE_VIT: True
    PATCH_SIZE: 14
    PRETRAIN_SIZE: 448
    QKV_BIAS: False
    EMBED_DIM: 3200
    NUM_HEADS: 25
    MLP_RATIO: 4
    INIT_VALUES: 0.1
    QK_NORMALIZATION: True
    DEPTH: 45
    USE_FLASH_ATTN: True
    PRETRAINED: "./pretrained/intern_vit_6b_448px_v1_2.pth"
    CLS_TARGET: 'attention_pooling'
TRAIN:
  EMA:
    ENABLE: True
    DECAY: 0.998
  EPOCHS: 10
  WARMUP_EPOCHS: 1
  WEIGHT_DECAY: 0.0
  BASE_LR: 0.1 # 512
  WARMUP_LR: .0
  MIN_LR: .0
  LR_LAYER_DECAY: false
  OPTIMIZER:
    NAME: 'sgd'


================================================
FILE: classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_real.yaml
================================================
DATA:
  IMG_ON_MEMORY: False
  BATCH_SIZE: 16 # single GPU batch size
  DATASET: 'imagenet-real'
  TRANSFORM: 'build_transform_for_linear_probe'
  DATA_PATH: './data/imagenet-1k'
  IMG_SIZE: 448
MODEL:
  TYPE: intern_vit_6b
  DROP_PATH_RATE: 0.0
  INTERN_VIT_6B:
    FREEZE_VIT: True
    PATCH_SIZE: 14
    PRETRAIN_SIZE: 448
    QKV_BIAS: False
    EMBED_DIM: 3200
    NUM_HEADS: 25
    MLP_RATIO: 4
    INIT_VALUES: 0.1
    QK_NORMALIZATION: True
    DEPTH: 45
    USE_FLASH_ATTN: True
    PRETRAINED: "./pretrained/intern_vit_6b_448px_v1_2.pth"
    CLS_TARGET: 'attention_pooling'
TRAIN:
  EMA:
    ENABLE: True
    DECAY: 0.998
  EPOCHS: 10
  WARMUP_EPOCHS: 1
  WEIGHT_DECAY: 0.0
  BASE_LR: 0.1 # 512
  WARMUP_LR: .0
  MIN_LR: .0
  LR_LAYER_DECAY: false
  OPTIMIZER:
    NAME: 'sgd'


================================================
FILE: classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_sketch.yaml
================================================
DATA:
  IMG_ON_MEMORY: False
  BATCH_SIZE: 16 # single GPU batch size
  DATASET: 'imagenet_sketch'
  TRANSFORM: 'build_transform_for_linear_probe'
  DATA_PATH: './data/imagenet-sketch'
  IMG_SIZE: 448
MODEL:
  TYPE: intern_vit_6b
  DROP_PATH_RATE: 0.0
  INTERN_VIT_6B:
    FREEZE_VIT: True
    PATCH_SIZE: 14
    PRETRAIN_SIZE: 448
    QKV_BIAS: False
    EMBED_DIM: 3200
    NUM_HEADS: 25
    MLP_RATIO: 4
    INIT_VALUES: 0.1
    QK_NORMALIZATION: True
    DEPTH: 45
    USE_FLASH_ATTN: True
    PRETRAINED: "./pretrained/intern_vit_6b_448px_v1_2.pth"
    CLS_TARGET: 'attention_pooling'
TRAIN:
  EMA:
    ENABLE: True
    DECAY: 0.998
  EPOCHS: 10
  WARMUP_EPOCHS: 1
  WEIGHT_DECAY: 0.0
  BASE_LR: 0.1 # 512
  WARMUP_LR: .0
  MIN_LR: .0
  LR_LAYER_DECAY: false
  OPTIMIZER:
    NAME: 'sgd'


================================================
FILE: classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenetv2.yaml
================================================
DATA:
  IMG_ON_MEMORY: False
  BATCH_SIZE: 16 # single GPU batch size
  DATASET: 'imagenetv2'
  TRANSFORM: 'build_transform_for_linear_probe'
  DATA_PATH: './data/imagenetv2'
  IMG_SIZE: 448
MODEL:
  TYPE: intern_vit_6b
  DROP_PATH_RATE: 0.0
  INTERN_VIT_6B:
    FREEZE_VIT: True
    PATCH_SIZE: 14
    PRETRAIN_SIZE: 448
    QKV_BIAS: False
    EMBED_DIM: 3200
    NUM_HEADS: 25
    MLP_RATIO: 4
    INIT_VALUES: 0.1
    QK_NORMALIZATION: True
    DEPTH: 45
    USE_FLASH_ATTN: True
    PRETRAINED: "./

Download .txt

gitextract_i3i5r_p7/

├── .flake8
├── .github/
│   ├── CONTRIBUTING.md
│   └── ISSUE_TEMPLATE/
│       ├── 1-bug-report.yml
│       ├── 2-feature-request.yml
│       └── 3-documentation.yml
├── .gitignore
├── .isort.cfg
├── .pre-commit-config.yaml
├── INSTALLATION.md
├── LICENSE
├── README.md
├── README_zh.md
├── classification/
│   ├── README.md
│   ├── config.py
│   ├── configs/
│   │   ├── attn_pooling_probing/
│   │   │   ├── attn_pooling_probing_intern_vit_6b_224px_in1k_224_64gpu.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_a.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_r.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_real.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_sketch.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenetv2.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_224px_in1k_224to448_64gpu.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_a.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_r.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_real.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_sketch.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenetv2.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_a.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_r.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_real.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_sketch.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenetv2.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_a.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_r.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_real.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_sketch.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenetv2.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenet_a.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenet_r.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenet_real.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenet_sketch.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenetv2.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenet_a.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenet_r.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenet_real.yaml
│   │   │   ├── attn_pooling_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenet_sketch.yaml
│   │   │   └── attn_pooling_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenetv2.yaml
│   │   ├── intern_vit_6b_1k_224.yaml
│   │   ├── intern_vit_6b_1k_224_test_imagenet_a.yaml
│   │   ├── intern_vit_6b_1k_224_test_imagenet_r.yaml
│   │   ├── intern_vit_6b_1k_224_test_imagenet_real.yaml
│   │   ├── intern_vit_6b_1k_224_test_imagenet_sketch.yaml
│   │   ├── intern_vit_6b_1k_224_test_imagenetv2.yaml
│   │   └── linear_probing/
│   │       ├── linear_probing_intern_vit_6b_224px_in1k_224_64gpu.yaml
│   │       ├── linear_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_a.yaml
│   │       ├── linear_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_r.yaml
│   │       ├── linear_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_real.yaml
│   │       ├── linear_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_sketch.yaml
│   │       ├── linear_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenetv2.yaml
│   │       ├── linear_probing_intern_vit_6b_224px_in1k_224to448_64gpu.yaml
│   │       ├── linear_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_a.yaml
│   │       ├── linear_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_r.yaml
│   │       ├── linear_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_real.yaml
│   │       ├── linear_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_sketch.yaml
│   │       ├── linear_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenetv2.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_a.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_r.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_real.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_sketch.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenetv2.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_a.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_r.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_real.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_sketch.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenetv2.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenet_a.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenet_r.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenet_real.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenet_sketch.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenetv2.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenet_a.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenet_r.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenet_real.yaml
│   │       ├── linear_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenet_sketch.yaml
│   │       └── linear_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenetv2.yaml
│   ├── dataset/
│   │   ├── __init__.py
│   │   ├── build.py
│   │   ├── cached_image_folder.py
│   │   ├── imagenet_a_r_indices.py
│   │   ├── imagenet_real.py
│   │   ├── imagenetv2.py
│   │   ├── samplers.py
│   │   └── zipreader.py
│   ├── ddp_hooks.py
│   ├── gflops.py
│   ├── hf2pytorch.py
│   ├── logger.py
│   ├── lr_scheduler.py
│   ├── main.py
│   ├── meta_data/
│   │   ├── 22k_class_to_idx.json
│   │   ├── imagenet_classes.json
│   │   ├── map22kto1k.txt
│   │   └── real.json
│   ├── models/
│   │   ├── __init__.py
│   │   ├── build.py
│   │   ├── clip_vit.py
│   │   ├── flash_attention.py
│   │   └── intern_vit_6b.py
│   ├── optimizer.py
│   ├── train_in1k.sh
│   ├── utils.py
│   └── work_dirs/
│       └── intern_vit_6b_1k_224/
│           └── log_rank0.txt
├── clip_benchmark/
│   ├── AUTHORS.rst
│   ├── CONTRIBUTING.rst
│   ├── HISTORY.rst
│   ├── LICENSE
│   ├── MANIFEST.in
│   ├── Makefile
│   ├── README.md
│   ├── benchmark/
│   │   ├── README.md
│   │   ├── benchmark.csv
│   │   ├── dataset_type.csv
│   │   ├── datasets.txt
│   │   ├── datasets_multilingual.txt
│   │   ├── models.txt
│   │   ├── results.ipynb
│   │   └── webdatasets.txt
│   ├── clip_benchmark/
│   │   ├── __init__.py
│   │   ├── cli.py
│   │   ├── datasets/
│   │   │   ├── __init__.py
│   │   │   ├── ar_classnames.json
│   │   │   ├── ar_zeroshot_classification_templates.json
│   │   │   ├── birdsnap.py
│   │   │   ├── builder.py
│   │   │   ├── caltech101.py
│   │   │   ├── cn_classnames.json
│   │   │   ├── cn_zeroshot_classification_templates.json
│   │   │   ├── cupl_prompts.json
│   │   │   ├── en_classnames.json
│   │   │   ├── en_zeroshot_classification_templates.json
│   │   │   ├── flickr.py
│   │   │   ├── imagenetv2.py
│   │   │   ├── it_classnames.json
│   │   │   ├── it_zeroshot_classification_templates.json
│   │   │   ├── jp_classnames.json
│   │   │   ├── jp_zeroshot_classification_templates.json
│   │   │   ├── kitti.py
│   │   │   ├── multilingual_mscoco.py
│   │   │   ├── objectnet.py
│   │   │   ├── tfds.py
│   │   │   ├── tools.py
│   │   │   └── voc2007.py
│   │   ├── metrics/
│   │   │   ├── __init__.py
│   │   │   ├── linear_probe.py
│   │   │   ├── mscoco_generative.py
│   │   │   ├── zeroshot_classification.py
│   │   │   └── zeroshot_retrieval.py
│   │   ├── model_collection.py
│   │   ├── models/
│   │   │   ├── __init__.py
│   │   │   ├── intern_vit_6b/
│   │   │   │   ├── configuration_intern_vit.py
│   │   │   │   ├── flash_attention.py
│   │   │   │   └── modeling_intern_vit.py
│   │   │   ├── internvl.py
│   │   │   ├── internvl_c_pytorch/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── chinese_alpaca_lora_7b/
│   │   │   │   │   ├── config.json
│   │   │   │   │   ├── generation_config.json
│   │   │   │   │   ├── pytorch_model.bin.index.json
│   │   │   │   │   ├── special_tokens_map.json
│   │   │   │   │   ├── tokenizer.model
│   │   │   │   │   └── tokenizer_config.json
│   │   │   │   ├── flash_attention.py
│   │   │   │   └── internvl_c.py
│   │   │   ├── internvl_huggingface/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── configuration_intern_vit.py
│   │   │   │   ├── configuration_internvl.py
│   │   │   │   ├── flash_attention.py
│   │   │   │   ├── modeling_intern_vit.py
│   │   │   │   ├── modeling_internvl.py
│   │   │   │   └── modeling_qllama.py
│   │   │   ├── japanese_clip.py
│   │   │   └── open_clip.py
│   │   └── webdataset_builder.py
│   ├── data/
│   │   ├── birdsnap/
│   │   │   └── test_images_valid.txt
│   │   ├── flickr30k/
│   │   │   └── flickr30k_cn_test.txt
│   │   └── mscoco_captions/
│   │       └── coco-cn_test.json
│   ├── probe_benchmark/
│   │   ├── PROBES.md
│   │   ├── build_df_scaling_experiments.py
│   │   ├── clip_table_2.csv
│   │   ├── generate_table.py
│   │   ├── laion5b_fewshot_experiments.py
│   │   ├── openclip_results.csv
│   │   ├── process_vtab.py
│   │   ├── scaling_experiment_data2.json
│   │   ├── scaling_experiment_data_vtab.json
│   │   ├── scaling_experiments.py
│   │   └── scaling_plot.ipynb
│   ├── requirements-test.txt
│   ├── requirements.txt
│   ├── setup.cfg
│   ├── setup.py
│   ├── test_internvl_c_classification.sh
│   ├── test_internvl_c_imagenet.sh
│   ├── test_internvl_c_retrieval.sh
│   ├── test_internvl_c_xtd.sh
│   ├── test_internvl_g_classification.sh
│   ├── test_internvl_g_imagenet.sh
│   ├── test_internvl_g_retrieval.sh
│   ├── test_internvl_g_retrieval_finetune.sh
│   ├── test_internvl_g_xtd.sh
│   ├── tests/
│   │   └── test_clip_benchmark.py
│   └── tox.ini
├── internvl_chat/
│   ├── README.md
│   ├── eval/
│   │   ├── README.md
│   │   ├── caption/
│   │   │   ├── README.md
│   │   │   └── evaluate_caption.py
│   │   ├── domain_specific/
│   │   │   ├── drivelm/
│   │   │   │   └── evaluate.py
│   │   │   ├── mme_rw/
│   │   │   │   └── evaluate.py
│   │   │   ├── rs_det/
│   │   │   │   ├── caculate.py
│   │   │   │   └── evaluate.py
│   │   │   └── rs_vqa/
│   │   │       ├── evaluate.py
│   │   │       └── score.py
│   │   ├── llava_bench/
│   │   │   ├── README.md
│   │   │   ├── eval_gpt_review_bench.py
│   │   │   ├── evaluate_llava_bench.py
│   │   │   ├── rule.json
│   │   │   └── summarize_gpt_review.py
│   │   ├── mantis_eval/
│   │   │   ├── README.md
│   │   │   └── evaluate_mantis.py
│   │   ├── mathvista/
│   │   │   ├── README.md
│   │   │   ├── calculate_score.py
│   │   │   ├── evaluate_mathvista.py
│   │   │   ├── extract_answer.py
│   │   │   ├── prompts/
│   │   │   │   └── ext_ans.py
│   │   │   └── utilities.py
│   │   ├── mirb/
│   │   │   ├── README.md
│   │   │   └── evaluate_mirb.py
│   │   ├── mmbench/
│   │   │   ├── README.md
│   │   │   └── evaluate_mmbench.py
│   │   ├── mme/
│   │   │   ├── README.md
│   │   │   ├── Your_Results/
│   │   │   │   ├── OCR.txt
│   │   │   │   ├── artwork.txt
│   │   │   │   ├── celebrity.txt
│   │   │   │   ├── code_reasoning.txt
│   │   │   │   ├── color.txt
│   │   │   │   ├── commonsense_reasoning.txt
│   │   │   │   ├── count.txt
│   │   │   │   ├── existence.txt
│   │   │   │   ├── landmark.txt
│   │   │   │   ├── numerical_calculation.txt
│   │   │   │   ├── position.txt
│   │   │   │   ├── posters.txt
│   │   │   │   ├── scene.txt
│   │   │   │   └── text_translation.txt
│   │   │   ├── calculation.py
│   │   │   └── eval.py
│   │   ├── mmhal/
│   │   │   ├── README.md
│   │   │   ├── eval_gpt_mmhal.py
│   │   │   └── evaluate_mmhal.py
│   │   ├── mmiu/
│   │   │   ├── README.md
│   │   │   ├── evaluate_mmiu.py
│   │   │   └── mmiu.jsonl
│   │   ├── mmmu/
│   │   │   ├── README.md
│   │   │   ├── answer_dict_val.json
│   │   │   ├── data_utils.py
│   │   │   ├── eval_utils.py
│   │   │   ├── evaluate_mmmu.py
│   │   │   └── main_eval_only.py
│   │   ├── mmmu_pro/
│   │   │   ├── README.md
│   │   │   ├── evaluate.py
│   │   │   ├── evaluate_mmmu_pro.py
│   │   │   └── prompts.yaml
│   │   ├── mmvet/
│   │   │   ├── README.md
│   │   │   └── evaluate_mmvet.py
│   │   ├── mmvetv2/
│   │   │   ├── README.md
│   │   │   └── evaluate_mmvet_v2.py
│   │   ├── mmvp/
│   │   │   ├── README.md
│   │   │   └── evaluate_mmvp.py
│   │   ├── mpdocvqa/
│   │   │   ├── README.md
│   │   │   ├── evaluate_vqa.py
│   │   │   └── infographicsvqa_eval.py
│   │   ├── mvbench/
│   │   │   ├── README.md
│   │   │   └── evaluate_mvbench.py
│   │   ├── pope/
│   │   │   ├── README.md
│   │   │   ├── eval_pope.py
│   │   │   └── evaluate_pope.py
│   │   ├── refcoco/
│   │   │   ├── README.md
│   │   │   └── evaluate_grounding.py
│   │   ├── scienceqa/
│   │   │   ├── README.md
│   │   │   └── evaluate_scienceqa.py
│   │   ├── seed/
│   │   │   ├── README.md
│   │   │   ├── calculation.py
│   │   │   └── evaluate_seed.py
│   │   ├── tiny_lvlm/
│   │   │   ├── README.md
│   │   │   ├── calculate_score.py
│   │   │   ├── evaluate_lvlm.py
│   │   │   └── tools.py
│   │   └── vqa/
│   │       ├── README.md
│   │       ├── convert_gqa_for_eval.py
│   │       ├── evaluate_vqa.py
│   │       ├── infographicsvqa_eval.py
│   │       └── textvqa_eval.py
│   ├── evaluate.sh
│   ├── internvl/
│   │   ├── conversation.py
│   │   ├── dist_utils.py
│   │   ├── model/
│   │   │   ├── __init__.py
│   │   │   ├── internlm2/
│   │   │   │   ├── configuration_internlm2.py
│   │   │   │   ├── modeling_internlm2.py
│   │   │   │   ├── tokenization_internlm2.py
│   │   │   │   └── tokenization_internlm2_fast.py
│   │   │   ├── internvl_chat/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── configuration_intern_vit.py
│   │   │   │   ├── configuration_internvl_chat.py
│   │   │   │   ├── modeling_intern_vit.py
│   │   │   │   └── modeling_internvl_chat.py
│   │   │   └── phi3/
│   │   │       ├── configuration_phi3.py
│   │   │       └── modeling_phi3.py
│   │   ├── patch/
│   │   │   ├── __init__.py
│   │   │   ├── internlm2_packed_training_patch.py
│   │   │   ├── internvit_liger_monkey_patch.py
│   │   │   ├── llama2_flash_attn_monkey_patch.py
│   │   │   ├── llama_flash_attn_monkey_patch.py
│   │   │   ├── llama_packed_training_patch.py
│   │   │   ├── llama_rmsnorm_monkey_patch.py
│   │   │   ├── pad_data_collator.py
│   │   │   ├── phi3_packed_training_patch.py
│   │   │   ├── qwen2_packed_training_patch.py
│   │   │   ├── train_dataloader_patch.py
│   │   │   └── train_sampler_patch.py
│   │   └── train/
│   │       ├── __init__.py
│   │       ├── constants.py
│   │       ├── dataset.py
│   │       ├── dataset_packed.py
│   │       ├── internvl_chat_finetune.py
│   │       ├── internvl_chat_mpo.py
│   │       ├── internvl_chat_pretrain.py
│   │       └── trainer_dpo.py
│   ├── pyproject.toml
│   ├── shell/
│   │   ├── data/
│   │   │   ├── coco_caption.json
│   │   │   ├── internvl_1_2_finetune.json
│   │   │   └── internvl_1_2_finetune_custom.json
│   │   ├── internvl1.2/
│   │   │   ├── 2nd_finetune/
│   │   │   │   ├── internvl_chat_v1_2_hermes2_yi34b_448_res_2nd_finetune_full.sh
│   │   │   │   └── internvl_chat_v1_2_hermes2_yi34b_448_res_2nd_finetune_lora.sh
│   │   │   └── hermes2_yi34b/
│   │   │       └── internvl_chat_v1_2_hermes2_yi34b_448_res_finetune.sh
│   │   ├── internvl1.5/
│   │   │   ├── 2nd_finetune/
│   │   │   │   ├── internvl_chat_v1_5_internlm2_1_8b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   ├── internvl_chat_v1_5_internlm2_1_8b_dynamic_res_2nd_finetune_lora.sh
│   │   │   │   ├── internvl_chat_v1_5_internlm2_20b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   ├── internvl_chat_v1_5_internlm2_20b_dynamic_res_2nd_finetune_lora.sh
│   │   │   │   ├── internvl_chat_v1_5_phi3_3_8b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   └── internvl_chat_v1_5_phi3_3_8b_dynamic_res_2nd_finetune_lora.sh
│   │   │   ├── hermes2_yi34b/
│   │   │   │   ├── internvl_chat_v1_5_hermes2_yi34b_dynamic_res_finetune.sh
│   │   │   │   └── internvl_chat_v1_5_hermes2_yi34b_dynamic_res_pretrain.sh
│   │   │   ├── internlm2_1_8b/
│   │   │   │   ├── internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune.sh
│   │   │   │   └── internvl_chat_v1_5_internlm2_1_8b_dynamic_res_pretrain.sh
│   │   │   ├── internlm2_20b/
│   │   │   │   ├── internvl_chat_v1_5_internlm2_20b_dynamic_res_finetune.sh
│   │   │   │   └── internvl_chat_v1_5_internlm2_20b_dynamic_res_pretrain.sh
│   │   │   └── phi3_3_8b/
│   │   │       ├── internvl_chat_v1_5_phi3_3_8b_dynamic_res_finetune.sh
│   │   │       └── internvl_chat_v1_5_phi3_3_8b_dynamic_res_pretrain.sh
│   │   ├── internvl2.0/
│   │   │   └── 2nd_finetune/
│   │   │       ├── internvl2_1b_qwen2_0_5b_dynamic_res_2nd_finetune_full.sh
│   │   │       ├── internvl2_1b_qwen2_0_5b_dynamic_res_2nd_finetune_lora.sh
│   │   │       ├── internvl2_26b_internlm2_20b_dynamic_res_2nd_finetune_full.sh
│   │   │       ├── internvl2_26b_internlm2_20b_dynamic_res_2nd_finetune_lora.sh
│   │   │       ├── internvl2_2b_internlm2_1_8b_dynamic_res_2nd_finetune_full.sh
│   │   │       ├── internvl2_2b_internlm2_1_8b_dynamic_res_2nd_finetune_lora.sh
│   │   │       ├── internvl2_2b_internlm2_1_8b_dynamic_res_2nd_finetune_lora_coco.sh
│   │   │       ├── internvl2_40b_hermes2_yi_34b_dynamic_res_2nd_finetune_full.sh
│   │   │       ├── internvl2_40b_hermes2_yi_34b_dynamic_res_2nd_finetune_lora.sh
│   │   │       ├── internvl2_4b_phi3_3_8b_dynamic_res_2nd_finetune_full.sh
│   │   │       ├── internvl2_4b_phi3_3_8b_dynamic_res_2nd_finetune_lora.sh
│   │   │       ├── internvl2_76b_hermes2_llama3_70b_dynamic_res_2nd_finetune_full.sh
│   │   │       ├── internvl2_76b_hermes2_llama3_70b_dynamic_res_2nd_finetune_lora.sh
│   │   │       ├── internvl2_8b_internlm2_7b_dynamic_res_2nd_finetune_full.sh
│   │   │       └── internvl2_8b_internlm2_7b_dynamic_res_2nd_finetune_lora.sh
│   │   ├── internvl2.0_mpo/
│   │   │   ├── README.md
│   │   │   └── preference_optimization/
│   │   │       └── internvl2_8b_internlm2_7b_dynamic_res_mpo_full.sh
│   │   ├── internvl2.5/
│   │   │   ├── 2nd_finetune/
│   │   │   │   ├── internvl2_5_1b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   ├── internvl2_5_1b_dynamic_res_2nd_finetune_lora.sh
│   │   │   │   ├── internvl2_5_26b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   ├── internvl2_5_26b_dynamic_res_2nd_finetune_lora.sh
│   │   │   │   ├── internvl2_5_2b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   ├── internvl2_5_2b_dynamic_res_2nd_finetune_lora.sh
│   │   │   │   ├── internvl2_5_2b_dynamic_res_2nd_finetune_lora_coco.sh
│   │   │   │   ├── internvl2_5_38b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   ├── internvl2_5_38b_dynamic_res_2nd_finetune_lora.sh
│   │   │   │   ├── internvl2_5_4b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   ├── internvl2_5_4b_dynamic_res_2nd_finetune_lora.sh
│   │   │   │   ├── internvl2_5_78b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   ├── internvl2_5_78b_dynamic_res_2nd_finetune_lora.sh
│   │   │   │   ├── internvl2_5_8b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   └── internvl2_5_8b_dynamic_res_2nd_finetune_lora.sh
│   │   │   ├── stage1/
│   │   │   │   ├── internvl2_5_1b_qwen2_5_0_5b_dynamic_res_stage1.sh
│   │   │   │   ├── internvl2_5_26b_internlm2_5_20b_dynamic_res_stage1.sh
│   │   │   │   ├── internvl2_5_2b_internlm2_5_1_8b_dynamic_res_stage1.sh
│   │   │   │   ├── internvl2_5_38b_qwen2_5_32b_dynamic_res_stage1.sh
│   │   │   │   ├── internvl2_5_4b_qwen2_5_3b_dynamic_res_stage1.sh
│   │   │   │   ├── internvl2_5_78b_qwen2_5_72b_dynamic_res_stage1.sh
│   │   │   │   └── internvl2_5_8b_internlm2_5_7b_dynamic_res_stage1.sh
│   │   │   ├── stage1.5/
│   │   │   │   ├── internvl2_5_26b_internlm2_5_20b_dynamic_res_stage1_5.sh
│   │   │   │   └── internvl2_5_8b_internlm2_5_7b_dynamic_res_stage1_5.sh
│   │   │   └── stage2/
│   │   │       ├── internvl2_5_1b_qwen2_5_0_5b_dynamic_res_stage2.sh
│   │   │       ├── internvl2_5_26b_internlm2_5_20b_dynamic_res_stage2.sh
│   │   │       ├── internvl2_5_2b_internlm2_5_1_8b_dynamic_res_stage2.sh
│   │   │       ├── internvl2_5_38b_qwen2_5_32b_dynamic_res_stage2.sh
│   │   │       ├── internvl2_5_4b_qwen2_5_3b_dynamic_res_stage2.sh
│   │   │       ├── internvl2_5_78b_qwen2_5_72b_dynamic_res_stage2.sh
│   │   │       └── internvl2_5_8b_internlm2_5_7b_dynamic_res_stage2.sh
│   │   ├── internvl2.5_mpo/
│   │   │   └── preference_optimization/
│   │   │       ├── internvl2_5_1b_qwen2_5_0_5b_dynamic_res_mpo.sh
│   │   │       ├── internvl2_5_26b_internlm2_5_20b_dynamic_res_mpo.sh
│   │   │       ├── internvl2_5_2b_internlm2_5_1_8b_dynamic_res_mpo.sh
│   │   │       ├── internvl2_5_38b_qwen2_5_32b_dynamic_res_mpo.sh
│   │   │       ├── internvl2_5_4b_qwen2_5_3b_dynamic_res_mpo.sh
│   │   │       ├── internvl2_5_78b_qwen2_5_72b_dynamic_res_mpo.sh
│   │   │       └── internvl2_5_8b_internlm2_5_7b_dynamic_res_mpo.sh
│   │   ├── internvl3.0/
│   │   │   ├── 2nd_finetune/
│   │   │   │   ├── internvl3_14b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   ├── internvl3_1b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   ├── internvl3_2b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   ├── internvl3_38b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   ├── internvl3_78b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   ├── internvl3_8b_dynamic_res_2nd_finetune_full.sh
│   │   │   │   └── internvl3_9b_dynamic_res_2nd_finetune_full.sh
│   │   │   ├── mpo/
│   │   │   │   ├── internvl3_14b_mpo.sh
│   │   │   │   ├── internvl3_1b_mpo.sh
│   │   │   │   ├── internvl3_2b_mpo.sh
│   │   │   │   ├── internvl3_38b_mpo.sh
│   │   │   │   ├── internvl3_78b_mpo.sh
│   │   │   │   ├── internvl3_8b_mpo.sh
│   │   │   │   └── internvl3_9b_mpo.sh
│   │   │   ├── mpo_data_construction/
│   │   │   │   ├── correctness_build_data.sh
│   │   │   │   └── correctness_mmpr_8b.sh
│   │   │   └── visualprm_data_construction/
│   │   │       ├── visualprm_build_data.sh
│   │   │       └── visualprm_mmpr_8b.sh
│   │   └── mini_internvl/
│   │       ├── README.md
│   │       └── domain_adaptation/
│   │           ├── internvl2_1b_qwen2_0_5b_dynamic_res_finetune_bdd.sh
│   │           ├── internvl2_1b_qwen2_0_5b_dynamic_res_finetune_drivelm.sh
│   │           ├── internvl2_1b_qwen2_0_5b_dynamic_res_finetune_medical.sh
│   │           ├── internvl2_1b_qwen2_0_5b_dynamic_res_finetune_remote.sh
│   │           ├── internvl2_2b_internlm2_1_8b_dynamic_res_finetune_bdd.sh
│   │           ├── internvl2_2b_internlm2_1_8b_dynamic_res_finetune_drivelm.sh
│   │           ├── internvl2_2b_internlm2_1_8b_dynamic_res_finetune_medical.sh
│   │           ├── internvl2_2b_internlm2_1_8b_dynamic_res_finetune_remote.sh
│   │           ├── internvl2_4b_phi3_3_8b_dynamic_res_finetune_bdd.sh
│   │           ├── internvl2_4b_phi3_3_8b_dynamic_res_finetune_drivelm.sh
│   │           ├── internvl2_4b_phi3_3_8b_dynamic_res_finetune_medical.sh
│   │           └── internvl2_4b_phi3_3_8b_dynamic_res_finetune_remote.sh
│   ├── tools/
│   │   ├── README.md
│   │   ├── convert_to_int8.py
│   │   ├── extract_mlp.py
│   │   ├── extract_video_frames.py
│   │   ├── extract_vit.py
│   │   ├── images_stitching.py
│   │   ├── internvl_custom2hf.py
│   │   ├── internvl_hf2custom.py
│   │   ├── json2jsonl.py
│   │   ├── jsonl2jsonl.py
│   │   ├── merge_lora.py
│   │   ├── reasoning_data_pipeline/
│   │   │   ├── mmpr_data_pipeline_correctness.py
│   │   │   ├── mmpr_data_pipeline_correctness_postprocess.py
│   │   │   ├── mmpr_data_pipeline_dropout_ntp.py
│   │   │   ├── utils/
│   │   │   │   ├── accuracy_reward.py
│   │   │   │   ├── constants.py
│   │   │   │   └── utils.py
│   │   │   ├── visualprm_data_pieline.py
│   │   │   └── visualprm_data_pipeline_postprocess.py
│   │   ├── replace_llm.py
│   │   └── resize_pos_embed.py
│   ├── zero_stage1_config.json
│   ├── zero_stage2_config.json
│   ├── zero_stage3_config.json
│   ├── zero_stage3_config_100b.json
│   ├── zero_stage3_config_100b_1e7_offload.json
│   ├── zero_stage3_config_100b_1e8.json
│   ├── zero_stage3_config_34b.json
│   └── zero_stage3_config_70b.json
├── internvl_chat_gpt_oss/
│   ├── README.md
│   ├── internvl/
│   │   ├── dist_utils.py
│   │   ├── model/
│   │   │   └── internvl_chat/
│   │   │       ├── __init__.py
│   │   │       ├── configuration_intern_vit.py
│   │   │       ├── configuration_internvl_chat.py
│   │   │       ├── conversation.py
│   │   │       ├── modeling_intern_vit.py
│   │   │       └── modeling_internvl_chat.py
│   │   ├── patch/
│   │   │   ├── __init__.py
│   │   │   ├── flash_sink_attn/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── flash_attn_with_sink.py
│   │   │   │   ├── flash_sink_attn.py
│   │   │   │   ├── flash_sink_attn_gpt_oss.py
│   │   │   │   ├── flash_sink_varlen_attn_gpt_oss.py
│   │   │   │   └── sliding_cache.py
│   │   │   ├── flash_sink_attn_monkey_patch.py
│   │   │   ├── pad_data_collator.py
│   │   │   ├── qwen3_flash_monkey_patch.py
│   │   │   └── train_dataloader_patch.py
│   │   ├── train/
│   │   │   ├── constants.py
│   │   │   ├── dataset.py
│   │   │   ├── dataset_packed.py
│   │   │   ├── internvl_chat_finetune.py
│   │   │   ├── internvl_chat_mpo.py
│   │   │   └── trainer_dpo.py
│   │   └── utils/
│   │       ├── s3_config.py
│   │       ├── s3_exception.py
│   │       └── s3_fileio.py
│   ├── requirements.txt
│   ├── shell/
│   │   ├── data/
│   │   │   ├── debug_mpo.json
│   │   │   └── debug_sft.json
│   │   ├── internvl3_5_gpt_oss/
│   │   │   ├── internvl3_5_gpt_oss_20b_stage0_mlp_warmup.sh
│   │   │   ├── internvl3_5_gpt_oss_20b_stage1_pretrain.sh
│   │   │   ├── internvl3_5_gpt_oss_20b_stage2_sft.sh
│   │   │   └── internvl3_5_gpt_oss_20b_stage3_mpo.sh
│   │   └── internvl3_5_qwen3/
│   │       ├── internvl3_5_14b_mpo.sh
│   │       ├── internvl3_5_14b_sft.sh
│   │       ├── internvl3_5_1b_mpo.sh
│   │       ├── internvl3_5_1b_sft.sh
│   │       ├── internvl3_5_241b_mpo.sh
│   │       ├── internvl3_5_241b_sft.sh
│   │       ├── internvl3_5_2b_mpo.sh
│   │       ├── internvl3_5_2b_sft.sh
│   │       ├── internvl3_5_30b_mpo.sh
│   │       ├── internvl3_5_30b_sft.sh
│   │       ├── internvl3_5_38b_mpo.sh
│   │       ├── internvl3_5_38b_sft.sh
│   │       ├── internvl3_5_4b_mpo.sh
│   │       ├── internvl3_5_4b_sft.sh
│   │       ├── internvl3_5_8b_mpo.sh
│   │       └── internvl3_5_8b_sft.sh
│   ├── zero_stage1_config.json
│   └── zero_stage3_config.json
├── internvl_chat_llava/
│   ├── LICENSE
│   ├── README.md
│   ├── docs/
│   │   ├── Customize_Component.md
│   │   ├── Data.md
│   │   ├── Evaluation.md
│   │   ├── LLaVA_Bench.md
│   │   ├── LLaVA_from_LLaMA2.md
│   │   ├── LoRA.md
│   │   ├── MODEL_ZOO.md
│   │   └── ScienceQA.md
│   ├── llava/
│   │   ├── __init__.py
│   │   ├── constants.py
│   │   ├── conversation.py
│   │   ├── eval/
│   │   │   ├── eval_gpt_review.py
│   │   │   ├── eval_gpt_review_bench.py
│   │   │   ├── eval_gpt_review_visual.py
│   │   │   ├── eval_pope.py
│   │   │   ├── eval_science_qa.py
│   │   │   ├── eval_science_qa_gpt4.py
│   │   │   ├── eval_science_qa_gpt4_requery.py
│   │   │   ├── eval_textvqa.py
│   │   │   ├── generate_webpage_data_from_table.py
│   │   │   ├── m4c_evaluator.py
│   │   │   ├── model_qa.py
│   │   │   ├── model_vqa.py
│   │   │   ├── model_vqa_loader.py
│   │   │   ├── model_vqa_mmbench.py
│   │   │   ├── model_vqa_science.py
│   │   │   ├── qa_baseline_gpt35.py
│   │   │   ├── run_llava.py
│   │   │   ├── summarize_gpt_review.py
│   │   │   ├── table/
│   │   │   │   ├── answer/
│   │   │   │   │   ├── answer_alpaca-13b.jsonl
│   │   │   │   │   ├── answer_bard.jsonl
│   │   │   │   │   ├── answer_gpt35.jsonl
│   │   │   │   │   ├── answer_llama-13b.jsonl
│   │   │   │   │   └── answer_vicuna-13b.jsonl
│   │   │   │   ├── caps_boxes_coco2014_val_80.jsonl
│   │   │   │   ├── model.jsonl
│   │   │   │   ├── prompt.jsonl
│   │   │   │   ├── question.jsonl
│   │   │   │   ├── review/
│   │   │   │   │   ├── review_alpaca-13b_vicuna-13b.jsonl
│   │   │   │   │   ├── review_bard_vicuna-13b.jsonl
│   │   │   │   │   ├── review_gpt35_vicuna-13b.jsonl
│   │   │   │   │   └── review_llama-13b_vicuna-13b.jsonl
│   │   │   │   ├── reviewer.jsonl
│   │   │   │   └── rule.json
│   │   │   └── webpage/
│   │   │       ├── index.html
│   │   │       ├── script.js
│   │   │       └── styles.css
│   │   ├── mm_utils.py
│   │   ├── model/
│   │   │   ├── __init__.py
│   │   │   ├── apply_delta.py
│   │   │   ├── builder.py
│   │   │   ├── consolidate.py
│   │   │   ├── language_model/
│   │   │   │   ├── llava_llama.py
│   │   │   │   ├── llava_mpt.py
│   │   │   │   └── mpt/
│   │   │   │       ├── adapt_tokenizer.py
│   │   │   │       ├── attention.py
│   │   │   │       ├── blocks.py
│   │   │   │       ├── configuration_mpt.py
│   │   │   │       ├── custom_embedding.py
│   │   │   │       ├── flash_attn_triton.py
│   │   │   │       ├── hf_prefixlm_converter.py
│   │   │   │       ├── meta_init_context.py
│   │   │   │       ├── modeling_mpt.py
│   │   │   │       ├── norm.py
│   │   │   │       └── param_init_fns.py
│   │   │   ├── llava_arch.py
│   │   │   ├── make_delta.py
│   │   │   ├── multimodal_encoder/
│   │   │   │   ├── builder.py
│   │   │   │   ├── clip_encoder.py
│   │   │   │   ├── eva_clip/
│   │   │   │   │   ├── configuration_evaclip.py
│   │   │   │   │   └── modeling_evaclip.py
│   │   │   │   ├── intern_vit_6b/
│   │   │   │   │   ├── configuration_intern_vit.py
│   │   │   │   │   ├── flash_attention.py
│   │   │   │   │   └── modeling_intern_vit.py
│   │   │   │   └── internvl_14b/
│   │   │   │       ├── __init__.py
│   │   │   │       ├── configuration_intern_vit.py
│   │   │   │       ├── configuration_internvl.py
│   │   │   │       ├── flash_attention.py
│   │   │   │       ├── modeling_intern_vit.py
│   │   │   │       ├── modeling_internvl.py
│   │   │   │       └── modeling_qllama.py
│   │   │   ├── multimodal_projector/
│   │   │   │   └── builder.py
│   │   │   └── utils.py
│   │   ├── serve/
│   │   │   ├── __init__.py
│   │   │   ├── cli.py
│   │   │   ├── controller.py
│   │   │   ├── gradio_web_server.py
│   │   │   ├── model_worker.py
│   │   │   ├── register_worker.py
│   │   │   └── test_message.py
│   │   ├── train/
│   │   │   ├── dist_utils.py
│   │   │   ├── llama_flash_attn_monkey_patch.py
│   │   │   ├── llava_trainer.py
│   │   │   ├── train.py
│   │   │   ├── train_custom.py
│   │   │   ├── train_mem.py
│   │   │   └── train_mem_custom.py
│   │   └── utils.py
│   ├── pyproject.toml
│   ├── scripts/
│   │   ├── convert_gqa_for_eval.py
│   │   ├── convert_mmbench_for_submission.py
│   │   ├── convert_mmvet_for_eval.py
│   │   ├── convert_seed_for_submission.py
│   │   ├── convert_sqa_to_llava.py
│   │   ├── convert_sqa_to_llava_base_prompt.py
│   │   ├── convert_vizwiz_for_submission.py
│   │   ├── convert_vqav2_for_submission.py
│   │   ├── finetune.sh
│   │   ├── finetune_full_schedule.sh
│   │   ├── finetune_lora.sh
│   │   ├── finetune_qlora.sh
│   │   ├── finetune_sqa.sh
│   │   ├── merge_lora_weights.py
│   │   ├── pretrain.sh
│   │   ├── sqa_eval_batch.sh
│   │   ├── sqa_eval_gather.sh
│   │   ├── v1_5/
│   │   │   ├── eval/
│   │   │   │   ├── gqa.sh
│   │   │   │   ├── llavabench.sh
│   │   │   │   ├── mmbench.sh
│   │   │   │   ├── mmbench_cn.sh
│   │   │   │   ├── mme.sh
│   │   │   │   ├── mmvet.sh
│   │   │   │   ├── pope.sh
│   │   │   │   ├── seed.sh
│   │   │   │   ├── sqa.sh
│   │   │   │   ├── textvqa.sh
│   │   │   │   ├── vizwiz.sh
│   │   │   │   └── vqav2.sh
│   │   │   ├── finetune.sh
│   │   │   └── pretrain.sh
│   │   ├── zero1.json
│   │   ├── zero2.json
│   │   ├── zero3.json
│   │   └── zero3_offload.json
│   └── scripts_internvl/
│       ├── eval/
│       │   ├── gqa.sh
│       │   ├── llavabench.sh
│       │   ├── mmbench.sh
│       │   ├── mme.sh
│       │   ├── mmvet.sh
│       │   ├── pope.sh
│       │   ├── sqa.sh
│       │   ├── textvqa.sh
│       │   ├── vizwiz.sh
│       │   └── vqav2.sh
│       ├── finetune_internvit6b_224to336_vicuna13b.sh
│       ├── finetune_internvit6b_224to336_vicuna13b_custom_data.sh
│       ├── finetune_internvit6b_224to336_vicuna7b.sh
│       ├── finetune_internvit6b_448_v1_2_vicuna13b.sh
│       ├── finetune_internvit6b_448_v1_5_vicuna13b.sh
│       ├── finetune_internvit6b_448_vicuna13b.sh
│       ├── finetune_internvit6b_448_vicuna7b.sh
│       ├── meta/
│       │   └── custom_data.json
│       ├── pretrain_internvit6b_224to336_vicuna13b.sh
│       ├── pretrain_internvit6b_224to336_vicuna7b.sh
│       ├── pretrain_internvit6b_448_v1_2_vicuna13b.sh
│       ├── pretrain_internvit6b_448_v1_5_vicuna13b.sh
│       ├── pretrain_internvit6b_448_vicuna13b.sh
│       └── pretrain_internvit6b_448_vicuna7b.sh
├── internvl_g/
│   ├── README.md
│   ├── eval/
│   │   └── evaluate_caption.py
│   ├── evaluate.sh
│   ├── internvl/
│   │   ├── dist_utils.py
│   │   ├── model/
│   │   │   ├── __init__.py
│   │   │   ├── internvl_stage2/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── configuration_intern_vit.py
│   │   │   │   ├── configuration_internvl.py
│   │   │   │   ├── flash_attention.py
│   │   │   │   ├── modeling_intern_vit.py
│   │   │   │   ├── modeling_internvl.py
│   │   │   │   └── modeling_qllama.py
│   │   │   └── internvl_stage2_retrieval/
│   │   │       ├── __init__.py
│   │   │       ├── configuration_intern_vit.py
│   │   │       ├── configuration_internvl.py
│   │   │       ├── flash_attention.py
│   │   │       ├── modeling_intern_vit.py
│   │   │       ├── modeling_internvl.py
│   │   │       └── modeling_qllama.py
│   │   └── train/
│   │       ├── __init__.py
│   │       ├── dataset.py
│   │       ├── internvl_stage2_finetune.py
│   │       └── trainer_monkey_patch.py
│   ├── shell/
│   │   ├── finetune/
│   │   │   ├── internvl_stage2_finetune_coco_364_bs1024_ep5.sh
│   │   │   ├── internvl_stage2_finetune_flickr_364_bs1024_ep10.sh
│   │   │   └── internvl_stage2_finetune_flickrcn_364_bs1024_ep10.sh
│   │   ├── head_finetune/
│   │   │   ├── internvl_stage2_finetune_coco_224_bs1024_ep5_head_4gpu.sh
│   │   │   ├── internvl_stage2_finetune_flickr_224_bs1024_ep10_head_4gpu.sh
│   │   │   └── internvl_stage2_finetune_flickrcn_224_bs1024_ep10_head_4gpu.sh
│   │   └── lora_finetune/
│   │       ├── internvl_stage2_finetune_coco_224_bs1024_ep5_lora16_4gpu.sh
│   │       ├── internvl_stage2_finetune_flickr_224_bs1024_ep10_lora16_4gpu.sh
│   │       └── internvl_stage2_finetune_flickrcn_224_bs1024_ep10_lora16_4gpu.sh
│   ├── zero_stage1_config.json
│   ├── zero_stage2_config.json
│   └── zero_stage3_config.json
├── requirements/
│   ├── classification.txt
│   ├── clip_benchmark.txt
│   ├── internvl_chat.txt
│   ├── segmentation.txt
│   └── streamlit_demo.txt
├── requirements.txt
├── segmentation/
│   ├── README.md
│   ├── configs/
│   │   ├── _base_/
│   │   │   ├── datasets/
│   │   │   │   ├── ade20k.py
│   │   │   │   ├── ade20k_504x504.py
│   │   │   │   ├── ade20k_504x504_1of16.py
│   │   │   │   ├── ade20k_504x504_1of2.py
│   │   │   │   ├── ade20k_504x504_1of4.py
│   │   │   │   ├── ade20k_504x504_1of8.py
│   │   │   │   ├── ade20k_640x640.py
│   │   │   │   ├── ade20k_896x896.py
│   │   │   │   ├── chase_db1.py
│   │   │   │   ├── cityscapes.py
│   │   │   │   ├── cityscapes_1024x1024.py
│   │   │   │   ├── cityscapes_768x768.py
│   │   │   │   ├── cityscapes_769x769.py
│   │   │   │   ├── cityscapes_832x832.py
│   │   │   │   ├── coco-stuff10k.py
│   │   │   │   ├── coco-stuff164k.py
│   │   │   │   ├── coco-stuff164k_896x896.py
│   │   │   │   ├── drive.py
│   │   │   │   ├── hrf.py
│   │   │   │   ├── isaid.py
│   │   │   │   ├── loveda.py
│   │   │   │   ├── pascal_context.py
│   │   │   │   ├── pascal_context_59.py
│   │   │   │   ├── pascal_voc12.py
│   │   │   │   ├── pascal_voc12_aug.py
│   │   │   │   ├── potsdam.py
│   │   │   │   ├── stare.py
│   │   │   │   └── vaihingen.py
│   │   │   ├── default_runtime.py
│   │   │   ├── models/
│   │   │   │   ├── ann_r50-d8.py
│   │   │   │   ├── apcnet_r50-d8.py
│   │   │   │   ├── bisenetv1_r18-d32.py
│   │   │   │   ├── bisenetv2.py
│   │   │   │   ├── ccnet_r50-d8.py
│   │   │   │   ├── cgnet.py
│   │   │   │   ├── danet_r50-d8.py
│   │   │   │   ├── deeplabv3_r50-d8.py
│   │   │   │   ├── deeplabv3_unet_s5-d16.py
│   │   │   │   ├── deeplabv3plus_r50-d8.py
│   │   │   │   ├── dmnet_r50-d8.py
│   │   │   │   ├── dnl_r50-d8.py
│   │   │   │   ├── dpt_vit-b16.py
│   │   │   │   ├── emanet_r50-d8.py
│   │   │   │   ├── encnet_r50-d8.py
│   │   │   │   ├── erfnet_fcn.py
│   │   │   │   ├── fast_scnn.py
│   │   │   │   ├── fastfcn_r50-d32_jpu_psp.py
│   │   │   │   ├── fcn_hr18.py
│   │   │   │   ├── fcn_r50-d8.py
│   │   │   │   ├── fcn_unet_s5-d16.py
│   │   │   │   ├── fpn_r50.py
│   │   │   │   ├── gcnet_r50-d8.py
│   │   │   │   ├── icnet_r50-d8.py
│   │   │   │   ├── isanet_r50-d8.py
│   │   │   │   ├── lraspp_m-v3-d8.py
│   │   │   │   ├── mask2former_beit.py
│   │   │   │   ├── nonlocal_r50-d8.py
│   │   │   │   ├── ocrnet_hr18.py
│   │   │   │   ├── ocrnet_r50-d8.py
│   │   │   │   ├── pointrend_r50.py
│   │   │   │   ├── psanet_r50-d8.py
│   │   │   │   ├── pspnet_r50-d8.py
│   │   │   │   ├── pspnet_unet_s5-d16.py
│   │   │   │   ├── segformer_mit-b0.py
│   │   │   │   ├── segmenter_vit-b16_mask.py
│   │   │   │   ├── setr_mla.py
│   │   │   │   ├── setr_naive.py
│   │   │   │   ├── setr_pup.py
│   │   │   │   ├── stdc.py
│   │   │   │   ├── twins_pcpvt-s_fpn.py
│   │   │   │   ├── twins_pcpvt-s_upernet.py
│   │   │   │   ├── upernet_beit.py
│   │   │   │   ├── upernet_convnext.py
│   │   │   │   ├── upernet_mae.py
│   │   │   │   ├── upernet_r50.py
│   │   │   │   ├── upernet_swin.py
│   │   │   │   └── upernet_vit-b16_ln_mln.py
│   │   │   └── schedules/
│   │   │       ├── schedule_10k.py
│   │   │       ├── schedule_160k.py
│   │   │       ├── schedule_20k.py
│   │   │       ├── schedule_320k.py
│   │   │       ├── schedule_40k.py
│   │   │       ├── schedule_5k.py
│   │   │       └── schedule_80k.py
│   │   └── intern_vit_6b/
│   │       ├── few_shot/
│   │       │   ├── linear_intern_vit_6b_504_10k_ade20k_bs16_lr4e-5_1of8.py
│   │       │   ├── linear_intern_vit_6b_504_20k_ade20k_bs16_lr4e-5_1of4.py
│   │       │   ├── linear_intern_vit_6b_504_40k_ade20k_bs16_lr4e-5_1of2.py
│   │       │   ├── linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16.py
│   │       │   └── linear_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_1of1.py
│   │       ├── full_tuning/
│   │       │   └── upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5.py
│   │       ├── head_tuning/
│   │       │   └── upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.py
│   │       └── linear_probing/
│   │           └── linear_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.py
│   ├── dist_test.sh
│   ├── dist_train.sh
│   ├── mmcv_custom/
│   │   ├── __init__.py
│   │   ├── ddp_hooks.py
│   │   └── layer_decay_optimizer_constructor.py
│   ├── mmseg_custom/
│   │   ├── __init__.py
│   │   ├── datasets/
│   │   │   ├── __init__.py
│   │   │   ├── ade.py
│   │   │   └── pipelines/
│   │   │       ├── __init__.py
│   │   │       └── transform.py
│   │   └── models/
│   │       ├── __init__.py
│   │       ├── backbones/
│   │       │   ├── __init__.py
│   │       │   ├── flash_attention.py
│   │       │   └── intern_vit_6b.py
│   │       └── decode_heads/
│   │           ├── __init__.py
│   │           └── fcn_head.py
│   ├── release.py
│   ├── slurm_test.sh
│   ├── slurm_train.sh
│   ├── test.py
│   ├── train.py
│   └── zero_configs/
│       ├── adam_fp16.json
│       ├── adam_zero1_amp.json
│       ├── adam_zero1_bf16.json
│       ├── adam_zero1_fp16.json
│       ├── adam_zero2_bf16.json
│       ├── adam_zero2_fp16.json
│       └── adam_zero3_fp16.json
├── streamlit_demo/
│   ├── .streamlit/
│   │   └── config.toml
│   ├── api.py
│   ├── app.py
│   ├── constants.py
│   ├── controller.py
│   ├── library.py
│   ├── model_worker.py
│   ├── sd_worker.py
│   └── utils.py
└── video_retrieval/
    └── test_msrvtt.py

Download .txt

Showing preview only (229K chars total). Download the full file or copy to clipboard to get everything.

SYMBOL INDEX (2771 symbols across 274 files)

FILE: classification/config.py
  function _update_config_from_file (line 226) | def _update_config_from_file(config, cfg_file):
  function update_config (line 240) | def update_config(config, args):
  function get_config (line 292) | def get_config(args):

FILE: classification/dataset/build.py
  function _pil_interp (line 22) | def _pil_interp(method):
  class TTA (line 35) | class TTA(torch.nn.Module):
    method __init__ (line 37) | def __init__(self, size, scales=[1.0, 1.05, 1.1]):
    method forward (line 42) | def forward(self, img):
    method __repr__ (line 54) | def __repr__(self) -> str:
  function build_loader (line 58) | def build_loader(config):
  function build_loader2 (line 148) | def build_loader2(config):
  function build_dataset (line 199) | def build_dataset(split, config):
  function build_transform_for_linear_probe (line 266) | def build_transform_for_linear_probe(is_train, config):
  function build_transform (line 287) | def build_transform(is_train, config):

FILE: classification/dataset/cached_image_folder.py
  function has_file_allowed_extension (line 32) | def has_file_allowed_extension(filename, extensions):
  function find_classes (line 44) | def find_classes(dir):
  function make_dataset (line 53) | def make_dataset(dir, class_to_idx, extensions):
  function make_dataset_with_ann (line 70) | def make_dataset_with_ann(ann_file, img_prefix, extensions):
  class DatasetFolder (line 85) | class DatasetFolder(data.Dataset):
    method __init__ (line 107) | def __init__(self,
    method init_cache (line 146) | def init_cache(self):
    method __getitem__ (line 170) | def __getitem__(self, index):
    method __len__ (line 186) | def __len__(self):
    method __repr__ (line 189) | def __repr__(self):
  function pil_loader (line 209) | def pil_loader(path):
  function accimage_loader (line 224) | def accimage_loader(path):
  function default_img_loader (line 233) | def default_img_loader(path):
  class CachedImageFolder (line 241) | class CachedImageFolder(DatasetFolder):
    method __init__ (line 261) | def __init__(self,
    method __getitem__ (line 280) | def __getitem__(self, index):
  class ImageCephDataset (line 299) | class ImageCephDataset(data.Dataset):
    method __init__ (line 301) | def __init__(self,
    method __getitem__ (line 324) | def __getitem__(self, index):
    method __len__ (line 335) | def __len__(self):
    method filename (line 338) | def filename(self, index, basename=False, absolute=False):
    method filenames (line 341) | def filenames(self, basename=False, absolute=False):
  class Parser (line 345) | class Parser:
    method __init__ (line 347) | def __init__(self):
    method _filename (line 351) | def _filename(self, index, basename=False, absolute=False):
    method filename (line 354) | def filename(self, index, basename=False, absolute=False):
    method filenames (line 357) | def filenames(self, basename=False, absolute=False):
  class ParserCephImage (line 364) | class ParserCephImage(Parser):
    method __init__ (line 366) | def __init__(self,
    method load_onto_memory (line 412) | def load_onto_memory(self):
    method load_onto_memory_v2 (line 427) | def load_onto_memory_v2(self):
    method __getitem__ (line 456) | def __getitem__(self, index):
    method __len__ (line 492) | def __len__(self):
    method _filename (line 495) | def _filename(self, index, basename=False, absolute=False):
  function get_temporal_info (line 502) | def get_temporal_info(date, miss_hour=False):
  function get_spatial_info (line 534) | def get_spatial_info(latitude, longitude):

FILE: classification/dataset/imagenet_real.py
  class RealLabelsImagenet (line 20) | class RealLabelsImagenet:
    method __init__ (line 22) | def __init__(self, filenames, real_json='real.json', topk=(1, 5)):
    method add_result (line 33) | def add_result(self, output):
    method get_accuracy (line 46) | def get_accuracy(self, k=None):

FILE: classification/dataset/imagenetv2.py
  class ImageNetV2Dataset (line 26) | class ImageNetV2Dataset(Dataset):
    method __init__ (line 27) | def __init__(self, variant='matched-frequency', transform=None, locati...
    method __len__ (line 52) | def __len__(self):
    method __getitem__ (line 55) | def __getitem__(self, i):

FILE: classification/dataset/samplers.py
  class SubsetRandomSampler (line 16) | class SubsetRandomSampler(torch.utils.data.Sampler):
    method __init__ (line 24) | def __init__(self, indices):
    method __iter__ (line 28) | def __iter__(self):
    method __len__ (line 31) | def __len__(self):
    method set_epoch (line 34) | def set_epoch(self, epoch):
  class NodeDistributedSampler (line 38) | class NodeDistributedSampler(Sampler):
    method __init__ (line 53) | def __init__(self,
    method __iter__ (line 85) | def __iter__(self):
    method __len__ (line 112) | def __len__(self):
    method set_epoch (line 115) | def set_epoch(self, epoch):

FILE: classification/dataset/zipreader.py
  function is_zip_path (line 17) | def is_zip_path(img_or_path):
  class ZipReader (line 22) | class ZipReader(object):
    method __init__ (line 26) | def __init__(self):
    method get_zipfile (line 30) | def get_zipfile(path):
    method split_zip_style_path (line 38) | def split_zip_style_path(path):
    method list_folder (line 48) | def list_folder(path):
    method list_files (line 66) | def list_files(path, extension=None):
    method read (line 85) | def read(path):
    method imread (line 92) | def imread(path):

FILE: classification/ddp_hooks.py
  function _allreduce_fut (line 12) | def _allreduce_fut(process_group: dist.ProcessGroup,
  function allreduce_hook (line 25) | def allreduce_hook(
  function fp16_compress_hook (line 43) | def fp16_compress_hook(
  function bf16_compress_hook (line 77) | def bf16_compress_hook(
  function fp16_compress_wrapper (line 111) | def fp16_compress_wrapper(
  function bf16_compress_wrapper (line 147) | def bf16_compress_wrapper(

FILE: classification/gflops.py
  function sa_flops (line 60) | def sa_flops(h, w, dim):
  function get_flops (line 64) | def get_flops(model, input_shape):

FILE: classification/logger.py
  function create_logger (line 16) | def create_logger(output_dir, dist_rank=0, name=''):

FILE: classification/lr_scheduler.py
  function build_scheduler (line 13) | def build_scheduler(config, optimizer, n_iter_per_epoch):
  class LinearLRScheduler (line 53) | class LinearLRScheduler(Scheduler):
    method __init__ (line 55) | def __init__(
    method _get_lr (line 89) | def _get_lr(self, t):
    method get_epoch_values (line 101) | def get_epoch_values(self, epoch: int):
    method get_update_values (line 107) | def get_update_values(self, num_updates: int):

FILE: classification/main.py
  function obsolete_torch_version (line 51) | def obsolete_torch_version(torch_version, version_threshold):
  function parse_option (line 55) | def parse_option():
  function throughput (line 146) | def throughput(data_loader, model, logger):
  function main (line 167) | def main(config):
  function train_one_epoch (line 403) | def train_one_epoch(config,
  function validate_real (line 539) | def validate_real(config, data_loader, model, real_labels, amp_autocast=...
  function validate (line 605) | def validate(config, data_loader, model, epoch=None, amp_autocast=suppre...

FILE: classification/models/build.py
  function build_model (line 11) | def build_model(config):

FILE: classification/models/clip_vit.py
  function _freeze_params (line 15) | def _freeze_params(module):
  class CrossAttention (line 20) | class CrossAttention(nn.Module):
    method __init__ (line 21) | def __init__(
    method forward (line 52) | def forward(self, x, k=None, v=None):
  class AttentiveBlock (line 85) | class AttentiveBlock(nn.Module):
    method __init__ (line 87) | def __init__(self, dim, num_heads, qkv_bias=False, qk_scale=None, drop...
    method forward (line 100) | def forward(self, x_q, x_kv, pos_q, pos_k, bool_masked_pos, rel_pos_bi...
  class AttentionPoolingBlock (line 109) | class AttentionPoolingBlock(AttentiveBlock):
    method forward (line 111) | def forward(self, x):
  class CLIPViT (line 119) | class CLIPViT(nn.Module):
    method __init__ (line 121) | def __init__(self, patch_size=14, img_size=336, pretrain_size=336, emb...
    method dtype (line 159) | def dtype(self):
    method forward_features (line 162) | def forward_features(self, x):
    method forward (line 168) | def forward(self, x):
    method lr_decay_keywords (line 181) | def lr_decay_keywords(self, decay_ratio=0.95):

FILE: classification/models/flash_attention.py
  class FlashAttention (line 14) | class FlashAttention(nn.Module):
    method __init__ (line 25) | def __init__(self, softmax_scale=None, attention_dropout=0.0, device=N...
    method forward (line 30) | def forward(self, qkv, key_padding_mask=None, causal=False, cu_seqlens...

FILE: classification/models/intern_vit_6b.py
  function _freeze_params (line 23) | def _freeze_params(module):
  class CrossAttention (line 28) | class CrossAttention(nn.Module):
    method __init__ (line 29) | def __init__(
    method forward (line 60) | def forward(self, x, k=None, v=None):
  class AttentiveBlock (line 93) | class AttentiveBlock(nn.Module):
    method __init__ (line 95) | def __init__(self, dim, num_heads, qkv_bias=False, qk_scale=None, drop...
    method forward (line 108) | def forward(self, x_q, x_kv, pos_q, pos_k, bool_masked_pos, rel_pos_bi...
  class AttentionPoolingBlock (line 117) | class AttentionPoolingBlock(AttentiveBlock):
    method forward (line 119) | def forward(self, x):
  class RMSNorm (line 127) | class RMSNorm(nn.Module):
    method __init__ (line 128) | def __init__(self, hidden_size, eps=1e-6):
    method forward (line 133) | def forward(self, hidden_states):
  class LayerScale (line 155) | class LayerScale(nn.Module):
    method __init__ (line 156) | def __init__(self, dim, init_values=1e-5, inplace=False, force_fp32=Fa...
    method forward (line 163) | def forward(self, x):
  class Attention (line 173) | class Attention(nn.Module):
    method __init__ (line 174) | def __init__(self, dim, num_heads=8, qkv_bias=False, attn_drop=0., pro...
    method _naive_attn (line 196) | def _naive_attn(self, x):
    method _flash_attn (line 215) | def _flash_attn(self, x, key_padding_mask=None, need_weights=False):
    method forward (line 232) | def forward(self, x):
  class Mlp (line 237) | class Mlp(nn.Module):
    method __init__ (line 241) | def __init__(self, in_features, hidden_features=None, out_features=Non...
    method forward (line 255) | def forward(self, x):
  class Block (line 264) | class Block(nn.Module):
    method __init__ (line 266) | def __init__(
    method forward (line 290) | def forward(self, x):
  class PatchEmbed (line 303) | class PatchEmbed(nn.Module):
    method __init__ (line 307) | def __init__(self, img_size=224, patch_size=16, in_chans=3, embed_dim=...
    method forward (line 321) | def forward(self, x, **kwargs):
  class InternViT6B (line 330) | class InternViT6B(nn.Module):
    method __init__ (line 332) | def __init__(self, in_chans=3, patch_size=14, img_size=224, pretrain_s...
    method init_weights (line 408) | def init_weights(self, pretrained=None):
    method dtype (line 438) | def dtype(self):
    method forward_features (line 441) | def forward_features(self, x):
    method forward (line 452) | def forward(self, x):
    method lr_decay_keywords (line 467) | def lr_decay_keywords(self, decay_ratio=0.95):

FILE: classification/optimizer.py
  function build_optimizer (line 11) | def build_optimizer(config, model):
  function check_keywords_in_name (line 90) | def check_keywords_in_name(name, keywords=()):
  function check_keywords_in_dict (line 98) | def check_keywords_in_dict(name, keywords_dict):
  function set_weight_decay_and_lr (line 105) | def set_weight_decay_and_lr(

FILE: classification/utils.py
  function load_ema_checkpoint (line 23) | def load_ema_checkpoint(config, model_ema, logger):
  function load_checkpoint (line 59) | def load_checkpoint(config, model, optimizer, lr_scheduler, scaler, logg...
  function load_pretrained (line 103) | def load_pretrained(config, model, logger):
  function convert_22k_head_to_1k (line 243) | def convert_22k_head_to_1k(model, logger):
  function save_checkpoint (line 263) | def save_checkpoint(config,
  function get_grad_norm (line 316) | def get_grad_norm(parameters, norm_type=2):
  function auto_resume_helper (line 329) | def auto_resume_helper(output_dir):
  function reduce_tensor (line 344) | def reduce_tensor(tensor):
  class NativeScalerWithGradNormCount (line 352) | class NativeScalerWithGradNormCount:
    method __init__ (line 355) | def __init__(self):
    method __call__ (line 358) | def __call__(self,
    method state_dict (line 380) | def state_dict(self):
    method load_state_dict (line 383) | def load_state_dict(self, state_dict):
  class MyAverageMeter (line 387) | class MyAverageMeter(object):
    method __init__ (line 390) | def __init__(self, max_len=-1):
    method update (line 398) | def update(self, val):

FILE: clip_benchmark/clip_benchmark/cli.py
  function get_parser_args (line 23) | def get_parser_args():
  function main (line 90) | def main():
  function main_build (line 98) | def main_build(base):
  function main_eval (line 119) | def main_eval(base):
  function _as_list (line 172) | def _as_list(l):
  function run (line 178) | def run(args):

FILE: clip_benchmark/clip_benchmark/datasets/birdsnap.py
  class Birdsnap (line 14) | class Birdsnap(torch.utils.data.Dataset):
    method __init__ (line 29) | def __init__(self, root, split='train', transform=None, target_transfo...
    method _check_integrity_of_metadata (line 44) | def _check_integrity_of_metadata(self, chunk_size=8192):
    method check_integrity (line 55) | def check_integrity(self):
    method download (line 69) | def download(self):
    method __len__ (line 86) | def __len__(self):
    method __getitem__ (line 90) | def __getitem__(self, index):
    method _parse_metadata (line 109) | def _parse_metadata(self):
    method _verify_image (line 127) | def _verify_image(self, idx):
    method scrape_images (line 137) | def scrape_images(self, missing_ids, chunk_size=8196):
    method _purge_missing_data (line 165) | def _purge_missing_data(self):
  class BirdsnapV2 (line 185) | class BirdsnapV2(torch.utils.data.Dataset):
    method __init__ (line 186) | def __init__(self, root, split='test', transform=None, target_transfor...
    method __len__ (line 209) | def __len__(self):
    method __getitem__ (line 213) | def __getitem__(self, index):

FILE: clip_benchmark/clip_benchmark/datasets/builder.py
  function _load_classnames_and_classification_templates (line 20) | def _load_classnames_and_classification_templates(dataset_name, current_...
  function build_dataset (line 43) | def build_dataset(dataset_name, root='root', transform=None, split='test...
  class Dummy (line 488) | class Dummy():
    method __init__ (line 490) | def __init__(self):
    method __getitem__ (line 493) | def __getitem__(self, i):
    method __len__ (line 496) | def __len__(self):
  function get_dataset_default_task (line 500) | def get_dataset_default_task(dataset):
  function get_dataset_collate_fn (line 507) | def get_dataset_collate_fn(dataset_name):
  function has_gdown (line 514) | def has_gdown():
  function has_kaggle (line 518) | def has_kaggle():
  function build_vtab_dataset (line 522) | def build_vtab_dataset(dataset_name, transform, download=True, split='te...
  function build_tfds_dataset (line 664) | def build_tfds_dataset(name, transform, download=True, split='test', dat...
  function build_wds_dataset (line 679) | def build_wds_dataset(dataset_name, transform, split='test', data_dir='r...
  function _extract_task (line 779) | def _extract_task(dataset_name):
  function image_captions_collate_fn (line 785) | def image_captions_collate_fn(batch):
  function get_dataset_collection_from_file (line 792) | def get_dataset_collection_from_file(path):

FILE: clip_benchmark/clip_benchmark/datasets/caltech101.py
  class Caltech101 (line 17) | class Caltech101(VisionDataset):
    method __init__ (line 41) | def __init__(
    method __getitem__ (line 82) | def __getitem__(self, index: int) -> Tuple[Any, Any]:
    method _check_integrity (line 125) | def _check_integrity(self) -> bool:
    method __len__ (line 129) | def __len__(self) -> int:
    method download (line 132) | def download(self) -> None:
    method extra_repr (line 150) | def extra_repr(self) -> str:
  class Caltech256 (line 154) | class Caltech256(VisionDataset):
    method __init__ (line 169) | def __init__(
    method __getitem__ (line 199) | def __getitem__(self, index: int) -> Tuple[Any, Any]:
    method _check_integrity (line 226) | def _check_integrity(self) -> bool:
    method __len__ (line 230) | def __len__(self) -> int:
    method download (line 233) | def download(self) -> None:

FILE: clip_benchmark/clip_benchmark/datasets/flickr.py
  class Flickr (line 13) | class Flickr(VisionDataset):
    method __init__ (line 15) | def __init__(
    method __getitem__ (line 36) | def __getitem__(self, index: int) -> Tuple[Any, Any]:
    method __len__ (line 58) | def __len__(self) -> int:

FILE: clip_benchmark/clip_benchmark/datasets/imagenetv2.py
  class ImageNetValDataset (line 29) | class ImageNetValDataset(Dataset):
    method __init__ (line 30) | def __init__(self, transform=None, location='.'):
    method __len__ (line 55) | def __len__(self):
    method __getitem__ (line 58) | def __getitem__(self, i):
  class ImageNetV2Dataset (line 65) | class ImageNetV2Dataset(Dataset):
    method __init__ (line 66) | def __init__(self, variant='matched-frequency', transform=None, locati...
    method __len__ (line 91) | def __len__(self):
    method __getitem__ (line 94) | def __getitem__(self, i):

FILE: clip_benchmark/clip_benchmark/datasets/kitti.py
  function _count_all_pp (line 26) | def _count_all_pp(x):
  function _count_vehicles_pp (line 34) | def _count_vehicles_pp(x):
  function _count_left_pp (line 44) | def _count_left_pp(x):
  function _count_far_pp (line 54) | def _count_far_pp(x):
  function _count_near_pp (line 65) | def _count_near_pp(x):
  function _closest_object_distance_pp (line 76) | def _closest_object_distance_pp(x):
  function _closest_vehicle_distance_pp (line 87) | def _closest_vehicle_distance_pp(x):
  function _closest_object_x_location_pp (line 103) | def _closest_object_x_location_pp(x):
  class KittiData (line 152) | class KittiData(base.ImageTfdsData):
    method __init__ (line 164) | def __init__(self, task, data_dir=None):

FILE: clip_benchmark/clip_benchmark/datasets/multilingual_mscoco.py
  class Multilingual_MSCOCO (line 21) | class Multilingual_MSCOCO(VisionDataset):
    method __init__ (line 23) | def __init__(self, root, ann_file, transform=None, target_transform=No...
    method __getitem__ (line 31) | def __getitem__(self, index):
    method __len__ (line 46) | def __len__(self) -> int:
  function _get_downloadable_file (line 50) | def _get_downloadable_file(filename, download_url, is_json=True):
  function create_annotation_file (line 60) | def create_annotation_file(root, lang_code):

FILE: clip_benchmark/clip_benchmark/datasets/objectnet.py
  function get_metadata (line 13) | def get_metadata(folder):
  class ObjectNetDataset (line 43) | class ObjectNetDataset(datasets.ImageFolder):
    method __init__ (line 45) | def __init__(self, root, transform):
    method __len__ (line 62) | def __len__(self):
    method __getitem__ (line 65) | def __getitem__(self, index):

FILE: clip_benchmark/clip_benchmark/datasets/tfds.py
  function download_tfds_dataset (line 5) | def download_tfds_dataset(name, data_dir=None):
  function disable_gpus_on_tensorflow (line 11) | def disable_gpus_on_tensorflow():
  class VTABIterableDataset (line 16) | class VTABIterableDataset(torch.utils.data.IterableDataset):
    method __init__ (line 18) | def __init__(self, tfds_dataset, split='test', input_name='image', lab...
    method __iter__ (line 33) | def __iter__(self):
    method __len__ (line 50) | def __len__(self):

FILE: clip_benchmark/clip_benchmark/datasets/tools.py
  function process_single_caption (line 4) | def process_single_caption(caption, max_words=50):
  function pre_caption (line 17) | def pre_caption(caption, max_words=50):

FILE: clip_benchmark/clip_benchmark/datasets/voc2007.py
  function download_url (line 35) | def download_url(url, path):
  function download_voc2007 (line 40) | def download_voc2007(root):
  function read_split (line 137) | def read_split(root, dataset, split):
  function read_bndbox (line 152) | def read_bndbox(root, dataset, paths):
  class PASCALVoc2007 (line 170) | class PASCALVoc2007(data.Dataset):
    method __init__ (line 177) | def __init__(self, root, set, transform=None, download=False, target_t...
    method __getitem__ (line 200) | def __getitem__(self, index):
    method __len__ (line 210) | def __len__(self):
  class PASCALVoc2007Cropped (line 214) | class PASCALVoc2007Cropped(data.Dataset):
    method __init__ (line 222) | def __init__(self, root, set, transform=None, download=False, target_t...
    method __getitem__ (line 240) | def __getitem__(self, index):
    method __len__ (line 250) | def __len__(self):

FILE: clip_benchmark/clip_benchmark/metrics/linear_probe.py
  function assign_learning_rate (line 15) | def assign_learning_rate(param_group, new_lr):
  function _warmup_lr (line 19) | def _warmup_lr(base_lr, warmup_length, step):
  function cosine_lr (line 23) | def cosine_lr(optimizer, base_lrs, warmup_length, steps):
  class Featurizer (line 41) | class Featurizer(torch.nn.Module):
    method __init__ (line 42) | def __init__(self, model):
    method forward (line 46) | def forward(self, input):
  class FeatureDataset (line 53) | class FeatureDataset(Dataset):
    method __init__ (line 54) | def __init__(self, features, targets):
    method __len__ (line 58) | def __len__(self):
    method __getitem__ (line 61) | def __getitem__(self, i):
  function evaluate (line 65) | def evaluate(model, train_dataloader, dataloader, fewshot_k, batch_size,...

FILE: clip_benchmark/clip_benchmark/metrics/mscoco_generative.py
  function evaluate (line 8) | def evaluate(model, dataloader, batch_size, device, transform, train_dat...

FILE: clip_benchmark/clip_benchmark/metrics/zeroshot_classification.py
  function zero_shot_classifier (line 13) | def zero_shot_classifier(model, tokenizer, classnames, templates, device...
  function accuracy (line 54) | def accuracy(output, target, topk=(1,)):
  function run_classification (line 79) | def run_classification(model, classifier, dataloader, device, amp=True):
  function average_precision_per_class (line 120) | def average_precision_per_class(scores, targets):
  function evaluate (line 161) | def evaluate(model, dataloader, tokenizer, classnames, templates, device...

FILE: clip_benchmark/clip_benchmark/metrics/zeroshot_retrieval.py
  function evaluate (line 8) | def evaluate(model, dataloader, tokenizer, device, amp=True, recall_k_li...
  function dataloader_with_indices (line 93) | def dataloader_with_indices(dataloader):
  function recall_at_k (line 102) | def recall_at_k(scores, positive_pairs, k):
  function batchify (line 126) | def batchify(func, X, Y, batch_size, device, *args, **kwargs):

FILE: clip_benchmark/clip_benchmark/model_collection.py
  function get_model_collection_from_file (line 4) | def get_model_collection_from_file(path):

FILE: clip_benchmark/clip_benchmark/models/__init__.py
  function load_clip (line 18) | def load_clip(

FILE: clip_benchmark/clip_benchmark/models/intern_vit_6b/configuration_intern_vit.py
  class InternVisionConfig (line 15) | class InternVisionConfig(PretrainedConfig):
    method __init__ (line 63) | def __init__(
    method from_pretrained (line 105) | def from_pretrained(cls, pretrained_model_name_or_path: Union[str, os....

FILE: clip_benchmark/clip_benchmark/models/intern_vit_6b/flash_attention.py
  class FlashAttention (line 14) | class FlashAttention(nn.Module):
    method __init__ (line 25) | def __init__(self, softmax_scale=None, attention_dropout=0.0, device=N...
    method forward (line 30) | def forward(self, qkv, key_padding_mask=None, causal=False, cu_seqlens...

FILE: clip_benchmark/clip_benchmark/models/intern_vit_6b/modeling_intern_vit.py
  class InternRMSNorm (line 33) | class InternRMSNorm(nn.Module):
    method __init__ (line 34) | def __init__(self, hidden_size, eps=1e-6):
    method forward (line 39) | def forward(self, hidden_states):
  class InternVisionEmbeddings (line 61) | class InternVisionEmbeddings(nn.Module):
    method __init__ (line 62) | def __init__(self, config: InternVisionConfig):
    method forward (line 82) | def forward(self, pixel_values: torch.FloatTensor) -> torch.Tensor:
  class InternAttention (line 93) | class InternAttention(nn.Module):
    method __init__ (line 96) | def __init__(self, config: InternVisionConfig):
    method _naive_attn (line 126) | def _naive_attn(self, x):
    method _flash_attn (line 145) | def _flash_attn(self, x, key_padding_mask=None, need_weights=False):
    method forward (line 162) | def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
  class InternMLP (line 167) | class InternMLP(nn.Module):
    method __init__ (line 168) | def __init__(self, config: InternVisionConfig):
    method forward (line 175) | def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
  class InternVisionEncoderLayer (line 182) | class InternVisionEncoderLayer(nn.Module):
    method __init__ (line 183) | def __init__(self, config: InternVisionConfig, drop_path_rate: float):
    method forward (line 198) | def forward(
  class InternVisionEncoder (line 213) | class InternVisionEncoder(nn.Module):
    method __init__ (line 223) | def __init__(self, config: InternVisionConfig):
    method forward (line 232) | def forward(
  class InternVisionModel (line 279) | class InternVisionModel(PreTrainedModel):
    method __init__ (line 283) | def __init__(self, config: InternVisionConfig):
    method resize_pos_embeddings (line 290) | def resize_pos_embeddings(self, old_size, new_size, patch_size):
    method get_input_embeddings (line 301) | def get_input_embeddings(self):
    method forward (line 304) | def forward(

FILE: clip_benchmark/clip_benchmark/models/internvl.py
  function load_internvl (line 12) | def load_internvl(model_name, pretrained, cache_dir, device):

FILE: clip_benchmark/clip_benchmark/models/internvl_c_pytorch/__init__.py
  class InternVLTokenizer (line 23) | class InternVLTokenizer(nn.Module):
    method __init__ (line 24) | def __init__(self, model_path):
    method forward (line 30) | def forward(self, text, prefix='summarize:'):
  function build_transform (line 39) | def build_transform(task, image_size=224, mean=[0.485, 0.456, 0.406], st...
  function get_model_and_transform (line 56) | def get_model_and_transform(task, image_size, device):
  function load_internvl_c_pytorch (line 65) | def load_internvl_c_pytorch(ckpt_path, device, task, image_size=224):

FILE: clip_benchmark/clip_benchmark/models/internvl_c_pytorch/flash_attention.py
  class FlashAttention (line 15) | class FlashAttention(nn.Module):
    method __init__ (line 26) | def __init__(self, softmax_scale=None, attention_dropout=0.0, device=N...
    method forward (line 31) | def forward(self, qkv, key_padding_mask=None, causal=False, cu_seqlens...

FILE: clip_benchmark/clip_benchmark/models/internvl_c_pytorch/internvl_c.py
  class CrossAttention (line 25) | class CrossAttention(nn.Module):
    method __init__ (line 26) | def __init__(
    method forward (line 57) | def forward(self, x, k=None, v=None):
  class AttentiveBlock (line 90) | class AttentiveBlock(nn.Module):
    method __init__ (line 92) | def __init__(self, dim, num_heads, qkv_bias=False, qk_scale=None, drop...
    method forward (line 105) | def forward(self, x_q, x_kv, pos_q, pos_k, bool_masked_pos, rel_pos_bi...
  class AttentionPoolingBlock (line 114) | class AttentionPoolingBlock(AttentiveBlock):
    method forward (line 116) | def forward(self, x):
  class RMSNorm (line 124) | class RMSNorm(nn.Module):
    method __init__ (line 125) | def __init__(self, hidden_size, eps=1e-6):
    method forward (line 130) | def forward(self, hidden_states):
  class LayerScale (line 152) | class LayerScale(nn.Module):
    method __init__ (line 153) | def __init__(self, dim, init_values=1e-5, inplace=False, force_fp32=Fa...
    method forward (line 160) | def forward(self, x):
  class Attention (line 170) | class Attention(nn.Module):
    method __init__ (line 171) | def __init__(self, dim, num_heads=8, qkv_bias=False, attn_drop=0., pro...
    method _naive_attn (line 193) | def _naive_attn(self, x):
    method _flash_attn (line 212) | def _flash_attn(self, x, key_padding_mask=None, need_weights=False):
    method forward (line 229) | def forward(self, x):
  class Mlp (line 234) | class Mlp(nn.Module):
    method __init__ (line 238) | def __init__(self, in_features, hidden_features=None, out_features=Non...
    method forward (line 252) | def forward(self, x):
  class Block (line 261) | class Block(nn.Module):
    method __init__ (line 263) | def __init__(
    method forward (line 287) | def forward(self, x):
  class PatchEmbed (line 300) | class PatchEmbed(nn.Module):
    method __init__ (line 304) | def __init__(self, img_size=224, patch_size=16, in_chans=3, embed_dim=...
    method forward (line 318) | def forward(self, x, **kwargs):
  class InternVL_C (line 327) | class InternVL_C(nn.Module):
    method __init__ (line 328) | def __init__(self, in_chans=3, patch_size=14, img_size=224, qkv_bias=F...
    method dtype (line 377) | def dtype(self):
    method forward_features (line 380) | def forward_features(self, x):
    method encode_image (line 391) | def encode_image(self, image):
    method encode_text (line 396) | def encode_text(self, text):
    method forward (line 403) | def forward(self, image, text):

FILE: clip_benchmark/clip_benchmark/models/internvl_huggingface/__init__.py
  class InternVLTokenizer (line 23) | class InternVLTokenizer(nn.Module):
    method __init__ (line 24) | def __init__(self, model_path):
    method forward (line 30) | def forward(self, text, prefix='summarize:'):
  function build_transform (line 39) | def build_transform(task, image_size=224, mean=[0.485, 0.456, 0.406], st...
  function load_internvl_c_huggingface (line 56) | def load_internvl_c_huggingface(ckpt_path, device, task):
  function load_internvl_g_huggingface (line 73) | def load_internvl_g_huggingface(ckpt_path, device, task):

FILE: clip_benchmark/clip_benchmark/models/internvl_huggingface/configuration_intern_vit.py
  class InternVisionConfig (line 15) | class InternVisionConfig(PretrainedConfig):
    method __init__ (line 63) | def __init__(
    method from_pretrained (line 105) | def from_pretrained(cls, pretrained_model_name_or_path: Union[str, os....

FILE: clip_benchmark/clip_benchmark/models/internvl_huggingface/configuration_internvl.py
  class InternVLConfig (line 17) | class InternVLConfig(PretrainedConfig):
    method __init__ (line 57) | def __init__(
    method to_dict (line 97) | def to_dict(self):

FILE: clip_benchmark/clip_benchmark/models/internvl_huggingface/flash_attention.py
  class FlashAttention (line 15) | class FlashAttention(nn.Module):
    method __init__ (line 26) | def __init__(self, softmax_scale=None, attention_dropout=0.0, device=N...
    method forward (line 31) | def forward(self, qkv, key_padding_mask=None, causal=False, cu_seqlens...

FILE: clip_benchmark/clip_benchmark/models/internvl_huggingface/modeling_intern_vit.py
  class InternRMSNorm (line 33) | class InternRMSNorm(nn.Module):
    method __init__ (line 34) | def __init__(self, hidden_size, eps=1e-6):
    method forward (line 39) | def forward(self, hidden_states):
  class InternVisionEmbeddings (line 61) | class InternVisionEmbeddings(nn.Module):
    method __init__ (line 62) | def __init__(self, config: InternVisionConfig):
    method forward (line 82) | def forward(self, pixel_values: torch.FloatTensor) -> torch.Tensor:
  class InternAttention (line 93) | class InternAttention(nn.Module):
    method __init__ (line 96) | def __init__(self, config: InternVisionConfig):
    method _naive_attn (line 126) | def _naive_attn(self, x):
    method _flash_attn (line 145) | def _flash_attn(self, x, key_padding_mask=None, need_weights=False):
    method forward (line 162) | def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
  class InternMLP (line 167) | class InternMLP(nn.Module):
    method __init__ (line 168) | def __init__(self, config: InternVisionConfig):
    method forward (line 175) | def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
  class InternVisionEncoderLayer (line 182) | class InternVisionEncoderLayer(nn.Module):
    method __init__ (line 183) | def __init__(self, config: InternVisionConfig, drop_path_rate: float):
    method forward (line 198) | def forward(
  class InternVisionEncoder (line 213) | class InternVisionEncoder(nn.Module):
    method __init__ (line 223) | def __init__(self, config: InternVisionConfig):
    method forward (line 232) | def forward(
  class InternVisionModel (line 279) | class InternVisionModel(PreTrainedModel):
    method __init__ (line 283) | def __init__(self, config: InternVisionConfig):
    method resize_pos_embeddings (line 290) | def resize_pos_embeddings(self, old_size, new_size, patch_size):
    method get_input_embeddings (line 301) | def get_input_embeddings(self):
    method forward (line 304) | def forward(

FILE: clip_benchmark/clip_benchmark/models/internvl_huggingface/modeling_internvl.py
  class InternVLPreTrainedModel (line 33) | class InternVLPreTrainedModel(PreTrainedModel):
    method _init_weights (line 49) | def _init_weights(self, module):
    method _set_gradient_checkpointing (line 67) | def _set_gradient_checkpointing(self, module, value=False):
  class CrossAttention (line 74) | class CrossAttention(nn.Module):
    method __init__ (line 75) | def __init__(
    method forward (line 106) | def forward(self, x, k=None, v=None):
  class AttentiveBlock (line 139) | class AttentiveBlock(nn.Module):
    method __init__ (line 141) | def __init__(self, dim, num_heads, qkv_bias=False, qk_scale=None, drop...
    method forward (line 154) | def forward(self, x_q, x_kv, pos_q, pos_k, bool_masked_pos, rel_pos_bi...
  class AttentionPoolingBlock (line 163) | class AttentionPoolingBlock(AttentiveBlock):
    method forward (line 165) | def forward(self, x):
  class InternVLModel (line 173) | class InternVLModel(InternVLPreTrainedModel):
    method __init__ (line 177) | def __init__(self, config: InternVLConfig):
    method wrap_backbone_lora (line 219) | def wrap_backbone_lora(self, r=128, lora_alpha=256, lora_dropout=0.05):
    method wrap_qllama_lora (line 229) | def wrap_qllama_lora(self, r=128, lora_alpha=256, lora_dropout=0.05):
    method get_input_embeddings (line 240) | def get_input_embeddings(self):
    method set_input_embeddings (line 243) | def set_input_embeddings(self, value):
    method set_output_embeddings (line 246) | def set_output_embeddings(self, new_embeddings):
    method get_output_embeddings (line 249) | def get_output_embeddings(self) -> nn.Module:
    method generate (line 253) | def generate(
    method get_text_features (line 288) | def get_text_features(
    method get_image_features (line 337) | def get_image_features(
    method encode_image (line 383) | def encode_image(self, image, mode):
    method encode_text (line 407) | def encode_text(self, text):
    method forward (line 420) | def forward(self, image, text, mode='InternVL-C'):
  class InternVL_C (line 437) | class InternVL_C(InternVLModel):
    method encode_image (line 439) | def encode_image(self, image):
    method encode_text (line 448) | def encode_text(self, text):
    method forward (line 461) | def forward(self, image, text):
  class InternVL_G (line 477) | class InternVL_G(InternVLModel):
    method encode_image (line 479) | def encode_image(self, image):
    method encode_text (line 493) | def encode_text(self, text):
    method forward (line 506) | def forward(self, image, text):

FILE: clip_benchmark/clip_benchmark/models/internvl_huggingface/modeling_qllama.py
  function _make_causal_mask (line 42) | def _make_causal_mask(
  function _expand_mask (line 60) | def _expand_mask(mask: torch.Tensor, dtype: torch.dtype, tgt_len: Option...
  class LlamaRMSNorm (line 74) | class LlamaRMSNorm(nn.Module):
    method __init__ (line 75) | def __init__(self, hidden_size, eps=1e-6):
    method forward (line 83) | def forward(self, hidden_states):
  class LlamaRotaryEmbedding (line 109) | class LlamaRotaryEmbedding(torch.nn.Module):
    method __init__ (line 110) | def __init__(self, dim, max_position_embeddings=2048, base=10000, devi...
    method forward (line 124) | def forward(self, x, seq_len=None):
  class FixedLlamaRotaryEmbedding (line 141) | class FixedLlamaRotaryEmbedding(torch.nn.Module):
    method __init__ (line 142) | def __init__(self, dim, max_position_embeddings=2048, base=10000, devi...
    method _set_cos_sin_cache (line 155) | def _set_cos_sin_cache(self, seq_len, device, dtype):
    method forward (line 165) | def forward(self, x, seq_len=None):
  function rotate_half (line 179) | def rotate_half(x):
  function apply_rotary_pos_emb (line 186) | def apply_rotary_pos_emb(q, k, cos, sin, position_ids):
  class LlamaMLP (line 196) | class LlamaMLP(nn.Module):
    method __init__ (line 197) | def __init__(
    method forward (line 209) | def forward(self, x):
  class LlamaAttention (line 213) | class LlamaAttention(nn.Module):
    method __init__ (line 216) | def __init__(self, config: LlamaConfig):
    method _shape (line 235) | def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int):
    method forward (line 238) | def forward(
  class LlamaCrossAttention (line 304) | class LlamaCrossAttention(nn.Module):
    method __init__ (line 307) | def __init__(self, config: LlamaConfig):
    method _shape (line 329) | def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int):
    method forward (line 332) | def forward(
  class LlamaDecoderLayer (line 408) | class LlamaDecoderLayer(nn.Module):
    method __init__ (line 409) | def __init__(self, config: LlamaConfig, use_cross_attn: bool):
    method forward (line 423) | def forward(
  class LlamaPreTrainedModel (line 520) | class LlamaPreTrainedModel(PreTrainedModel):
    method _init_weights (line 527) | def _init_weights(self, module):
    method _set_gradient_checkpointing (line 538) | def _set_gradient_checkpointing(self, module, value=False):
  class LlamaModel (line 613) | class LlamaModel(LlamaPreTrainedModel):
    method __init__ (line 621) | def __init__(self, config: LlamaConfig):
    method get_input_embeddings (line 636) | def get_input_embeddings(self):
    method set_input_embeddings (line 639) | def set_input_embeddings(self, value):
    method _prepare_decoder_attention_mask (line 643) | def _prepare_decoder_attention_mask(self, attention_mask, input_shape,...
    method forward (line 667) | def forward(
    method forward_train (line 783) | def forward_train(
  class LlamaForCausalLM (line 915) | class LlamaForCausalLM(LlamaPreTrainedModel):
    method __init__ (line 916) | def __init__(self, config):
    method get_input_embeddings (line 925) | def get_input_embeddings(self):
    method set_input_embeddings (line 928) | def set_input_embeddings(self, value):
    method get_output_embeddings (line 931) | def get_output_embeddings(self):
    method set_output_embeddings (line 934) | def set_output_embeddings(self, new_embeddings):
    method set_decoder (line 937) | def set_decoder(self, decoder):
    method get_decoder (line 940) | def get_decoder(self):
    method forward (line 945) | def forward(
    method prepare_inputs_for_generation (line 1035) | def prepare_inputs_for_generation(
    method _reorder_cache (line 1069) | def _reorder_cache(past_key_values, beam_idx):

FILE: clip_benchmark/clip_benchmark/models/japanese_clip.py
  class DictTensor (line 6) | class DictTensor:
    method __init__ (line 11) | def __init__(self, d: Dict[str, torch.Tensor]):
    method to (line 14) | def to(self, device):
  class JaCLIPForBenchmark (line 18) | class JaCLIPForBenchmark:
    method __init__ (line 23) | def __init__(self, model):
    method encode_text (line 26) | def encode_text(self, dict_tensor):
    method encode_image (line 29) | def encode_image(self, image):
  function load_japanese_clip (line 33) | def load_japanese_clip(pretrained: str, device='cpu', **kwargs):

FILE: clip_benchmark/clip_benchmark/models/open_clip.py
  function load_open_clip (line 4) | def load_open_clip(model_name: str = 'ViT-B-32-quickgelu', pretrained: s...

FILE: clip_benchmark/clip_benchmark/webdataset_builder.py
  function get_parser_args (line 16) | def get_parser_args():
  function main (line 52) | def main():
  function run (line 57) | def run(args):
  function PIL_to_bytes (line 92) | def PIL_to_bytes(image_format):
  function path_to_bytes (line 107) | def path_to_bytes(filepath):
  function convert_dataset (line 112) | def convert_dataset(dataset, split, output_folder, *, transform=None,
  function convert_retrieval_dataset (line 213) | def convert_retrieval_dataset(dataset, split, output_folder, *, transfor...

FILE: clip_benchmark/probe_benchmark/build_df_scaling_experiments.py
  function get_us_dataset (line 70) | def get_us_dataset(pretrained):

FILE: clip_benchmark/setup.py
  function load_requirements (line 14) | def load_requirements(f):

FILE: clip_benchmark/tests/test_clip_benchmark.py
  class base_args (line 11) | class base_args:
  function test_base (line 38) | def test_base():

FILE: internvl_chat/eval/caption/evaluate_caption.py
  class CaptionDataset (line 40) | class CaptionDataset(torch.utils.data.Dataset):
    method __init__ (line 42) | def __init__(self, name, root, annotation, prompt, input_size=224, dyn...
    method __len__ (line 57) | def __len__(self):
    method __getitem__ (line 60) | def __getitem__(self, idx):
  function collate_fn (line 89) | def collate_fn(inputs, tokenizer):
  class InferenceSampler (line 98) | class InferenceSampler(torch.utils.data.sampler.Sampler):
    method __init__ (line 100) | def __init__(self, size):
    method _get_local_indices (line 108) | def _get_local_indices(total_size, world_size, rank):
    method __iter__ (line 117) | def __iter__(self):
    method __len__ (line 120) | def __len__(self):
  function evaluate_chat_model (line 124) | def evaluate_chat_model():

FILE: internvl_chat/eval/domain_specific/drivelm/evaluate.py
  function post_process (line 27) | def post_process(pred):
  function collate_fn (line 68) | def collate_fn(batches, tokenizer):
  class DriveLMDataset (line 77) | class DriveLMDataset(torch.utils.data.Dataset):
    method __init__ (line 79) | def __init__(self, root, split, prompt, image_path, input_size=224, dy...
    method __len__ (line 94) | def __len__(self):
    method __getitem__ (line 97) | def __getitem__(self, idx):
  class InferenceSampler (line 128) | class InferenceSampler(torch.utils.data.sampler.Sampler):
    method __init__ (line 130) | def __init__(self, size):
    method _get_local_indices (line 138) | def _get_local_indices(total_size, world_size, rank):
    method __iter__ (line 147) | def __iter__(self):
    method __len__ (line 150) | def __len__(self):
  function evaluate_chat_model (line 154) | def evaluate_chat_model():

FILE: internvl_chat/eval/domain_specific/mme_rw/evaluate.py
  function collate_fn (line 29) | def collate_fn(batches, tokenizer):
  class MMERealworldDataset (line 40) | class MMERealworldDataset(torch.utils.data.Dataset):
    method __init__ (line 42) | def __init__(self, root, prompt, language, subtask: Literal[
    method __len__ (line 59) | def __len__(self):
    method __getitem__ (line 62) | def __getitem__(self, idx):
  class InferenceSampler (line 100) | class InferenceSampler(torch.utils.data.sampler.Sampler):
    method __init__ (line 102) | def __init__(self, size):
    method _get_local_indices (line 110) | def _get_local_indices(total_size, world_size, rank):
    method __iter__ (line 119) | def __iter__(self):
    method __len__ (line 122) | def __len__(self):
  function post_process (line 126) | def post_process(s, choices):
  function evaluate (line 152) | def evaluate(outputs):
  function evaluate_chat_model (line 190) | def evaluate_chat_model():

FILE: internvl_chat/eval/domain_specific/rs_det/caculate.py
  function calculate_iou (line 9) | def calculate_iou(box1, box2):
  function box_iou (line 32) | def box_iou(boxes1, boxes2):
  function transform_bbox (line 48) | def transform_bbox(bbox, image_size):
  function evaluation_metrics (line 59) | def evaluation_metrics(outputs):

FILE: internvl_chat/eval/domain_specific/rs_det/evaluate.py
  function collate_fn (line 26) | def collate_fn(batches, tokenizer):
  class GroundingDataset (line 35) | class GroundingDataset(torch.utils.data.Dataset):
    method __init__ (line 37) | def __init__(self, root, image_root, prompt='', input_size=224, dynami...
    method __len__ (line 50) | def __len__(self):
    method __getitem__ (line 53) | def __getitem__(self, idx):
  function calculate_iou (line 80) | def calculate_iou(box1, box2):
  class InferenceSampler (line 103) | class InferenceSampler(torch.utils.data.sampler.Sampler):
    method __init__ (line 105) | def __init__(self, size):
    method _get_local_indices (line 113) | def _get_local_indices(total_size, world_size, rank):
    method __iter__ (line 122) | def __iter__(self):
    method __len__ (line 125) | def __len__(self):
  function evaluate_chat_model (line 129) | def evaluate_chat_model():

FILE: internvl_chat/eval/domain_specific/rs_vqa/evaluate.py
  function collate_fn (line 40) | def collate_fn(batches, tokenizer):
  class RSVQADataset (line 50) | class RSVQADataset(torch.utils.data.Dataset):
    method __init__ (line 52) | def __init__(self, root, prompt, image_root, input_size=224, dynamic_i...
    method __len__ (line 65) | def __len__(self):
    method __getitem__ (line 68) | def __getitem__(self, idx):
  function evaluation_metrics (line 96) | def evaluation_metrics(outputs):
  class InferenceSampler (line 121) | class InferenceSampler(torch.utils.data.sampler.Sampler):
    method __init__ (line 123) | def __init__(self, size):
    method _get_local_indices (line 131) | def _get_local_indices(total_size, world_size, rank):
    method __iter__ (line 140) | def __iter__(self):
    method __len__ (line 143) | def __len__(self):
  function evaluate_chat_model (line 147) | def evaluate_chat_model():

FILE: internvl_chat/eval/domain_specific/rs_vqa/score.py
  function is_correct_count (line 5) | def is_correct_count(response, answer):
  function is_correct_area (line 23) | def is_correct_area(response, answer):
  function calculate_scores (line 32) | def calculate_scores(data):

FILE: internvl_chat/eval/llava_bench/eval_gpt_review_bench.py
  function get_eval (line 11) | def get_eval(content: str, max_tokens: int):
  function parse_score (line 34) | def parse_score(review):

FILE: internvl_chat/eval/llava_bench/evaluate_llava_bench.py
  class VQADataset (line 22) | class VQADataset(torch.utils.data.Dataset):
    method __init__ (line 24) | def __init__(self, root, data, prompt, input_size=224, dynamic_image_s...
    method __len__ (line 35) | def __len__(self):
    method __getitem__ (line 38) | def __getitem__(self, idx):
  function evaluate_chat_model (line 57) | def evaluate_chat_model():

FILE: internvl_chat/eval/llava_bench/summarize_gpt_review.py
  function parse_args (line 9) | def parse_args():

FILE: internvl_chat/eval/mantis_eval/evaluate_mantis.py
  function collate_fn (line 26) | def collate_fn(batches, tokenizer):
  class MantisEvalDataset (line 36) | class MantisEvalDataset(torch.utils.data.Dataset):
    method __init__ (line 38) | def __init__(self, root, split, prompt, input_size=224, dynamic_image_...
    method __len__ (line 49) | def __len__(self):
    method __getitem__ (line 52) | def __getitem__(self, idx):
  class InferenceSampler (line 103) | class InferenceSampler(torch.utils.data.sampler.Sampler):
    method __init__ (line 105) | def __init__(self, size):
    method _get_local_indices (line 113) | def _get_local_indices(total_size, world_size, rank):
    method __iter__ (line 122) | def __iter__(self):
    method __len__ (line 125) | def __len__(self):
  function evaluate_chat_model (line 129) | def evaluate_chat_model():

FILE: internvl_chat/eval/mathvista/calculate_score.py
  function get_most_similar (line 9) | def get_most_similar(prediction, choices):
  function normalize_extracted_answer (line 19) | def normalize_extracted_answer(extraction, choices, question_type, answe...
  function safe_equal (line 70) | def safe_equal(prediction, answer):
  function get_acc_with_contion (line 83) | def get_acc_with_contion(res_pd, key, value):

FILE: internvl_chat/eval/mathvista/evaluate_mathvista.py
  function collate_fn (line 42) | def collate_fn(batches, tokenizer):
  class MathVistaDataset (line 48) | class MathVistaDataset(torch.utils.data.Dataset):
    method __init__ (line 50) | def __init__(self, root, split, input_size=224, dynamic_image_size=False,
    method __len__ (line 60) | def __len__(self):
    method __getitem__ (line 63) | def __getitem__(self, idx):
  class InferenceSampler (line 83) | class InferenceSampler(torch.utils.data.sampler.Sampler):
    method __init__ (line 85) | def __init__(self, size):
    method _get_local_indices (line 93) | def _get_local_indices(total_size, world_size, rank):
    method __iter__ (line 102) | def __iter__(self):
    method __len__ (line 105) | def __len__(self):
  function evaluate_chat_model (line 109) | def evaluate_chat_model():

FILE: internvl_chat/eval/mathvista/extract_answer.py
  function verify_extraction (line 21) | def verify_extraction(extraction):
  function create_test_prompt (line 28) | def create_test_prompt(demo_prompt, query, response):
  function _extract_answer (line 35) | def _extract_answer(text):
  function extract_answer (line 42) | def extract_answer(response, problem, quick_extract=False):

FILE: internvl_chat/eval/mathvista/utilities.py
  function create_dir (line 14) | def create_dir(output_dir):
  function read_csv (line 19) | def read_csv(file):
  function read_pandas_csv (line 27) | def read_pandas_csv(csv_path):
  function read_json (line 34) | def read_json(path):
  function read_jsonl (line 39) | def read_jsonl(file):
  function read_pickle (line 45) | def read_pickle(path):
  function save_json (line 50) | def save_json(data, path):
  function save_array_img (line 55) | def save_array_img(path, image):
  function contains_digit (line 59) | def contains_digit(text):
  function contains_number_word (line 66) | def contains_number_word(text):
  function contains_quantity_word (line 86) | def contains_quantity_word(text, special_keep_words=[]):
  function is_bool_word (line 115) | def is_bool_word(text):
  function is_digit_string (line 123) | def is_digit_string(text):
  function is_float_string (line 134) | def is_float_string(text):
  function copy_image (line 145) | def copy_image(image_path, output_image_path):
  function copy_dir (line 150) | def copy_dir(src_dir, dst_dir):
  function get_image_size (line 160) | def get_image_size(img_path):
  function get_chat_response (line 166) | def get_chat_response(promot, api_key, model='gpt-3.5-turbo', temperatur...

FILE: internvl_chat/eval/mirb/evaluate_mirb.py
  function eval_scores (line 42) | def eval_scores(results, dataset):
  function exact_yes_no (line 54) | def exact_yes_no(results):
  function exact_in_match (line 74) | def exact_in_match(results):
  function exact_match (line 107) | def exact_match(results, dataset):
  function collate_fn (line 139) | def collate_fn(batches, tokenizer):
  function get_task_instruction (line 151) | def get_task_instruction(dataset):
  class MIRBDataset (line 166) | class MIRBDataset(torch.utils.data.Dataset):
    method __init__ (line 168) | def __init__(self, root, split, input_size=224, dynamic_image_size=False,
    method __len__ (line 188) | def __len__(self):
    method __getitem__ (line 191) | def __getitem__(self, idx):
  class InferenceSampler (line 237) | class InferenceSampler(torch.utils.data.sampler.Sampler):
    method __init__ (line 239) | def __init__(self, size):
    method _get_local_indices (line 247) | def _get_local_indices(total_size, world_size, rank):
    method __iter__ (line 256) | def __iter__(self):
    method __len__ (line 259) | def __len__(self):
  function evaluate_chat_model (line 263) | def evaluate_chat_model():

FILE: internvl_chat/eval/mmbench/evaluate_mmbench.py
  function collate_fn (line 64) | def collate_fn(batches, tokenizer):
  class MMBenchDataset (line 73) | class MMBenchDataset(torch.utils.data.Dataset):
    method __init__ (line 75) | def __init__(self, root, prompt, language, input_size=224, dynamic_ima...
    method __len__ (line 86) | def __len__(self):
    method __getitem__ (line 89) | def __getitem__(self, idx):
    method load_from_df (line 132) | def load_from_df(self, idx, key):
  class InferenceSampler (line 139) | class InferenceSampler(torch.utils.data.sampler.Sampler):
    method __init__ (line 141) | def __init__(self, size):
    method _get_local_indices (line 149) | def _get_local_indices(total_size, world_size, rank):
    method __iter__ (line 158) | def __iter__(self):
    method __len__ (line 161) | def __len__(self):
  function post_process (line 165) | def post_process(pred, option):
  function evaluate_chat_model (line 180) | def evaluate_chat_model():

FILE: internvl_chat/eval/mme/calculation.py
  class calculate_metrics (line 16) | class calculate_metrics:
    method divide_chunks (line 17) | def divide_chunks(self, l, n=2):
    method parse_pred_ans (line 24) | def parse_pred_ans(self, pred_ans):
    method compute_metric (line 40) | def compute_metric(self, gts, preds):
    method process_result (line 84) | def process_result(self, results_dir):

FILE: internvl_chat/eval/mme/eval.py
  function load_image (line 12) | def load_image(image_file, input_size=224):
  function post_processing (line 26) | def post_processing(response):

FILE: internvl_chat/eval/mmhal/evaluate_mmhal.py
  function collate_fn (line 29) | def collate_fn(batches, tokenizer):
  class VQADataset (line 39) | class VQADataset(torch.utils.data.Dataset):
    method __init__ (line 41) | def __init__(
    method __len__ (line 58) | def __len__(self):
    method __getitem__ (line 61) | def __getitem__(self, idx):
  class InferenceSampler (line 97) | class InferenceSampler(torch.utils.data.sampler.Sampler):
    method __init__ (line 99) | def __init__(self, size):
    method _get_local_indices (line 107) | def _get_local_indices(total_size, world_size, rank):
    method __iter__ (line 116) | def __iter__(self):
    method __len__ (line 119) | def __len__(self):
  function evaluate_chat_model (line 123) | def evaluate_chat_model():

FILE: internvl_chat/eval/mmiu/evaluate_mmiu.py
  function collate_fn (line 25) | def collate_fn(batches, tokenizer):
  class MMIUDataset (line 35) | class MMIUDataset(torch.utils.data.Dataset):
    method __init__ (line 37) | def __init__(self, meta, input_size=224, dynamic_image_size=False,
    method __len__ (line 51) | def __len__(self):
    method __getitem__ (line 54) | def __getitem__(self, idx):
  class InferenceSampler (line 111) | class InferenceSampler(torch.utils.data.sampler.Sampler):
    method __init__ (line 113) | def __init__(self, size):
    method _get_local_indices (line 121) | def _get_local_indices(total_size, world_size, rank):
    method __iter__ (line 130) | def __iter__(self):
    method __len__ (line 133) | def __len__(self):
  function post_process (line 137) | def post_process(pred, option):
  function evaluate_chat_model (line 152) | def evaluate_chat_model():

FILE: internvl_chat/eval/mmmu/data_utils.py
  function save_json (line 55) | def save_json(filename, ds):
  function get_multi_choice_info (line 60) | def get_multi_choice_info(options):
  function load_yaml (line 76) | def load_yaml(file_path):
  function parse_img_path (line 86) | def parse_img_path(text):
  function process_single_sample (line 91) | def process_single_sample(data):
  function save_json (line 105) | def save_json(filename, ds):
  function save_jsonl (line 110) | def save_jsonl(filename, data):
  function save_args (line 128) | def save_args(args, path_dir):
  function construct_prompt (line 138) | def construct_prompt(sample, config):

FILE: internvl_chat/eval/mmmu/eval_utils.py
  function parse_multi_choice_response (line 11) | def parse_multi_choice_response(response, all_choices, index2ans):
  function check_is_number (line 67) | def check_is_number(string):
  function normalize_str (line 79) | def normalize_str(string):
  function extract_numbers (line 104) | def extract_numbers(string):
  function parse_open_response (line 127) | def parse_open_response(response):
  function eval_multi_choice (line 183) | def eval_multi_choice(gold_i, pred_i):
  function eval_open (line 200) | def eval_open(gold_i, pred_i):
  function evaluate (line 229) | def evaluate(samples):
  function calculate_ins_level_acc (line 255) | def calculate_ins_level_acc(results: Dict):

FILE: internvl_chat/eval/mmmu/evaluate_mmmu.py
  function collate_fn (line 39) | def collate_fn(batches, tokenizer):
  class MMMUDataset (line 48) | class MMMUDataset(torch.utils.data.Dataset):
    method __init__ (line 50) | def __init__(self, root, split, prompt, input_size=224, dynamic_image_...
    method __len__ (line 67) | def __len__(self):
    method __getitem__ (line 70) | def __getitem__(self, idx):
  class InferenceSampler (line 119) | class InferenceSampler(torch.utils.data.sampler.Sampler):
    method __init__ (line 121) | def __init__(self, size):
    method _get_local_indices (line 129) | def _get_local_indices(total_size, world_size, rank):
    method __iter__ (line 138) | def __iter__(self):
    method __len__ (line 141) | def __len__(self):
  function post_process (line 145) | def post_process(pred, option):
  function evaluate_chat_model (line 160) | def evaluate_chat_model():

FILE: internvl_chat/eval/mmmu_pro/evaluate.py
  function mmmu_process_results (line 16) | def mmmu_process_results(results):
  function extract_subset_name (line 40) | def extract_subset_name(input_string):
  function mmmu_aggregate_results (line 51) | def mmmu_aggregate_results(results):
  function calculate_ins_level_acc (line 94) | def calculate_ins_level_acc(results):
  function eval_multi_choice (line 143) | def eval_multi_choice(gold_i, pred_i):
  function eval_open (line 161) | def eval_open(gold_i, pred_i):
  function evaluate_mmmu (line 190) | def evaluate_mmmu(samples):
  function parse_multi_choice_responses (line 214) | def parse_multi_choice_responses(response):
  function parse_multi_choice_response (line 219) | def parse_multi_choice_response(response, all_choices, index2ans):
  function extract_numbers (line 296) | def extract_numbers(string):
  function check_is_number (line 320) | def check_is_number(string):
  function normalize_str (line 333) | def normalize_str(string):
  function parse_open_response (line 359) | def parse_open_response(response):
  function get_multi_choice_info (line 431) | def get_multi_choice_info(options):
  function check_files (line 451) | def check_files(input_dir):

FILE: internvl_chat/eval/mmmu_pro/evaluate_mmmu_pro.py
  function replace_images_tokens (line 34) | def replace_images_tokens(input_string):
  function parse_options (line 43) | def parse_options(options):
  function construct_prompt (line 49) | def construct_prompt(doc):
  function mmmu_doc_to_text (line 56) | def mmmu_doc_to_text(doc):
  function origin_mmmu_doc_to_visual (line 61) | def origin_mmmu_doc_to_visual(doc):
  function vision_mmmu_doc_to_visual (line 70) | def vision_mmmu_doc_to_visual(doc):
  function process_prompt (line 74) | def process_prompt(data):
  function run_and_save (line 84) | def run_and_save(pipe):

FILE: internvl_chat/eval/mmvet/evaluate_mmvet.py
  function collate_fn (line 24) | def collate_fn(batches, tokenizer):
  class VQADataset (line 33) | class VQADataset(torch.utils.data.Dataset):
    method __init__ (line 35) | def __init__(self, root, data, prompt, input_size=224, dynamic_image_s...
    method __len__ (line 46) | def __len__(self):
    method __getitem__ (line 49) | def __getitem__(self, idx):
  function evaluate_chat_model (line 68) | def evaluate_chat_model():

FILE: internvl_chat/eval/mmvetv2/evaluate_mmvet_v2.py
  function collate_fn (line 26) | def collate_fn(batches, tokenizer):
  class VQADataset (line 35) | class VQADataset(torch.utils.data.Dataset):
    method __init__ (line 37) | def __init__(self, root, data, prompt, input_size=224, dynamic_image_s...
    method __len__ (line 50) | def __len__(self):
    method __getitem__ (line 53) | def __getitem__(self, idx):
  class InferenceSampler (line 80) | class InferenceSampler(torch.utils.data.sampler.Sampler):
    method __init__ (line 82) | def __init__(self, size):
    method _get_local_indices (line 90) | def _get_local_indices(total_size, world_size, rank):
    method __iter__ (line 99) | def __iter__(self):
    method __len__ (line 102) | def __len__(self):
  function evaluate_chat_model (line 106) | def evaluate_chat_model():

FILE: internvl_chat/eval/mmvp/evaluate_mmvp.py
  function collate_fn (line 25) | def collate_fn(batches, tokenizer):
  class MMVPDataset (line 34) | class MMVPDataset(torch.utils.data.Dataset):
    method __init__ (line 36) | def __init__(self, root, prompt, input_size=224, dynamic_image_size=Fa...
    method __len__ (line 53) | def __len__(self):
    method __getitem__ (line 56) | def __getitem__(self, idx):
  class InferenceSampler (line 98) | class InferenceSampler(torch.utils.data.sampler.Sampler):
    method __init__ (line 100) | def __init__(self, size):
    method _get_local_indices (line 108) | def _get_local_indices(total_size, world_size, rank):
    method __iter__ (line 117) | def __iter__(self):
    method __len__ (line 120) | def __len__(self):
  function post_process (line 124) | def post_process(pred, option):
  function evaluate_chat_model (line 139) | def evaluate_chat_model():

FILE: internvl_chat/eval/mpdocvqa/evaluate_vqa.py
  function collate_fn (line 32) | def collate_fn(batches, tokenizer):
  class VQADataset (line 42) | class VQADataset(torch.utils.data.Dataset):
    method __init__ (line 44) | def __init__(self, root, test, prompt, input_size=224, dynamic_image_s...
    method __len__ (line 56) | def __len__(self):
    method __getitem__ (line 59) | def __getitem__(self, idx):
  class InferenceSampler (line 101) | class InferenceSampler(torch.utils.data.sampler.Sampler):
    method __init__ (line 103) | def __init__(self, size):
    method _get_local_indices (line 111) | def _get_local_indices(total_size, world_size, rank):
    method __iter__ (line 120) | def __iter__(self):
    method __len__ (line 123) | def __len__(self):
  function evaluate_chat_model (line 127) | def evaluate_chat_model():

FILE: internvl_chat/eval/mpdocvqa/infographicsvqa_eval.py
  function save_json (line 17) | def save_json(file_path, data):
  function levenshtein_distance (line 22) | def levenshtein_distance(s1, s2):
  function validate_data (line 38) | def validate_data(gtFilePath, submFilePath):
  function evaluate_method (line 91) | def evaluate_method(gtFilePath, submFilePath, evaluationParams):
  function display_results (line 204) | def display_results(results, show_answer_types):

FILE: internvl_chat/eval/mvbench/evaluate_mvbench.py
  function collate_fn (line 52) | def collate_fn(batches, tokenizer):
  class MVBenchDataset (line 61) | class MVBenchDataset(torch.utils.data.Dataset):
    method __init__ (line 63) | def __init__(self, data_dir, data_list, prompt, question_prompt, num_s...
    method __len__ (line 91) | def __len__(self):
    method __str__ (line 94) | def __str__(self):
    method get_index (line 116) | def get_index(self, bound, fps, max_frame, first_idx=0):
    method read_video (line 130) | def read_video(self, video_path, bound=None):
    method read_gif (line 143) | def read_gif(self, video_path, bound=None, fps=25):
    method read_frame (line 157) | def read_frame(self, video_path, bound=None, fps=3):
    method qa_template (line 167) | def qa_template(self, data):
    method __getitem__ (line 180) | def __getitem__(self, idx):
  class InferenceSampler (line 220) | class InferenceSampler(torch.utils.data.sampler.Sampler):
    method __init__ (line 222) | def __init__(self, size):
    method _get_local_indices (line 230) | def _get_local_indices(total_size, world_size, rank):
    method __iter__ (line 239) | def __iter__(self):
    method __len__ (line 242) | def __len__(self):
  function check_ans (line 246) | def check_ans(pred, gt):
  function evaluate_chat_model (line 265) | def evaluate_chat_model():

FILE: internvl_chat/eval/pope/eval_pope.py
  function eval_pope (line 6) | def eval_pope(answers, label_file):

FILE: internvl_chat/eval/pope/evaluate_pope.py
  function extract_answer (line 38) | def extract_answer(text):
  function collate_fn (line 45) | def collate_fn(batches, tokenizer):
  class VQADataset (line 54) | class VQADataset(torch.utils.data.Dataset):
    method __init__ (line 56) | def __init__(self, root, data, prompt, input_size=224, dynamic_image_s...
    method __len__ (line 67) | def __len__(self):
    method __getitem__ (line 70) | def __getitem__(self, idx):
  class InferenceSampler (line 98) | class InferenceSampler(torch.utils.data.sampler.Sampler):
    method __init__ (line 100) | def __init__(self, size):
    method _get_local_indices (line 108) | def _get_local_indices(total_size, world_size, rank):
    method __iter__ (line 117) | def __iter__(self):
    method __len__ (line 120) | def __len__(self):
  function evaluate_chat_model (line 124) | def evaluate_chat_model():

FILE: internvl_chat/eval/refcoco/evaluate_grounding.py
  function box_iou (line 29) | def box_iou(boxes1, boxes2):
  function collate_fn (line 45) | def collate_fn(batches, tokenizer):
  class RefCOCODataset (line 53) | class RefCOCODataset(torch.utils.data.Dataset):
    method __init__ (line 55) | def __init__(self, test, prompt, input_size=224, dynamic_image_size=Fa...
    method __len__ (line 65) | def __len__(self):
    method __getitem__ (line 68) | def __getitem__(self, idx):
  class InferenceSampler (line 94) | class InferenceSampler(torch.utils.data.sampler.Sampler):
    method __init__ (line 96) | def __init__(self, size):
    method _get_local_indices (line 104) | def _get_local_indices(total_size, world_size, rank):
    method __iter__ (line 113) | def __iter__(self):
    method __len__ (line 116) | def __len__(self):
  function evaluate_chat_model (line 120) | def evaluate_chat_model():

FILE: internvl_chat/eval/scienceqa/evaluate_scienceqa.py
  function extract_answer (line 41) | def extract_answer(text):
  function collate_fn (line 48) | def collate_fn(batches, tokenizer):
  class ScienceQADataset (line 57) | class ScienceQADataset(torch.utils.data.Dataset):
    method __init__ (line 59) | def __init__(self, root, prompt, input_size=224, dynamic_image_size=Fa...
    method __len__ (line 70) | def __len__(self):
    method __getitem__ (line 73) | def __getitem__(self, idx):
  class InferenceSampler (line 114) | class InferenceSampler(torch.utils.data.sampler.Sampler):
    method __init__ (line 116) | def __init__(self, size):
    method _get_local_indices (line 124) | def _get_local_indices(total_size, world_size, rank):
    method __iter__ (line 133) | def __iter__(self):
    method __len__ (line 136) | def __len__(self):
  function post_process (line 140) | def post_process(pred, option):
  function evaluate_chat_model (line 161) | def evaluate_chat_model():

FILE: internvl_chat/eval/seed/calculation.py
  function is_integer_string (line 16) | def is_integer_string(s):
  function filter_questions (line 24) | def filter_questions(data, task='all'):

FILE: internvl_chat/eval/seed/evaluate_seed.py
  function collate_fn (line 25) | def collate_fn(batches, tokenizer):
  class MultipleChoiceDataset (line 33) | class MultipleChoiceDataset(torch.utils.data.Dataset):
    method __init__ (line 35) | def __init__(self, root, annotation, input_size=224, dynamic_image_siz...
    method __len__ (line 46) | def __len__(self):
    method __getitem__ (line 49) | def __getitem__(self, idx):
  class InferenceSampler (line 71) | class InferenceSampler(torch.utils.data.sampler.Sampler):
    method __init__ (line 73) | def __init__(self, size):
    method _get_local_indices (line 81) | def _get_local_indices(total_size, world_size, rank):
    method __iter__ (line 90) | def __iter__(self):
    method __len__ (line 93) | def __len__(self):
  function post_process (line 97) | def post_process(pred, option):
  function evaluate_chat_model (line 112) | def evaluate_chat_model():

FILE: internvl_chat/eval/tiny_lvlm/calculate_score.py
  function parse_args (line 10) | def parse_args():
  function main (line 17) | def main(args):

FILE: internvl_chat/eval/tiny_lvlm/evaluate_lvlm.py
  function collate_fn (line 24) | def collate_fn(batches, tokenizer):
  class VQADataset (line 33) | class VQADataset(torch.utils.data.Dataset):
    method __init__ (line 35) | def __init__(self, root, prompt, input_size=224, dynamic_image_size=Fa...
    method __len__ (line 56) | def __len__(self):
    method __getitem__ (line 59) | def __getitem__(self, idx):
  class InferenceSampler (line 82) | class InferenceSampler(torch.utils.data.sampler.Sampler):
    method __init__ (line 84) | def __init__(self, size):
    method _get_local_indices (line 92) | def _get_local_indices(total_size, world_size, rank):
    method __iter__ (line 101) | def __iter__(self):
    method __len__ (line 104) | def __len__(self):
  function evaluate_chat_model (line 108) | def evaluate_chat_model():

FILE: internvl_chat/eval/tiny_lvlm/tools.py
  function remove_special_chars (line 5) | def remove_special_chars(s):
  function has_word (line 11) | def has_word(sentence, word):
  class VQAEval (line 20) | class VQAEval:
    method __init__ (line 21) | def __init__(self):
    method evaluate (line 186) | def evaluate(self, answer, gt_answers):
    method evaluate_MRR (line 213) | def evaluate_MRR(self, answer, gt_answers):
    method processPunctuation (line 231) | def processPunctuation(self, inText):
    method processDigitArticle (line 243) | def processDigitArticle(self, inText):

FILE: internvl_chat/eval/vqa/evaluate_vqa.py
  function relaxed_correctness (line 144) | def relaxed_correctness(target: str,
  function evaluate_relaxed_accuracy (line 186) | def evaluate_relaxed_accuracy(entries):
  function evaluate_exact_match_accuracy (line 199) | def evaluate_exact_match_accuracy(entries):
  function collate_fn (line 213) | def collate_fn(batches, tokenizer):
  class VQADataset (line 222) | class VQADataset(torch.utils.data.Dataset):
    method __init__ (line 224) | def __init__(self, train, test, prompt, few_shot, input_size=224, dyna...
    method __len__ (line 237) | def __len__(self):
    method __getitem__ (line 240) | def __getitem__(self, idx):
  class InferenceSampler (line 273) | class InferenceSampler(torch.utils.data.sampler.Sampler):
    method __init__ (line 275) | def __init__(self, size):
    method _get_local_indices (line 283) | def _get_local_indices(total_size, world_size, rank):
    method __iter__ (line 292) | def __iter__(self):
    method __len__ (line 295) | def __len__(self):
  function post_process (line 299) | def post_process(response):
  function evaluate_chat_model (line 318) | def evaluate_chat_model():

FILE: internvl_chat/eval/vqa/infographicsvqa_eval.py
  function save_json (line 17) | def save_json(file_path, data):
  function levenshtein_distance (line 22) | def levenshtein_distance(s1, s2):
  function validate_data (line 38) | def validate_data(gtFilePath, submFilePath):
  function evaluate_method (line 91) | def evaluate_method(gtFilePath, submFilePath, evaluationParams):
  function display_results (line 204) | def display_results(results, show_answer_types):

FILE: internvl_chat/eval/vqa/textvqa_eval.py
  class EvalAIAnswerProcessor (line 8) | class EvalAIAnswerProcessor:
    method __init__ (line 179) | def __init__(self, *args, **kwargs):
    method word_tokenize (line 182) | def word_tokenize(self, word):
    method process_punctuation (line 187) | def process_punctuation(self, in_text):
    method process_digit_article (line 199) | def process_digit_article(self, in_text):
    method __call__ (line 214) | def __call__(self, item):
  class TextVQAAccuracyEvaluator (line 222) | class TextVQAAccuracyEvaluator:
    method __init__ (line 223) | def __init__(self):
    method _compute_answer_scores (line 226) | def _compute_answer_scores(self, raw_answers):
    method eval_pred_list (line 249) | def eval_pred_list(self, pred_list, disable_tqdm=False):
  class STVQAAccuracyEvaluator (line 261) | class STVQAAccuracyEvaluator:
    method __init__ (line 262) | def __init__(self):
    method eval_pred_list (line 265) | def eval_pred_list(self, pred_list):
  class STVQAANLSEvaluator (line 277) | class STVQAANLSEvaluator:
    method __init__ (line 278) | def __init__(self):
    method get_anls (line 283) | def get_anls(self, s1, s2):
    method eval_pred_list (line 290) | def eval_pred_list(self, pred_list):
  class TextCapsBleu4Evaluator (line 302) | class TextCapsBleu4Evaluator:
    method __init__ (line 303) | def __init__(self):
    method eval_pred_list (line 322) | def eval_pred_list(self, pred_list):

FILE: internvl_chat/internvl/conversation.py
  class SeparatorStyle (line 13) | class SeparatorStyle(IntEnum):
  class Conversation (line 37) | class Conversation:
    method get_prompt (line 61) | def get_prompt(self) -> str:
    method set_system_message (line 251) | def set_system_message(self, system_message: str):
    method append_message (line 255) | def append_message(self, role: str, message: str):
    method update_last_message (line 259) | def update_last_message(self, message: str):
    method to_gradio_chatbot (line 267) | def to_gradio_chatbot(self):
    method to_openai_api_messages (line 277) | def to_openai_api_messages(self):
    method copy (line 289) | def copy(self):
    method dict (line 304) | def dict(self):
  function register_conv_template (line 318) | def register_conv_template(template: Conversation, override: bool = False):
  function get_conv_template (line 328) | def get_conv_template(name: str) -> Conversation:

FILE: internvl_chat/internvl/dist_utils.py
  function _find_free_port (line 14) | def _find_free_port():
  function _is_free_port (line 25) | def _is_free_port(port):
  function init_dist (line 32) | def init_dist(launcher, backend='nccl', **kwargs):
  function _init_dist_pytorch (line 45) | def _init_dist_pytorch(backend, **kwargs):
  function _init_dist_mpi (line 54) | def _init_dist_mpi(backend, **kwargs):
  function _init_dist_slurm (line 67) | def _init_dist_slurm(backend, port=None):

FILE: internvl_chat/internvl/model/__init__.py
  function split_model (line 14) | def split_model(num_layers, vit_alpha=0.5):
  function load_model_and_tokenizer (line 39) | def load_model_and_tokenizer(args):

FILE: internvl_chat/internvl/model/internlm2/configuration_internlm2.py
  class InternLM2Config (line 27) | class InternLM2Config(PretrainedConfig):
    method __init__ (line 77) | def __init__(  # pylint: disable=W0102
    method _rope_scaling_validation (line 131) | def _rope_scaling_validation(self):

FILE: internvl_chat/internvl/model/internlm2/modeling_internlm2.py
  function _import_flash_attn (line 65) | def _import_flash_attn():
  function _get_unpad_data (line 83) | def _get_unpad_data(attention_mask):
  function _make_causal_mask (line 96) | def _make_causal_mask(
  function _expand_mask (line 114) | def _expand_mask(mask: torch.Tensor, dtype: torch.dtype, tgt_len: Option...
  class InternLM2RMSNorm (line 129) | class InternLM2RMSNorm(nn.Module):
    method __init__ (line 130) | def __init__(self, hidden_size, eps=1e-6):
    method forward (line 138) | def forward(self, hidden_states):
  class InternLM2RotaryEmbedding (line 161) | class InternLM2RotaryEmbedding(nn.Module):
    method __init__ (line 162) | def __init__(self, dim, max_position_embeddings=2048, base=10000, devi...
    method _set_cos_sin_cache (line 176) | def _set_cos_sin_cache(self, seq_len, device, dtype):
    method forward (line 186) | def forward(self, x, seq_len=None):
  class InternLM2LinearScalingRotaryEmbedding (line 198) | class InternLM2LinearScalingRotaryEmbedding(InternLM2RotaryEmbedding):
    method __init__ (line 201) | def __init__(self, dim, max_position_embeddings=2048, base=10000, devi...
    method _set_cos_sin_cache (line 205) | def _set_cos_sin_cache(self, seq_len, device, dtype):
  class InternLM2DynamicNTKScalingRotaryEmbedding (line 218) | class InternLM2DynamicNTKScalingRotaryEmbedding(InternLM2RotaryEmbedding):
    method __init__ (line 223) | def __init__(self, dim, max_position_embeddings=2048, base=10000, devi...
    method _set_cos_sin_cache (line 227) | def _set_cos_sin_cache(self, seq_len, device, dtype):
  function rotate_half (line 247) | def rotate_half(x):
  function apply_rotary_pos_emb (line 255) | def apply_rotary_pos_emb(q, k, cos, sin, position_ids, unsqueeze_dim=1):
  class InternLM2MLP (line 264) | class InternLM2MLP(nn.Module):
    method __init__ (line 265) | def __init__(self, config):
    method forward (line 275) | def forward(self, x):
  function repeat_kv (line 282) | def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor:
  class InternLM2Attention (line 295) | class InternLM2Attention(nn.Module):
    method __init__ (line 298) | def __init__(self, config: InternLM2Config):
    method _init_rope (line 324) | def _init_rope(self):
    method _shape (line 352) | def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int):
    method forward (line 355) | def forward(
  class InternLM2FlashAttention2 (line 444) | class InternLM2FlashAttention2(InternLM2Attention):
    method forward (line 451) | def forward(
    method _flash_attention_forward (line 523) | def _flash_attention_forward(
    method _unpad_input (line 577) | def _unpad_input(self, query_layer, key_layer, value_layer, attention_...
  class InternLM2DecoderLayer (line 624) | class InternLM2DecoderLayer(nn.Module):
    method __init__ (line 625) | def __init__(self, config: InternLM2Config):
    method forward (line 635) | def forward(
  class InternLM2PreTrainedModel (line 720) | class InternLM2PreTrainedModel(PreTrainedModel):
    method _init_weights (line 728) | def _init_weights(self, module):
  class InternLM2Model (line 810) | class InternLM2Model(InternLM2PreTrainedModel):
    method __init__ (line 820) | def __init__(self, config: InternLM2Config):
    method get_input_embeddings (line 838) | def get_input_embeddings(self):
    method set_input_embeddings (line 841) | def set_input_embeddings(self, value):
    method _prepare_decoder_attention_mask (line 844) | def _prepare_decoder_attention_mask(self, attention_mask, input_shape,...
    method forward (line 868) | def forward(
  class InternLM2ForCausalLM (line 1002) | class InternLM2ForCausalLM(InternLM2PreTrainedModel):
    method __init__ (line 1007) | def __init__(self, config):
    method get_input_embeddings (line 1016) | def get_input_embeddings(self):
    method set_input_embeddings (line 1019) | def set_input_embeddings(self, value):
    method get_output_embeddings (line 1022) | def get_output_embeddings(self):
    method set_output_embeddings (line 1025) | def set_output_embeddings(self, new_embeddings):
    method set_decoder (line 1028) | def set_decoder(self, decoder):
    method get_decoder (line 1031) | def get_decoder(self):
    method forward (line 1036) | def forward(
    method prepare_inputs_for_generation (line 1126) | def prepare_inputs_for_generation(
    method _reorder_cache (line 1166) | def _reorder_cache(past_key_values, beam_idx):
    method build_inputs (line 1174) | def build_inputs(self, tokenizer, query: str, history: List[Tuple[str,...
    method chat (line 1187) | def chat(
    method stream_chat (line 1223) | def stream_chat(
  class InternLM2ForSequenceClassification (line 1325) | class InternLM2ForSequenceClassification(InternLM2PreTrainedModel):
    method __init__ (line 1326) | def __init__(self, config):
    method get_input_embeddings (line 1335) | def get_input_embeddings(self):
    method set_input_embeddings (line 1338) | def set_input_embeddings(self, value):
    method forward (line 1342) | def forward(

FILE: internvl_chat/internvl/model/internlm2/tokenization_internlm2.py
  class InternLM2Tokenizer (line 34) | class InternLM2Tokenizer(PreTrainedTokenizer):
    method __init__ (line 48) | def __init__(
    method no_prefix_space_tokens (line 80) | def no_prefix_space_tokens(self):
    method vocab_size (line 87) | def vocab_size(self):
    method bos_token_id (line 92) | def bos_token_id(self) -> Optional[int]:
    method eos_token_id (line 96) | def eos_token_id(self) -> Optional[int]:
    method get_vocab (line 99) | def get_vocab(self):
    method _tokenize (line 105) | def _tokenize(self, text):
    method _convert_token_to_id (line 109) | def _convert_token_to_id(self, token):
    method _convert_id_to_token (line 113) | def _convert_id_to_token(self, index):
    method _maybe_add_prefix_space (line 118) | def _maybe_add_prefix_space(self, tokens, decoded):
    method convert_tokens_to_string (line 124) | def convert_tokens_to_string(self, tokens):
    method save_vocabulary (line 145) | def save_vocabulary(self, save_directory, filename_prefix: Optional[st...
    method build_inputs_with_special_tokens (line 172) | def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=No...
    method get_special_tokens_mask (line 188) | def get_special_tokens_mask(
    method create_token_type_ids_from_sequences (line 215) | def create_token_type_ids_from_sequences(

FILE: internvl_chat/internvl/model/internlm2/tokenization_internlm2_fast.py
  class InternLM2Converter (line 38) | class InternLM2Converter(SpmConverter):
    method vocab (line 41) | def vocab(self, proto):
    method unk_id (line 50) | def unk_id(self, proto):
    method decoder (line 54) | def decoder(self, replacement, add_prefix_space):
    method tokenizer (line 64) | def tokenizer(self, proto):
    method normalizer (line 92) | def normalizer(self, proto):
    method pre_tokenizer (line 99) | def pre_tokenizer(self, replacement, add_prefix_space):
  class InternLM2TokenizerFast (line 107) | class InternLM2TokenizerFast(PreTrainedTokenizerFast):
    method __init__ (line 114) | def __init__(
    method can_save_slow_tokenizer (line 147) | def can_save_slow_tokenizer(self) -> bool:
    method update_post_processor (line 150) | def update_post_processor(self):
    method add_eos_token (line 177) | def add_eos_token(self):
    method add_bos_token (line 181) | def add_bos_token(self):
    method add_eos_token (line 185) | def add_eos_token(self, value):
    method add_bos_token (line 190) | def add_bos_token(self, value):
    method save_vocabulary (line 194) | def save_vocabulary(self, save_directory: str, filename_prefix: Option...

FILE: internvl_chat/internvl/model/internvl_chat/configuration_intern_vit.py
  class InternVisionConfig (line 16) | class InternVisionConfig(PretrainedConfig):
    method __init__ (line 64) | def __init__(
    method from_pretrained (line 108) | def from_pretrained(cls, pretrained_model_name_or_path: Union[str, os....

FILE: internvl_chat/internvl/model/internvl_chat/configuration_internvl_chat.py
  class InternVLChatConfig (line 20) | class InternVLChatConfig(PretrainedConfig):
    method __init__ (line 24) | def __init__(
    method to_dict (line 86) | def to_dict(self):

FILE: internvl_chat/internvl/model/internvl_chat/modeling_intern_vit.py
  class FlashAttention (line 35) | class FlashAttention(nn.Module):
    method __init__ (line 46) | def __init__(self, softmax_scale=None, attention_dropout=0.0, device=N...
    method forward (line 51) | def forward(self, qkv, key_padding_mask=None, causal=False, cu_seqlens...
  class InternRMSNorm (line 99) | class InternRMSNorm(nn.Module):
    method __init__ (line 100) | def __init__(self, hidden_size, eps=1e-6):
    method forward (line 105) | def forward(self, hidden_states):
  class InternVisionEmbeddings (line 133) | class InternVisionEmbeddings(nn.Module):
    method __init__ (line 134) | def __init__(self, config: InternVisionConfig):
    method _get_pos_embed (line 154) | def _get_pos_embed(self, pos_embed, H, W):
    method forward (line 162) | def forward(self, pixel_values: torch.FloatTensor) -> torch.Tensor:
  class InternAttention (line 177) | class InternAttention(nn.Module):
    method __init__ (line 180) | def __init__(self, config: InternVisionConfig):
    method _naive_attn (line 210) | def _naive_attn(self, x):
    method _flash_attn (line 229) | def _flash_attn(self, x, key_padding_mask=None, need_weights=False):
    method forward (line 246) | def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
  class InternMLP (line 251) | class InternMLP(nn.Module):
    method __init__ (line 252) | def __init__(self, config: InternVisionConfig):
    method forward (line 259) | def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
  class InternVisionEncoderLayer (line 266) | class InternVisionEncoderLayer(nn.Module):
    method __init__ (line 267) | def __init__(self, config: InternVisionConfig, drop_path_rate: float):
    method forward (line 283) | def forward(
  class InternVisionEncoder (line 298) | class InternVisionEncoder(nn.Module):
    method __init__ (line 308) | def __init__(self, config: InternVisionConfig):
    method forward (line 317) | def forward(
  class InternVisionModel (line 364) | class InternVisionModel(PreTrainedModel):
    method __init__ (line 371) | def __init__(self, config: InternVisionConfig):
    method resize_pos_embeddings (line 378) | def resize_pos_embeddings(self, old_size, new_size, patch_size):
    method get_input_embeddings (line 390) | def get_input_embeddings(self):
    method forward (line 393) | def forward(

FILE: internvl_chat/internvl/model/internvl_chat/modeling_internvl_chat.py
  function version_cmp (line 31) | def version_cmp(v1, v2, op='eq'):
  class InternVLChatModel (line 39) | class InternVLChatModel(PreTrainedModel):
    method __init__ (line 48) | def __init__(self, config: InternVLChatConfig, vision_model=None, lang...
    method wrap_backbone_lora (line 111) | def wrap_backbone_lora(self, r=128, lora_alpha=256, lora_dropout=0.05):
    method wrap_llm_lora (line 121) | def wrap_llm_lora(self, r=128, lora_alpha=256, lora_dropout=0.05):
    method forward (line 143) | def forward(
    method pixel_shuffle (line 257) | def pixel_shuffle(self, x, scale_factor=0.5):
    method extract_feature (line 273) | def extract_feature(self, pixel_values):
    method batch_chat (line 293) | def batch_chat(self, tokenizer, pixel_values, questions, generation_co...
    method chat (line 343) | def chat(self, tokenizer, pixel_values, question, generation_config, h...
    method generate (line 401) | def generate(
    method lm_head (line 443) | def lm_head(self):
    method get_input_embeddings (line 446) | def get_input_embeddings(self):
    method get_output_embeddings (line 449) | def get_output_embeddings(self):

FILE: internvl_chat/internvl/model/phi3/configuration_phi3.py
  class Phi3Config (line 29) | class Phi3Config(PretrainedConfig):
    method __init__ (line 115) | def __init__(
    method _rope_scaling_validation (line 173) | def _rope_scaling_validation(self):

FILE: internvl_chat/internvl/model/phi3/modeling_phi3.py
  class Phi3RMSNorm (line 78) | class Phi3RMSNorm(nn.Module):
    method __init__ (line 79) | def __init__(self, hidden_size, eps=1e-6):
    method forward (line 87) | def forward(self, hidden_states):
  function _get_unpad_data (line 96) | def _get_unpad_data(attention_mask):
  class Phi3RotaryEmbedding (line 109) | class Phi3RotaryEmbedding(nn.Module):
    method __init__ (line 110) | def __init__(self, dim, max_position_embeddings=2048, base=10000, devi...
    method forward (line 119) | def forward(self, x, position_ids, seq_len=None):
  class Phi3SuScaledRotaryEmbedding (line 139) | class Phi3SuScaledRotaryEmbedding(Phi3RotaryEmbedding):
    method __init__ (line 140) | def __init__(self, dim, config, device=None):
    method forward (line 148) | def forward(self, x, position_ids, seq_len=None):
  class Phi3YarnScaledRotaryEmbedding (line 180) | class Phi3YarnScaledRotaryEmbedding(Phi3RotaryEmbedding):
    method __init__ (line 181) | def __init__(self, dim, config, device=None):
    method forward (line 189) | def forward(self, x, position_ids, seq_len=None):
  function rotate_half (line 222) | def rotate_half(x):
  function apply_rotary_pos_emb (line 230) | def apply_rotary_pos_emb(q, k, cos, sin, position_ids=None, unsqueeze_di...
  class Phi3MLP (line 257) | class Phi3MLP(nn.Module):
    method __init__ (line 258) | def __init__(self, config):
    method forward (line 267) | def forward(self, hidden_states: torch.FloatTensor) -> torch.FloatTensor:
  function repeat_kv (line 277) | def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor:
  class Phi3Attention (line 289) | class Phi3Attention(nn.Module):
    method __init__ (line 292) | def __init__(self, config: Phi3Config, layer_idx: Optional[int] = None):
    method _init_rope (line 326) | def _init_rope(self):
    method forward (line 342) | def forward(
  class Phi3FlashAttention2 (line 424) | class Phi3FlashAttention2(Phi3Attention):
    method __init__ (line 432) | def __init__(self, *args, **kwargs):
    method forward (line 440) | def forward(
    method _flash_attention_forward (line 589) | def _flash_attention_forward(
    method _upad_input (line 690) | def _upad_input(self, query_layer, key_layer, value_layer, attention_m...
  class Phi3SdpaAttention (line 735) | class Phi3SdpaAttention(Phi3Attention):
    method forward (line 743) | def forward(
  class Phi3DecoderLayer (line 831) | class Phi3DecoderLayer(nn.Module):
    method __init__ (line 832) | def __init__(self, config: Phi3Config, layer_idx: int):
    method forward (line 845) | def forward(
  class Phi3PreTrainedModel (line 930) | class Phi3PreTrainedModel(PreTrainedModel):
    method __init__ (line 942) | def __init__(self, config: Phi3Config):
    method _init_weights (line 948) | def _init_weights(self, module):
  class Phi3Model (line 1034) | class Phi3Model(Phi3PreTrainedModel):
    method __init__ (line 1042) | def __init__(self, config: Phi3Config):
    method get_input_embeddings (line 1060) | def get_input_embeddings(self):
    method set_input_embeddings (line 1063) | def set_input_embeddings(self, value):
    method forward (line 1067) | def forward(
  class Phi3ForCausalLM (line 1204) | class Phi3ForCausalLM(Phi3PreTrainedModel):
    method __init__ (line 1208) | def __init__(self, config):
    method get_input_embeddings (line 1218) | def get_input_embeddings(self):
    method set_input_embeddings (line 1222) | def set_input_embeddings(self, value):
    method get_output_embeddings (line 1226) | def get_output_embeddings(self):
    method set_output_embeddings (line 1230) | def set_output_embeddings(self, new_embeddings):
    method set_decoder (line 1234) | def set_decoder(self, decoder):
    method get_decoder (line 1238) | def get_decoder(self):
    method forward (line 1244) | def forward(
    method prepare_inputs_for_generation (line 1332) | def prepare_inputs_for_generation(
    method _reorder_cache (line 1390) | def _reorder_cache(past_key_values, beam_idx):
  class Phi3ForSequenceClassification (line 1415) | class Phi3ForSequenceClassification(Phi3PreTrainedModel):
    method __init__ (line 1416) | def __init__(self, config):
    method get_input_embeddings (line 1425) | def get_input_embeddings(self):
    method set_input_embeddings (line 1428) | def set_input_embeddings(self, value):
    method forward (line 1432) | def forward(
  class Phi3ForTokenClassification (line 1531) | class Phi3ForTokenClassification(Phi3PreTrainedModel):
    method __init__ (line 1532) | def __init__(self, config: Phi3Config):
    method forward (line 1555) | def forward(

FILE: internvl_chat/internvl/patch/internlm2_packed_training_patch.py
  class InternLM2FlashAttention2ForPackedTraining (line 15) | class InternLM2FlashAttention2ForPackedTraining(InternLM2FlashAttention2):
    method _flash_attention_forward (line 17) | def _flash_attention_forward(
  function replace_internlm2_attention_class (line 72) | def replace_internlm2_attention_class():

FILE: internvl_chat/internvl/patch/internvit_liger_monkey_patch.py
  function apply_liger_kernel_to_internvit (line 7) | def apply_liger_kernel_to_internvit() -> None:

FILE: internvl_chat/internvl/patch/llama2_flash_attn_monkey_patch.py
  function apply_rotary_pos_emb (line 17) | def apply_rotary_pos_emb(q, k, cos_sin, position_ids):
  function forward (line 31) | def forward(
  function _prepare_decoder_attention_mask (line 107) | def _prepare_decoder_attention_mask(
  function replace_llama2_attn_with_flash_attn (line 131) | def replace_llama2_attn_with_flash_attn():
  function test (line 143) | def test():

FILE: internvl_chat/internvl/patch/llama_flash_attn_monkey_patch.py
  function forward (line 17) | def forward(
  function _prepare_decoder_attention_mask (line 114) | def _prepare_decoder_attention_mask(
  function forward_2 (line 121) | def forward_2(
  function replace_llama_attn_with_flash_attn (line 215) | def replace_llama_attn_with_flash_attn():

FILE: internvl_chat/internvl/patch/llama_packed_training_patch.py
  class LlamaFlashAttention2ForPackedTraining (line 14) | class LlamaFlashAttention2ForPackedTraining(LlamaFlashAttention2):
    method _flash_attention_forward (line 16) | def _flash_attention_forward(
  function replace_llama_attention_class (line 104) | def replace_llama_attention_class():

FILE: internvl_chat/internvl/patch/llama_rmsnorm_monkey_patch.py
  function replace_llama_rmsnorm_with_fused_rmsnorm (line 10) | def replace_llama_rmsnorm_with_fused_rmsnorm():

FILE: internvl_chat/internvl/patch/pad_data_collator.py
  function pad_data_collator (line 13) | def pad_data_collator(features, pad_id=0):
  function concat_pad_data_collator (line 57) | def concat_pad_data_collator(features, max_item_length=None, pad_id=0):
  function dpo_concat_pad_data_collator (line 119) | def dpo_concat_pad_data_collator(features, pad_id=0):

FILE: internvl_chat/internvl/patch/phi3_packed_training_patch.py
  class Phi3FlashAttention2ForPackedTraining (line 13) | class Phi3FlashAttention2ForPackedTraining(Phi3FlashAttention2):
    method _flash_attention_forward (line 15) | def _flash_attention_forward(
  function replace_phi3_attention_class (line 103) | def replace_phi3_attention_class():

FILE: internvl_chat/internvl/patch/qwen2_packed_training_patch.py
  class Qwen2FlashAttention2ForPackedTraining (line 14) | class Qwen2FlashAttention2ForPackedTraining(Qwen2FlashAttention2):
    method _flash_attention_forward (line 16) | def _flash_attention_forward(
  function replace_qwen2_attention_class (line 104) | def replace_qwen2_attention_class():

FILE: internvl_chat/internvl/patch/train_dataloader_patch.py
  function get_train_dataloader (line 14) | def get_train_dataloader(self) -> DataLoader:
  function replace_train_dataloader (line 51) | def replace_train_dataloader():

FILE: internvl_chat/internvl/patch/train_sampler_patch.py
  function split_to_even_chunks (line 19) | def split_to_even_chunks(indices, lengths, num_chunks):
  function get_length_grouped_indices (line 42) | def get_length_grouped_indices(lengths, batch_size, world_size, generato...
  class LengthGroupedSampler (line 54) | class LengthGroupedSampler(Sampler):
    method __init__ (line 60) | def __init__(
    method __len__ (line 93) | def __len__(self):
    method __iter__ (line 96) | def __iter__(self):
  function _get_train_sampler (line 102) | def _get_train_sampler(self) -> Optional[torch.utils.data.Sampler]:
  function replace_train_sampler (line 123) | def replace_train_sampler():

FILE: internvl_chat/internvl/train/dataset.py
  function calculate_ngram_repetition (line 43) | def calculate_ngram_repetition(text, n):
  function check_conversations_repetition (line 52) | def check_conversations_repetition(conversations, repeat_threshold=0.4, ...
  function get_frame_indices (line 61) | def get_frame_indices(num_frames, vlen, sample='rand', fix_start=None, i...
  function read_frames_gif (line 102) | def read_frames_gif(
  function read_frames_decord (line 126) | def read_frames_decord(
  function extract_frame_number (line 158) | def extract_frame_number(filename):
  function sort_frames (line 164) | def sort_frames(frame_paths):
  function read_frames_folder (line 169) | def read_frames_folder(
  class WeightedConcatDataset (line 199) | class WeightedConcatDataset(ConcatDataset):
    method __init__ (line 200) | def __init__(self, datasets, weights):
    method __iter__ (line 206) | def __iter__(self):
    method __len__ (line 209) | def __len__(self):
  function pil_loader (line 213) | def pil_loader(img_str):
  class TCSLoader (line 219) | class TCSLoader(object):
    method __init__ (line 221) | def __init__(self, conf_path, sc_config_key='sensecore'):
    method __call__ (line 228) | def __call__(self, fn, image_type='image', max_num_frames=-1, min_num_...
  function expand2square (line 247) | def expand2square(pil_img, background_color):
  function simulate_jpeg_degradation (line 261) | def simulate_jpeg_degradation(quality):
  function build_transform (line 276) | def build_transform(is_train, input_size, pad2square=False, normalize_ty...
  function preprocess (line 313) | def preprocess(
  function preprocess_mpt (line 418) | def preprocess_mpt(
  function preprocess_phi3 (line 512) | def preprocess_phi3(
  function preprocess_internlm (line 621) | def preprocess_internlm(
  function preprocess_internvl2_5 (line 711) | def preprocess_internvl2_5(
  function find_closest_aspect_ratio (line 813) | def find_closest_aspect_ratio(aspect_ratio, target_ratios, width, height...
  function dynamic_preprocess (line 830) | def dynamic_preprocess(image, min_num=1, max_num=6, image_size=448, use_...

FILE: internvl_chat/internvl/train/dataset_packed.py
  function is_dist_avail_and_initialized (line 26) | def is_dist_avail_and_initialized():
  function get_world_size (line 34) | def get_world_size():
  function get_rank (line 40) | def get_rank():
  class PackedDataset (line 46) | class PackedDataset(IterableDataset):
    method __init__ (line 47) | def __init__(
    method load_state_dict (line 142) | def load_state_dict(self, state_dict, custom_infos=None):
    method _should_log (line 154) | def _should_log(self):
    method next_data (line 163) | def next_data(self, current_dataset_idx):
    method find_buffer (line 210) | def find_buffer(self, buffer_list, new_sample):
    method update_buffer (line 234) | def update_buffer(self, buffer, new_sample):
    method check_valid (line 247) | def check_valid(sample_to_check, min_active_tokens_ratio=1/256):
    method split_buffer (line 253) | def split_buffer(buffer, max_tokens, img_start_token_id, img_token_id,...
    method update_buffer_list (line 339) | def update_buffer_list(self, buffer_list, buffer_max_len_list, buffer):
    method pad_buffer (line 376) | def pad_buffer(self, buffer):
    method postprocess_buffer (line 392) | def postprocess_buffer(self, buffer, custom_infos=None):
    method print_log (line 399) | def print_log(self, iter_idx, buffer_list):
    method __iter__ (line 408) | def __iter__(self):
    method get_cu_seqlens_and_indexes (line 517) | def get_cu_seqlens_and_indexes(
  function packed_collate_fn (line 551) | def packed_collate_fn(

FILE: internvl_chat/internvl/train/internvl_chat_finetune.py
  class ModelArguments (line 88) | class ModelArguments:
  class DataTrainingArguments (line 163) | class DataTrainingArguments:
  class LazySupervisedDataset (line 269) | class LazySupervisedDataset(Dataset):
    method __init__ (line 272) | def __init__(
    method __len__ (line 384) | def __len__(self):
    method get_preprocess_function (line 387) | def get_preprocess_function(self):
    method load_image (line 401) | def load_image(self, image_path):
    method get_image_path (line 407) | def get_image_path(self, image_path):
    method get_transform (line 414) | def get_transform(self):
    method multi_modal_get_item (line 420) | def multi_modal_get_item(self, data_item):
    method multi_modal_multi_image_get_item (line 475) | def multi_modal_multi_image_get_item(self, data_item):
    method video_get_item (line 525) | def video_get_item(self, data_item):
    method pure_text_get_item (line 581) | def pure_text_get_item(self, data_item):
    method _enable_worker_distributed (line 624) | def _enable_worker_distributed(self):
    method __getitem__ (line 634) | def __getitem__(self, i) -> Dict[str, torch.Tensor]:
    method __iter__ (line 681) | def __iter__(self):
  function build_datasets (line 701) | def build_datasets(
  function len2weight (line 786) | def len2weight(x, loss_reduction):
  function main (line 798) | def main():

FILE: internvl_chat/internvl/train/internvl_chat_mpo.py
  class ModelArguments (line 89) | class ModelArguments:
  class DataTrainingArguments (line 164) | class DataTrainingArguments:
  class DPOConfig (line 238) | class DPOConfig(DPOConfigTRL):
  class LazySupervisedDataset (line 245) | class LazySupervisedDataset(Dataset):
    method __init__ (line 248) | def __init__(
    method __len__ (line 338) | def __len__(self):
    method get_preprocess_function (line 341) | def get_preprocess_function(self):
    method load_image (line 355) | def load_image(self, image_path):
    method get_image_path (line 361) | def get_image_path(self, image_path):
    method get_transform (line 368) | def get_transform(self):
    method get_longest_common_prefix_index (line 375) | def get_longest_common_prefix_index(tensor1, tensor2):
    method multi_modal_get_item (line 384) | def multi_modal_get_item(self, data_item):
    method multi_modal_multi_image_get_item (line 456) | def multi_modal_multi_image_get_item(self, data_item):
    method video_get_item (line 527) | def video_get_item(self, data_item):
    method pure_text_get_item (line 606) | def pure_text_get_item(self, data_item):
    method __getitem__ (line 670) | def __getitem__(self, i) -> Dict[str, torch.Tensor]:
  function build_datasets (line 712) | def build_datasets(
  function main (line 772) | def main():

FILE: internvl_chat/internvl/train/internvl_chat_pretrain.py
  class ModelArguments (line 88) | class ModelArguments:
  class DataTrainingArguments (line 163) | class DataTrainingArguments:
  class LazySupervisedDataset (line 269) | class LazySupervisedDataset(Dataset):
    method __init__ (line 272) | def __init__(
    method __len__ (line 424) | def __len__(self):
    method get_preprocess_function (line 430) | def get_preprocess_function(self):
    method load_image (line 444) | def load_image(self, image_path):
    method get_image_path (line 450) | def get_image_path(self, image_path):
    method get_transform (line 457) | def get_transform(self):
    method multi_modal_get_item (line 463) | def multi_modal_get_item(self, data_item):
    method multi_modal_multi_image_get_item (line 518) | def multi_modal_multi_image_get_item(self, data_item):
    method video_get_item (line 568) | def video_get_item(self, data_item):
    method pure_text_get_item (line 624) | def pure_text_get_item(self, data_item):
    method _enable_worker_distributed (line 667) | def _enable_worker_distributed(self):
    method __getitem__ (line 678) | def __getitem__(self, i) -> Dict[str, torch.Tensor]:
    method __iter__ (line 725) | def __iter__(self):
  function build_datasets (line 745) | def build_datasets(
  function len2weight (line 830) | def len2weight(x, loss_reduction):
  function main (line 842) | def main():

FILE: internvl_chat/internvl/train/trainer_dpo.py
  function _map (line 18) | def _map(self, *args, **kwargs):
  class MultimodalDPOTrainer (line 25) | class MultimodalDPOTrainer(DPOTrainer):
    method __init__ (line 26) | def __init__(self, *args, **kwargs):
    method concatenated_inputs (line 33) | def concatenated_inputs(
    method concatenated_forward (line 99) | def concatenated_forward(
    method _prepare_deepspeed_orig (line 178) | def _prepare_deepspeed_orig(self, model):
    method _prepare_deepspeed (line 191) | def _prepare_deepspeed(self, model):
    method get_batch_loss_metrics (line 206) | def get_batch_loss_metrics(

FILE: internvl_chat/tools/extract_video_frames.py
  function transform_video (line 22) | def transform_video(buffer):
  function get_index (line 37) | def get_index(num_frames, num_segments):
  function fetch_images (line 52) | def fetch_images(qa_item):
  function fetch_images_parallel (line 99) | def fetch_images_parallel(qa_item):

FILE: internvl_chat/tools/images_stitching.py
  function custom_image (line 11) | def custom_image(img_paths, save_path, image_size=448):
  function get_images (line 54) | def get_images(ann_file):

FILE: internvl_chat/tools/internvl_custom2hf.py
  function compute_l2_distance (line 13) | def compute_l2_distance(model1, model2):
  function convert_keys_to_hf (line 41) | def convert_keys_to_hf(custom_state_dict):

FILE: internvl_chat/tools/internvl_hf2custom.py
  function compute_l2_distance (line 11) | def compute_l2_distance(model1, model2):
  function convert_keys_back (line 39) | def convert_keys_back(hf_state_dict):

FILE: internvl_chat/tools/reasoning_data_pipeline/mmpr_data_pipeline_correctness.py
  function collate_fn (line 49) | def collate_fn(batches):
  class VQADataset (line 67) | class VQADataset(torch.utils.data.Dataset):
    method __init__ (line 68) | def __init__(
    method __len__ (line 86) | def __len__(self):
    method multi_modal_get_item (line 89) | def multi_modal_get_item(self, item):
    method pure_text_get_item (line 122) | def pure_text_get_item(self, item):
    method __getitem__ (line 136) | def __getitem__(self, idx):
  function evaluate_chat_model (line 143) | def evaluate_chat_model():

FILE: internvl_chat/tools/reasoning_data_pipeline/mmpr_data_pipeline_correctness_postprocess.py
  function _build_items_based_on_correctness (line 22) | def _build_items_based_on_correctness(lines, mode):
  function build_neg_based_on_correctness (line 70) | def build_neg_based_on_correctness(lines, mode):
  function _build_pair_based_on_pos_neg (line 96) | def _build_pair_based_on_pos_neg(item_pos, item_neg):
  function build_pairs_based_on_pos_neg (line 125) | def build_pairs_based_on_pos_neg(pos_id2item, neg_id2item, allow_entailm...
  function save_items (line 164) | def save_items(items, save_path, question_only=False, all_incorrect_keys...
  function save_pairs (line 202) | def save_pairs(pairs, save_path):
  function main (line 269) | def main(args):

FILE: internvl_chat/tools/reasoning_data_pipeline/mmpr_data_pipeline_dropout_ntp.py
  function collate_fn (line 46) | def collate_fn(batches):
  class VQADataset (line 61) | class VQADataset(torch.utils.data.Dataset):
    method __init__ (line 62) | def __init__(
    method __len__ (line 80) | def __len__(self):
    method _truncate_prefix (line 83) | def _truncate_prefix(self, prefix):
    method __getitem__ (line 89) | def __getitem__(self, idx):
  function evaluate_chat_model (line 121) | def evaluate_chat_model():

FILE: internvl_chat/tools/reasoning_data_pipeline/utils/accuracy_reward.py
  class EvalAIAnswerProcessor (line 9) | class EvalAIAnswerProcessor:
    method __init__ (line 180) | def __init__(self, *args, **kwargs):
    method word_tokenize (line 183) | def word_tokenize(self, word):
    method process_punctuation (line 188) | def process_punctuation(self, in_text):
    method process_digit_article (line 200) | def process_digit_article(self, in_text):
    method __call__ (line 215) | def __call__(self, item):
  class TextVQAAccuracyEvaluator (line 223) | class TextVQAAccuracyEvaluator:
    method __init__ (line 224) | def __init__(self):
    method _compute_answer_scores (line 227) | def _compute_answer_scores(self, raw_answers):
    method eval_pred_list (line 250) | def eval_pred_list(self, pred_list, disable_tqdm=False):
  function isfloat (line 267) | def isfloat(x):
  function math_score (line 275) | def math_score(prediction: str, target: str, max_relative_change: float ...
  function relaxed_correctness (line 301) | def relaxed_correctness(target: str,
  function levenshtein_distance (line 347) | def levenshtein_distance(s1, s2):
  function multi_choice_score (line 363) | def multi_choice_score(answer_pred, answer_gt):
  function parse_answer (line 378) | def parse_answer(response, prompt_version):
  function extract_answer_from_mpo (line 395) | def extract_answer_from_mpo(response, version):
  function extract_answer_from_box (line 419) | def extract_answer_from_box(ans):
  function check_cot_format (line 446) | def check_cot_format(response: str):
  function check_r1_format (line 450) | def check_r1_format(response: str):
  function check_answer (line 467) | def check_answer(answer_pred, answer_gt, mode):
  function fix_answer (line 531) | def fix_answer(response, answer_pred, answer_gt):
  function contain_keywords (line 561) | def contain_keywords(ds_name, keywords):
  function post_process (line 568) | def post_process(pred):
  function get_mode (line 583) | def get_mode(ds_name):
  function use_latex_score (line 599) | def use_latex_score(x):
  function validate_latex (line 612) | def validate_latex(pred, gt, easy_mode=False):
  function latex_score (line 650) | def latex_score(prediction, target):

FILE: internvl_chat/tools/reasoning_data_pipeline/utils/utils.py
  function localtime (line 11) | def localtime():
  function init_distributed_mode (line 15) | def init_distributed_mode():
  function init_dist (line 51) | def init_dist(args):
  function get_global_min (line 90) | def get_global_min(value):
  function save_outputs (line 98) | def save_outputs(outputs, results_file):
  function load_outputs (line 120) | def load_outputs(results_file):
  class InferenceSampler (line 127) | class InferenceSampler(torch.utils.data.sampler.Sampler):
    method __init__ (line 129) | def __init__(self, size):
    method _get_local_indices (line 137) | def _get_local_indices(total_size, world_size, rank):
    method __iter__ (line 146) | def __iter__(self):
    method __len__ (line 149) | def __len__(self):

FILE: internvl_chat/tools/reasoning_data_pipeline/visualprm_data_pieline.py
  function collate_fn (line 53) | def collate_fn(batches):
  class VQADataset (line 63) | class VQADataset(torch.utils.data.Dataset):
    method __init__ (line 64) | def __init__(
    method __len__ (line 78) | def __len__(self):
    method __getitem__ (line 81) | def __getitem__(self, idx):
  function split_response (line 116) | def split_response(response, sep='\n\n', max_steps=None):
  function join_steps (line 129) | def join_steps(steps, sep='\n\n'):
  function build_responses (line 133) | def build_responses(inputs, num_return_sequences=1, prefixes=None):
  function build_mc_scores (line 174) | def build_mc_scores(inputs, response_list, items, num_return_sequences):
  function build_process_supervision (line 254) | def build_process_supervision(inputs, items, num_return_sequences):
  function print_process_supervision (line 272) | def print_process_supervision(output):
  function evaluate_chat_model (line 287) | def evaluate_chat_model():

FILE: internvl_chat/tools/reasoning_data_pipeline/visualprm_data_pipeline_postprocess.py
  function save_outputs (line 10) | def save_outputs(outputs, results_file):
  function item2conv_prm (line 20) | def item2conv_prm(item):
  function item2conv_orm (line 47) | def item2conv_orm(item):
  function main (line 74) | def main():

FILE: internvl_chat_gpt_oss/internvl/dist_utils.py
  function _find_free_port (line 14) | def _find_free_port():
  function _is_free_port (line 25) | def _is_free_port(port):
  function init_dist (line 32) | def init_dist(launcher, backend='nccl', **kwargs):
  function _init_dist_pytorch (line 45) | def _init_dist_pytorch(backend, **kwargs):
  function _init_dist_mpi (line 61) | def _init_dist_mpi(backend, **kwargs):
  function _init_dist_slurm (line 74) | def _init_dist_slurm(backend, port=None):

FILE: internvl_chat_gpt_oss/internvl/model/internvl_chat/configuration_intern_vit.py
  class InternVisionConfig (line 15) | class InternVisionConfig(PretrainedConfig):
    method __init__ (line 63) | def __init__(
    method from_pretrained (line 107) | def from_pretrained(cls, pretrained_model_name_or_path: Union[str, os....

FILE: internvl_chat_gpt_oss/internvl/model/internvl_chat/configuration_internvl_chat.py
  class InternVLChatConfig (line 18) | class InternVLChatConfig(PretrainedConfig):
    method __init__ (line 22) | def __init__(
    method to_dict (line 95) | def to_dict(self):

FILE: internvl_chat_gpt_oss/internvl/model/internvl_chat/conversation.py
  class SeparatorStyle (line 15) | class SeparatorStyle(IntEnum):
  class Conversation (line 40) | class Conversation:
    method get_prompt (line 64) | def get_prompt(self) -> str:
    method set_system_message (line 271) | def set_system_message(self, system_message: str):
    method append_message (line 275) | def append_message(self, role: str, message: str):
    method update_last_message (line 279) | def update_last_message(self, message: str):
    method to_gradio_chatbot (line 287) | def to_gradio_chatbot(self):
    method to_openai_api_messages (line 297) | def to_openai_api_messages(self):
    method copy (line 309) | def copy(self):
    method dict (line 324) | def dict(self):
  function register_conv_template (line 338) | def register_conv_template(template: Conversation, override: bool = False):
  function get_conv_template (line 348) | def get_conv_template(name: str) -> Conversation:

FILE: internvl_chat_gpt_oss/internvl/model/internvl_chat/modeling_intern_vit.py
  class FlashAttention (line 35) | class FlashAttention(nn.Module):
    method __init__ (line 46) | def __init__(self, softmax_scale=None, attention_dropout=0.0, device=N...
    method forward (line 51) | def forward(self, qkv, key_padding_mask=None, causal=False, cu_seqlens...
  class InternRMSNorm (line 99) | class InternRMSNorm(nn.Module):
    method __init__ (line 100) | def __init__(self, hidden_size, eps=1e-6):
    method forward (line 105) | def forward(self, hidden_states):
  class InternVisionEmbeddings (line 133) | class InternVisionEmbeddings(nn.Module):
    method __init__ (line 134) | def __init__(self, config: InternVisionConfig):
    method _get_pos_embed (line 154) | def _get_pos_embed(self, pos_embed, H, W):
    method forward (line 162) | def forward(self, pixel_values: torch.FloatTensor) -> torch.Tensor:
  class InternAttention (line 177) | class InternAttention(nn.Module):
    method __init__ (line 180) | def __init__(self, config: InternVisionConfig):
    method _naive_attn (line 210) | def _naive_attn(self, x):
    method _flash_attn (line 229) | def _flash_attn(self, x, key_padding_mask=None, need_weights=False):
    method forward (line 246) | def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
  class InternMLP (line 251) | class InternMLP(nn.Module):
    method __init__ (line 252) | def __init__(self, config: InternVisionConfig):
    method forward (line 259) | def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
  class InternVisionEncoderLayer (line 266) | class InternVisionEncoderLayer(nn.Module):
    method __init__ (line 267) | def __init__(self, config: InternVisionConfig, drop_path_rate: float):
    method forward (line 283) | def forward(
  class InternVisionEncoder (line 298) | class InternVisionEncoder(nn.Module):
    method __init__ (line 307) | def __init__(self, config: InternVisionConfig):
    method forward (line 316) | def forward(
  class InternVisionModel (line 363) | class InternVisionModel(PreTrainedModel):
    method __init__ (line 370) | def __init__(self, config: InternVisionConfig):
    method resize_pos_embeddings (line 377) | def resize_pos_embeddings(self, old_size, new_size, patch_size):
    method get_input_embeddings (line 389) | def get_input_embeddings(self):
    method forward (line 392) | def forward(

FILE: internvl_chat_gpt_oss/internvl/model/internvl_chat/modeling_internvl_chat.py
  function version_cmp (line 29) | def version_cmp(v1, v2, op='eq'):
  class InternVLChatModel (line 37) | class InternVLChatModel(PreTrainedModel):
    method __init__ (line 56) | def __init__(self, config: InternVLChatConfig, vision_model=None, lang...
    method forward (line 100) | def forward(
    method pixel_shuffle (line 278) | def pixel_shuffle(self, x, scale_factor=0.5):
    method extract_feature (line 294) | def extract_feature(self, pixel_values):
    method batch_chat (line 314) | def batch_chat(self, tokenizer, pixel_values, questions, generation_co...
    method chat (line 366) | def chat(self, tokenizer, pixel_values, question, generation_config, h...
    method generate (line 426) | def generate(
    method lm_head (line 468) | def lm_head(self):
    method get_output_embeddings (line 471) | def get_output_embeddings(self):
    method get_input_embeddings (line 474) | def get_input_embeddings(self):
    method set_input_embeddings (line 477) | def set_input_embeddings(self, value):
    method set_output_embeddings (line 480) | def set_output_embeddings(self, value):

FILE: internvl_chat_gpt_oss/internvl/patch/flash_sink_attn/flash_attn_with_sink.py
  class FlashAttentionWithSink (line 14) | class FlashAttentionWithSink(torch.autograd.Function):
    method forward (line 17) | def forward(
    method backward (line 77) | def backward(ctx, grad_output):
  function flash_attn_with_sink_func (line 186) | def flash_attn_with_sink_func(

FILE: internvl_chat_gpt_oss/internvl/patch/flash_sink_attn/flash_sink_attn.py
  function _bwd_preprocess_do_o_dot (line 19) | def _bwd_preprocess_do_o_dot(
  function init_to_zero (line 77) | def init_to_zero(name):
  function _fwd_kernel (line 82) | def _fwd_kernel(
  function _bwd_kernel (line 169) | def _bwd_kernel(
  function _flash_attn_forward (line 263) | def _flash_attn_forward(
  function _flash_attn_backward (line 314) | def _flash_attn_backward(
  class FlashSinkAttention (line 377) | class FlashSinkAttention(torch.autograd.Function):
    method forward (line 380) | def forward(
    method backward (line 404) | def backward(ctx, do):

FILE: internvl_chat_gpt_oss/internvl/patch/flash_sink_attn/flash_sink_attn_gpt_oss.py
  function _bwd_preprocess_do_o_dot (line 19) | def _bwd_preprocess_do_o_dot(
  function init_to_zero (line 77) | def init_to_zero(name):
  function _fwd_kernel (line 82) | def _fwd_kernel(
  function _bwd_kernel (line 200) | def _bwd_kernel(
  function _flash_attn_forward (line 333) | def _flash_attn_forward(
  function _flash_attn_backward (line 383) | def _flash_attn_backward(
  class FlashSinkAttention (line 446) | class FlashSinkAttention(torch.autograd.Function):
    method forward (line 449) | def forward(
    method backward (line 473) | def backward(ctx, do):

FILE: internvl_chat_gpt_oss/internvl/patch/flash_sink_attn/flash_sink_varlen_attn_gpt_oss.py
  function _bwd_preprocess_do_o_dot (line 18) | def _bwd_preprocess_do_o_dot(
  function init_to_zero (line 62) | def init_to_zero(name):
  function _fwd_kernel (line 67) | def _fwd_kernel(
  function _bwd_kernel (line 185) | def _bwd_kernel(
  function _flash_attn_forward (line 314) | def _flash_attn_forward(
  function _flash_attn_backward (line 363) | def _flash_attn_backward(
  class FlashSinkVarlenAttention (line 426) | class FlashSinkVarlenAttention(torch.autograd.Function):
    method forward (line 428) | def forward(
    method backward (line 454) | def backward(ctx, do):

FILE: internvl_chat_gpt_oss/internvl/patch/flash_sink_attn/sliding_cache.py
  class SlidingCacheManager (line 12) | class SlidingCacheManager:
    method __init__ (line 16) | def __init__(
    method reset (line 25) | def reset(self):
    method update (line 30) | def update(self, key, val):

FILE: internvl_chat_gpt_oss/internvl/patch/flash_sink_attn_monkey_patch.py
  function _forward_gpt_oss (line 17) | def _forward_gpt_oss(
  function _forward_gpt_oss_with_varlen (line 64) | def _forward_gpt_oss_with_varlen(
  function replace_gpt_oss_with_flash_sink_attn (line 125) | def replace_gpt_oss_with_flash_sink_attn(model, use_varlen=False):

FILE: internvl_chat_gpt_oss/internvl/patch/pad_data_collator.py
  function pad_data_collator (line 13) | def pad_data_collator(features, pad_id=0):
  function concat_pad_data_collator (line 57) | def concat_pad_data_collator(features, max_item_length=None, pad_id=0):
  function dpo_concat_pad_data_collator (line 119) | def dpo_concat_pad_data_collator(features, pad_id=0):

FILE: internvl_chat_gpt_oss/internvl/patch/qwen3_flash_monkey_patch.py
  function _forward_qwen3 (line 16) | def _forward_qwen3(
  function replace_qwen3_attention_class (line 91) | def replace_qwen3_attention_class(model):

FILE: internvl_chat_gpt_oss/internvl/patch/train_dataloader_patch.py
  function _get_dataloader (line 16) | def _get_dataloader(
  function replace_train_dataloader (line 67) | def replace_train_dataloader():

FILE: internvl_chat_gpt_oss/internvl/train/dataset.py
  function calculate_ngram_repetition (line 38) | def calculate_ngram_repetition(text, n):
  function check_conversations_repetition (line 47) | def check_conversations_repetition(conversations, repeat_threshold=0.4, ...
  function get_frame_indices (line 56) | def get_frame_indices(num_frames, vlen, sample='rand', fix_start=None, i...
  function read_frames_gif (line 97) | def read_frames_gif(
  function read_frames_decord (line 121) | def read_frames_decord(
  function extract_frame_number (line 153) | def extract_frame_number(filename):
  function sort_frames (line 159) | def sort_frames(frame_paths):
  function read_frames_folder (line 164) | def read_frames_folder(
  class WeightedConcatDataset (line 194) | class WeightedConcatDataset(ConcatDataset):
    method __init__ (line 195) | def __init__(self, datasets, weights):
    method __iter__ (line 201) | def __iter__(self):
    method __len__ (line 204) | def __len__(self):
  function pil_loader (line 208) | def pil_loader(img_str):
  class TCSLoader (line 214) | class TCSLoader(object):
    method __init__ (line 216) | def __init__(self, conf_path, sc_config_key='sensecore'):
    method __call__ (line 223) | def __call__(self, fn, image_type='image', max_num_frames=-1, min_num_...
  function expand2square (line 242) | def expand2square(pil_img, background_color):
  function simulate_jpeg_degradation (line 256) | def simulate_jpeg_degradation(quality):
  function build_transform (line 271) | def build_transform(is_train, input_size, pad2square=False, normalize_ty...
  function preprocess_pretrain (line 308) | def preprocess_pretrain(
  function preprocess_internvl2_5 (line 389) | def preprocess_internvl2_5(
  function preprocess_internvl3_5_gpt_oss (line 491) | def preprocess_internvl3_5_gpt_oss(
  function preprocess_internvl3_5_gpt_oss_with_think (line 597) | def preprocess_internvl3_5_gpt_oss_with_think(
  function find_closest_aspect_ratio (line 712) | def find_closest_aspect_ratio(aspect_ratio, target_ratios, width, height...
  function dynamic_preprocess (line 729) | def dynamic_preprocess(image, min_num=1, max_num=6, image_size=448, use_...

FILE: internvl_chat_gpt_oss/internvl/train/dataset_packed.py
  function is_dist_avail_and_initialized (line 26) | def is_dist_avail_and_initialized():
  function get_rank (line 34) | def get_rank():
  class PackedDataset (line 40) | class PackedDataset(IterableDataset):
    method __init__ (line 41) | def __init__(
    method load_state_dict (line 136) | def load_state_dict(self, state_dict, custom_infos=None):
    method _should_log (line 148) | def _should_log(self):
    method next_data (line 157) | def next_data(self, current_dataset_idx):
    method find_buffer (line 204) | def find_buffer(self, buffer_list, new_sample):
    method update_buffer (line 228) | def update_buffer(self, buffer, new_sample):
    method check_valid (line 241) | def check_valid(sample_to_check, min_active_tokens_ratio=1/256):
    method split_buffer (line 247) | def split_buffer(buffer, max_tokens, img_start_token_id, img_token_id,...
    method update_buffer_list (line 333) | def update_buffer_list(self, buffer_list, buffer_max_len_list, buffer):
    method pad_buffer (line 370) | def pad_buffer(self, buffer):
    method postprocess_buffer (line 386) | def postprocess_buffer(self, buffer, custom_infos=None):
    method print_log (line 393) | def print_log(self, iter_idx, buffer_list):
    method __iter__ (line 402) | def __iter__(self):
    method get_cu_seqlens_and_indexes (line 511) | def get_cu_seqlens_and_indexes(
  function packed_collate_fn (line 548) | def packed_collate_fn(

FILE: internvl_chat_gpt_oss/internvl/train/internvl_chat_finetune.py
  class ModelArguments (line 76) | class ModelArguments:
  class DataTrainingArguments (line 155) | class DataTrainingArguments:
  class LazySupervisedDataset (line 257) | class LazySupervisedDataset(Dataset):
    method __init__ (line 260) | def __init__(
    method __len__ (line 391) | def __len__(self):
    method get_preprocess_function (line 394) | def get_preprocess_function(self, use_pretrain=False):
    method load_image (line 407) | def load_image(self, image_path):
    method get_image_path (line 413) | def get_image_path(self, image_path):
    method get_transform (line 418) | def get_transform(self):
    method multi_modal_get_item (line 428) | def multi_modal_get_item(self, data_item):
    method multi_modal_multi_image_get_item (line 485) | def multi_modal_multi_image_get_item(self, data_item):
    method video_get_item (line 541) | def video_get_item(self, data_item):
    method pure_text_get_item (line 599) | def pure_text_get_item(self, data_item):
    method fake_data_get_item (line 642) | def fake_data_get_item(self):
    method _enable_worker_distributed (line 690) | def _enable_worker_distributed(self):
    method __getitem__ (line 700) | def __getitem__(self, i) -> Dict[str, torch.Tensor]:
    method __iter__ (line 749) | def __iter__(self):
  function build_datasets (line 769) | def build_datasets(
  function len2weight (line 849) | def len2weight(x, loss_reduction):
  function main (line 861) | def main():

FILE: internvl_chat_gpt_oss/internvl/train/internvl_chat_mpo.py
  class ModelArguments (line 95) | class ModelArguments:
  class DataTrainingArguments (line 162) | class DataTrainingArguments:
  class LazySupervisedDataset (line 236) | class LazySupervisedDataset(Dataset):
    method __init__ (line 239) | def __init__(
    method __len__ (line 343) | def __len__(self):
    method get_preprocess_function (line 346) | def get_preprocess_function(self, use_pretrain=False):
    method load_image (line 359) | def load_image(self, image_path):
    method get_image_path (line 365) | def get_image_path(self, image_path):
    method get_transform (line 370) | def get_transform(self):
    method multi_modal_get_item (line 380) | def multi_modal_get_item(self, data_item):
    method multi_modal_multi_image_get_item (line 483) | def multi_modal_multi_image_get_item(self, data_item):
    method video_get_item (line 585) | def video_get_item(self, data_item):
    method pure_text_get_item (line 696) | def pure_text_get_item(self, data_item):
    method __getitem__ (line 791) | def __getitem__(self, i) -> Dict[str, torch.Tensor]:
  function build_datasets (line 837) | def build_datasets(
  function main (line 890) | def main():

FILE: internvl_chat_gpt_oss/internvl/train/trainer_dpo.py
  function _map (line 17) | def _map(self, *args, **kwargs):
  class InternVLDPOTrainer (line 25) | class InternVLDPOTrainer(DPOTrainer):
    method concatenated_inputs (line 27) | def concatenated_inputs(
    method concatenated_forward (line 37) | def concatenated_forward(

FILE: internvl_chat_gpt_oss/internvl/utils/s3_config.py
  function _value_to_str (line 30) | def _value_to_str(d):
  class GetterMixin (line 40) | class GetterMixin(object):
    method get (line 45) | def get(self, key, default=_UNSET):
    method has_option (line 54) | def has_option(self, key):
    method get_boolean (line 61) | def get_boolean(self, key, default=_UNSET):
    method get_int (line 68) | def get_int(self, key, default=_UNSET):
    method get_log_level (line 74) | def get_log_level(self, key, default=_UNSET):
  class _my_dict (line 81) | class _my_dict(configparser._default_dict):
  class Config (line 85) | class Config(GetterMixin):
    method __init__ (line 86) | def __init__(self, conf_path, *args, **kwargs):
    method __getitem__ (line 109) | def __getitem__(self, key):
    method update (line 115) | def update(self, other: dict):
    method default (line 119) | def default(self):
    method items (line 122) | def items(self):
  class Section (line 129) | class Section(GetterMixin):
    method __init__ (line 131) | def __init__(self, conf: dict):
    method __getitem__ (line 135) | def __getitem__(self, key):
    method update (line 141) | def update(self, other):

FILE: internvl_chat_gpt_oss/internvl/utils/s3_exception.py
  class Error (line 4) | class Error(Exception):
    method __str__ (line 7) | def __str__(self):
  class RetriableError (line 13) | class RetriableError(Error):
  class ConfigError (line 20) | class ConfigError(Error):
  class InvalidConfigError (line 24) | class InvalidConfigError(ConfigError):
  class ConfigFileNotFoundError (line 28) | class ConfigFileNotFoundError(ConfigError):
  class ConfigItemNotFoundError (line 32) | class ConfigItemNotFoundError(ConfigError):
  class ConfigKeyNotFoundError (line 36) | class ConfigKeyNotFoundError(ConfigItemNotFoundError):
  class ConfigSectionNotFoundError (line 40) | class ConfigSectionNotFoundError(ConfigItemNotFoundError):
  class ConfigKeyTypeError (line 44) | class ConfigKeyTypeError(ConfigError):
  class ConfigKeyValueError (line 48) | class ConfigKeyValueError(ConfigError):
  class UnSupprotAddressStyle (line 52) | class UnSupprotAddressStyle(ConfigError):
  class ClientError (line 59) | class ClientError(Error):
  class ContentTypeError (line 63) | class ContentTypeError(ClientError):
  class S3ClientError (line 67) | class S3ClientError(ClientError):
  class InvalidAccessKeyError (line 71) | class InvalidAccessKeyError(S3ClientError):
  class SignatureNotMatchError (line 75) | class SignatureNotMatchError(S3ClientError):
  class NetworkConnectionError (line 79) | class NetworkConnectionError(S3ClientError):
  class ResourceNotFoundError (line 83) | class ResourceNotFoundError(S3ClientError):
  class AccessDeniedError (line 87) | class AccessDeniedError(ClientError):
  class RangeError (line 91) | class RangeError(ClientError):
  class MultipartError (line 95) | class MultipartError(ClientError):
  class ObjectNotFoundError (line 99) | class ObjectNotFoundError(ClientError):
  class S3ObjectNotFoundError (line 103) | class S3ObjectNotFoundError(ObjectNotFoundError):
  class NoSuchBucketError (line 107) | class NoSuchBucketError(S3ObjectNotFoundError):
  class NoSuchKeyError (line 111) | class NoSuchKeyError(S3ObjectNotFoundError):
  class CacheError (line 118) | class CacheError(ClientError):
  class McClientError (line 122) | class McClientError(CacheError):
  class McObjectNotFoundError (line 126) | class McObjectNotFoundError(ObjectNotFoundError, McClientError):
  class McTimeoutOccur (line 130) | class McTimeoutOccur(McClientError, RetriableError):
  class McConnFailed (line 134) | class McConnFailed(McClientError, RetriableError):
  class McServerFailed (line 138) | class McServerFailed(McClientError, RetriableError):
  class McServerDisable (line 142) | class McServerDisable(McClientError):
  class McServerDead (line 146) | class McServerDead(McClientError):
  class McBadKeyProvided (line 150) | class McBadKeyProvided(McClientError):
  class McKeySizeExceed (line 154) | class McKeySizeExceed(McClientError):
  class McObjectSizeExceed (line 158) | class McObjectSizeExceed(McClientError):
  class InvalidUriError (line 165) | class InvalidUriError(Error):
  class InvalidS3UriError (line 169) | class InvalidS3UriError(InvalidUriError):
  class InvalidBucketUriError (line 173) | class InvalidBucketUriError(InvalidS3UriError):
  class InvalidDfsUriError (line 177) | class InvalidDfsUriError(InvalidUriError):
  class InvalidMcUriError (line 181) | class InvalidMcUriError(InvalidUriError):
  class InvalidClusterNameError (line 185) | class InvalidClusterNameError(InvalidUriError):
  class NoDefaultClusterNameError (line 189) | class NoDefaultClusterNameError(InvalidUriError):

FILE: internvl_chat_gpt_oss/internvl/utils/s3_fileio.py
  class S3Backend (line 26) | class S3Backend(BaseStorageBackend):
    method __init__ (line 45) | def __init__(self,
    method _map_path (line 76) | def _map_path(self, filepath: Union[str, Path]) -> str:
    method _format_path (line 89) | def _format_path(self, filepath: str) -> str:
    method _parse_path (line 102) | def _parse_path(self, filepath: Union[str, Path]) -> Tuple[str, str]:
    method _check_bucket (line 121) | def _check_bucket(self, bucket: str) -> bool:
    method _check_object (line 136) | def _check_object(self, bucket: str, obj_name: str) -> bool:
    method get (line 151) | def get(self, filepath: str) -> bytes:
    method get_text (line 170) | def get_text(self, filepath, encoding='utf-8') -> str:
    method put (line 188) | def put(self, obj: bytes, filepath: Union[str, Path]) -> None:
    method put_text (line 201) | def put_text(self,
    method remove (line 215) | def remove(self, filepath: Union[str, Path]) -> None:
    method exists (line 224) | def exists(self, filepath: Union[str, Path]) -> bool:
    method isdir (line 235) | def isdir(self, filepath: Union[str, Path]) -> bool:
    method isfile (line 251) | def isfile(self, filepath: Union[str, Path]) -> bool:
    method get_local_path (line 267) | def get_local_path(
    method list (line 297) | def list(self,
  class MixedClient (line 421) | class MixedClient(object):
    method __init__ (line 422) | def __init__(self, conf_path, **kwargs):
    method parse_uri (line 443) | def parse_uri(uri, ceph_dict, default_cluster=None):
    method get_with_info (line 470) | def get_with_info(self, uri, **kwargs):
    method list (line 480) | def list(self, uri, **kwargs):
  class Client (line 487) | class Client(object):
    method __init__ (line 489) | def __init__(self, conf_path='petreloss.conf', *args, **kwargs):
    method _get_local_client (line 497) | def _get_local_client(self):
    method get_with_info (line 513) | def get_with_info(self, uri, **kwargs):
    method get (line 516) | def get(self, *args, **kwargs):
    method list (line 520) | def list(self, *args, **kwargs):

FILE: internvl_chat_llava/llava/conversation.py
  class SeparatorStyle (line 6) | class SeparatorStyle(Enum):
  class Conversation (line 18) | class Conversation:
    method get_prompt (line 35) | def get_prompt(self):
    method append_message (line 133) | def append_message(self, role, message):
    method get_images (line 136) | def get_images(self, return_pil=False, return_org=False):
    method to_gradio_chatbot (line 197) | def to_gradio_chatbot(self):
    method copy (line 228) | def copy(self):
    method dict (line 242) | def dict(self):

FILE: internvl_chat_llava/llava/eval/eval_gpt_review.py
  function get_eval (line 13) | def get_eval(content: str, max_tokens: int):
  function parse_score (line 39) | def parse_score(review):

FILE: internvl_chat_llava/llava/eval/eval_gpt_review_bench.py
  function get_eval (line 11) | def get_eval(content: str, max_tokens: int):
  function parse_score (line 36) | def parse_score(review):

FILE: internvl_chat_llava/llava/eval/eval_gpt_review_visual.py
  function get_eval (line 11) | def get_eval(content: str, max_tokens: int):
  function parse_score (line 36) | def parse_score(review):

FILE: internvl_chat_llava/llava/eval/eval_pope.py
  function eval_pope (line 5) | def eval_pope(answers, label_file):

FILE: internvl_chat_llava/llava/eval/eval_science_qa.py
  function get_args (line 8) | def get_args():
  function convert_caps (line 19) | def convert_caps(results):
  function get_pred_idx (line 28) | def get_pred_idx(prediction, choices, options):

FILE: internvl_chat_llava/llava/eval/eval_science_qa_gpt4.py
  function get_args (line 9) | def get_args():
  function convert_caps (line 19) | def convert_caps(results):
  function get_pred_idx (line 28) | def get_pred_idx(prediction, choices, options):

FILE: internvl_chat_llava/llava/eval/eval_science_qa_gpt4_requery.py
  function get_args (line 9) | def get_args():
  function convert_caps (line 21) | def convert_caps(results):
  function get_pred_idx (line 30) | def get_pred_idx(prediction, choices, options):

FILE: internvl_chat_llava/llava/eval/eval_textvqa.py
  function get_args (line 9) | def get_args():
  function prompt_processor (line 17) | def prompt_processor(prompt):
  function eval_single (line 35) | def eval_single(annotation_file, result_file):

FILE: internvl_chat_llava/llava/eval/generate_webpage_data_from_table.py
  function read_jsonl (line 10) | def read_jsonl(path: str, key: str=None):
  function trim_hanging_lines (line 23) | def trim_hanging_lines(s: str, n: int) -> str:

FILE: internvl_chat_llava/llava/eval/m4c_evaluator.py
  class EvalAIAnswerProcessor (line 7) | class EvalAIAnswerProcessor:
    method __init__ (line 178) | def __init__(self, *args, **kwargs):
    method word_tokenize (line 181) | def word_tokenize(self, word):
    method process_punctuation (line 186) | def process_punctuation(self, in_text):
    method process_digit_article (line 198) | def process_digit_article(self, in_text):
    method __call__ (line 213) | def __call__(self, item):
  class TextVQAAccuracyEvaluator (line 221) | class TextVQAAccuracyEvaluator:
    method __init__ (line 222) | def __init__(self):
    method _compute_answer_scores (line 225) | def _compute_answer_scores(self, raw_answers):
    method eval_pred_list (line 248) | def eval_pred_list(self, pred_list):
  class STVQAAccuracyEvaluator (line 260) | class STVQAAccuracyEvaluator:
    method __init__ (line 261) | def __init__(self):
    method eval_pred_list (line 264) | def eval_pred_list(self, pred_list):
  class STVQAANLSEvaluator (line 276) | class STVQAANLSEvaluator:
    method __init__ (line 277) | def __init__(self):
    method get_anls (line 282) | def get_anls(self, s1, s2):
    method eval_pred_list (line 289) | def eval_pred_list(self, pred_list):
  class TextCapsBleu4Evaluator (line 301) | class TextCapsBleu4Evaluator:
    method __init__ (line 302) | def __init__(self):
    method eval_pred_list (line 321) | def eval_pred_list(self, pred_list):

FILE: internvl_chat_llava/llava/eval/model_qa.py
  class KeywordsStoppingCriteria (line 14) | class KeywordsStoppingCriteria(StoppingCriteria):
    method __init__ (line 15) | def __init__(self, keywords, tokenizer, input_ids):
    method __call__ (line 21) | def __call__(self, output_ids: torch.LongTensor, scores: torch.FloatTe...
  function eval_model (line 33) | def eval_model(model_name, questions_file, answers_file):

FILE: internvl_chat_llava/llava/eval/model_vqa.py
  function split_list (line 18) | def split_list(lst, n):
  function get_chunk (line 24) | def get_chunk(lst, n, k):
  function eval_model (line 29) | def eval_model(args):

FILE: internvl_chat_llava/llava/eval/model_vqa_loader.py
  function split_list (line 19) | def split_list(lst, n):
  function get_chunk (line 25) | def get_chunk(lst, n, k):
  class CustomDataset (line 31) | class CustomDataset(Dataset):
    method __init__ (line 32) | def __init__(self, questions, image_folder, tokenizer, image_processor...
    method __getitem__ (line 39) | def __getitem__(self, index):
    method __len__ (line 60) | def __len__(self):
  function create_data_loader (line 65) | def create_data_loader(questions, image_folder, tokenizer, image_process...
  function eval_model (line 72) | def eval_model(args):

FILE: internvl_chat_llava/llava/eval/model_vqa_mmbench.py
  function split_list (line 22) | def split_list(lst, n):
  function get_chunk (line 28) | def get_chunk(lst, n, k):
  function is_none (line 33) | def is_none(value):
  function get_options (line 44) | def get_options(row, options):
  function eval_model (line 54) | def eval_model(args):

FILE: internvl_chat_llava/llava/eval/model_vqa_science.py
  function split_list (line 18) | def split_list(lst, n):
  function get_chunk (line 24) | def get_chunk(lst, n, k):
  function eval_model (line 29) | def eval_model(args):

FILE: internvl_chat_llava/llava/eval/qa_baseline_gpt35.py
  function get_answer (line 16) | def get_answer(question_id: int, question: str, max_tokens: int):

FILE: internvl_chat_llava/llava/eval/run_llava.py
  function load_image (line 17) | def load_image(image_file):
  function eval_model (line 26) | def eval_model(args):

FILE: internvl_chat_llava/llava/eval/summarize_gpt_review.py
  function parse_args (line 9) | def parse_args():

FILE: internvl_chat_llava/llava/eval/webpage/script.js
  function text2Markdown (line 35) | function text2Markdown(text) {
  function capitalizeFirstChar (line 41) | function capitalizeFirstChar(str) {
  function updateQuestionSelect (line 48) | function updateQuestionSelect(question_id) {
  function updateModelSelect (line 64) | function updateModelSelect() {
  function populateModels (line 70) | function populateModels(models) {
  function populateQuestions (line 81) | function populateQuestions(questions) {
  function displayQuestion (line 110) | function displayQuestion(index) {
  function displayAnswers (line 116) | function displayAnswers(index) {
  function switchQuestionAndCategory (line 203) | function switchQuestionAndCategory() {
  function updateExpandButtonVisibility (line 226) | function updateExpandButtonVisibility(card) {

FILE: internvl_chat_llava/llava/mm_utils.py
  function load_image_from_base64 (line 10) | def load_image_from_base64(image):
  function expand2square (line 14) | def expand2square(pil_img, background_color):
  function process_images (line 28) | def process_images(images, image_processor, model_cfg):
  function tokenizer_image_token (line 43) | def tokenizer_image_token(prompt, tokenizer, image_token_index=IMAGE_TOK...
  function get_model_name_from_path (line 71) | def get_model_name_from_path(model_path):
  class KeywordsStoppingCriteria (line 82) | class KeywordsStoppingCriteria(StoppingCriteria):
    method __init__ (line 83) | def __init__(self, keywords, tokenizer, input_ids):
    method __call__ (line 97) | def __call__(self, output_ids: torch.LongTensor, scores: torch.FloatTe...

FILE: internvl_chat_llava/llava/model/apply_delta.py
  function apply_delta (line 13) | def apply_delta(base_model_path, target_model_path, delta_path):

FILE: internvl_chat_llava/llava/model/builder.py
  function load_pretrained_model (line 26) | def load_pretrained_model(model_path, model_base, model_name, load_8bit=...

FILE: internvl_chat_llava/llava/model/consolidate.py
  function consolidate_ckpt (line 13) | def consolidate_ckpt(src_path, dst_path):

FILE: internvl_chat_llava/llava/model/language_model/llava_llama.py
  class LlavaConfig (line 30) | class LlavaConfig(LlamaConfig):
  class LlavaLlamaModel (line 34) | class LlavaLlamaModel(LlavaMetaModel, LlamaModel):
    method __init__ (line 37) | def __init__(self, config: LlamaConfig):
  class LlavaLlamaForCausalLM (line 41) | class LlavaLlamaForCausalLM(LlamaForCausalLM, LlavaMetaForCausalLM):
    method __init__ (line 44) | def __init__(self, config):
    method get_model (line 53) | def get_model(self):
    method forward (line 56) | def forward(
    method prepare_inputs_for_generation (line 117) | def prepare_inputs_for_generation(

FILE: internvl_chat_llava/llava/model/language_model/llava_mpt.py
  class LlavaMptConfig (line 25) | class LlavaMptConfig(MptConfig):
  class LlavaMptModel (line 29) | class LlavaMptModel(LlavaMetaModel, MptModel):
    method __init__ (line 32) | def __init__(self, config: MptConfig):
    method embed_tokens (line 36) | def embed_tokens(self, x):
  class LlavaMptForCausalLM (line 40) | class LlavaMptForCausalLM(MptForCausalLM, LlavaMetaForCausalLM):
    method __init__ (line 44) | def __init__(self, config):
    method get_model (line 53) | def get_model(self):
    method _set_gradient_checkpointing (line 56) | def _set_gradient_checkpointing(self, module, value=False):
    method forward (line 60) | def forward(
    method prepare_inputs_for_generation (line 87) | def prepare_inputs_for_generation(self, input_ids, past_key_values=Non...

FILE: internvl_chat_llava/llava/model/language_model/mpt/adapt_tokenizer.py
  function adapt_tokenizer_for_denoising (line 6) | def adapt_tokenizer_for_denoising(tokenizer: Tokenizer):
  class AutoTokenizerForMOD (line 25) | class AutoTokenizerForMOD(AutoTokenizer):
    method from_pretrained (line 37) | def from_pretrained(cls, *args, **kwargs):

FILE: internvl_chat_llava/llava/model/language_model/mpt/attention.py
  function _reset_is_causal (line 12) | def _reset_is_causal(num_query_tokens: int, num_key_tokens: int, origina...
  function scaled_multihead_dot_product_attention (line 20) | def scaled_multihead_dot_product_attention(query, key, value, n_heads, p...
  function check_valid_inputs (line 64) | def check_valid_inputs(*tensors, valid_dtypes=[torch.float16, torch.bflo...
  function flash_attn_fn (line 71) | def flash_attn_fn(query, key, value, n_heads, past_key_value=None, softm...
  function triton_flash_attn_fn (line 107) | def triton_flash_attn_fn(query, key, value, n_heads, past_key_value=None...
  class MultiheadAttention (line 151) | class MultiheadAttention(nn.Module):
    method __init__ (line 158) | def __init__(self, d_model: int, n_heads: int, attn_impl: str='triton'...
    method forward (line 191) | def forward(self, x, past_key_value=None, attn_bias=None, attention_ma...
  class MultiQueryAttention (line 204) | class MultiQueryAttention(nn.Module):
    method __init__ (line 211) | def __init__(self, d_model: int, n_heads: int, attn_impl: str='triton'...
    method forward (line 245) | def forward(self, x, past_key_value=None, attn_bias=None, attention_ma...
  function attn_bias_shape (line 258) | def attn_bias_shape(attn_impl, n_heads, seq_len, alibi, prefix_lm, causa...
  function build_attn_bias (line 272) | def build_attn_bias(attn_impl, attn_bias, n_heads, seq_len, causal=False...
  function gen_slopes (line 283) | def gen_slopes(n_heads, alibi_bias_max=8, device=None):
  function build_alibi_bias (line 292) | def build_alibi_bias(n_heads, seq_len, full=False, alibi_bias_max=8, dev...

FILE: internvl_chat_llava/llava/model/language_model/mpt/blocks.py
  class MPTMLP (line 8) | class MPTMLP(nn.Module):
    method __init__ (line 10) | def __init__(self, d_model: int, expansion_ratio: int, device: Optiona...
    method forward (line 17) | def forward(self, x):
  class MPTBlock (line 20) | class MPTBlock(nn.Module):
    method __init__ (line 22) | def __init__(self, d_model: int, n_heads: int, expansion_ratio: int, a...
    method forward (line 34) | def forward(self, x: torch.Tensor, past_key_value: Optional[Tuple[torc...

FILE: internvl_chat_llava/llava/model/language_model/mpt/configuration_mpt.py
  class MPTConfig (line 7) | class MPTConfig(PretrainedConfig):
    method __init__ (line 10) | def __init__(self, d_model: int=2048, n_heads: int=16, n_layers: int=2...
    method _set_config_defaults (line 90) | def _set_config_defaults(self, config, config_defaults):
    method _validate_config (line 96) | def _validate_config(self):

FILE: internvl_chat_llava/llava/model/language_model/mpt/custom_embedding.py
  class SharedEmbedding (line 6) | class SharedEmbedding(nn.Embedding):
    method forward (line 8) | def forward(self, input: Tensor, unembed: bool=False) -> Tensor:

FILE: internvl_chat_llava/llava/model/language_model/mpt/flash_attn_triton.py
  function _fwd_kernel (line 51) | def _fwd_kernel(Q, K, V, Bias, Out, Lse, TMP, softmax_scale, stride_qb, ...
  function _bwd_preprocess_do_o_dot (line 155) | def _bwd_preprocess_do_o_dot(Out, DO, Delta, stride_ob, stride_oh, strid...
  function _bwd_store_dk_dv (line 168) | def _bwd_store_dk_dv(dk_ptrs, dv_ptrs, dk, dv, offs_n, offs_d, seqlen_k,...
  function _bwd_kernel_one_col_block (line 184) | def _bwd_kernel_one_col_block(start_n, Q, K, V, Bias, DO, DQ, DK, DV, LS...
  function init_to_zero (line 300) | def init_to_zero(name):
  function _bwd_kernel (line 306) | def _bwd_kernel(Q, K, V, Bias, DO, DQ, DK, DV, LSE, D, softmax_scale, st...
  function _flash_attn_forward (line 329) | def _flash_attn_forward(q, k, v, bias=None, causal=False, softmax_scale=...
  function _flash_attn_backward (line 366) | def _flash_attn_backward(do, q, k, v, o, lse, dq, dk, dv, bias=None, cau...
  class FlashAttnQKVPackedFunc (line 401) | class FlashAttnQKVPackedFunc(torch.autograd.Function):
    method forward (line 404) | def forward(ctx, qkv, bias=None, causal=False, softmax_scale=None):
    method backward (line 419) | def backward(ctx, do):
  class FlashAttnKVPackedFunc (line 428) | class FlashAttnKVPackedFunc(torch.autograd.Function):
    method forward (line 431) | def forward(ctx, q, kv, bias=None, causal=False, softmax_scale=None):
    method backward (line 446) | def backward(ctx, do):
  class FlashAttnFunc (line 457) | class FlashAttnFunc(torch.autograd.Function):
    method forward (line 460) | def forward(ctx, q, k, v, bias=None, causal=False, softmax_scale=None):
    method backward (line 475) | def backward(ctx, do):

FILE: internvl_chat_llava/llava/model/language_model/mpt/hf_prefixlm_converter.py
  function _convert_gpt_causal_lm_to_prefix_lm (line 29) | def _convert_gpt_causal_lm_to_prefix_lm(model: CAUSAL_GPT_TYPES) -> CAUS...
  function _convert_bloom_causal_lm_to_prefix_lm (line 113) | def _convert_bloom_causal_lm_to_prefix_lm(model: BloomForCausalLM) -> Bl...
  function _convert_opt_causal_lm_to_prefix_lm (line 269) | def _convert_opt_causal_lm_to_prefix_lm(model: OPTForCausalLM) -> OPTFor...
  function convert_hf_causal_lm_to_prefix_lm (line 335) | def convert_hf_causal_lm_to_prefix_lm(model: CAUSAL_LM_TYPES) -> CAUSAL_...
  function add_bidirectional_mask_if_missing (line 401) | def add_bidirectional_mask_if_missing(batch: Dict[str, Any]):

FILE: internvl_chat_llava/llava/model/language_model/mpt/meta_init_context.py
  function init_empty_weights (line 6) | def init_empty_weights(include_buffers: bool=False):
  function init_on_device (line 37) | def init_on_device(device: torch.device, include_buffers: bool=False):

FILE: internvl_chat_llava/llava/model/language_model/mpt/modeling_mpt.py
  class MPTPreTrainedModel (line 28) | class MPTPreTrainedModel(PreTrainedModel):
  class MPTModel (line 33) | class MPTModel(MPTPreTrainedModel):
    method __init__ (line 35) | def __init__(self, config: MPTConfig):
    method get_input_embeddings (line 81) | def get_input_embeddings(self):
    method set_input_embeddings (line 84) | def set_input_embeddings(self, value):
    method _attn_bias (line 88) | def _attn_bias(self, device, dtype, attention_mask: Optional[torch.Byt...
    method _apply_prefix_mask (line 119) | def _apply_prefix_mask(self, attn_bias: torch.Tensor, prefix_mask: tor...
    method _apply_sequence_id (line 134) | def _apply_sequence_id(self, attn_bias: torch.Tensor, sequence_id: tor...
    method forward (line 144) | def forward(self, input_ids: torch.LongTensor, past_key_values: Option...
    method param_init_fn (line 222) | def param_init_fn(self, module):
    method fsdp_wrap_fn (line 226) | def fsdp_wrap_fn(self, module):
    method activation_checkpointing_fn (line 229) | def activation_checkpointing_fn(self, module):
  class MPTForCausalLM (line 232) | class MPTForCausalLM(MPTPreTrainedModel):
    method __init__ (line 234) | def __init__(self, config: MPTConfig):
    method get_input_embeddings (line 255) | def get_input_embeddings(self):
    method set_input_embeddings (line 258) | def set_input_embeddings(self, value):
    method get_output_embeddings (line 261) | def get_output_embeddings(self):
    method set_output_embeddings (line 264) | def set_output_embeddings(self, new_embeddings):
    method set_decoder (line 267) | def set_decoder(self, decoder):
    method get_decoder (line 270) | def get_decoder(self):
    method forward (line 273) | def forward(self, input_ids: torch.LongTensor, past_key_values: Option...
    method param_init_fn (line 291) | def param_init_fn(self, module):
    method fsdp_wrap_fn (line 295) | def fsdp_wrap_fn(self, module):
    method activation_checkpointing_fn (line 298) | def activation_checkpointing_fn(self, module):
    method prepare_inputs_for_generation (line 301) | def prepare_inputs_for_generation(self, input_ids, past_key_values=Non...
    method _reorder_cache (line 322) | def _reorder_cache(past_key_values, beam_idx):

FILE: internvl_chat_llava/llava/model/language_model/mpt/norm.py
  function _cast_if_autocast_enabled (line 3) | def _cast_if_autocast_enabled(tensor):
  class LPLayerNorm (line 14) | class LPLayerNorm(torch.nn.LayerNorm):
    method __init__ (line 16) | def __init__(self, normalized_shape, eps=1e-05, elementwise_affine=Tru...
    method forward (line 19) | def forward(self, x):
  function rms_norm (line 27) | def rms_norm(x, weight=None, eps=1e-05):
  class RMSNorm (line 33) | class RMSNorm(torch.nn.Module):
    method __init__ (line 35) | def __init__(self, normalized_shape, eps=1e-05, weight=True, dtype=Non...
    method forward (line 43) | def forward(self, x):
  class LPRMSNorm (line 46) | class LPRMSNorm(RMSNorm):
    method __init__ (line 48) | def __init__(self, normalized_shape, eps=1e-05, weight=True, dtype=Non...
    method forward (line 51) | def forward(self, x):

FILE: internvl_chat_llava/llava/model/language_model/mpt/param_init_fns.py
  function torch_default_param_init_fn_ (line 10) | def torch_default_param_init_fn_(module: nn.Module, verbose: int=0, **kw...
  function fused_init_helper_ (line 17) | def fused_init_helper_(module: nn.Module, init_fn_):
  function generic_param_init_fn_ (line 28) | def generic_param_init_fn_(module: nn.Module, init_fn_, n_layers: int, d...
  function _normal_init_ (line 121) | def _normal_init_(std, mean=0.0):
  function _normal_param_init_fn_ (line 124) | def _normal_param_init_fn_(module: nn.Module, std: float, n_layers: int,...
  function baseline_param_init_fn_ (line 131) | def baseline_param_init_fn_(module: nn.Module, init_std: float, n_layers...
  function small_param_init_fn_ (line 137) | def small_param_init_fn_(module: nn.Module, n_layers: int, d_model: int,...
  function neox_param_init_fn_ (line 142) | def neox_param_init_fn_(module: nn.Module, n_layers: int, d_model: int, ...
  function kaiming_uniform_param_init_fn_ (line 155) | def kaiming_uniform_param_init_fn_(module: nn.Module, n_layers: int, d_m...
  function kaiming_normal_param_init_fn_ (line 162) | def kaiming_normal_param_init_fn_(module: nn.Module, n_layers: int, d_mo...
  function xavier_uniform_param_init_fn_ (line 169) | def xavier_uniform_param_init_fn_(module: nn.Module, n_layers: int, d_mo...
  function xavier_normal_param_init_fn_ (line 176) | def xavier_normal_param_init_fn_(module: nn.Module, n_layers: int, d_mod...

FILE: internvl_chat_llava/llava/model/llava_arch.py
  class LlavaMetaModel (line 27) | class LlavaMetaModel:
    method __init__ (line 29) | def __init__(self, config):
    method get_vision_tower (line 36) | def get_vision_tower(self):
    method initialize_vision_modules (line 42) | def initialize_vision_modules(self, model_args, fsdp=None):
  class LlavaMetaForCausalLM (line 90) | class LlavaMetaForCausalLM(ABC):
    method get_model (line 93) | def get_model(self):
    method get_vision_tower (line 96) | def get_vision_tower(self):
    method encode_images (line 99) | def encode_images(self, images):
    method prepare_inputs_labels_for_multimodal (line 104) | def prepare_inputs_labels_for_multimodal(
    method initialize_vision_tokenizer (line 223) | def initialize_vision_tokenizer(self, model_args, tokenizer):

FILE: internvl_chat_llava/llava/model/make_delta.py
  function make_delta (line 13) | def make_delta(base_model_path, target_model_path, delta_path, hub_repo_...

FILE: internvl_chat_llava/llava/model/multimodal_encoder/builder.py
  function build_vision_tower (line 5) | def build_vision_tower(vision_tower_cfg, **kwargs):

FILE: internvl_chat_llava/llava/model/multimodal_encoder/clip_encoder.py
  function is_intern_vit_6b_model (line 15) | def is_intern_vit_6b_model(vision_tower_name):
  function is_internvl_14b_model (line 20) | def is_internvl_14b_model(vision_tower_name):
  class CLIPVisionTower (line 25) | class CLIPVisionTower(nn.Module):
    method __init__ (line 26) | def __init__(self, vision_tower, args, delay_load=False):
    method load_model (line 47) | def load_model(self):
    method feature_select (line 72) | def feature_select(self, image_forward_outs):
    method forward (line 83) | def forward(self, images):
    method dummy_feature (line 107) | def dummy_feature(self):
    method dtype (line 111) | def dtype(self):
    method device (line 115) | def device(self):
    method config (line 119) | def config(self):
    method hidden_size (line 126) | def hidden_size(self):
    method num_patches (line 130) | def num_patches(self):

FILE: internvl_chat_llava/llava/model/multimodal_encoder/eva_clip/configuration_evaclip.py
  class EvaCLIPTextConfig (line 36) | class EvaCLIPTextConfig(PretrainedConfig):
    method __init__ (line 90) | def __init__(
    method from_pretrained (line 133) | def from_pretrained(cls, pretrained_model_name_or_path: Union[str, os....
  class EvaCLIPVisionConfig (line 151) | class EvaCLIPVisionConfig(PretrainedConfig):
    method __init__ (line 204) | def __init__(
    method from_pretrained (line 246) | def from_pretrained(cls, pretrained_model_name_or_path: Union[str, os....
  class EvaCLIPConfig (line 264) | class EvaCLIPConfig(PretrainedConfig):
    method __init__ (line 313) | def __init__(
    method from_text_vision_configs (line 402) | def from_text_vision_configs(cls, text_config: EvaCLIPTextConfig, visi...
    method to_dict (line 413) | def to_dict(self):

FILE: internvl_chat_llava/llava/model/multimodal_encoder/eva_clip/modeling_evaclip.py
  function _expand_mask (line 48) | def _expand_mask(mask: torch.Tensor, dtype: torch.dtype, tgt_len: Option...
  function contrastive_loss (line 64) | def contrastive_loss(logits: torch.Tensor) -> torch.Tensor:
  function clip_loss (line 68) | def clip_loss(similarity: torch.Tensor) -> torch.Tensor:
  class EvaCLIPVisionModelOutput (line 75) | class EvaCLIPVisionModelOutput(ModelOutput):
  class EvaCLIPTextModelOutput (line 104) | class EvaCLIPTextModelOutput(ModelOutput):
  class EvaCLIPOutput (line 133) | class EvaCLIPOutput(ModelOutput):
    method to_tuple (line 162) | def to_tuple(self) -> Tuple[Any]:
  class EvaCLIPVisionEmbeddings (line 169) | class EvaCLIPVisionEmbeddings(nn.Module):
    method __init__ (line 170) | def __init__(self, config: EvaCLIPVisionConfig):
    method forward (line 192) | def forward(self, pixel_values: torch.FloatTensor) -> torch.Tensor:
  class EvaCLIPTextEmbeddings (line 203) | class EvaCLIPTextEmbeddings(nn.Module):
    method __init__ (line 204) | def __init__(self, config: EvaCLIPTextConfig):
    method forward (line 214) | def forward(
  class EvaCLIPAttention (line 234) | class EvaCLIPAttention(nn.Module):
    method __init__ (line 237) | def __init__(self, config):
    method _shape (line 255) | def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int):
    method forward (line 258) | def forward(
  class EvaCLIPTextAttention (line 336) | class EvaCLIPTextAttention(nn.Module):
    method __init__ (line 339) | def __init__(self, config):
    method _shape (line 357) | def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int):
    method forward (line 360) | def forward(
  class EvaCLIPMLP (line 438) | class EvaCLIPMLP(nn.Module):
    method __init__ (line 439) | def __init__(self, config):
    method forward (line 446) | def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
  class EvaCLIPEncoderLayer (line 453) | class EvaCLIPEncoderLayer(nn.Module):
    method __init__ (line 454) | def __init__(self, config: EvaCLIPConfig):
    method forward (line 464) | def forward(
  class EvaCLIPPreTrainedModel (line 510) | class EvaCLIPPreTrainedModel(PreTrainedModel):
    method _init_weights (line 521) | def _init_weights(self, module):
    method _set_gradient_checkpointing (line 574) | def _set_gradient_checkpointing(self, module, value=False):
  class EvaCLIPEncoder (line 679) | class EvaCLIPEncoder(nn.Module):
    method __init__ (line 688) | def __init__(self, config: EvaCLIPConfig):
    method forward (line 694) | def forward(
  class EvaCLIPTextTransformer (line 782) | class EvaCLIPTextTransformer(nn.Module):
    method __init__ (line 783) | def __init__(self, config: EvaCLIPTextConfig):
    method forward (line 793) | def forward(
    method _build_causal_attention_mask (line 861) | def _build_causal_attention_mask(self, bsz, seq_len, dtype):
  class EvaCLIPTextModel (line 875) | class EvaCLIPTextModel(EvaCLIPPreTrainedModel):
    method __init__ (line 880) | def __init__(self, config: EvaCLIPTextConfig):
    method get_input_embeddings (line 886) | def get_input_embeddings(self) -> nn.Module:
    method set_input_embeddings (line 889) | def set_input_embeddings(self, value):
    method forward (line 894) | def forward(
  class EvaCLIPVisionTransformer (line 932) | class EvaCLIPVisionTransformer(nn.Module):
    method __init__ (line 933) | def __init__(self, config: EvaCLIPVisionConfig):
    method forward (line 944) | def forward(
  class EvaCLIPVisionModel (line 992) | class EvaCLIPVisionModel(EvaCLIPPreTrainedModel):
    method __init__ (line 996) | def __init__(self, config: EvaCLIPVisionConfig):
    method get_input_embeddings (line 1002) | def get_input_embeddings(self) -> nn.Module:
    method forward (line 1007) | def forward(
  class EvaCLIPModel (line 1047) | class EvaCLIPModel(EvaCLIPPreTrainedModel):
    method __init__ (line 1050) | def __init__(self, config: EvaCLIPConfig):
    method get_text_features (line 1083) | def get_text_features(
    method get_image_features (line 1130) | def get_image_features(
    method forward (line 1180) | def forward(
  class EvaCLIPTextModelWithProjection (line 1278) | class EvaCLIPTextModelWithProjection(EvaCLIPPreTrainedModel):
    method __init__ (line 1283) | def __init__(self, config: EvaCLIPTextConfig):
    method get_input_embeddings (line 1293) | def get_input_embeddings(self) -> nn.Module:
    method set_input_embeddings (line 1296) | def set_input_embeddings(self, value):
    method forward (line 1301) | def forward(
  class EvaCLIPVisionModelWithProjection (line 1359) | class EvaCLIPVisionModelWithProjection(EvaCLIPPreTrainedModel):
    method __init__ (line 1363) | def __init__(self, config: EvaCLIPVisionConfig):
    method get_input_embeddings (line 1373) | def get_input_embeddings(self) -> nn.Module:
    method forward (line 1378) | def forward(

FILE: internvl_chat_llava/llava/model/multimodal_encoder/intern_vit_6b/configuration_intern_vit.py
  class InternVisionConfig (line 15) | class InternVisionConfig(PretrainedConfig):
    method __init__ (line 63) | def __init__(
    method from_pretrained (line 105) | def from_pretrained(cls, pretrained_model_name_or_path: Union[str, os....

FILE: internvl_chat_llava/llava/model/multimodal_encoder/intern_vit_6b/flash_attention.py
  class FlashAttention (line 14) | class FlashAttention(nn.Module):
    method __init__ (line 25) | def __init__(self, softmax_scale=None, attention_dropout=0.0, device=N...
    method forward (line 30) | def forward(self, qkv, key_padding_mask=None, causal=False, cu_seqlens...

FILE: internvl_chat_llava/llava/model/multimodal_encoder/intern_vit_6b/modeling_intern_vit.py
  class InternRMSNorm (line 33) | class InternRMSNorm(nn.Module):
    method __init__ (line 34) | def __init__(self, hidden_size, eps=1e-6):
    method forward (line 39) | def forward(self, hidden_states):
  class InternVisionEmbeddings (line 61) | class InternVisionEmbeddings(nn.Module):
    method __init__ (line 62) | def __init__(self, config: InternVisionConfig):
    method _get_pos_embed (line 82) | def _get_pos_embed(self, pos_embed, H, W):
    method forward (line 90) | def forward(self, pixel_values: torch.FloatTensor) -> torch.Tensor:
  class InternAttention (line 105) | class InternAttention(nn.Module):
    method __init__ (line 108) | def __init__(self, config: InternVisionConfig):
    method _naive_attn (line 138) | def _naive_attn(self, x):
    method _flash_attn (line 157) | def _flash_attn(self, x, key_padding_mask=None, need_weights=False):
    method forward (line 174) | def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
  class InternMLP (line 179) | class InternMLP(nn.Module):
    method __init__ (line 180) | def __init__(self, config: InternVisionConfig):
    method forward (line 187) | def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
  class InternVisionEncoderLayer (line 194) | class InternVisionEncoderLayer(nn.Module):
    method __init__ (line 195) | def __init__(self, config: InternVisionConfig, drop_path_rate: float):
    method forward (line 210) | def forward(
  class InternVisionEncoder (line 225) | class InternVisionEncoder(nn.Module):
    method __init__ (line 235) | def __init__(self, config: InternVisionConfig):
    method forward (line 244) | def forward(
  class InternVisionModel (line 291) | class InternVisionModel(PreTrainedModel):
    method __init__ (line 295) | def __init__(self, config: InternVisionConfig):
    method resize_pos_embeddings (line 302) | def resize_pos_embeddings(self, old_size, new_size, patch_size):
    method get_input_embeddings (line 313) | def get_input_embeddings(self):
    method forward (line 316) | def forward(

FILE: internvl_chat_llava/llava/model/multimodal_encoder/internvl_14b/__init__.py
  class InternVLTokenizer (line 23) | class InternVLTokenizer(nn.Module):
    method __init__ (line 24) | def __init__(self, model_path):
    method forward (line 30) | def forward(self, text, prefix='summarize:'):
  function build_transform (line 39) | def build_transform(task, image_size=224, mean=[0.485, 0.456, 0.406], st...
  function load_internvl_c_huggingface (line 56) | def load_internvl_c_huggingface(ckpt_path, device, task):
  function load_internvl_g_huggingface (line 73) | def load_internvl_g_huggingface(ckpt_path, device, task):

FILE: internvl_chat_llava/llava/model/multimodal_encoder/internvl_14b/configuration_intern_vit.py
  class InternVisionConfig (line 15) | class InternVisionConfig(PretrainedConfig):
    method __init__ (line 63) | def __init__(
    method from_pretrained (line 105) | def from_pretrained(cls, pretrained_model_name_or_path: Union[str, os....

FILE: internvl_chat_llava/llava/model/multimodal_encoder/internvl_14b/configuration_internvl.py
  class InternVLConfig (line 17) | class InternVLConfig(PretrainedConfig):
    method __init__ (line 57) | def __init__(
    method to_dict (line 97) | def to_dict(self):

FILE: internvl_chat_llava/llava/model/multimodal_encoder/internvl_14b/flash_attention.py
  class FlashAttention (line 15) | class FlashAttention(nn.Module):
    method __init__ (line 26) | def __init__(self, softmax_scale=None, attention_dropout=0.0, device=N...
    method forward (line 31) | def forward(self, qkv, key_padding_mask=None, causal=False, cu_seqlens...

FILE: internvl_chat_llava/llava/model/multimodal_encoder/internvl_14b/modeling_intern_vit.py
  class InternRMSNorm (line 33) | class InternRMSNorm(nn.Module):
    method __init__ (line 34) | def __init__(self, hidden_size, eps=1e-6):
    method forward (line 39) | def forward(self, hidden_states):
  class InternVisionEmbeddings (line 61) | class InternVisionEmbeddings(nn.Module):
    method __init__ (line 62) | def __init__(self, config: InternVisionConfig):
    method _get_pos_embed (line 82) | def _get_pos_embed(self, pos_embed, H, W):
    method forward (line 90) | def forward(self, pixel_values: torch.FloatTensor) -> torch.Tensor:
  class InternAttention (line 105) | class InternAttention(nn.Module):
    method __init__ (line 108) | def __init__(self, config: InternVisionConfig):
    method _naive_attn (line 138) | def _naive_attn(self, x):
    method _flash_attn (line 157) | def _flash_attn(self, x, key_padding_mask=None, need_weights=False):
    method forward (line 174) | def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
  class InternMLP (line 179) | class InternMLP(nn.Module):
    method __init__ (line 180) | def __init__(self, config: InternVisionConfig):
    method forward (line 187) | def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
  class InternVisionEncoderLayer (line 194) | class InternVisionEncoderLayer(nn.Module):
    method __init__ (line 195) | def __init__(self, config: InternVisionConfig, drop_path_rate: float):
    method forward (line 210) | def forward(
  class InternVisionEncoder (line 225) | class InternVisionEncoder(nn.Module):
    method __init__ (line 235) | def __init__(self, config: InternVisionConfig):
    method forward (line 244) | def forward(
  class InternVisionModel (line 291) | class InternVisionModel(PreTrainedModel):
    method __init__ (line 295) | def __init__(self, config: InternVisionConfig):
    method resize_pos_embeddings (line 302) | def resize_pos_embeddings(self, old_size, new_size, patch_size):
    method get_input_embeddings (line 313) | def get_input_embeddings(self):
    method forward (line 316) | def forward(

FILE: internvl_chat_llava/llava/model/multimodal_encoder/internvl_14b/modeling_internvl.py
  class InternVLPreTrainedModel (line 33) | class InternVLPreTrainedModel(PreTrainedModel):
    method _init_weights (line 49) | def _init_weights(self, module):
    method _set_gradient_checkpointing (line 67) | def _set_gradient_checkpointing(self, module, value=False):
  class CrossAttention (line 74) | class CrossAttention(nn.Module):
    method __init__ (line 75) | def __init__(
    method forward (line 106) | def forward(self, x, k=None, v=None):
  class AttentiveBlock (line 139) | class AttentiveBlock(nn.Module):
    method __init__ (line 141) | def __init__(self, dim, num_heads, qkv_bias=False, qk_scale=None, drop...
    method forward (line 154) | def forward(self, x_q, x_kv, pos_q, pos_k, bool_masked_pos, rel_pos_bi...
  class AttentionPoolingBlock (line 163) | class AttentionPoolingBlock(AttentiveBlock):
    method forward (line 165) | def forward(self, x):
  class InternVLModel (line 173) | class InternVLModel(InternVLPreTrainedModel):
    method __init__ (line 177) | def __init__(self, config: InternVLConfig):
    method wrap_backbone_lora (line 218) | def wrap_backbone_lora(self, r=128, lora_alpha=256, lora_dropout=0.05):
    method wrap_qllama_lora (line 228) | def wrap_qllama_lora(self, r=128, lora_alpha=256, lora_dropout=0.05):
    method get_input_embeddings (line 239) | def get_input_embeddings(self):
    method set_input_embeddings (line 242) | def set_input_embeddings(self, value):
    method set_output_embeddings (line 245) | def set_output_embeddings(self, new_embeddings):
    method get_output_embeddings (line 248) | def get_output_embeddings(self) -> nn.Module:
    method generate (line 252) | def generate(
    method get_text_features (line 287) | def get_text_features(
    method get_image_features (line 336) | def get_image_features(
    method encode_image (line 382) | def encode_image(self, image, mode):
    method encode_text (line 406) | def encode_text(self, text):
    method forward (line 419) | def forward(self, pixel_values: torch.FloatTensor,
  class InternVL_C (line 461) | class InternVL_C(InternVLModel):
    method encode_image (line 463) | def encode_image(self, image):
    method encode_text (line 472) | def encode_text(self, text):
    method forward (line 485) | def forward(self, image, text):
  class InternVL_G (line 501) | class InternVL_G(InternVLModel):
    method encode_image (line 503) | def encode_image(self, image):
    method encode_text (line 517) | def encode_text(self, text):
    method forward (line 530) | def forward(self, image, text):

FILE: internvl_chat_llava/llava/model/multimodal_encoder/internvl_14b/modeling_qllama.py
  function _make_causal_mask (line 42) | def _make_causal_mask(
  function _expand_mask (line 60) | def _expand_mask(mask: torch.Tensor, dtype: torch.dtype, tgt_len: Option...
  class LlamaRMSNorm (line 74) | class LlamaRMSNorm(nn.Module):
    method __init__ (line 75) | def __init__(self, hidden_size, eps=1e-6):
    method forward (line 83) | def forward(self, hidden_states):
  class LlamaRotaryEmbedding (line 109) | class LlamaRotaryEmbedding(torch.nn.Module):
    method __init__ (line 110) | def __init__(self, dim, max_position_embeddings=2048, base=10000, devi...
    method forward (line 124) | def forward(self, x, seq_len=None):
  class FixedLlamaRotaryEmbedding (line 141) | class FixedLlamaRotaryEmbedding(torch.nn.Module):
    method __init__ (line 142) | def __init__(self, dim, max_position_embeddings=2048, base=10000, devi...
    method _set_cos_sin_cache (line 155) | def _set_cos_sin_cache(self, seq_len, device, dtype):
    method forward (line 165) | def forward(self, x, seq_len=None):
  function rotate_half (line 179) | def rotate_half(x):
  function apply_rotary_pos_emb (line 186) | def apply_rotary_pos_emb(q, k, cos, sin, position_ids):
  class LlamaMLP (line 196) | class LlamaMLP(nn.Module):
    method __init__ (line 197) | def __init__(
    method forward (line 209) | def forward(self, x):
  class LlamaAttention (line 213) | class LlamaAttention(nn.Module):
    method __init__ (line 216) | def __init__(self, config: LlamaConfig):
    method _shape (line 235) | def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int):
    method forward (line 238) | def forward(
  class LlamaCrossAttention (line 304) | class LlamaCrossAttention(nn.Module):
    method __init__ (line 307) | def __init__(self, config: LlamaConfig):
    method _shape (line 329) | def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int):
    method forward (line 332) | def forward(
  class LlamaDecoderLayer (line 408) | class LlamaDecoderLayer(nn.Module):
    method __init__ (line 409) | def __init__(self, config: LlamaConfig, use_cross_attn: bool):
    method forward (line 423) | def forward(
  class LlamaPreTrainedModel (line 520) | class LlamaPreTrainedModel(PreTrainedModel):
    method _init_weights (line 527) | def _init_weights(self, module):
    method _set_gradient_checkpointing (line 538) | def _set_gradient_checkpointing(self, module, value=False):
  class LlamaModel (line 613) | class LlamaModel(LlamaPreTrainedModel):
    method __init__ (line 621) | def __init__(self, config: LlamaConfig):
    method get_input_embeddings (line 636) | def get_input_embeddings(self):
    method set_input_embeddings (line 639) | def set_input_embeddings(self, value):
    method _prepare_decoder_attention_mask (line 643) | def _prepare_decoder_attention_mask(self, attention_mask, input_shape,...
    method forward (line 667) | def forward(
    method forward_train (line 783) | def forward_train(
  class LlamaForCausalLM (line 915) | class LlamaForCausalLM(LlamaPreTrainedModel):
    method __init__ (line 916) | def __init__(self, config):
    method get_input_embeddings (line 925) | def get_input_embeddings(self):
    method set_input_embeddings (line 928) | def set_input_embeddings(self, value):
    method get_output_embeddings (line 931) | def get_output_embeddings(self):
    method set_output_embeddings (line 934) | def set_output_embeddings(self, new_embeddings):
    method set_decoder (line 937) | def set_decoder(self, decoder):
    method get_decoder (line 940) | def get_decoder(self):
    method forward (line 945) | def forward(
    method prepare_inputs_for_generation (line 1035) | def prepare_inputs_for_generation(
    method _reorder_cache (line 1069) | def _reorder_cache(past_key_values, beam_idx):

FILE: internvl_chat_llava/llava/model/multimodal_projector/builder.py
  class IdentityMap (line 6) | class IdentityMap(nn.Module):
    method __init__ (line 7) | def __init__(self):
    method forward (line 10) | def forward(self, x, *args, **kwargs):
    method config (line 14) | def config(self):
  class SimpleResBlock (line 18) | class SimpleResBlock(nn.Module):
    method __init__ (line 19) | def __init__(self, channels):
    method forward (line 28) | def forward(self, x):
  class TwoMLP (line 33) | class TwoMLP(nn.Module):
    method __init__ (line 34) | def __init__(self, config):
    method forward (line 48) | def forward(self, inputs):
  function build_vision_projector (line 58) | def build_vision_projector(config, delay_load=False, **kwargs):

FILE: internvl_chat_llava/llava/model/utils.py
  function auto_upgrade (line 4) | def auto_upgrade(config):

FILE: internvl_chat_llava/llava/serve/cli.py
  function load_image (line 18) | def load_image(image_file):
  function main (line 27) | def main(args):

FILE: internvl_chat_llava/llava/serve/controller.py
  class DispatchMethod (line 28) | class DispatchMethod(Enum):
    method from_str (line 33) | def from_str(cls, name):
  class WorkerInfo (line 43) | class WorkerInfo:
  function heart_beat_controller (line 51) | def heart_beat_controller(controller):
  class Controller (line 57) | class Controller:
    method __init__ (line 58) | def __init__(self, dispatch_method: str):
    method register_worker (line 69) | def register_worker(self, worker_name: str, check_heart_beat: bool,
    method get_worker_status (line 88) | def get_worker_status(self, worker_name: str):
    method remove_worker (line 101) | def remove_worker(self, worker_name: str):
    method refresh_all_workers (line 104) | def refresh_all_workers(self):
    method list_models (line 112) | def list_models(self):
    method get_worker_address (line 120) | def get_worker_address(self, model_name: str):
    method receive_heart_beat (line 173) | def receive_heart_beat(self, worker_name: str, queue_length: int):
    method remove_stable_workers_by_expiration (line 183) | def remove_stable_workers_by_expiration(self):
    method worker_api_generate_stream (line 193) | def worker_api_generate_stream(self, params):
    method worker_api_get_status (line 220) | def worker_api_get_status(self):
  function register_worker (line 243) | async def register_worker(request: Request):
  function refresh_all_workers (line 251) | async def refresh_all_workers():
  function list_models (line 256) | async def list_models():
  function get_worker_address (line 262) | async def get_worker_address(request: Request):
  function receive_heart_beat (line 269) | async def receive_heart_beat(request: Request):
  function worker_api_generate_stream (line 277) | async def worker_api_generate_stream(request: Request):
  function worker_api_get_status (line 284) | async def worker_api_get_status(request: Request):

FILE: internvl_chat_llava/llava/serve/gradio_web_server.py
  function get_conv_log_filename (line 32) | def get_conv_log_filename():
  function sort_models (line 38) | def sort_models(models):
  function get_model_list (line 58) | def get_model_list():
  function load_demo (line 79) | def load_demo(url_params, request: gr.Request):
  function load_demo_refresh_model_list (line 93) | def load_demo_refresh_model_list(request: gr.Request):
  function vote_last_response (line 104) | def vote_last_response(state, vote_type, model_selector, request: gr.Req...
  function upvote_last_response (line 116) | def upvote_last_response(state, model_selector, request: gr.Request):
  function downvote_last_response (line 122) | def downvote_last_response(state, model_selector, request: gr.Request):
  function flag_last_response (line 128) | def flag_last_response(state, model_selector, request: gr.Request):
  function regenerate (line 134) | def regenerate(state, image_process_mode, request: gr.Request):
  function clear_history (line 144) | def clear_history(request: gr.Request):
  function add_text (line 150) | def add_text(state, text, image, image_process_mode, request: gr.Request):
  function http_bot (line 174) | def http_bot(state, model_selector, temperature, top_p, max_new_tokens, ...
  function build_demo (line 344) | def build_demo(embed_mode):

FILE: internvl_chat_llava/llava/serve/model_worker.py
  function heart_beat_worker (line 37) | def heart_beat_worker(controller):
  class ModelWorker (line 44) | class ModelWorker:
    method __init__ (line 45) | def __init__(self, controller_addr, worker_addr,
    method register_to_controller (line 75) | def register_to_controller(self):
    method send_heart_beat (line 87) | def send_heart_beat(self):
    method get_queue_length (line 108) | def get_queue_length(self):
    method get_status (line 115) | def get_status(self):
    method generate_stream (line 123) | def generate_stream(self, params):
    method generate_stream_gate (line 194) | def generate_stream_gate(self, params):
  function release_model_semaphore (line 224) | def release_model_semaphore(fn=None):
  function generate_stream (line 231) | async def generate_stream(request: Request):
  function get_status (line 247) | async def get_status(request: Request):

FILE: internvl_chat_llava/llava/serve/test_message.py
  function main (line 9) | def main():

FILE: internvl_chat_llava/llava/train/dist_utils.py
  function _find_free_port (line 13) | def _find_free_port():
  function _is_free_port (line 24) | def _is_free_port(port):
  function init_dist (line 31) | def init_dist(launcher, backend='nccl', **kwargs):
  function _init_dist_pytorch (line 44) | def _init_dist_pytorch(backend, **kwargs):
  function _init_dist_mpi (line 52) | def _init_dist_mpi(backend, **kwargs):
  function _init_dist_slurm (line 65) | def _init_dist_slurm(backend, port=None):

FILE: internvl_chat_llava/llava/train/llama_flash_attn_monkey_patch.py
  function forward (line 16) | def forward(
  function _prepare_decoder_attention_mask (line 98) | def _prepare_decoder_attention_mask(
  function replace_llama_attn_with_flash_attn (line 105) | def replace_llama_attn_with_flash_attn():

FILE: internvl_chat_llava/llava/train/llava_trainer.py
  function maybe_zero_3 (line 13) | def maybe_zero_3(param, ignore_status=False, name=None):
  function get_mm_adapter_state_maybe_zero_3 (line 27) | def get_mm_adapter_state_maybe_zero_3(named_params, keys_to_match):
  function split_to_even_chunks (line 33) | def split_to_even_chunks(indices, lengths, num_chunks):
  function get_modality_length_grouped_indices (line 55) | def get_modality_length_grouped_indices(lengths, batch_size, world_size,...
  function get_length_grouped_indices (line 87) | def get_length_grouped_indices(lengths, batch_size, world_size, generato...
  class LengthGroupedSampler (line 98) | class LengthGroupedSampler(Sampler):
    method __init__ (line 104) | def __init__(
    method __len__ (line 121) | def __len__(self):
    method __iter__ (line 124) | def __iter__(self):
  class LLaVATrainer (line 132) | class LLaVATrainer(Trainer):
    method _get_train_sampler (line 134) | def _get_train_sampler(self) -> Optional[torch.utils.data.Sampler]:
    method _save_checkpoint (line 150) | def _save_checkpoint(self, model, trial, metrics=None):
    method _save (line 176) | def _save(self, output_dir: Optional[str] = None, state_dict=None):

FILE: internvl_chat_llava/llava/train/train.py
  function rank0_print (line 44) | def rank0_print(*args):
  class ModelArguments (line 50) | class ModelArguments:
  class DataArguments (line 66) | class DataArguments:
  class TrainingArguments (line 77) | class TrainingArguments(transformers.TrainingArguments):
  function maybe_zero_3 (line 112) | def maybe_zero_3(param, ignore_status=False, name=None):
  function get_peft_state_maybe_zero_3 (line 127) | def get_peft_state_maybe_zero_3(named_params, bias):
  function get_peft_state_non_lora_maybe_zero_3 (line 152) | def get_peft_state_non_lora_maybe_zero_3(named_params, require_grad_only...
  function get_mm_adapter_state_maybe_zero_3 (line 160) | def get_mm_adapter_state_maybe_zero_3(named_params, keys_to_match):
  function find_all_linear_names (line 166) | def find_all_linear_names(model):
  function safe_save_model_for_hf_trainer (line 180) | def safe_save_model_for_hf_trainer(trainer: transformers.Trainer,
  function smart_tokenizer_and_embedding_resize (line 224) | def smart_tokenizer_and_embedding_resize(
  function _tokenize_fn (line 249) | def _tokenize_fn(strings: Sequence[str],
  function _mask_targets (line 276) | def _mask_targets(target, tokenized_lens, speakers):
  function _add_speaker_and_signal (line 287) | def _add_speaker_and_signal(header, source, get_conversation=True):
  function preprocess_multimodal (line 308) | def preprocess_multimodal(
  function preprocess_llama_2 (line 332) | def preprocess_llama_2(
  function preprocess_v1 (line 414) | def preprocess_v1(
  function preprocess_mpt (line 505) | def preprocess_mpt(
  function preprocess_plain (line 571) | def preprocess_plain(
  function preprocess (line 593) | def preprocess(
  class LazySupervisedDataset (line 641) | class LazySupervisedDataset(Dataset):
    method __init__ (line 644) | def __init__(self, data_path: str,
    method __len__ (line 655) | def __len__(self):
    method lengths (line 659) | def lengths(self):
    method modality_lengths (line 667) | def modality_lengths(self):
    method __getitem__ (line 675) | def __getitem__(self, i) -> Dict[str, torch.Tensor]:
  class DataCollatorForSupervisedDataset (line 733) | class DataCollatorForSupervisedDataset(object):
    method __call__ (line 738) | def __call__(self, instances: Sequence[Dict]) -> Dict[str, torch.Tensor]:
  function make_supervised_data_module (line 766) | def make_supervised_data_module(tokenizer: transformers.PreTrainedTokeni...
  function train (line 778) | def train(attn_implementation=None):

FILE: internvl_chat_llava/llava/train/train_custom.py
  function pil_loader (line 52) | def pil_loader(img_str):
  class TCSLoader (line 58) | class TCSLoader(object):
    method __init__ (line 60) | def __init__(self, conf_path):
    method __call__ (line 66) | def __call__(self, fn):
  function rank0_print (line 75) | def rank0_print(*args):
  class ModelArguments (line 81) | class ModelArguments:
  class DataArguments (line 97) | class DataArguments:
  class TrainingArguments (line 108) | class TrainingArguments(transformers.TrainingArguments):
  function maybe_zero_3 (line 143) | def maybe_zero_3(param, ignore_status=False, name=None):
  function get_peft_state_maybe_zero_3 (line 158) | def get_peft_state_maybe_zero_3(named_params, bias):
  function get_peft_state_non_lora_maybe_zero_3 (line 183) | def get_peft_state_non_lora_maybe_zero_3(named_params, require_grad_only...
  function get_mm_adapter_state_maybe_zero_3 (line 191) | def get_mm_adapter_state_maybe_zero_3(named_params, keys_to_match):
  function find_all_linear_names (line 197) | def find_all_linear_names(model):
  function safe_save_model_for_hf_trainer (line 211) | def safe_save_model_for_hf_trainer(trainer: transformers.Trainer,
  function smart_tokenizer_and_embedding_resize (line 255) | def smart_tokenizer_and_embedding_resize(
  function _tokenize_fn (line 280) | def _tokenize_fn(strings: Sequence[str],
  function _mask_targets (line 307) | def _mask_targets(target, tokenized_lens, speakers):
  function _add_speaker_and_signal (line 318) | def _add_speaker_and_signal(header, source, get_conversation=True):
  function preprocess_multimodal (line 339) | def preprocess_multimodal(
  function preprocess_llama_2 (line 363) | def preprocess_llama_2(
  function preprocess_v1 (line 445) | def preprocess_v1(
  function preprocess_mpt (line 536) | def preprocess_mpt(
  function preprocess_plain (line 602) | def preprocess_plain(
  function preprocess (line 624) | def preprocess(
  class WeightedConcatDataset (line 675) | class WeightedConcatDataset(ConcatDataset):
    method __init__ (line 676) | def __init__(self, datasets, weights):
    method __iter__ (line 682) | def __iter__(self):
    method __len__ (line 685) | def __len__(self):
  class LazySupervisedDataset (line 689) | class LazySupervisedDataset(Dataset):
    method __init__ (line 692) | def __init__(self, meta,
    method __len__ (line 708) | def __len__(self):
    method lengths (line 712) | def lengths(self):
    method modality_lengths (line 720) | def modality_lengths(self):
    method __getitem__ (line 728) | def __getitem__(self, i) -> Dict[str, torch.Tensor]:
  class DataCollatorForSupervisedDataset (line 788) | class DataCollatorForSupervisedDataset(object):
    method __call__ (line 793) | def __call__(self, instances: Sequence[Dict]) -> Dict[str, torch.Tensor]:
  function make_supervised_data_module (line 821) | def make_supervised_data_module(tokenizer: transformers.PreTrainedTokeni...
  function train (line 850) | def train(attn_implementation=None):

FILE: internvl_chat_llava/llava/utils.py
  function build_logger (line 17) | def build_logger(logger_name, logger_filename):
  class StreamToLogger (line 60) | class StreamToLogger(object):
    method __init__ (line 64) | def __init__(self, logger, log_level=logging.INFO):
    method __getattr__ (line 70) | def __getattr__(self, attr):
    method write (line 73) | def write(self, buf):
    method flush (line 87) | def flush(self):
  function disable_torch_init (line 93) | def disable_torch_init():
  function violates_moderation (line 102) | def violates_moderation(text):
  function pretty_print_semaphore (line 123) | def pretty_print_semaphore(semaphore):

FILE: internvl_chat_llava/scripts/convert_mmbench_for_submission.py
  function get_args (line 6) | def get_args():

FILE: internvl_chat_llava/scripts/convert_seed_for_submission.py
  function get_args (line 6) | def get_args():
  function eval_single (line 14) | def eval_single(result_file, eval_only_type=None):

FILE: internvl_chat_llava/scripts/convert_sqa_to_llava.py
  function convert_to_llava (line 8) | def convert_to_llava(base_dir, split, prompt_format="QCM-LEA"):
  function convert_to_jsonl (line 49) | def convert_to_jsonl(base_dir, split, prompt_format="QCM-LEPA"):
  function main (line 83) | def main(task, **kwargs):

FILE: internvl_chat_llava/scripts/convert_sqa_to_llava_base_prompt.py
  function get_question_text (line 1) | def get_question_text(problem):
  function get_context_text (line 6) | def get_context_text(problem, use_caption):
  function get_choice_text (line 15) | def get_choice_text(probelm, options):
  function get_answer (line 25) | def get_answer(problem, options):
  function get_lecture_text (line 29) | def get_lecture_text(problem):
  function get_solution_text (line 35) | def get_solution_text(problem):
  function create_one_example_chatbot (line 41) | def create_one_example_chatbot(format, question, context, choice, answer...
  function create_one_example (line 106) | def create_one_example(format, question, context, choice, answer, lectur...
  function create_one_example_gpt4 (line 162) | def create_one_example_gpt4(format, question, context, choice, answer, l...
  function build_prompt_chatbot (line 221) | def build_prompt_chatbot(problems, shot_qids, prompt_format, use_caption...
  function build_prompt (line 244) | def build_prompt(problems, shot_qids, test_qid, args):
  function build_prompt_gpt4 (line 291) | def build_prompt_gpt4(problems, shot_qids, test_qid, args):

FILE: internvl_chat_llava/scripts/convert_vizwiz_for_submission.py
  function parse_args (line 8) | def parse_args():

FILE: internvl_chat_llava/scripts/convert_vqav2_for_submission.py
  function parse_args (line 8) | def parse_args():

FILE: internvl_chat_llava/scripts/merge_lora_weights.py
  function merge_lora (line 6) | def merge_lora(args):

FILE: internvl_g/eval/evaluate_caption.py
  class CaptionDataset (line 36) | class CaptionDataset(torch.utils.data.Dataset):
    method __init__ (line 38) | def __init__(self, name, root, annotation, prompt, input_size=224):
    method __len__ (line 53) | def __len__(self):
    method __getitem__ (line 56) | def __getitem__(self, idx):
  function collate_fn (line 76) | def collate_fn(inputs, tokenizer):
  class InferenceSampler (line 85) | class InferenceSampler(torch.utils.data.sampler.Sampler):
    method __init__ (line 87) | def __init__(self, size):
    method _get_local_indices (line 95) | def _get_local_indices(total_size, world_size, rank):
    method __iter__ (line 104) | def __iter__(self):
    method __len__ (line 107) | def __len__(self):
  function evaluate_qllama_model (line 111) | def evaluate_qllama_model():

FILE: internvl_g/internvl/dist_utils.py
  function _find_free_port (line 13) | def _find_free_port():
  function _is_free_port (line 24) | def _is_free_port(port):
  function init_dist (line 31) | def init_dist(launcher, backend='nccl', **kwargs):
  function _init_dist_pytorch (line 44) | def _init_dist_pytorch(backend, **kwargs):
  function _init_dist_mpi (line 52) | def _init_dist_mpi(backend, **kwargs):
  function _init_dist_slurm (line 65) | def _init_dist_slurm(backend, port=None):

FILE: internvl_g/internvl/model/internvl_stage2/__init__.py
  class InternVLTokenizer (line 23) | class InternVLTokenizer(nn.Module):
    method __init__ (line 24) | def __init__(self, model_path):
    method forward (line 30) | def forward(self, text, prefix='summarize:'):
  function build_transform (line 39) | def build_transform(task, image_size=224, mean=[0.485, 0.456, 0.406], st...
  function load_internvl_c_huggingface (line 56) | def load_internvl_c_huggingface(ckpt_path, device, task):
  function load_internvl_g_huggingface (line 73) | def load_internvl_g_huggingface(ckpt_path, device, task):

FILE: internvl_g/internvl/model/internvl_stage2/configuration_intern_vit.py
  class InternVisionConfig (line 15) | class InternVisionConfig(PretrainedConfig):
    method __init__ (line 63) | def __init__(
    method from_pretrained (line 105) | def from_pretrained(cls, pretrained_model_name_or_path: Union[str, os....

FILE: internvl_g/internvl/model/internvl_stage2/configuration_internvl.py
  class InternVLConfig (line 17) | class InternVLConfig(PretrainedConfig):
    method __init__ (line 57) | def __init__(
    method to_dict (line 97) | def to_dict(self):

FILE: internvl_g/internvl/model/internvl_stage2/flash_attention.py
  class FlashAttention (line 15) | class FlashAttention(nn.Module):
    method __init__ (line 26) | def __init__(self, softmax_scale=None, attention_dropout=0.0, device=N...
    method forward (line 31) | def forward(self, qkv, key_padding_mask=None, causal=False, cu_seqlens...

FILE: internvl_g/internvl/model/internvl_stage2/modeling_intern_vit.py
  class InternRMSNorm (line 33) | class InternRMSNorm(nn.Module):
    method __init__ (line 34) | def __init__(self, hidden_size, eps=1e-6):
    method forward (line 39) | def forward(self, hidden_states):
  class InternVisionEmbeddings (line 61) | class InternVisionEmbeddings(nn.Module):
    method __init__ (line 62) | def __init__(self, config: InternVisionConfig):
    method forward (line 82) | def forward(self, pixel_values: torch.FloatTensor) -> torch.Tensor:
  class InternAttention (line 93) | class InternAttention(nn.Module):
    method __init__ (line 96) | def __init__(self, config: InternVisionConfig):
    method _naive_attn (line 126) | def _naive_attn(self, x):
    method _flash_attn (line 145) | def _flash_attn(self, x, key_padding_mask=None, need_weights=False):
    method forward (line 162) | def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
  class InternMLP (line 167) | class InternMLP(nn.Module):
    method __init__ (line 168) | def __init__(self, config: InternVisionConfig):
    method forward (line 175) | def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
  class InternVisionEncoderLayer (line 182) | class InternVisionEncoderLayer(nn.Module):
    method __init__ (line 183) | def __init__(self, config: InternVisionConfig, drop_path_rate: float):
    method forward (line 198) | def forward(
  class InternVisionEncoder (line 213) | class InternVisionEncoder(nn.Module):
    method __init__ (line 223) | def __init__(self, config: InternVisionConfig):
    method forward (line 232) | def forward(
  class InternVisionModel (line 279) | class InternVisionModel(PreTrainedModel):
    method __init__ (line 283) | def __init__(self, config: InternVisionConfig):
    method resize_pos_embeddings (line 290) | def resize_pos_embeddings(self, old_size, new_size, patch_size):
    method get_input_embeddings (line 301) | def get_input_embeddings(self):
    method forward (line 304) | def forward(

FILE: internvl_g/internvl/model/internvl_stage2/modeling_internvl.py
  class InternVLPreTrainedModel (line 35) | class InternVLPreTrainedModel(PreTrainedModel):
    method _set_gradient_checkpointing (line 69) | def _set_gradient_checkpointing(self, module, value=False):
  class CrossAttention (line 76) | class CrossAttention(nn.Module):
    method __init__ (line 77) | def __init__(
    method forward (line 108) | def forward(self, x, k=None, v=None):
  class AttentiveBlock (line 141) | class AttentiveBlock(nn.Module):
    method __init__ (line 143) | def __init__(self, dim, num_heads, qkv_bias=False, qk_scale=None, drop...
    method forward (line 156) | def forward(self, x_q, x_kv, pos_q, pos_k, bool_masked_pos, rel_pos_bi...
  class AttentionPoolingBlock (line 165) | class AttentionPoolingBlock(AttentiveBlock):
    method forward (line 167) | def forward(self, x):
  class InternVLModelOutput (line 176) | class InternVLModelOutput(ModelOutput):
    method to_tuple (line 186) | def to_tuple(self) -> Tuple[Any]:
  class GatherLayer (line 195) | class GatherLayer(torch.autograd.Function):
    method forward (line 200) | def forward(ctx, input):
    method backward (line 207) | def backward(ctx, grads):
  class InternVLModel (line 215) | class InternVLModel(InternVLPreTrainedModel):
    method __init__ (line 219) | def __init__(self, config: InternVLConfig):
    method wrap_backbone_lora (line 261) | def wrap_backbone_lora(self, r=128, lora_alpha=256, lora_dropout=0.05):
    method wrap_qllama_lora (line 271) | def wrap_qllama_lora(self, r=128, lora_alpha=256, lora_dropout=0.05):
    method get_input_embeddings (line 282) | def get_input_embeddings(self):
    method set_input_embeddings (line 285) | def set_input_embeddings(self, value):
    method set_ou

Copy disabled (too large) Download .json

Condensed preview — 864 files, each showing path, character count, and a content snippet. Download the .json file for the full structured content (31,193K chars).

[
  {
    "path": ".flake8",
    "chars": 218,
    "preview": "[flake8]\nignore = E501, F403, C901, W504, W605, E251, E122, E126, E127, E722, W503, E128, E741, E731, E701\nselect = E1, "
  },
  {
    "path": ".github/CONTRIBUTING.md",
    "chars": 11248,
    "preview": "## Contributing to InternLM\n\nWelcome to the InternLM community, all kinds of contributions are welcomed, including but n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/1-bug-report.yml",
    "chars": 2031,
    "preview": "name: 🐞 Bug report\ndescription: Create a report to help us reproduce and fix the bug\ntitle: \"[Bug] \"\nlabels: ['Bug']\n\nbo"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/2-feature-request.yml",
    "chars": 1172,
    "preview": "name: 🚀 Feature request\ndescription: Suggest an idea for this project\ntitle: \"[Feature] \"\n\nbody:\n- type: markdown\n  attr"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/3-documentation.yml",
    "chars": 547,
    "preview": "name: 📚 Documentation\ndescription: Report an issue related to the documentation.\nlabels: \"kind/doc,status/unconfirmed\"\nt"
  },
  {
    "path": ".gitignore",
    "chars": 3242,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
  },
  {
    "path": ".isort.cfg",
    "chars": 697,
    "preview": "[isort]\nline-length = 180\nmulti_line_output = 0\nextra_standard_library = setuptools\nknown_third_party = PIL,asynctest,ci"
  },
  {
    "path": ".pre-commit-config.yaml",
    "chars": 853,
    "preview": "exclude: ^internvl_chat_llava/\nrepos:\n  - repo: https://github.com/PyCQA/flake8\n    rev: 5.0.4\n    hooks:\n      - id: fl"
  },
  {
    "path": "INSTALLATION.md",
    "chars": 2167,
    "preview": "## 🛠️ Installation\n\n- Clone this repository:\n\n  ```bash\n  git clone https://github.com/OpenGVLab/InternVL.git\n  ```\n\n- C"
  },
  {
    "path": "LICENSE",
    "chars": 1066,
    "preview": "MIT License\n\nCopyright (c) 2023 OpenGVLab\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\n"
  },
  {
    "path": "README.md",
    "chars": 70069,
    "preview": "<div align=\"center\">\n\n# InternVL Family: Closing the Gap to Commercial Multimodal Models with Open-Source Suites —— A Pi"
  },
  {
    "path": "README_zh.md",
    "chars": 60886,
    "preview": "<div align=\"center\">\n\n# InternVL家族：通过开源组件缩小与商业多模态模型的差距 —— GPT-5的开源替代方案\n\n<div align=\"center\">\n  <img width=\"500\" alt=\"ima"
  },
  {
    "path": "classification/README.md",
    "chars": 10168,
    "preview": "# InternViT-6B for Image Classification\n\nThis folder contains the implementation of the InternViT-6B for image classific"
  },
  {
    "path": "classification/config.py",
    "chars": 10125,
    "preview": "# --------------------------------------------------------\n# InternVL\n# Copyright (c) 2022 OpenGVLab\n# Licensed under Th"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_224px_in1k_224_64gpu.yaml",
    "chars": 739,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  TRANSFORM: 'build_transform_for_linear_probe'\n  "
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_a.yaml",
    "chars": 762,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_a'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_r.yaml",
    "chars": 762,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_r'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_real.yaml",
    "chars": 766,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet-real'\n  TRANSFORM: 'build_tra"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_sketch.yaml",
    "chars": 772,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_sketch'\n  TRANSFORM: 'build_t"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenetv2.yaml",
    "chars": 762,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenetv2'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_224px_in1k_224to448_64gpu.yaml",
    "chars": 755,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  TRANSFORM: 'build_transform_for_linear_probe'\n  "
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_a.yaml",
    "chars": 778,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_a'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_r.yaml",
    "chars": 778,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_r'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_real.yaml",
    "chars": 782,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet-real'\n  TRANSFORM: 'build_tra"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_sketch.yaml",
    "chars": 788,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_sketch'\n  TRANSFORM: 'build_t"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenetv2.yaml",
    "chars": 778,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenetv2'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu.yaml",
    "chars": 760,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  TRANSFORM: 'build_transform_for_linear_probe'\n  "
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_a.yaml",
    "chars": 783,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_a'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_r.yaml",
    "chars": 783,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_r'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_real.yaml",
    "chars": 787,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet-real'\n  TRANSFORM: 'build_tra"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_sketch.yaml",
    "chars": 793,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_sketch'\n  TRANSFORM: 'build_t"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenetv2.yaml",
    "chars": 783,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenetv2'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu.yaml",
    "chars": 760,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  TRANSFORM: 'build_transform_for_linear_probe'\n  "
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_a.yaml",
    "chars": 783,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_a'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_r.yaml",
    "chars": 783,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_r'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_real.yaml",
    "chars": 787,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet-real'\n  TRANSFORM: 'build_tra"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_sketch.yaml",
    "chars": 793,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_sketch'\n  TRANSFORM: 'build_t"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenetv2.yaml",
    "chars": 783,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenetv2'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu.yaml",
    "chars": 760,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  TRANSFORM: 'build_transform_for_linear_probe'\n  "
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenet_a.yaml",
    "chars": 783,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_a'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenet_r.yaml",
    "chars": 783,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_r'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenet_real.yaml",
    "chars": 787,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet-real'\n  TRANSFORM: 'build_tra"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenet_sketch.yaml",
    "chars": 793,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_sketch'\n  TRANSFORM: 'build_t"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenetv2.yaml",
    "chars": 783,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenetv2'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu.yaml",
    "chars": 760,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  TRANSFORM: 'build_transform_for_linear_probe'\n  "
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenet_a.yaml",
    "chars": 783,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_a'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenet_r.yaml",
    "chars": 783,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_r'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenet_real.yaml",
    "chars": 787,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet-real'\n  TRANSFORM: 'build_tra"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenet_sketch.yaml",
    "chars": 793,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_sketch'\n  TRANSFORM: 'build_t"
  },
  {
    "path": "classification/configs/attn_pooling_probing/attn_pooling_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenetv2.yaml",
    "chars": 783,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenetv2'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/intern_vit_6b_1k_224.yaml",
    "chars": 716,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 128\n  TRANSFORM: 'build_transform_for_linear_probe'\n  DATA_PATH: './data/imag"
  },
  {
    "path": "classification/configs/intern_vit_6b_1k_224_test_imagenet_a.yaml",
    "chars": 739,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 128\n  DATASET: 'imagenet_a'\n  TRANSFORM: 'build_transform_for_linear_probe'\n "
  },
  {
    "path": "classification/configs/intern_vit_6b_1k_224_test_imagenet_r.yaml",
    "chars": 739,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 128\n  DATASET: 'imagenet_r'\n  TRANSFORM: 'build_transform_for_linear_probe'\n "
  },
  {
    "path": "classification/configs/intern_vit_6b_1k_224_test_imagenet_real.yaml",
    "chars": 743,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 128\n  DATASET: 'imagenet-real'\n  TRANSFORM: 'build_transform_for_linear_probe"
  },
  {
    "path": "classification/configs/intern_vit_6b_1k_224_test_imagenet_sketch.yaml",
    "chars": 749,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 128\n  DATASET: 'imagenet_sketch'\n  TRANSFORM: 'build_transform_for_linear_pro"
  },
  {
    "path": "classification/configs/intern_vit_6b_1k_224_test_imagenetv2.yaml",
    "chars": 739,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 128\n  DATASET: 'imagenetv2'\n  TRANSFORM: 'build_transform_for_linear_probe'\n "
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_224px_in1k_224_64gpu.yaml",
    "chars": 738,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  TRANSFORM: 'build_transform_for_linear_probe'\n  "
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_a.yaml",
    "chars": 761,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_a'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_r.yaml",
    "chars": 761,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_r'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_real.yaml",
    "chars": 765,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet-real'\n  TRANSFORM: 'build_tra"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenet_sketch.yaml",
    "chars": 771,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_sketch'\n  TRANSFORM: 'build_t"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_224px_in1k_224_64gpu_imagenetv2.yaml",
    "chars": 761,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenetv2'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_224px_in1k_224to448_64gpu.yaml",
    "chars": 754,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  TRANSFORM: 'build_transform_for_linear_probe'\n  "
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_a.yaml",
    "chars": 777,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_a'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_r.yaml",
    "chars": 777,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_r'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_real.yaml",
    "chars": 781,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet-real'\n  TRANSFORM: 'build_tra"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenet_sketch.yaml",
    "chars": 787,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_sketch'\n  TRANSFORM: 'build_t"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_224px_in1k_224to448_64gpu_imagenetv2.yaml",
    "chars": 777,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenetv2'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu.yaml",
    "chars": 759,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  TRANSFORM: 'build_transform_for_linear_probe'\n  "
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_a.yaml",
    "chars": 782,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_a'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_r.yaml",
    "chars": 782,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_r'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_real.yaml",
    "chars": 786,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet-real'\n  TRANSFORM: 'build_tra"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenet_sketch.yaml",
    "chars": 792,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_sketch'\n  TRANSFORM: 'build_t"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_448px_v1_0_in1k_448_64gpu_imagenetv2.yaml",
    "chars": 782,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenetv2'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu.yaml",
    "chars": 759,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  TRANSFORM: 'build_transform_for_linear_probe'\n  "
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_a.yaml",
    "chars": 782,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_a'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_r.yaml",
    "chars": 782,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_r'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_real.yaml",
    "chars": 786,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet-real'\n  TRANSFORM: 'build_tra"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenet_sketch.yaml",
    "chars": 792,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_sketch'\n  TRANSFORM: 'build_t"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_448px_v1_2_in1k_448_64gpu_imagenetv2.yaml",
    "chars": 782,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenetv2'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu.yaml",
    "chars": 759,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  TRANSFORM: 'build_transform_for_linear_probe'\n  "
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenet_a.yaml",
    "chars": 782,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_a'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenet_r.yaml",
    "chars": 782,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_r'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenet_real.yaml",
    "chars": 786,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet-real'\n  TRANSFORM: 'build_tra"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenet_sketch.yaml",
    "chars": 792,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_sketch'\n  TRANSFORM: 'build_t"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_448px_v1_5_in1k_448_64gpu_imagenetv2.yaml",
    "chars": 782,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenetv2'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu.yaml",
    "chars": 759,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  TRANSFORM: 'build_transform_for_linear_probe'\n  "
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenet_a.yaml",
    "chars": 782,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_a'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenet_r.yaml",
    "chars": 782,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_r'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenet_real.yaml",
    "chars": 786,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet-real'\n  TRANSFORM: 'build_tra"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenet_sketch.yaml",
    "chars": 792,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenet_sketch'\n  TRANSFORM: 'build_t"
  },
  {
    "path": "classification/configs/linear_probing/linear_probing_intern_vit_6b_448px_v2_5_in1k_448_64gpu_imagenetv2.yaml",
    "chars": 782,
    "preview": "DATA:\n  IMG_ON_MEMORY: False\n  BATCH_SIZE: 16 # single GPU batch size\n  DATASET: 'imagenetv2'\n  TRANSFORM: 'build_transf"
  },
  {
    "path": "classification/dataset/__init__.py",
    "chars": 267,
    "preview": "# --------------------------------------------------------\n# InternVL\n# Copyright (c) 2023 OpenGVLab\n# Licensed under Th"
  },
  {
    "path": "classification/dataset/build.py",
    "chars": 13228,
    "preview": "# --------------------------------------------------------\n# InternVL\n# Copyright (c) 2023 OpenGVLab\n# Licensed under Th"
  },
  {
    "path": "classification/dataset/cached_image_folder.py",
    "chars": 18325,
    "preview": "# --------------------------------------------------------\n# InternVL\n# Copyright (c) 2023 OpenGVLab\n# Licensed under Th"
  },
  {
    "path": "classification/dataset/imagenet_a_r_indices.py",
    "chars": 19652,
    "preview": "\"\"\"Code from https://github.com/baaivision/EVA/blob/master/EVA-02/asuka/imagenet_a_r_indices.py\nThanks to the authors of"
  },
  {
    "path": "classification/dataset/imagenet_real.py",
    "chars": 2025,
    "preview": "# --------------------------------------------------------\n# EVA: Exploring the Limits of Masked Visual Representation L"
  },
  {
    "path": "classification/dataset/imagenetv2.py",
    "chars": 2865,
    "preview": "\"\"\"Code from https://github.com/mlfoundations/wise-ft/blob/master/src/datasets/imagenetv2.py\nThanks to the authors of wi"
  },
  {
    "path": "classification/dataset/samplers.py",
    "chars": 3858,
    "preview": "# --------------------------------------------------------\n# InternVL\n# Copyright (c) 2023 OpenGVLab\n# Licensed under Th"
  },
  {
    "path": "classification/dataset/zipreader.py",
    "chars": 3292,
    "preview": "# --------------------------------------------------------\n# InternVL\n# Copyright (c) 2023 OpenGVLab\n# Licensed under Th"
  },
  {
    "path": "classification/ddp_hooks.py",
    "chars": 7618,
    "preview": "# --------------------------------------------------------\n# InternVL\n# Copyright (c) 2022 OpenGVLab\n# Licensed under Th"
  },
  {
    "path": "classification/gflops.py",
    "chars": 4049,
    "preview": "# --------------------------------------------------------\n# InternVL\n# Copyright (c) 2023 OpenGVLab\n# Licensed under Th"
  },
  {
    "path": "classification/hf2pytorch.py",
    "chars": 2869,
    "preview": "import argparse\nimport os\n\nimport torch\nfrom safetensors.torch import load_file as safetensors_load_file\n\n# Parse comman"
  },
  {
    "path": "classification/logger.py",
    "chars": 1483,
    "preview": "# --------------------------------------------------------\n# InternVL\n# Copyright (c) 2022 OpenGVLab\n# Licensed under Th"
  },
  {
    "path": "classification/lr_scheduler.py",
    "chars": 3633,
    "preview": "# --------------------------------------------------------\n# InternVL\n# Copyright (c) 2022 OpenGVLab\n# Licensed under Th"
  },
  {
    "path": "classification/main.py",
    "chars": 30523,
    "preview": "# --------------------------------------------------------\n# InternVL\n# Copyright (c) 2023 OpenGVLab\n# Licensed under Th"
  },
  {
    "path": "classification/meta_data/22k_class_to_idx.json",
    "chars": 425711,
    "preview": "{\"n00004475\": 0, \"n00005787\": 1, \"n00006024\": 2, \"n00006484\": 3, \"n00007846\": 4, \"n00015388\": 5, \"n00017222\": 6, \"n00021"
  },
  {
    "path": "classification/meta_data/imagenet_classes.json",
    "chars": 21895,
    "preview": "{\n    \"n01440764\": 0,\n    \"n01443537\": 1,\n    \"n01484850\": 2,\n    \"n01491361\": 3,\n    \"n01494475\": 4,\n    \"n01496331\": 5"
  },
  {
    "path": "classification/meta_data/map22kto1k.txt",
    "chars": 5193,
    "preview": "359\n368\n460\n475\n486\n492\n496\n514\n516\n525\n547\n548\n556\n563\n575\n641\n648\n723\n733\n765\n801\n826\n852\n858\n878\n896\n900\n905\n908\n910\n"
  },
  {
    "path": "classification/meta_data/real.json",
    "chars": 388479,
    "preview": "[[], [970, 795], [230, 231], [809], [516, 850], [57], [334], [700], [674], [332], [109], [286], [370], [757], [595], [14"
  },
  {
    "path": "classification/models/__init__.py",
    "chars": 251,
    "preview": "# --------------------------------------------------------\n# InternVL\n# Copyright (c) 2023 OpenGVLab\n# Licensed under Th"
  },
  {
    "path": "classification/models/build.py",
    "chars": 2258,
    "preview": "# --------------------------------------------------------\n# InternVL\n# Copyright (c) 2023 OpenGVLab\n# Licensed under Th"
  },
  {
    "path": "classification/models/clip_vit.py",
    "chars": 6934,
    "preview": "# --------------------------------------------------------\n# InternVL\n# Copyright (c) 2024 OpenGVLab\n# Licensed under Th"
  },
  {
    "path": "classification/models/flash_attention.py",
    "chars": 3370,
    "preview": "import torch\nimport torch.nn as nn\nfrom einops import rearrange\n\ntry:  # v1\n    from flash_attn.flash_attn_interface imp"
  },
  {
    "path": "classification/models/intern_vit_6b.py",
    "chars": 18837,
    "preview": "# --------------------------------------------------------\n# InternVL\n# Copyright (c) 2023 OpenGVLab\n# Licensed under Th"
  },
  {
    "path": "classification/optimizer.py",
    "chars": 5904,
    "preview": "# --------------------------------------------------------\n# InternVL\n# Copyright (c) 2022 OpenGVLab\n# Licensed under Th"
  },
  {
    "path": "classification/train_in1k.sh",
    "chars": 613,
    "preview": "#!/usr/bin/env bash\n\nset -x\n\nPARTITION=$1\nJOB_NAME=$2\nCONFIG=$3\nGPUS=${GPUS:-8}\nGPUS_PER_NODE=${GPUS_PER_NODE:-8}\nCPUS_P"
  },
  {
    "path": "classification/utils.py",
    "chars": 15597,
    "preview": "# --------------------------------------------------------\n# InternVL\n# Copyright (c) 2022 OpenGVLab\n# Licensed under Th"
  },
  {
    "path": "classification/work_dirs/intern_vit_6b_1k_224/log_rank0.txt",
    "chars": 535243,
    "preview": "[2023-11-09 22:22:39 intern_vit_6b_1k_224_cls_patch_sgd_lr0.1] (main.py 663): INFO Full config saved to work_dirs/intern"
  },
  {
    "path": "clip_benchmark/AUTHORS.rst",
    "chars": 122,
    "preview": "=======\nCredits\n=======\n\n* `Mehdi Cherti <https://github.com/mehdidc>`_\n* `Romain Beaumont <https://github.com/rom1504>`"
  },
  {
    "path": "clip_benchmark/CONTRIBUTING.rst",
    "chars": 3607,
    "preview": ".. highlight:: shell\n\n============\nContributing\n============\n\nContributions are welcome, and they are greatly appreciate"
  },
  {
    "path": "clip_benchmark/HISTORY.rst",
    "chars": 852,
    "preview": "## History\n\n### 1.4.0\n\n* Fix silent webdataset error-handling\n* Added support for wds/voc2007_multilabel\n* default to fl"
  },
  {
    "path": "clip_benchmark/LICENSE",
    "chars": 1070,
    "preview": "MIT License\n\nCopyright (c) 2022, Mehdi Cherti\n\nPermission is hereby granted, free of charge, to any person obtaining a c"
  },
  {
    "path": "clip_benchmark/MANIFEST.in",
    "chars": 289,
    "preview": "include AUTHORS.rst\ninclude CONTRIBUTING.rst\ninclude HISTORY.rst\ninclude LICENSE\ninclude README.rst\n\nrecursive-include t"
  },
  {
    "path": "clip_benchmark/Makefile",
    "chars": 2480,
    "preview": ".PHONY: clean clean-build clean-pyc clean-test coverage dist docs help install lint lint/flake8\n.DEFAULT_GOAL := help\n\nd"
  },
  {
    "path": "clip_benchmark/README.md",
    "chars": 58761,
    "preview": "# InternVL for Zero-Shot Image Classification & Image-Text Retrieval\n\nThis folder contains the implementation of InternV"
  },
  {
    "path": "clip_benchmark/benchmark/README.md",
    "chars": 1988,
    "preview": "# Benchmark\n\nthe benchmark results are available in [benchmark.csv](benchmark.csv).\nYou can visualize the results in the"
  },
  {
    "path": "clip_benchmark/benchmark/benchmark.csv",
    "chars": 68985,
    "preview": "acc1,acc5,mean_per_class_recall,dataset,model,pretrained,task,mean_average_precision,image_retrieval_recall@5,text_retri"
  },
  {
    "path": "clip_benchmark/benchmark/dataset_type.csv",
    "chars": 1026,
    "preview": "dataset,type\nimagenet1k,natural\nimagenetv2,natural\nimagenet-r,natural\nimagenet_sketch,specialized\nobjectnet,natural\nimag"
  },
  {
    "path": "clip_benchmark/benchmark/datasets.txt",
    "chars": 582,
    "preview": "mscoco_captions\nflickr8k\nflickr30k\nimagenet1k\nimagenetv2\nimagenet_sketch\nimagenet-a\nimagenet-r\nobjectnet\nfer2013\nvoc2007"
  },
  {
    "path": "clip_benchmark/benchmark/datasets_multilingual.txt",
    "chars": 326,
    "preview": "multilingual_mscoco_captions,es\nmultilingual_mscoco_captions,it\nmultilingual_mscoco_captions,ko\nmultilingual_mscoco_capt"
  },
  {
    "path": "clip_benchmark/benchmark/models.txt",
    "chars": 308,
    "preview": "ViT-B-32,openai\nViT-B-16,openai\nViT-L-14,openai\nViT-L-14-336,openai\nViT-B-32-quickgelu,laion400m_e32\nViT-B-32,laion2b_e1"
  },
  {
    "path": "clip_benchmark/benchmark/results.ipynb",
    "chars": 1892068,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 1,\n   \"id\": \"b3edae65-ec1c-4318-b825-1ea20cea1f3c\",\n   \""
  },
  {
    "path": "clip_benchmark/benchmark/webdatasets.txt",
    "chars": 788,
    "preview": "wds/mscoco_captions\nwds/flickr8k\nwds/flickr30k\nwds/imagenet1k\nwds/imagenetv2\nwds/imagenet_sketch\nwds/imagenet-a\nwds/imag"
  },
  {
    "path": "clip_benchmark/clip_benchmark/__init__.py",
    "chars": 135,
    "preview": "\"\"\"Top-level package for CLIP Benchmark.\"\"\"\n\n__author__ = \"\"\"Mehdi Cherti\"\"\"\n__email__ = 'mehdicherti@gmail.com'\n__versi"
  },
  {
    "path": "clip_benchmark/clip_benchmark/cli.py",
    "chars": 16207,
    "preview": "\"\"\"Console script for clip_benchmark.\"\"\"\nimport argparse\nimport csv\nimport json\nimport os\nimport sys\nfrom copy import co"
  },
  {
    "path": "clip_benchmark/clip_benchmark/datasets/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "clip_benchmark/clip_benchmark/datasets/ar_classnames.json",
    "chars": 76069,
    "preview": "{\n  \"imagenet1k\": [\n    \"\\u0633\\u0645\\u0643 \\u0627\\u0644\\u062a\\u0646\\u0634\",\n    \"\\u0627\\u0644\\u0633\\u0645\\u0643\\u0629 \\"
  },
  {
    "path": "clip_benchmark/clip_benchmark/datasets/ar_zeroshot_classification_templates.json",
    "chars": 4623,
    "preview": "{\n  \"imagenet1k\": [\n    \"{c}\",\n    \"\\u0635\\u0648\\u0631\\u0629 \\u0633\\u064a\\u0626\\u0629 \\u0644\\u0640 {c}\",\n    \"\\u0635\\u06"
  },
  {
    "path": "clip_benchmark/clip_benchmark/datasets/birdsnap.py",
    "chars": 10006,
    "preview": "import concurrent.futures\nimport csv\nimport hashlib\nimport os\nfrom pathlib import Path\n\nimport torch\nfrom PIL import Ima"
  },
  {
    "path": "clip_benchmark/clip_benchmark/datasets/builder.py",
    "chars": 65146,
    "preview": "import json\nimport os\nimport sys\nimport warnings\nfrom subprocess import call\n\nimport torch\nfrom torch.utils.data import "
  },
  {
    "path": "clip_benchmark/clip_benchmark/datasets/caltech101.py",
    "chars": 9151,
    "preview": "\"\"\"\nCode adapted from https://github.com/pytorch/vision/blob/main/torchvision/datasets/caltech.py\nModification of caltec"
  },
  {
    "path": "clip_benchmark/clip_benchmark/datasets/cn_classnames.json",
    "chars": 27709,
    "preview": "{\n  \"imagenet1k\": [\n    \"\\u4e01\\u9cb7\",\n    \"\\u91d1\\u9c7c\",\n    \"\\u5927\\u767d\\u9ca8\",\n    \"\\u864e\\u9ca8\",\n    \"\\u9524\\u5"
  },
  {
    "path": "clip_benchmark/clip_benchmark/datasets/cn_zeroshot_classification_templates.json",
    "chars": 3747,
    "preview": "{\n  \"imagenet1k\": [\n    \"{c}\\u7684\\u7167\\u7247\\u3002\",\n    \"\\u8d28\\u91cf\\u5dee\\u7684{c}\\u7684\\u7167\\u7247\\u3002\",\n    \"\\"
  },
  {
    "path": "clip_benchmark/clip_benchmark/datasets/cupl_prompts.json",
    "chars": 10172605,
    "preview": "{\n    \"imagenet1k\": {\n        \"tench\": [\n            \"A tench is a freshwater fish of the carp family.\",\n            \"A "
  },
  {
    "path": "clip_benchmark/clip_benchmark/datasets/en_classnames.json",
    "chars": 32702,
    "preview": "{\n  \"flowers\": [\n    \"pink primrose\",\n    \"hard-leaved pocket orchid\",\n    \"canterbury bells\",\n    \"sweet pea\",\n    \"eng"
  },
  {
    "path": "clip_benchmark/clip_benchmark/datasets/en_zeroshot_classification_templates.json",
    "chars": 7927,
    "preview": "{\n  \"cifar10\": [\n    \"a photo of a {c}.\",\n    \"a blurry photo of a {c}.\",\n    \"a black and white photo of a {c}.\",\n    \""
  },
  {
    "path": "clip_benchmark/clip_benchmark/datasets/flickr.py",
    "chars": 1820,
    "preview": "\"\"\"\nAdapted from https://github.com/pytorch/vision/blob/main/torchvision/datasets/flickr.py\nThanks to the authors of tor"
  },
  {
    "path": "clip_benchmark/clip_benchmark/datasets/imagenetv2.py",
    "chars": 4628,
    "preview": "\"\"\"\nCode from https://github.com/mlfoundations/wise-ft/blob/master/src/datasets/imagenetv2.py\nThanks to the authors of w"
  },
  {
    "path": "clip_benchmark/clip_benchmark/datasets/it_classnames.json",
    "chars": 23002,
    "preview": "{\n  \"imagenet1k\": [\n    \"una tinca\",\n    \"un pesce rosso\",\n    \"un grande squalo bianco\",\n    \"uno squalo tigre\",\n    \"u"
  },
  {
    "path": "clip_benchmark/clip_benchmark/datasets/it_zeroshot_classification_templates.json",
    "chars": 1479,
    "preview": "{\n  \"imagenet1k\": [\n    \"una brutta foto di {c}\",\n    \"una scultura di {c}\",\n    \"una foto di {c} difficilmente visibile"
  },
  {
    "path": "clip_benchmark/clip_benchmark/datasets/jp_classnames.json",
    "chars": 38852,
    "preview": "{\n  \"imagenet1k\": [\n    \"\\u30c6\\u30f3\\u30c1\",\n    \"\\u91d1\\u9b5a\",\n    \"\\u30db\\u30db\\u30b8\\u30ed\\u30b6\\u30e1\",\n    \"\\u30a"
  },
  {
    "path": "clip_benchmark/clip_benchmark/datasets/jp_zeroshot_classification_templates.json",
    "chars": 1714,
    "preview": "{\n  \"imagenet1k\": [\n    \"{c}\\u306e\\u60aa\\u3044\\u5199\\u771f\",\n    \"\\u591a\\u304f\\u306e{c}\\u306e\\u5199\\u771f\",\n    \"{c}\\u30"
  },
  {
    "path": "clip_benchmark/clip_benchmark/datasets/kitti.py",
    "chars": 7109,
    "preview": "# Copyright 2019 Google LLC.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this "
  },
  {
    "path": "clip_benchmark/clip_benchmark/datasets/multilingual_mscoco.py",
    "chars": 3991,
    "preview": "import json\nimport os\nfrom subprocess import call\n\nfrom PIL import Image\nfrom torchvision.datasets import VisionDataset\n"
  },
  {
    "path": "clip_benchmark/clip_benchmark/datasets/objectnet.py",
    "chars": 2648,
    "preview": "\"\"\"\nCode adapted from https://github.com/mlfoundations/wise-ft/blob/master/src/datasets/objectnet.py\nThanks to the autho"
  },
  {
    "path": "clip_benchmark/clip_benchmark/datasets/tfds.py",
    "chars": 1939,
    "preview": "import torch\nfrom PIL import Image\n\n\ndef download_tfds_dataset(name, data_dir=None):\n    import tensorflow_datasets as t"
  },
  {
    "path": "clip_benchmark/clip_benchmark/datasets/tools.py",
    "chars": 651,
    "preview": "import re\n\n\ndef process_single_caption(caption, max_words=50):\n    caption = re.sub(r\"([.!\\\"()*#:;~])\", ' ', caption.low"
  },
  {
    "path": "clip_benchmark/clip_benchmark/datasets/voc2007.py",
    "chars": 8861,
    "preview": "# Code from https://github.com/SsnL/dataset-distillation/blob/master/datasets/pascal_voc.py , thanks to the authors\n\"\"\"D"
  },
  {
    "path": "clip_benchmark/clip_benchmark/metrics/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "clip_benchmark/clip_benchmark/metrics/linear_probe.py",
    "chars": 9494,
    "preview": "import os\nimport time\nfrom contextlib import suppress\n\nimport numpy as np\nimport torch\nimport torch.nn.functional as F\nf"
  },
  {
    "path": "clip_benchmark/clip_benchmark/metrics/mscoco_generative.py",
    "chars": 1261,
    "preview": "import json\n\nfrom open_clip.tokenizer import _tokenizer\nfrom pycocoevalcap.eval import COCOEvalCap\nfrom tqdm.auto import"
  },
  {
    "path": "clip_benchmark/clip_benchmark/metrics/zeroshot_classification.py",
    "chars": 7564,
    "preview": "\"\"\"\nCode adapated from https://github.com/mlfoundations/open_clip/blob/main/src/training/zero_shot.py\nThanks to the auth"
  },
  {
    "path": "clip_benchmark/clip_benchmark/metrics/zeroshot_retrieval.py",
    "chars": 5662,
    "preview": "from contextlib import suppress\n\nimport torch\nimport torch.nn.functional as F\nfrom tqdm import tqdm\n\n\ndef evaluate(model"
  },
  {
    "path": "clip_benchmark/clip_benchmark/model_collection.py",
    "chars": 935,
    "preview": "import open_clip\n\n\ndef get_model_collection_from_file(path):\n    return [l.strip().split(',') for l in open(path).readli"
  },
  {
    "path": "clip_benchmark/clip_benchmark/models/__init__.py",
    "chars": 761,
    "preview": "from typing import Union\n\nimport torch\n\nfrom .internvl import load_internvl\nfrom .japanese_clip import load_japanese_cli"
  },
  {
    "path": "clip_benchmark/clip_benchmark/models/intern_vit_6b/configuration_intern_vit.py",
    "chars": 5478,
    "preview": "# --------------------------------------------------------\n# InternVL\n# Copyright (c) 2023 OpenGVLab\n# Licensed under Th"
  },
  {
    "path": "clip_benchmark/clip_benchmark/models/intern_vit_6b/flash_attention.py",
    "chars": 3370,
    "preview": "import torch\nimport torch.nn as nn\nfrom einops import rearrange\n\ntry:  # v1\n    from flash_attn.flash_attn_interface imp"
  },
  {
    "path": "clip_benchmark/clip_benchmark/models/intern_vit_6b/modeling_intern_vit.py",
    "chars": 13993,
    "preview": "# --------------------------------------------------------\n# InternVL\n# Copyright (c) 2023 OpenGVLab\n# Licensed under Th"
  },
  {
    "path": "clip_benchmark/clip_benchmark/models/internvl.py",
    "chars": 1238,
    "preview": "# --------------------------------------------------------\n# InternVL\n# Copyright (c) 2023 OpenGVLab\n# Licensed under Th"
  },
  {
    "path": "clip_benchmark/clip_benchmark/models/internvl_c_pytorch/__init__.py",
    "chars": 2769,
    "preview": "# --------------------------------------------------------\n# InternVL\n# Copyright (c) 2023 OpenGVLab\n# Licensed under Th"
  },
  {
    "path": "clip_benchmark/clip_benchmark/models/internvl_c_pytorch/chinese_alpaca_lora_7b/config.json",
    "chars": 538,
    "preview": "{\n  \"architectures\": [\n    \"LlamaForCausalLM\"\n  ],\n  \"bos_token_id\": 1,\n  \"eos_token_id\": 2,\n  \"hidden_act\": \"silu\",\n  \""
  },
  {
    "path": "clip_benchmark/clip_benchmark/models/internvl_c_pytorch/chinese_alpaca_lora_7b/generation_config.json",
    "chars": 137,
    "preview": "{\n  \"_from_model_config\": true,\n  \"bos_token_id\": 1,\n  \"eos_token_id\": 2,\n  \"pad_token_id\": 0,\n  \"transformers_version\":"
  },
  {
    "path": "clip_benchmark/clip_benchmark/models/internvl_c_pytorch/chinese_alpaca_lora_7b/pytorch_model.bin.index.json",
    "chars": 26788,
    "preview": "{\n  \"metadata\": {\n    \"total_size\": 13770997760\n  },\n  \"weight_map\": {\n    \"lm_head.weight\": \"pytorch_model-00002-of-000"
  },
  {
    "path": "clip_benchmark/clip_benchmark/models/internvl_c_pytorch/chinese_alpaca_lora_7b/special_tokens_map.json",
    "chars": 96,
    "preview": "{\n  \"bos_token\": \"<s>\",\n  \"eos_token\": \"</s>\",\n  \"pad_token\": \"[PAD]\",\n  \"unk_token\": \"<unk>\"\n}\n"
  },
  {
    "path": "clip_benchmark/clip_benchmark/models/internvl_c_pytorch/chinese_alpaca_lora_7b/tokenizer_config.json",
    "chars": 806,
    "preview": "{\n  \"add_bos_token\": true,\n  \"add_eos_token\": false,\n  \"bos_token\": {\n    \"__type\": \"AddedToken\",\n    \"content\": \"<s>\",\n"
  },
  {
    "path": "clip_benchmark/clip_benchmark/models/internvl_c_pytorch/flash_attention.py",
    "chars": 3459,
    "preview": "# https://github.com/Dao-AILab/flash-attention/blob/v0.2.8/flash_attn/flash_attention.py\nimport torch\nimport torch.nn as"
  },
  {
    "path": "clip_benchmark/clip_benchmark/models/internvl_c_pytorch/internvl_c.py",
    "chars": 16066,
    "preview": "# --------------------------------------------------------\n# InternVL\n# Copyright (c) 2023 OpenGVLab\n# Licensed under Th"
  },
  {
    "path": "clip_benchmark/clip_benchmark/models/internvl_huggingface/__init__.py",
    "chars": 3596,
    "preview": "# --------------------------------------------------------\n# InternVL\n# Copyright (c) 2023 OpenGVLab\n# Licensed under Th"
  },
  {
    "path": "clip_benchmark/clip_benchmark/models/internvl_huggingface/configuration_intern_vit.py",
    "chars": 5478,
    "preview": "# --------------------------------------------------------\n# InternVL\n# Copyright (c) 2023 OpenGVLab\n# Licensed under Th"
  },
  {
    "path": "clip_benchmark/clip_benchmark/models/internvl_huggingface/configuration_internvl.py",
    "chars": 4802,
    "preview": "# --------------------------------------------------------\n# InternVL\n# Copyright (c) 2023 OpenGVLab\n# Licensed under Th"
  },
  {
    "path": "clip_benchmark/clip_benchmark/models/internvl_huggingface/flash_attention.py",
    "chars": 3459,
    "preview": "# https://github.com/Dao-AILab/flash-attention/blob/v0.2.8/flash_attn/flash_attention.py\nimport torch\nimport torch.nn as"
  },
  {
    "path": "clip_benchmark/clip_benchmark/models/internvl_huggingface/modeling_intern_vit.py",
    "chars": 13993,
    "preview": "# --------------------------------------------------------\n# InternVL\n# Copyright (c) 2023 OpenGVLab\n# Licensed under Th"
  },
  {
    "path": "clip_benchmark/clip_benchmark/models/internvl_huggingface/modeling_internvl.py",
    "chars": 21163,
    "preview": "# --------------------------------------------------------\n# InternVL\n# Copyright (c) 2023 OpenGVLab\n# Licensed under Th"
  },
  {
    "path": "clip_benchmark/clip_benchmark/models/internvl_huggingface/modeling_qllama.py",
    "chars": 48012,
    "preview": "# Copyright 2022 EleutherAI and the HuggingFace Inc. team. All rights reserved.\n#\n# This code is based on EleutherAI's G"
  },
  {
    "path": "clip_benchmark/clip_benchmark/models/japanese_clip.py",
    "chars": 1710,
    "preview": "from typing import Dict\n\nimport torch\n\n\nclass DictTensor:\n    \"\"\"\n    enable to do `tokenizer(texts).to(device)`\n    \"\"\""
  },
  {
    "path": "clip_benchmark/clip_benchmark/models/open_clip.py",
    "chars": 410,
    "preview": "import open_clip\n\n\ndef load_open_clip(model_name: str = 'ViT-B-32-quickgelu', pretrained: str = 'laion400m_e32', cache_d"
  },
  {
    "path": "clip_benchmark/clip_benchmark/webdataset_builder.py",
    "chars": 11155,
    "preview": "# Convert CLIP_benchmark datasets to webdataset format\n\nimport argparse\nimport io\nimport os\nimport sys\n\nimport torch\nimp"
  },
  {
    "path": "clip_benchmark/data/birdsnap/test_images_valid.txt",
    "chars": 51030,
    "preview": "path\nCoopers_Hawk/0561.jpg\nCoopers_Hawk/0629.jpg\nCoopers_Hawk/0717.jpg\nCoopers_Hawk/1847.jpg\nNorthern_Goshawk/2629.jpg\nN"
  },
  {
    "path": "clip_benchmark/data/flickr30k/flickr30k_cn_test.txt",
    "chars": 179306,
    "preview": "image,caption\n1009692167.jpg,在警车前，一条训练有素的警犬坐在它的警官身旁。\n1009692167.jpg,一名警察站着，身边有一只德国牧羊犬\n1009692167.jpg,一位安保人员带着他的狗正在寻找某些东西"
  },
  {
    "path": "clip_benchmark/data/mscoco_captions/coco-cn_test.json",
    "chars": 231071,
    "preview": "{\"images\": [{\"id\": 573854, \"file_name\": \"train2014/COCO_train2014_000000573854.jpg\"}, {\"id\": 412975, \"file_name\": \"val20"
  },
  {
    "path": "clip_benchmark/probe_benchmark/PROBES.md",
    "chars": 565,
    "preview": "Steps to run.\n\n1. Navigate to `CLIP_benchmark`.\n2. Run `export PYTHONPATH=$PWD`.\n3. (Optional) To re-run the experiments"
  },
  {
    "path": "clip_benchmark/probe_benchmark/build_df_scaling_experiments.py",
    "chars": 5292,
    "preview": "import json\nimport os\n\nimport pandas as pd\n\nif __name__ == '__main__':\n\n    compute_df = pd.read_csv('probe_benchmark/cl"
  },
  {
    "path": "clip_benchmark/probe_benchmark/clip_table_2.csv",
    "chars": 2063,
    "preview": "model,image_size,image_width,text_width,embed_dim,gmacs,macts,mparams,image_gmacs,image_macts,image_mparams,text_gmacs,t"
  },
  {
    "path": "clip_benchmark/probe_benchmark/generate_table.py",
    "chars": 2383,
    "preview": "import pandas as pd\n\n# make a new version of vtab\n\nif __name__ == '__main__':\n    df_full = pd.read_json('probe_benchmar"
  },
  {
    "path": "clip_benchmark/probe_benchmark/laion5b_fewshot_experiments.py",
    "chars": 2376,
    "preview": "import os\n\nfrom clip_benchmark.cli import get_parser_args, run\n\n# /private/home/mitchellw/miniconda3/envs/cb/bin/python "
  },
  {
    "path": "clip_benchmark/probe_benchmark/openclip_results.csv",
    "chars": 274304,
    "preview": "model_fullname,arch,samples_seen,gmacs_total,gmacs,upstream_dataset,downstream_dataset,acc1,acc5,mean_per_class_recall,i"
  },
  {
    "path": "clip_benchmark/probe_benchmark/process_vtab.py",
    "chars": 1440,
    "preview": "import json\n\nimport pandas as pd\n\n# make a new version of vtab\n\nif __name__ == '__main__':\n    df = pd.read_json('probe_"
  },
  {
    "path": "clip_benchmark/probe_benchmark/scaling_experiment_data2.json",
    "chars": 2818083,
    "preview": "[\n  {\n    \"k\": 10,\n    \"lr\": 0.1,\n    \"bs\": 256,\n    \"epochs\": 10,\n    \"model\": \"ViT-B-32\",\n    \"pretrained\": \"laion400m"
  },
  {
    "path": "clip_benchmark/probe_benchmark/scaling_experiment_data_vtab.json",
    "chars": 3043,
    "preview": "[\n  {\n    \"dataset\": \"vtab\",\n    \"lp_acc1\": 0.7272385796110142,\n    \"fewshot_k\": -1,\n    \"model\": \"ViT-B-16\",\n    \"pretr"
  }
]

// ... and 664 more files (download for full content)

About this extraction

This page contains the full source code of the OpenGVLab/InternVL GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 864 files (28.3 MB), approximately 7.5M tokens, and a symbol index with 2771 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo