Full Code of shenyunhang/APE for AI

main 8c4920e2014d cached

501 files

5.7 MB

1.5M tokens

1314 symbols

1 requests

Download .txt

Showing preview only (6,120K chars total). Download the full file or copy to clipboard to get everything.

Repository: shenyunhang/APE
Branch: main
Commit: 8c4920e2014d
Files: 501
Total size: 5.7 MB

Directory structure:
gitextract_oqcovuov/

├── .gitignore
├── LICENSE
├── README.md
├── ape/
│   ├── __init__.py
│   ├── checkpoint/
│   │   ├── __init__.py
│   │   └── detection_checkpoint.py
│   ├── data/
│   │   ├── __init__.py
│   │   ├── build.py
│   │   ├── build_copypaste.py
│   │   ├── build_multi_dataset.py
│   │   ├── build_multi_dataset_copypaste.py
│   │   ├── common_copypaste.py
│   │   ├── dataset_mapper.py
│   │   ├── dataset_mapper_copypaste.py
│   │   ├── dataset_mapper_detr_instance.py
│   │   ├── dataset_mapper_detr_instance_exp.py
│   │   ├── dataset_mapper_detr_panoptic.py
│   │   ├── dataset_mapper_detr_panoptic_copypaste.py
│   │   ├── dataset_mapper_detr_semantic.py
│   │   ├── datasets/
│   │   │   ├── __init__.py
│   │   │   ├── coco.py
│   │   │   ├── d_cube.py
│   │   │   ├── flickr30k.py
│   │   │   ├── gqa.py
│   │   │   ├── grit.py
│   │   │   ├── inst_categories.py
│   │   │   ├── lvis_coco.py
│   │   │   ├── lvis_coco_panoptic.py
│   │   │   ├── lvis_v1_coco_category_image_count.py
│   │   │   ├── objects365.py
│   │   │   ├── odinw_categories.py
│   │   │   ├── odinw_instance.py
│   │   │   ├── odinw_prompts.py
│   │   │   ├── oid.py
│   │   │   ├── openimages_v6_category_image_count.py
│   │   │   ├── pascal_voc_external.py
│   │   │   ├── phrasecut.py
│   │   │   ├── refcoco.py
│   │   │   ├── register_bdd100k_panoseg.py
│   │   │   ├── register_bdd100k_semseg.py
│   │   │   ├── register_pascal_context.py
│   │   │   ├── register_voc_seg.py
│   │   │   ├── sa1b.py
│   │   │   ├── seginw_categories.py
│   │   │   ├── seginw_instance.py
│   │   │   ├── visualgenome.py
│   │   │   └── visualgenome_categories.py
│   │   ├── detection_utils.py
│   │   ├── mapper_utils.py
│   │   ├── samplers/
│   │   │   ├── __init__.py
│   │   │   └── distributed_sampler_multi_dataset.py
│   │   └── transforms/
│   │       ├── __init__.py
│   │       ├── augmentation_aa.py
│   │       └── augmentation_lsj.py
│   ├── engine/
│   │   ├── __init__.py
│   │   ├── defaults.py
│   │   └── train_loop.py
│   ├── evaluation/
│   │   ├── __init__.py
│   │   ├── d3_evaluation.py
│   │   ├── evaluator.py
│   │   ├── instance_evaluation.py
│   │   ├── lvis_evaluation.py
│   │   ├── multi_dataset_evaluator.py
│   │   ├── oideval.py
│   │   ├── refcoco_evaluation.py
│   │   └── refcocoeval.py
│   ├── layers/
│   │   ├── __init__.py
│   │   ├── csrc/
│   │   │   ├── MsDeformAttn/
│   │   │   │   ├── ms_deform_attn.h
│   │   │   │   ├── ms_deform_attn_cpu.cpp
│   │   │   │   ├── ms_deform_attn_cpu.h
│   │   │   │   ├── ms_deform_attn_cuda.cu
│   │   │   │   ├── ms_deform_attn_cuda.h
│   │   │   │   └── ms_deform_im2col_cuda.cuh
│   │   │   ├── cuda_version.cu
│   │   │   └── vision.cpp
│   │   ├── fuse_helper.py
│   │   ├── multi_scale_deform_attn.py
│   │   ├── vision_language_align.py
│   │   ├── vision_language_fusion.py
│   │   └── zero_shot_fc.py
│   ├── model_zoo/
│   │   ├── __init__.py
│   │   └── model_zoo.py
│   ├── modeling/
│   │   ├── __init__.py
│   │   ├── ape_deta/
│   │   │   ├── __init__.py
│   │   │   ├── ape_deta.py
│   │   │   ├── assigner.py
│   │   │   ├── deformable_criterion.py
│   │   │   ├── deformable_detr.py
│   │   │   ├── deformable_detr_segm.py
│   │   │   ├── deformable_detr_segm_vl.py
│   │   │   ├── deformable_transformer.py
│   │   │   ├── deformable_transformer_vl.py
│   │   │   ├── fast_rcnn.py
│   │   │   ├── misc.py
│   │   │   └── segmentation.py
│   │   ├── backbone/
│   │   │   ├── __init__.py
│   │   │   ├── utils_eva.py
│   │   │   ├── utils_eva02.py
│   │   │   ├── vit.py
│   │   │   ├── vit_eva.py
│   │   │   ├── vit_eva02.py
│   │   │   └── vit_eva_clip.py
│   │   ├── deta/
│   │   │   ├── __init__.py
│   │   │   ├── assigner.py
│   │   │   ├── deformable_criterion.py
│   │   │   ├── deformable_detr.py
│   │   │   ├── deformable_detr_segm.py
│   │   │   ├── deformable_transformer.py
│   │   │   ├── misc.py
│   │   │   └── segmentation.py
│   │   └── text/
│   │       ├── __init__.py
│   │       ├── bert_wrapper.py
│   │       ├── clip_wrapper.py
│   │       ├── clip_wrapper_eva01.py
│   │       ├── clip_wrapper_eva02.py
│   │       ├── clip_wrapper_open.py
│   │       ├── eva01_clip/
│   │       │   ├── README.md
│   │       │   ├── __init__.py
│   │       │   ├── clip.py
│   │       │   ├── eva_clip.py
│   │       │   ├── eva_model.py
│   │       │   ├── model.py
│   │       │   ├── simple_tokenizer.py
│   │       │   └── vit_model.py
│   │       ├── eva02_clip/
│   │       │   ├── __init__.py
│   │       │   ├── constants.py
│   │       │   ├── eva_vit_model.py
│   │       │   ├── factory.py
│   │       │   ├── hf_configs.py
│   │       │   ├── hf_model.py
│   │       │   ├── loss.py
│   │       │   ├── model.py
│   │       │   ├── modified_resnet.py
│   │       │   ├── openai.py
│   │       │   ├── pretrained.py
│   │       │   ├── rope.py
│   │       │   ├── timm_model.py
│   │       │   ├── tokenizer.py
│   │       │   ├── transform.py
│   │       │   ├── transformer.py
│   │       │   └── utils.py
│   │       ├── llama2_wrapper.py
│   │       ├── t5_wrapper.py
│   │       ├── text_encoder.py
│   │       └── utils.py
│   └── utils/
│       ├── __init__.py
│       ├── box_ops.py
│       ├── misc.py
│       └── plot_utils.py
├── configs/
│   ├── ADE20kFull_SemanticSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024.py
│   │       └── ape_deta_vitt_eva02_vlf_lsj1024.py
│   ├── ADE20k_PanopticSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_160k.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024.py
│   │       └── ape_deta_vitt_eva02_vlf_lsj1024.py
│   ├── ADE20k_SemanticSegmentation/
│   │   ├── ape_deta/
│   │   │   ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │   │   ├── ape_deta_vitl_eva02_lsj1024.py
│   │   │   ├── ape_deta_vitl_eva02_vlf_lsj1024.py
│   │   │   └── ape_deta_vitt_eva02_vlf_lsj1024.py
│   │   └── deformable_deta/
│   │       └── deformable_deta_segm_r50_160k.py
│   ├── BDD10k_PanopticSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024.py
│   │       └── ape_deta_vitt_eva02_vlf_lsj1024.py
│   ├── BDD10k_SemanticSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024.py
│   │       └── ape_deta_vitt_eva02_vlf_lsj1024.py
│   ├── COCO_Detection/
│   │   ├── deformable_deta/
│   │   │   ├── deformable_deta_r50_12ep.py
│   │   │   ├── deformable_deta_r50_24ep.py
│   │   │   ├── deformable_deta_vitb_clip_openai_lsj1024_cp_12ep.py
│   │   │   ├── deformable_deta_vitb_lsj1024_12ep.py
│   │   │   ├── deformable_deta_vitg_eva_lsj1024_12ep.py
│   │   │   ├── deformable_deta_vitg_eva_lsj1024_cp_12ep.py
│   │   │   ├── deformable_deta_vitl_eva02_lsj1024_cp_12ep.py
│   │   │   ├── deformable_deta_vitl_eva_lsj1024_cp_12ep.py
│   │   │   ├── deformable_deta_vitl_lsj1024_12ep.py
│   │   │   └── models/
│   │   │       └── deformable_deta_r50.py
│   │   └── deformable_detr/
│   │       ├── deformable_detr_r50_50ep.py
│   │       ├── deformable_detr_r50_two_stage_50ep.py
│   │       ├── deformable_detr_r50_with_box_refinement_50ep.py
│   │       ├── improved_deformable_detr_r50_12ep.py
│   │       ├── improved_deformable_detr_r50_50ep.py
│   │       ├── improved_deformable_detr_r50_two_stage_12ep.py
│   │       ├── improved_deformable_detr_r50_two_stage_50ep.py
│   │       └── models/
│   │           ├── deformable_detr_r50.py
│   │           └── improved_deformable_detr_r50.py
│   ├── COCO_InstanceSegmentation/
│   │   ├── ape_deta/
│   │   │   ├── ape_deta_r50_12ep.py
│   │   │   ├── ape_deta_r50_vlf_12ep.py
│   │   │   ├── ape_deta_vite_eva02_clip_lsj1024_cp_12ep_fsdp.py
│   │   │   ├── ape_deta_vite_eva02_clip_lsj1024_cp_32x90k_fsdp.py
│   │   │   ├── ape_deta_vitg_eva01_clip_lsj1536_cp_128x45k.py
│   │   │   ├── ape_deta_vitg_eva01_clip_lsj1536_cp_64x90k.py
│   │   │   ├── ape_deta_vitg_eva01_lsj1536_cp_64x90k.py
│   │   │   ├── ape_deta_vitl_eva02_clip_lsj1024_cp_12ep.py
│   │   │   ├── ape_deta_vitl_eva02_clip_lsj1024_cp_12ep_fsdp.py
│   │   │   ├── ape_deta_vitl_eva02_clip_lsj1536_cp_128x45k.py
│   │   │   ├── ape_deta_vitl_eva02_clip_lsj1536_cp_64x90k.py
│   │   │   ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_12ep.py
│   │   │   ├── ape_deta_vitl_eva02_lsj1024_cp_12ep.py
│   │   │   ├── ape_deta_vitl_eva02_lsj1536_cp_128x90k.py
│   │   │   ├── ape_deta_vitl_eva02_lsj1536_cp_12ep.py
│   │   │   ├── ape_deta_vitl_eva02_lsj1536_cp_64x90k.py
│   │   │   ├── ape_deta_vitl_eva02_vlf_lsj1024_cp_12ep.py
│   │   │   ├── ape_deta_vitl_lsj1024_cp_12ep.py
│   │   │   ├── ape_deta_vitt_eva02_lsj1024_cp_12ep.py
│   │   │   ├── ape_deta_vitt_eva02_vlf_lsj1024_cp_12ep.py
│   │   │   └── models/
│   │   │       └── ape_deta_r50.py
│   │   └── deformable_deta/
│   │       ├── deformable_deta_segm_r50_12ep.py
│   │       ├── deformable_deta_segm_r50_24ep.py
│   │       ├── deformable_deta_segm_vitl_eva02_lsj1024_cp_12ep.py
│   │       └── models/
│   │           └── deformable_deta_segm_r50.py
│   ├── COCO_PanopticSegmentation/
│   │   ├── ape_deta/
│   │   │   ├── ape_deta_r50_12ep.py
│   │   │   ├── ape_deta_r50_12ep_separated.py
│   │   │   ├── ape_deta_r50_24ep.py
│   │   │   ├── ape_deta_r50_lsj1024.py
│   │   │   ├── ape_deta_r50_vlf_lsj1024.py
│   │   │   ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │   │   ├── ape_deta_vitl_eva02_lsj1024.py
│   │   │   ├── ape_deta_vitl_eva02_vlf_lsj1024.py
│   │   │   └── ape_deta_vitt_eva02_vlf_lsj1024.py
│   │   └── deformable_deta/
│   │       ├── deformable_deta_segm_r50_12ep.py
│   │       ├── deformable_deta_segm_r50_24ep.py
│   │       ├── deformable_deta_segm_r50_36ep.py
│   │       └── deformable_deta_segm_r50_50ep.py
│   ├── COCO_REFCOCO/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_12ep.py
│   │       ├── ape_deta_r50_24ep.py
│   │       ├── ape_deta_r50_36ep.py
│   │       ├── ape_deta_r50_vlf_12ep.py
│   │       ├── ape_deta_r50_vlf_36ep.py
│   │       ├── ape_deta_r50_vlf_bert_36ep.py
│   │       ├── ape_deta_vitl_eva02_lsj1024_12ep.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024_36ep.py
│   │       └── ape_deta_vitl_lsj1024_12ep.py
│   ├── COCO_SA1B_InstanceSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_24ep.py
│   │       └── ape_deta_r50_24ep_mp.py
│   ├── COCO_SA1B_PanopticSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_24ep.py
│   │       ├── ape_deta_r50_24ep_lp.py
│   │       └── ape_deta_r50_24ep_vlf_lp.py
│   ├── COCO_SemanticSegmentation/
│   │   ├── ape_deta/
│   │   │   ├── ape_deta_r50_12ep.py
│   │   │   ├── ape_deta_r50_vlf_lsj1024_12ep.py
│   │   │   └── ape_deta_vitl_eva02_lsj1024_12ep.py
│   │   └── deformable_deta/
│   │       └── deformable_deta_segm_r50_12ep.py
│   ├── Cityscapes_PanopticSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024.py
│   │       └── ape_deta_vitt_eva02_vlf_lsj1024.py
│   ├── D3_InstanceSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024.py
│   │       └── ape_deta_vitt_eva02_vlf_lsj1024.py
│   ├── Flickr30k_VisualGrounding/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_12ep.py
│   │       ├── ape_deta_r50_vlf_12ep.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │       └── ape_deta_vitl_eva02_vlf_lsj1024.py
│   ├── GQA_VisualGrounding/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_12ep.py
│   │       ├── ape_deta_r50_12ep_eval_odinw13.py
│   │       ├── ape_deta_r50_12ep_eval_odinw35.py
│   │       ├── ape_deta_r50_vlf_12ep.py
│   │       ├── ape_deta_r50_vlf_12ep_eval_odinw13.py
│   │       └── ape_deta_r50_vlf_12ep_eval_odinw35.py
│   ├── GRIT_SA1B_VisualGrounding/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_24ep.py
│   │       └── ape_deta_r50_vlf_24ep.py
│   ├── GRIT_VisualGrounding/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_400k.py
│   │       ├── ape_deta_r50_vlf_400k.py
│   │       └── ape_deta_r50_vlf_lsj224_256x50k.py
│   ├── LVISCOCOCOCOSTUFF_O365_OID_VG/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_lsj1024_cp_50ep.py
│   │       ├── ape_deta_vitl_eva02_lsj1024_cp_180k.py
│   │       ├── ape_deta_vitl_eva02_lsj1024_cp_720k.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024_cp_180k.py
│   │       └── ape_deta_vitl_eva02_vlf_lsj1024_cp_720k.py
│   ├── LVISCOCOCOCOSTUFF_O365_OID_VGR_REFCOCO/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_lsj1024_cp_180k.py
│   │       ├── ape_deta_vitl_eva02_lsj1024_cp_720k.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024_cp_180k.py
│   │       └── ape_deta_vitl_eva02_vlf_lsj1024_cp_720k.py
│   ├── LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k.py
│   │       └── ape_deta_vitl_eva02_vlf_lsj1024_cp_2160k.py
│   ├── LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/
│   │   └── ape_deta/
│   │       ├── ape_deta_vite_eva02_clip_vlf_lsj1024_cp_16x4_1080k.py
│   │       ├── ape_deta_vite_eva02_clip_vlf_lsj1024_cp_16x4_1080k_mdl_fsdp.py
│   │       ├── ape_deta_vite_eva02_clip_vlf_lsj1024_cp_32x2_540k_mdl_fsdp.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_08x8x270k.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_1080k.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k_mdl.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k_mdl_llama2.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4x270k.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4x270k_mdl.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4x270k_mdl_llama2.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4x337k_mdl.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_32x2x270k.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_48x2x270k.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_64x1x270k.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1536_cp_08x8x270k.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1536_cp_32x2x270k.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1536_cp_64x270k.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k.py
│   │       ├── ape_deta_vitt_eva02_vlf_lsj1024_cp_16x4_1080k.py
│   │       ├── ape_deta_vitt_eva02_vlf_lsj1024_cp_16x4_1080k_mdl.py
│   │       └── ape_deta_vitt_eva02_vlf_lsj1024_cp_64x1_270k_mdl.py
│   ├── LVISCOCOCOCOSTUFF_PanopticSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_lsj1024_cp_50ep.py
│   │       └── ape_deta_vitl_eva02_lsj1024_cp_24ep.py
│   ├── LVISCOCOCOCOSTUFF_REFCOCO/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_lsj1024_50ep.py
│   │       ├── ape_deta_r50_lsj1024_cp_50ep.py
│   │       ├── ape_deta_r50_vlf_lsj1024_cp_50ep.py
│   │       ├── ape_deta_r50_vlf_lsj1024_cp_bert_50ep.py
│   │       ├── ape_deta_vitl_eva02_lsj1024_24ep.py
│   │       └── ape_deta_vitl_eva02_vlf_lsj1024_50ep.py
│   ├── LVISCOCO_COCOSTUFF_O365_OID_VG_REFCOCO/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024_cp_180k.py
│   │       └── ape_deta_vitl_eva02_vlf_lsj1024_cp_720k.py
│   ├── LVISCOCO_COCOSTUFF_PanopticSegmentation/
│   │   └── ape_deta/
│   │       └── ape_deta_r50_lsj1024_cp_50ep.py
│   ├── LVIS_Detection/
│   │   └── deformable_deta/
│   │       ├── deformable_deta_r50_lsj1024_24ep.py
│   │       ├── deformable_deta_vitb_lsj1024_24ep.py
│   │       ├── deformable_deta_vitg_eva_lsj1024_24ep.py
│   │       ├── deformable_deta_vitg_eva_lsj1024_cp_24ep.py
│   │       ├── deformable_deta_vitl_eva02_lsj1024_cp_24ep.py
│   │       ├── deformable_deta_vitl_eva_lsj1024_cp_24ep.py
│   │       └── deformable_deta_vitl_lsj1024_24ep.py
│   ├── LVIS_InstanceSegmentation/
│   │   ├── ape_deta/
│   │   │   ├── ape_deta_r50_24ep.py
│   │   │   ├── ape_deta_r50_vlf_24ep.py
│   │   │   ├── ape_deta_vite_eva02_clip_lsj1024_cp_24ep_fsdp.py
│   │   │   ├── ape_deta_vitl_eva02_clip_lsj1024_cp_24ep.py
│   │   │   ├── ape_deta_vitl_eva02_clip_lsj1536_cp_64x90k.py
│   │   │   ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_24ep.py
│   │   │   ├── ape_deta_vitl_eva02_lsj1024_cp_24ep.py
│   │   │   ├── ape_deta_vitl_eva02_lsj1536_cp_64x90k.py
│   │   │   ├── ape_deta_vitl_eva02_vlf_lsj1024_cp_24ep.py
│   │   │   ├── ape_deta_vitt_eva02_lsj1024_cp_24ep.py
│   │   │   └── ape_deta_vitt_eva02_vlf_lsj1024_cp_24ep.py
│   │   └── deformable_deta/
│   │       ├── deformable_deta_segm_vitl_eva02_4scale_lsj1024_cp_24ep.py
│   │       └── deformable_deta_segm_vitl_eva02_lsj1024_cp_24ep.py
│   ├── LVIS_SA1B_InstanceSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_50ep.py
│   │       ├── ape_deta_r50_50ep_eval_odinw13.py
│   │       ├── ape_deta_r50_50ep_eval_odinw35.py
│   │       ├── ape_deta_r50_50ep_eval_seginw.py
│   │       ├── ape_deta_r50_50ep_iouloss_lp.py
│   │       └── ape_deta_r50_50ep_mp.py
│   ├── ODinW_Detection/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_13.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_35.py
│   │       ├── ape_deta_vitl_eva02_lsj1024_13.py
│   │       ├── ape_deta_vitl_eva02_lsj1024_35.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024_13.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024_35.py
│   │       ├── ape_deta_vitt_eva02_vlf_lsj1024_13.py
│   │       └── ape_deta_vitt_eva02_vlf_lsj1024_35.py
│   ├── PascalContext459_SemanticSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024.py
│   │       └── ape_deta_vitt_eva02_vlf_lsj1024.py
│   ├── PascalContext59_SemanticSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024.py
│   │       └── ape_deta_vitt_eva02_vlf_lsj1024.py
│   ├── PascalVOC20_SemanticSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024.py
│   │       └── ape_deta_vitt_eva02_vlf_lsj1024.py
│   ├── PascalVOCParts_PanopticSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_12ep.py
│   │       └── ape_deta_r50_vlf_12ep.py
│   ├── PhraseCut_VisualGrounding/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_12ep.py
│   │       ├── ape_deta_r50_vlf_12ep.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │       └── ape_deta_vitl_eva02_vlf_lsj1024.py
│   ├── REFCOCO_VisualGrounding/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_12ep.py
│   │       ├── ape_deta_r50_bert_vlf_12ep.py
│   │       ├── ape_deta_r50_vlf_12ep.py
│   │       ├── ape_deta_vitl_eva02_clip_lsj1024_12ep.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_12ep.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024_12ep.py
│   │       └── ape_deta_vitl_lsj1024_12ep.py
│   ├── Roboflow_Detection/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024.py
│   │       └── ape_deta_vitt_eva02_vlf_lsj1024.py
│   ├── SegInW_InstanceSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024.py
│   │       └── ape_deta_vitt_eva02_vlf_lsj1024.py
│   ├── VisualGenome_VisualGrounding/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_12ep.py
│   │       ├── ape_deta_r50_12ep_eval_odinw13.py
│   │       ├── ape_deta_r50_12ep_eval_odinw35.py
│   │       ├── ape_deta_r50_vlf_12ep.py
│   │       ├── ape_deta_r50_vlf_12ep_eval_odinw13.py
│   │       └── ape_deta_r50_vlf_12ep_eval_odinw35.py
│   └── common/
│       ├── backbone/
│       │   ├── vite_eva02_clip_1024.py
│       │   ├── vite_eva02_clip_1536.py
│       │   ├── vitg_eva01.py
│       │   ├── vitg_eva01_1536.py
│       │   ├── vitg_eva01_clip_1024.py
│       │   ├── vitg_eva01_clip_1536.py
│       │   ├── vitl_eva02.py
│       │   ├── vitl_eva02_1536.py
│       │   ├── vitl_eva02_clip.py
│       │   ├── vitl_eva02_clip_1536.py
│       │   └── vitt_eva02.py
│       └── data/
│           ├── ade20k_panoptic.py
│           ├── ade20k_panoptic_lsj1024.py
│           ├── ade20k_semantic.py
│           ├── ade20k_semantic_lsj1024.py
│           ├── ade20kfull_semantic_lsj1024.py
│           ├── bdd10k_panoptic_lsj1024.py
│           ├── bdd10k_semantic_lsj1024.py
│           ├── cityscapes_panoptic_lsj1024.py
│           ├── cityscapes_semantic_lsj1024.py
│           ├── coco_instance.py
│           ├── coco_instance_lsj1024.py
│           ├── coco_instance_lsj1024_cp.py
│           ├── coco_instance_lsj1536_cp.py
│           ├── coco_panoptic.py
│           ├── coco_panoptic_lsj1024.py
│           ├── coco_panoptic_separated.py
│           ├── coco_refcoco_instance.py
│           ├── coco_refcoco_instance_lsj1024.py
│           ├── coco_sa1b_instance.py
│           ├── coco_sa1b_panoptic.py
│           ├── coco_semantic.py
│           ├── coco_semantic_lsj1024.py
│           ├── constants.py
│           ├── d3_instance_lsj1024.py
│           ├── flickr30k_instance.py
│           ├── flickr30k_instance_lsj1024.py
│           ├── gqa_region_instance.py
│           ├── grit_instance.py
│           ├── grit_instance_lsj224.py
│           ├── grit_sa1b_instance.py
│           ├── lvis_instance_lsj1024_cp.py
│           ├── lvis_instance_lsj1536_cp.py
│           ├── lvis_sa1b_instance.py
│           ├── lviscoco_cocostuff_o365_oid_vg_refcoco_panoptic_lsj1024_cp.py
│           ├── lviscoco_cocostuff_panoptic_lsj1024_cp.py
│           ├── lviscocococostuff_o365_oid_refcoco_panoptic_lsj1024.py
│           ├── lviscocococostuff_o365_oid_refcoco_panoptic_lsj1024_cp.py
│           ├── lviscocococostuff_o365_oid_vg_panoptic_lsj1024_cp.py
│           ├── lviscocococostuff_o365_oid_vg_refcoco_panoptic_lsj1024_cp.py
│           ├── lviscocococostuff_o365_oid_vgr_refcoco_group_by_image_panoptic_lsj1024_cp.py
│           ├── lviscocococostuff_o365_oid_vgr_refcoco_panoptic_lsj1024_cp.py
│           ├── lviscocococostuff_o365_oid_vgr_sa1b_refcoco_group_by_image_gqa_panoptic_lsj1024_cp.py
│           ├── lviscocococostuff_o365_oid_vgr_sa1b_refcoco_group_by_image_gqa_panoptic_lsj1536_cp.py
│           ├── lviscocococostuff_o365_oid_vgr_sa1b_refcoco_group_by_image_gqa_phrasecut_flickr30k_panoptic_lsj1024_cp.py
│           ├── lviscocococostuff_o365_oid_vgr_sa1b_refcoco_group_by_image_gqa_phrasecut_flickr30k_panoptic_lsj1024_cp_mdl.py
│           ├── lviscocococostuff_o365_oid_vgr_sa1b_refcoco_group_by_image_gqa_phrasecut_flickr30k_panoptic_lsj1536_cp.py
│           ├── lviscocococostuff_o365_oid_vgr_sa1b_refcoco_group_by_image_gqa_phrasecut_panoptic_lsj1024_cp.py
│           ├── lviscocococostuff_o365_oid_vgr_sa1b_refcoco_group_by_image_gqa_phrasecut_panoptic_lsj1536_cp.py
│           ├── lviscocococostuff_o365_oid_vgr_sa1b_refcoco_group_by_image_panoptic_lsj1024_cp.py
│           ├── lviscocococostuff_panoptic_lsj1024_cp.py
│           ├── lviscocococostuff_refcoco_group_by_image_panoptic_lsj1024_cp.py
│           ├── lviscocococostuff_refcoco_panoptic_lsj1024.py
│           ├── lviscocococostuff_refcoco_panoptic_lsj1024_cp.py
│           ├── lviscocococostuff_sa1b_panoptic.py
│           ├── o365_instance_lsj1024.py
│           ├── odinw13_instance.py
│           ├── odinw13_instance_lsj1024.py
│           ├── odinw13_instance_lsj1536.py
│           ├── odinw35_instance.py
│           ├── odinw35_instance_lsj1024.py
│           ├── odinw35_instance_lsj1536.py
│           ├── odinwvoc_instance_lsj1024.py
│           ├── pascalcontext459_semantic_lsj1024.py
│           ├── pascalcontext59_semantic_lsj1024.py
│           ├── pascalvoc20_semantic_lsj1024.py
│           ├── pascalvocpart_panoptic.py
│           ├── phrasecut_instance.py
│           ├── phrasecut_instance_lsj1024.py
│           ├── refcoco_group_by_image_instance.py
│           ├── refcoco_group_by_image_instance_lsj1024.py
│           ├── refcoco_instance.py
│           ├── refcoco_instance_lsj1024.py
│           ├── roboflow100_instance_lsj1024.py
│           ├── seginw_instance.py
│           ├── seginw_instance_lsj1024.py
│           ├── seginw_instance_lsj1536.py
│           └── vgregion_instance.py
├── datasets/
│   ├── README.md
│   ├── prepare_ade20k_full_sem_seg.py
│   ├── prepare_coco_semantic_annos_from_panoptic_annos.py
│   ├── prepare_pascal_context.py
│   └── prepare_voc_sem_seg.py
├── demo/
│   ├── .gitattributes
│   ├── README.md
│   ├── app.py
│   ├── demo_lazy.py
│   ├── pre-requirements.txt
│   ├── predictor_lazy.py
│   └── requirements.txt
├── requirements.txt
├── scripts/
│   ├── eval_APE-L_A.sh
│   ├── eval_APE-L_B.sh
│   ├── eval_APE-L_C.sh
│   ├── eval_APE-L_D.sh
│   ├── eval_APE-Ti.sh
│   ├── eval_flops.sh
│   └── eval_time.sh
├── setup.py
└── tools/
    ├── analyze_model.py
    ├── eva_interpolate_patch_14to16.py
    ├── train_net.py
    ├── train_net_fsdp.py
    └── visualize_json_results.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
# output dir
output
instant_test_output
inference_test_output


*.png
*.json
*.diff
*.jpg
!/projects/DensePose/doc/images/*.jpg

# compilation and distribution
__pycache__
_ext
*.pyc
*.pyd
*.so
*.dll
*.egg-info/
build/
dist/
wheels/

# pytorch/python/numpy formats
*.pth
*.pkl
*.npy
*.ts
model_ts*.txt

# ipython/jupyter notebooks
*.ipynb
**/.ipynb_checkpoints/

# Editor temporaries
*.swn
*.swo
*.swp
*~

# editor settings
.idea
.vscode
_darcs

# project dirs
/ape/model_zoo/configs
/datasets/*
!/datasets/*.*
/projects/*/datasets
/models
/snippet


================================================
FILE: LICENSE
================================================
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: README.md
================================================
# APE: Aligning and Prompting Everything All at Once for Universal Visual Perception


<!-- 
<a href='https://github.com/shenyunhang/APE'><img src='https://img.shields.io/badge/Project-Page-Green'></a>
<a href='https://arxiv.org/abs/2312.02153'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>
<a href='https://huggingface.co/spaces/shenyunhang/APE'><img src='https://img.shields.io/badge/%F0%9F%A4%97-Demo-yellow'></a>
<a href='https://huggingface.co/shenyunhang/APE'><img src='https://img.shields.io/badge/%F0%9F%A4%97-Model-yellow'></a>
[![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/LICENSE)
-->

<p align="center">
    <img src="./.asset/ape.png" width="96%" height="96%">
</p>


<font size=7><div align='center' > :grapes: \[[Read our arXiv Paper](https://arxiv.org/abs/2312.02153)\] &nbsp; :apple: \[[Try our Online Demo](https://huggingface.co/spaces/shenyunhang/APE)\] </div></font>


---

<p align="center">
    <img src="./.asset/example_1.png" width="96%" height="96%">
</p>


## :bulb: Highlight

- **High Performance.**  SotA (or competitive) performance on **160** datasets with only one model.
- **Perception in the Wild.** Detect and segment **everything** with thousands of vocabularies or language descriptions all at once.
- **Flexible.** Support both foreground objects and background stuff for instance segmentation and semantic segmentation.

## :fire: News
* **`2024.04.07`** Release checkpoints for APE-Ti with only 6M backbone!
* **`2024.02.27`** APE has been accepted to CVPR 2024!
* **`2023.12.05`** Release training codes!
* **`2023.12.05`** Release checkpoints for APE-L!
* **`2023.12.05`** Release inference codes and demo!

## :label: TODO 

- [x] Release inference code and demo.
- [x] Release checkpoints.
- [x] Release training codes.
- [ ] Add clean docs.


## :hammer_and_wrench: Install 

1. Clone the APE repository from GitHub:

```bash
git clone https://github.com/shenyunhang/APE
cd APE
```

2. Install the required dependencies and APE:

```bash
pip3 install -r requirements.txt
python3 -m pip install -e .
```


## :arrow_forward: Demo Localy

**Web UI demo**
```
pip3 install gradio
cd APE/demo
python3 app.py
```
This demo will detect GPUs and use one GPU if you have GPUs.

Please feel free to try our [Online Demo](https://huggingface.co/spaces/shenyunhang/APE)!

<p align="center">
<img src="./.asset/demo.png" width="96%" height="96%">
</p>


## :books: Data Prepare
Following [here](https://github.com/shenyunhang/APE/blob/main/datasets/README.md) to prepare the following datasets:

|  Name |   COCO  |   LVIS  |  Objects365 | Openimages | VisualGenome |  SA-1B  |   RefCOCO  |   GQA   | PhraseCut | Flickr30k |         |
|:-----:|:-------:|:-------:|:-----------:|:----------:|:------------:|:-------:|:----------:|:-------:|:---------:|:---------:|:-------:|
| Train | &check; | &check; |   &check;   |   &check;  |    &check;   | &check; |   &check;  | &check; |  &check;  |  &check;  |         |
|  Test | &check; | &check; |   &check;   |   &check;  |    &cross;   | &cross; |   &check;  | &cross; |  &cross;  |  &cross;  |         |
|       |         |         |             |            |              |         |            |         |           |           |         |
| Name  |  ODinW  |  SegInW | Roboflow100 |   ADE20k   |   ADE-full   |  BDD10k | Cityscapes |  PC459  |    PC59   |    VOC    |    D3   |
| Train | &cross; | &cross; |   &cross;   |   &cross;  |    &cross;   | &cross; |   &cross;  | &cross; |  &cross;  |  &cross;  | &cross; |
|  Test | &check; | &check; |   &check;   |   &check;  |    &check;   | &check; |   &check;  | &check; |  &check;  |  &check;  | &check; |

Noted we do not use `coco_2017_train` for training.

Instead, we augment `lvis_v1_train` with annotations from coco, and keep the image set unchanged.

And we register it as `lvis_v1_train+coco` for instance segmentation and `lvis_v1_train+coco_panoptic_separated` for panoptic segmentation.


## :test_tube: Inference

### Infer on 160+ dataset
We provide several scripts to evaluate all models.

It is necessary to adjust the checkpoint location and GPU number in the scripts before running them.

```bash
scripts/eval_APE-L_D.sh
scripts/eval_APE-L_C.sh
scripts/eval_APE-L_B.sh
scripts/eval_APE-L_A.sh
scripts/eval_APE-Ti.sh
```

### Infer on images or videos

APE-L_D
```
python3 demo/demo_lazy.py \
--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k.py \
--input image1.jpg image2.jpg image3.jpg \
--output /path/to/output/dir \
--confidence-threshold 0.1 \
--text-prompt 'person,car,chess piece of horse head' \
--with-box \
--with-mask \
--with-sseg \
--opts \
train.init_checkpoint=/path/to/APE-D/checkpoint \
model.model_language.cache_dir="" \
model.model_vision.select_box_nums_for_evaluation=500 \
model.model_vision.text_feature_bank_reset=True \
```

To disable `xformers`, add the following option:
```
model.model_vision.backbone.net.xattn=False \
```

To use `pytorch` version of `MultiScaleDeformableAttention`, add the following option:
```
model.model_vision.transformer.encoder.pytorch_attn=True \
model.model_vision.transformer.decoder.pytorch_attn=True \
```


## :train: Training

### Prepare backbone and language models
```bash
git lfs install
git clone https://huggingface.co/QuanSun/EVA-CLIP models/QuanSun/EVA-CLIP/
git clone https://huggingface.co/BAAI/EVA models/BAAI/EVA/
git clone https://huggingface.co/Yuxin-CV/EVA-02 models/Yuxin-CV/EVA-02/
```

Resize patch size:
```bash
python3 tools/eva_interpolate_patch_14to16.py --input models/QuanSun/EVA-CLIP/EVA02_CLIP_E_psz14_plus_s9B.pt --output models/QuanSun/EVA-CLIP/EVA02_CLIP_E_psz14to16_plus_s9B.pt --image_size 224
python3 tools/eva_interpolate_patch_14to16.py --input models/QuanSun/EVA-CLIP/EVA01_CLIP_g_14_plus_psz14_s11B.pt --output models/QuanSun/EVA-CLIP/EVA01_CLIP_g_14_plus_psz14to16_s11B.pt --image_size 224
python3 tools/eva_interpolate_patch_14to16.py --input models/QuanSun/EVA-CLIP/EVA02_CLIP_L_336_psz14_s6B.pt --output models/QuanSun/EVA-CLIP/EVA02_CLIP_L_336_psz14to16_s6B.pt --image_size 336
python3 tools/eva_interpolate_patch_14to16.py --input models/Yuxin-CV/EVA-02/eva02/pt/eva02_Ti_pt_in21k_p14.pt --output models/Yuxin-CV/EVA-02/eva02/pt/eva02_Ti_pt_in21k_p14to16.pt --image_size 224
```

### Train APE-L_D

Single node:
```bash
python3 tools/train_net.py \
--num-gpus 8 \
--resume \
--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k_mdl.py \
train.output_dir=output/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k_mdl_`date +'%Y%m%d_%H%M%S'`
```

Multiple nodes:
```bash
python3 tools/train_net.py \
--dist-url="tcp://${MASTER_IP}:${MASTER_PORT}" \
--num-gpus ${HOST_GPU_NUM} \
--num-machines ${HOST_NUM} \
--machine-rank ${INDEX} \
--resume \
--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k_mdl.py \
train.output_dir=output/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k_mdl_`date +'%Y%m%d_%H'`0000
```

### Train APE-L_C

Single node:
```bash
python3 tools/train_net.py \
--num-gpus 8 \
--resume \
--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k.py \
train.output_dir=output/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k_`date +'%Y%m%d_%H%M%S'`
```

Multiple nodes:
```bash
python3 tools/train_net.py \
--dist-url="tcp://${MASTER_IP}:${MASTER_PORT}" \
--num-gpus ${HOST_GPU_NUM} \
--num-machines ${HOST_NUM} \
--machine-rank ${INDEX} \
--resume \
--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k.py \
train.output_dir=output/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k_`date +'%Y%m%d_%H'`0000
```

### Train APE-L_B

Single node:
```bash
python3 tools/train_net.py \
--num-gpus 8 \
--resume \
--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k.py \
train.output_dir=output/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k_`date +'%Y%m%d_%H%M%S'`
```

Multiple nodes:
```bash
python3 tools/train_net.py \
--dist-url="tcp://${MASTER_IP}:${MASTER_PORT}" \
--num-gpus ${HOST_GPU_NUM} \
--num-machines ${HOST_NUM} \
--machine-rank ${INDEX} \
--resume \
--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k.py \
train.output_dir=output/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k_`date +'%Y%m%d_%H'`0000
```

### Train APE-L_A

Single node:
```bash
python3 tools/train_net.py \
--num-gpus 8 \
--resume \
--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VG/ape_deta/ape_deta_vitl_eva02_lsj1024_cp_720k.py \
train.output_dir=output/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VG/ape_deta/ape_deta_vitl_eva02_lsj1024_cp_720k_`date +'%Y%m%d_%H%M%S'`
```

Multiple nodes:
```bash
python3 tools/train_net.py \
--dist-url="tcp://${MASTER_IP}:${MASTER_PORT}" \
--num-gpus ${HOST_GPU_NUM} \
--num-machines ${HOST_NUM} \
--machine-rank ${INDEX} \
--resume \
--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VG/ape_deta/ape_deta_vitl_eva02_lsj1024_cp_720k.py \
train.output_dir=output/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VG/ape_deta/ape_deta_vitl_eva02_lsj1024_cp_720k_`date +'%Y%m%d_%H'`0000
```

### Train APE-Ti

Single node:
```bash
python3 tools/train_net.py \
--num-gpus 8 \
--resume \
--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitt_eva02_vlf_lsj1024_cp_16x4_1080k_mdl.py \
train.output_dir=output/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitt_eva02_vlf_lsj1024_cp_16x4_1080k_mdl_`date +'%Y%m%d_%H%M%S'`
```

Multiple nodes:
```bash
python3 tools/train_net.py \
--dist-url="tcp://${MASTER_IP}:${MASTER_PORT}" \
--num-gpus ${HOST_GPU_NUM} \
--num-machines ${HOST_NUM} \
--machine-rank ${INDEX} \
--resume \
--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitt_eva02_vlf_lsj1024_cp_16x4_1080k_mdl.py \
train.output_dir=output/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitt_eva02_vlf_lsj1024_cp_16x4_1080k_mdl_`date +'%Y%m%d_%H'`0000
```


## :luggage: Checkpoints

```
git lfs install
git clone https://huggingface.co/shenyunhang/APE
```

<!-- insert a table -->
<table>
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>name</th>
      <th>Checkpoint</th>
      <th>Config</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>1</th>
      <td>APE-L_A</td>
      <td><a href="https://huggingface.co/shenyunhang/APE/blob/main/configs/LVISCOCOCOCOSTUFF_O365_OID_VG/ape_deta/ape_deta_vitl_eva02_lsj_cp_720k_20230504_002019/model_final.pth">HF link</a></td>
      <td><a href="https://github.com/shenyunhang/APE/blob/main/configs/LVISCOCOCOCOSTUFF_O365_OID_VG/ape_deta/ape_deta_vitl_eva02_lsj1024_cp_720k.py">link</a></td>
    </tr>
    <tr>
      <th>2</th>
      <td>APE-L_B</td>
      <td><a href="https://huggingface.co/shenyunhang/APE/blob/main/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj_cp_1080k_20230702_225418/model_final.pth">HF link</a> 
      <td><a href="https://github.com/shenyunhang/APE/blob/main/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k.py">link</a></td>
    </tr>
    <tr>
      <th>3</th>
      <td>APE-L_C</td>
      <td><a href="https://huggingface.co/shenyunhang/APE/blob/main/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj_cp_1080k_20230702_210950/model_final.pth">HF link</a> 
      <td><a href="https://github.com/shenyunhang/APE/blob/main/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k.py">link</a></td>
    </tr>
    <tr>
      <th>4</th>
      <td>APE-L_D</td>
      <td><a href="https://huggingface.co/shenyunhang/APE/blob/main/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k_mdl_20230829_162438/model_final.pth">HF link</a> 
      <td><a href="https://github.com/shenyunhang/APE/blob/main/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k_mdl.py">link</a></td>
    </tr>
    <tr>
      <th>4</th>
      <td>APE-Ti</td>
      <td><a href="https://huggingface.co/shenyunhang/APE/blob/main/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitt_eva02_vlf_lsj1024_cp_16x4_1080k_mdl_20240203_230000/model_final.pth">HF link</a> 
      <td><a href="https://github.com/shenyunhang/APE/blob/main/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitt_eva02_vlf_lsj1024_cp_16x4_1080k_mdl.py">link</a></td>
    </tr>
  </tbody>
</table>


## :medal_military: Results

<img src=".asset/radar.png" alt="radar" width="100%">


## :black_nib: Citation

If you find our work helpful for your research, please consider citing the following BibTeX entry.   

```bibtex
@inproceedings{APE,
  title={Aligning and Prompting Everything All at Once for Universal Visual Perception},
  author={Shen, Yunhang and Fu, Chaoyou and Chen, Peixian and Zhang, Mengdan and Li, Ke and Sun, Xing and Wu, Yunsheng and Lin, Shaohui and Ji, Rongrong},
  journal={CVPR},
  year={2024}
}
```


================================================
FILE: ape/__init__.py
================================================
from .data import *

# This line will be programatically read/write by setup.py.
# Leave them at the bottom of this file and don't touch them.
__version__ = "0.0"


================================================
FILE: ape/checkpoint/__init__.py
================================================
# -*- coding: utf-8 -*-


from .detection_checkpoint import DetectionCheckpointer
from .detection_checkpoint import FSDPDetectionCheckpointer

__all__ = ["DetectionCheckpointer"]


================================================
FILE: ape/checkpoint/detection_checkpoint.py
================================================
# Copyright (c) Facebook, Inc. and its affiliates.
import logging
import os
import pickle
from typing import IO, Any, Dict, Iterable, List, NamedTuple, Optional, Tuple, cast

import numpy as np
import torch
from torch.distributed.fsdp import FullyShardedDataParallel as FSDP
from torch.distributed.fsdp import StateDictType
from torch.distributed.fsdp import FullStateDictConfig

from detectron2.checkpoint import DetectionCheckpointer as DetectionCheckpointer_d2


class DetectionCheckpointer(DetectionCheckpointer_d2):

    # def __init__(self, skip_key="", **kwargs):
    #     super().__init__(**kwargs)
    #     self.skip_key = skip_key

    def _convert_ndarray_to_tensor(self, state_dict: Dict[str, Any]) -> None:
        """
        In-place convert all numpy arrays in the state_dict to torch tensor.
        Args:
            state_dict (dict): a state-dict to be loaded to the model.
                Will be modified.
        """
        logger = logging.getLogger(__name__)
        # model could be an OrderedDict with _metadata attribute
        # (as returned by Pytorch's state_dict()). We should preserve these
        # properties.
        for k in list(state_dict.keys()):

            # if self.skip_key in k:
            # if "model_language" in k:
            #     state_dict.pop(k)
            #     continue

            v = state_dict[k]
            if not isinstance(v, np.ndarray) and not isinstance(v, torch.Tensor):
                logger.warning("Unsupported type found in checkpoint! {}: {}".format(k, type(v)))
                state_dict.pop(k)
                continue
                raise ValueError("Unsupported type found in checkpoint! {}: {}".format(k, type(v)))
            if not isinstance(v, torch.Tensor):
                state_dict[k] = torch.from_numpy(v)


class FSDPDetectionCheckpointer(DetectionCheckpointer):

    # def __init__(self, skip_key="", **kwargs):
    #     super().__init__(**kwargs)
    #     self.skip_key = skip_key

    def save(self, name: str, **kwargs: Any) -> None:
        """
        Dump model and checkpointables to a file.

        Args:
            name (str): name of the file.
            kwargs (dict): extra arbitrary data to save.
        """
        # if not self.save_dir or not self.save_to_disk:
        #     return

        data = {}

        save_policy = FullStateDictConfig(offload_to_cpu=True, rank0_only=True)
        with FSDP.state_dict_type(
            self.model, StateDictType.FULL_STATE_DICT, save_policy
        ):
            data["model"] = self.model.state_dict()

        if not self.save_dir or not self.save_to_disk:
            return

        # data["model"] = self.model.state_dict()
        for key, obj in self.checkpointables.items():
            data[key] = obj.state_dict()
        data.update(kwargs)

        basename = "{}.pth".format(name)
        save_file = os.path.join(self.save_dir, basename)
        assert os.path.basename(save_file) == basename, basename
        self.logger.info("Saving checkpoint to {}".format(save_file))
        with self.path_manager.open(save_file, "wb") as f:
            # pyre-fixme[22]: The cast is redundant.
            torch.save(data, cast(IO[bytes], f))
        self.tag_last_checkpoint(basename)



================================================
FILE: ape/data/__init__.py
================================================
from . import datasets
from .build_copypaste import (
    build_detection_train_loader_copypaste,
    get_detection_dataset_dicts_copypaste,
)
from .build_multi_dataset import (
    build_detection_train_loader_multi_dataset,
    get_detection_dataset_dicts_multi_dataset,
)
from .build_multi_dataset_copypaste import (
    build_detection_train_loader_multi_dataset_copypaste,
    get_detection_dataset_dicts_multi_dataset_copypaste,
)
from .build import build_detection_test_loader
from .dataset_mapper import DatasetMapper_ape
from .dataset_mapper_copypaste import DatasetMapper_copypaste
from .dataset_mapper_detr_instance import DatasetMapper_detr_instance
from .dataset_mapper_detr_instance_exp import DatasetMapper_detr_instance_exp
from .dataset_mapper_detr_panoptic import DatasetMapper_detr_panoptic
from .dataset_mapper_detr_panoptic_copypaste import DatasetMapper_detr_panoptic_copypaste
from .dataset_mapper_detr_semantic import DatasetMapper_detr_semantic


================================================
FILE: ape/data/build.py
================================================
# Copyright (c) Facebook, Inc. and its affiliates.
import itertools
import logging
import numpy as np
import operator
import pickle
from typing import Any, Callable, Dict, List, Optional, Union
import torch
import torch.utils.data as torchdata
from tabulate import tabulate
from termcolor import colored

from detectron2.config import configurable
from detectron2.structures import BoxMode
from detectron2.utils.comm import get_world_size
from detectron2.utils.env import seed_all_rng
from detectron2.utils.file_io import PathManager
from detectron2.utils.logger import _log_api_usage, log_first_n

from detectron2.data.build import trivial_batch_collator

from detectron2.data.common import AspectRatioGroupedDataset, DatasetFromList, MapDataset, ToIterableDataset
from detectron2.data.dataset_mapper import DatasetMapper
from detectron2.data.detection_utils import check_metadata_consistency
from detectron2.data.samplers import (
    RandomSubsetTrainingSampler,
    RepeatFactorTrainingSampler,
    TrainingSampler,
)

from .samplers import (
    InferenceSampler,
)

"""
This file contains the default logic to build a dataloader for training or testing.
"""

__all__ = [
    "build_detection_test_loader",
]


def _test_loader_from_config(cfg, dataset_name, mapper=None):
    """
    Uses the given `dataset_name` argument (instead of the names in cfg), because the
    standard practice is to evaluate each test set individually (not combining them).
    """
    if isinstance(dataset_name, str):
        dataset_name = [dataset_name]

    dataset = get_detection_dataset_dicts(
        dataset_name,
        filter_empty=False,
        proposal_files=[
            cfg.DATASETS.PROPOSAL_FILES_TEST[list(cfg.DATASETS.TEST).index(x)] for x in dataset_name
        ]
        if cfg.MODEL.LOAD_PROPOSALS
        else None,
    )
    if mapper is None:
        mapper = DatasetMapper(cfg, False)
    return {
        "dataset": dataset,
        "mapper": mapper,
        "num_workers": cfg.DATALOADER.NUM_WORKERS,
        "sampler": InferenceSampler(len(dataset))
        if not isinstance(dataset, torchdata.IterableDataset)
        else None,
    }


@configurable(from_config=_test_loader_from_config)
def build_detection_test_loader(
    dataset: Union[List[Any], torchdata.Dataset],
    *,
    mapper: Callable[[Dict[str, Any]], Any],
    sampler: Optional[torchdata.Sampler] = None,
    batch_size: int = 1,
    num_workers: int = 0,
    collate_fn: Optional[Callable[[List[Any]], Any]] = None,
) -> torchdata.DataLoader:
    """
    Similar to `build_detection_train_loader`, with default batch size = 1,
    and sampler = :class:`InferenceSampler`. This sampler coordinates all workers
    to produce the exact set of all samples.

    Args:
        dataset: a list of dataset dicts,
            or a pytorch dataset (either map-style or iterable). They can be obtained
            by using :func:`DatasetCatalog.get` or :func:`get_detection_dataset_dicts`.
        mapper: a callable which takes a sample (dict) from dataset
           and returns the format to be consumed by the model.
           When using cfg, the default choice is ``DatasetMapper(cfg, is_train=False)``.
        sampler: a sampler that produces
            indices to be applied on ``dataset``. Default to :class:`InferenceSampler`,
            which splits the dataset across all workers. Sampler must be None
            if `dataset` is iterable.
        batch_size: the batch size of the data loader to be created.
            Default to 1 image per worker since this is the standard when reporting
            inference time in papers.
        num_workers: number of parallel data loading workers
        collate_fn: same as the argument of `torch.utils.data.DataLoader`.
            Defaults to do no collation and return a list of data.

    Returns:
        DataLoader: a torch DataLoader, that loads the given detection
        dataset, with test-time transformation and batching.

    Examples:
    ::
        data_loader = build_detection_test_loader(
            DatasetRegistry.get("my_test"),
            mapper=DatasetMapper(...))

        # or, instantiate with a CfgNode:
        data_loader = build_detection_test_loader(cfg, "my_test")
    """
    if isinstance(dataset, list):
        dataset = DatasetFromList(dataset, copy=False)
    if mapper is not None:
        dataset = MapDataset(dataset, mapper)
    if isinstance(dataset, torchdata.IterableDataset):
        assert sampler is None, "sampler must be None if dataset is IterableDataset"
    else:
        if sampler is None:
            sampler = InferenceSampler(len(dataset))
    return torchdata.DataLoader(
        dataset,
        batch_size=batch_size,
        sampler=sampler,
        drop_last=False,
        num_workers=num_workers,
        collate_fn=trivial_batch_collator if collate_fn is None else collate_fn,
    )


================================================
FILE: ape/data/build_copypaste.py
================================================
# Copyright (c) Facebook, Inc. and its affiliates.
import itertools
import logging

import torch.utils.data as torchdata

from detectron2.config import configurable
from detectron2.data.build import (
    build_batch_data_loader,
    filter_images_with_few_keypoints,
    filter_images_with_only_crowd_annotations,
    get_detection_dataset_dicts,
    load_proposals_into_dataset,
    print_instances_class_histogram,
)
from detectron2.data.catalog import DatasetCatalog, MetadataCatalog
from detectron2.data.common import DatasetFromList
from detectron2.data.detection_utils import check_metadata_consistency
from detectron2.data.samplers import (
    RandomSubsetTrainingSampler,
    RepeatFactorTrainingSampler,
    TrainingSampler,
)
from detectron2.utils.logger import _log_api_usage

from .common_copypaste import MapDataset_coppaste
from .dataset_mapper_copypaste import DatasetMapper_copypaste

"""
This file contains the default logic to build a dataloader for training or testing.
"""

__all__ = [
    "build_detection_train_loader_copypaste",
]


def get_detection_dataset_dicts_copypaste(
    names,
    filter_empty=True,
    min_keypoints=0,
    proposal_files=None,
    check_consistency=True,
    copypastes=[True],
):
    """
    Load and prepare dataset dicts for instance detection/segmentation and semantic segmentation.

    Args:
        names (str or list[str]): a dataset name or a list of dataset names
        filter_empty (bool): whether to filter out images without instance annotations
        min_keypoints (int): filter out images with fewer keypoints than
            `min_keypoints`. Set to 0 to do nothing.
        proposal_files (list[str]): if given, a list of object proposal files
            that match each dataset in `names`.
        check_consistency (bool): whether to check if datasets have consistent metadata.

    Returns:
        list[dict]: a list of dicts following the standard dataset dict format.
    """
    if isinstance(names, str):
        names = [names]
    assert len(names), names
    dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in names]
    for dataset_name, dicts in zip(names, dataset_dicts):
        assert len(dicts), "Dataset '{}' is empty!".format(dataset_name)

    for copypaste, dicts in zip(copypastes, dataset_dicts):
        for d in dicts:
            d["copypaste"] = copypaste

    if proposal_files is not None:
        assert len(names) == len(proposal_files)
        # load precomputed proposals from proposal files
        dataset_dicts = [
            load_proposals_into_dataset(dataset_i_dicts, proposal_file)
            for dataset_i_dicts, proposal_file in zip(dataset_dicts, proposal_files)
        ]

    if isinstance(dataset_dicts[0], torchdata.Dataset):
        return torchdata.ConcatDataset(dataset_dicts)

    dataset_dicts = list(itertools.chain.from_iterable(dataset_dicts))

    has_instances = "annotations" in dataset_dicts[0]
    if filter_empty and has_instances:
        dataset_dicts = filter_images_with_only_crowd_annotations(dataset_dicts)
    if min_keypoints > 0 and has_instances:
        dataset_dicts = filter_images_with_few_keypoints(dataset_dicts, min_keypoints)

    if check_consistency and has_instances:
        try:
            class_names = MetadataCatalog.get(names[0]).thing_classes
            check_metadata_consistency("thing_classes", names)
            print_instances_class_histogram(dataset_dicts, class_names)
        except AttributeError:  # class names are not available for this dataset
            pass

    assert len(dataset_dicts), "No valid data found in {}.".format(",".join(names))
    return dataset_dicts


def _train_loader_from_config(cfg, mapper=None, *, dataset=None, sampler=None):
    assert len(cfg.DATASETS.TRAIN) == len(cfg.DATASETS.COPYPASTE.COPYPASTE)

    if dataset is None:
        dataset = get_detection_dataset_dicts_copypaste(
            cfg.DATASETS.TRAIN,
            filter_empty=cfg.DATALOADER.FILTER_EMPTY_ANNOTATIONS,
            min_keypoints=cfg.MODEL.ROI_KEYPOINT_HEAD.MIN_KEYPOINTS_PER_IMAGE
            if cfg.MODEL.KEYPOINT_ON
            else 0,
            proposal_files=cfg.DATASETS.PROPOSAL_FILES_TRAIN if cfg.MODEL.LOAD_PROPOSALS else None,
            copypastes=cfg.DATASETS.COPYPASTE.COPYPASTE,
        )
        _log_api_usage("dataset." + cfg.DATASETS.TRAIN[0])

    if True:
        dataset_bg = get_detection_dataset_dicts(
            cfg.DATASETS.COPYPASTE.BG,
            filter_empty=cfg.DATALOADER.FILTER_EMPTY_ANNOTATIONS,
            min_keypoints=cfg.MODEL.ROI_KEYPOINT_HEAD.MIN_KEYPOINTS_PER_IMAGE
            if cfg.MODEL.KEYPOINT_ON
            else 0,
            proposal_files=cfg.DATASETS.PROPOSAL_FILES_TRAIN if cfg.MODEL.LOAD_PROPOSALS else None,
        )
        _log_api_usage("dataset." + cfg.DATASETS.TRAIN[0])

    if mapper is None:
        mapper = DatasetMapper_copypaste(cfg, True)

    if sampler is None:
        sampler_name = cfg.DATALOADER.SAMPLER_TRAIN
        logger = logging.getLogger(__name__)
        logger.info("Using training sampler {}".format(sampler_name))
        if sampler_name == "TrainingSampler":
            sampler = TrainingSampler(len(dataset))
        elif sampler_name == "RepeatFactorTrainingSampler":
            repeat_factors = RepeatFactorTrainingSampler.repeat_factors_from_category_frequency(
                dataset, cfg.DATALOADER.REPEAT_THRESHOLD
            )
            sampler = RepeatFactorTrainingSampler(repeat_factors)
        elif sampler_name == "RandomSubsetTrainingSampler":
            sampler = RandomSubsetTrainingSampler(len(dataset), cfg.DATALOADER.RANDOM_SUBSET_RATIO)
        else:
            raise ValueError("Unknown training sampler: {}".format(sampler_name))

    if True:
        sampler_name = cfg.DATALOADER.COPYPASTE.SAMPLER_TRAIN
        logger = logging.getLogger(__name__)
        logger.info("Using training sampler {}".format(sampler_name))
        if sampler_name == "TrainingSampler":
            sampler_bg = TrainingSampler(len(dataset_bg))
        elif sampler_name == "RepeatFactorTrainingSampler":
            repeat_factors = RepeatFactorTrainingSampler.repeat_factors_from_category_frequency(
                dataset_bg, cfg.DATALOADER.COPYPASTE.REPEAT_THRESHOLD
            )
            sampler_bg = RepeatFactorTrainingSampler(repeat_factors)
        elif sampler_name == "RandomSubsetTrainingSampler":
            sampler_bg = RandomSubsetTrainingSampler(
                len(dataset_bg), cfg.DATALOADER.COPYPASTE.RANDOM_SUBSET_RATIO
            )
        else:
            raise ValueError("Unknown training sampler: {}".format(sampler_name))

    return {
        "dataset": dataset,
        "dataset_bg": dataset_bg,
        "sampler": sampler,
        "sampler_bg": sampler_bg,
        "mapper": mapper,
        "total_batch_size": cfg.SOLVER.IMS_PER_BATCH,
        "aspect_ratio_grouping": cfg.DATALOADER.ASPECT_RATIO_GROUPING,
        "num_workers": cfg.DATALOADER.NUM_WORKERS,
    }


@configurable(from_config=_train_loader_from_config)
def build_detection_train_loader_copypaste(
    dataset,
    dataset_bg,
    *,
    mapper,
    sampler=None,
    sampler_bg=None,
    total_batch_size,
    aspect_ratio_grouping=True,
    num_workers=0,
    collate_fn=None,
):
    """
    Build a dataloader for object detection with some default features.
    This interface is experimental.

    Args:
        dataset (list or torch.utils.data.Dataset): a list of dataset dicts,
            or a pytorch dataset (either map-style or iterable). It can be obtained
            by using :func:`DatasetCatalog.get` or :func:`get_detection_dataset_dicts`.
        mapper (callable): a callable which takes a sample (dict) from dataset and
            returns the format to be consumed by the model.
            When using cfg, the default choice is ``DatasetMapper(cfg, is_train=True)``.
        sampler (torch.utils.data.sampler.Sampler or None): a sampler that produces
            indices to be applied on ``dataset``.
            If ``dataset`` is map-style, the default sampler is a :class:`TrainingSampler`,
            which coordinates an infinite random shuffle sequence across all workers.
            Sampler must be None if ``dataset`` is iterable.
        total_batch_size (int): total batch size across all workers. Batching
            simply puts data into a list.
        aspect_ratio_grouping (bool): whether to group images with similar
            aspect ratio for efficiency. When enabled, it requires each
            element in dataset be a dict with keys "width" and "height".
        num_workers (int): number of parallel data loading workers
        collate_fn: same as the argument of `torch.utils.data.DataLoader`.
            Defaults to do no collation and return a list of data.
            No collation is OK for small batch size and simple data structures.
            If your batch size is large and each sample contains too many small tensors,
            it's more efficient to collate them in data loader.

    Returns:
        torch.utils.data.DataLoader:
            a dataloader. Each output from it is a ``list[mapped_element]`` of length
            ``total_batch_size / num_workers``, where ``mapped_element`` is produced
            by the ``mapper``.
    """
    if isinstance(dataset_bg, list):
        dataset_bg = DatasetFromList(dataset_bg, copy=False)

    if isinstance(dataset_bg, torchdata.IterableDataset):
        assert sampler_bg is None, "sampler must be None if dataset is IterableDataset"
    else:
        if sampler_bg is None:
            sampler_bg = TrainingSampler(len(dataset))
        assert isinstance(
            sampler_bg, torchdata.Sampler
        ), f"Expect a Sampler but got {type(sampler)}"

    if isinstance(dataset, list):
        dataset = DatasetFromList(dataset, copy=False)
    if mapper is not None:
        dataset = MapDataset_coppaste(dataset, mapper, dataset_bg, sampler_bg)

    if isinstance(dataset, torchdata.IterableDataset):
        assert sampler is None, "sampler must be None if dataset is IterableDataset"
    else:
        if sampler is None:
            sampler = TrainingSampler(len(dataset))
        assert isinstance(sampler, torchdata.Sampler), f"Expect a Sampler but got {type(sampler)}"
    return build_batch_data_loader(
        dataset,
        sampler,
        total_batch_size,
        aspect_ratio_grouping=aspect_ratio_grouping,
        num_workers=num_workers,
        collate_fn=collate_fn,
    )


================================================
FILE: ape/data/build_multi_dataset.py
================================================
# Copyright (c) Facebook, Inc. and its affiliates.
import itertools
import logging
import operator
import time
from collections import defaultdict
from typing import Callable, Optional

import numpy as np
import torch
import torch.utils.data as torchdata
from termcolor import colored
from torch.utils.data.sampler import Sampler

from detectron2.config import configurable
from detectron2.data.build import (
    filter_images_with_few_keypoints,
    filter_images_with_only_crowd_annotations,
    get_detection_dataset_dicts,
    load_proposals_into_dataset,
    trivial_batch_collator,
    worker_init_reset_seed,
)
from detectron2.data.catalog import DatasetCatalog, MetadataCatalog
from detectron2.data.common import DatasetFromList, MapDataset, ToIterableDataset
from detectron2.data.detection_utils import check_metadata_consistency
from detectron2.data.samplers import (
    RandomSubsetTrainingSampler,
    RepeatFactorTrainingSampler,
    TrainingSampler,
)
from detectron2.utils import comm
from detectron2.utils.comm import get_world_size
from detectron2.utils.logger import _log_api_usage, log_first_n
from tabulate import tabulate

from .dataset_mapper import DatasetMapper_ape
from .samplers import MultiDatasetTrainingSampler

"""
This file contains the default logic to build a dataloader for training or testing.
"""

__all__ = [
    "build_detection_train_loader_multi_dataset",
]


def print_instances_class_histogram(dataset_dicts, class_names):
    """
    Args:
        dataset_dicts (list[dict]): list of dataset dicts.
        class_names (list[str]): list of class names (zero-indexed).
    """
    num_classes = len(class_names)
    hist_bins = np.arange(num_classes + 1)
    histogram = np.zeros((num_classes,), dtype=np.int)
    total_num_out_of_class = 0
    for entry in dataset_dicts:
        annos = entry["annotations"]
        classes = np.asarray(
            [x["category_id"] for x in annos if not x.get("iscrowd", 0)], dtype=np.int
        )
        if len(classes):
            assert classes.min() >= 0, f"Got an invalid category_id={classes.min()}"
            # assert (
            #     classes.max() < num_classes
            # ), f"Got an invalid category_id={classes.max()} for a dataset of {num_classes} classes"
        histogram += np.histogram(classes, bins=hist_bins)[0]

        total_num_out_of_class += sum(classes >= num_classes)

    N_COLS = min(6, len(class_names) * 2)

    def short_name(x):
        # make long class names shorter. useful for lvis
        if len(x) > 13:
            return x[:11] + ".."
        return x

    data = list(
        itertools.chain(*[[short_name(class_names[i]), int(v)] for i, v in enumerate(histogram)])
    )
    total_num_instances = sum(data[1::2])
    data.extend([None] * (N_COLS - (len(data) % N_COLS)))
    if num_classes > 1:
        data.extend(["total", total_num_instances])
    if total_num_out_of_class > 0:
        data.extend(["total out", total_num_out_of_class])
    data = itertools.zip_longest(*[data[i::N_COLS] for i in range(N_COLS)])
    table = tabulate(
        data,
        headers=["category", "#instances"] * (N_COLS // 2),
        tablefmt="pipe",
        numalign="left",
        stralign="center",
    )
    log_first_n(
        logging.INFO,
        "Distribution of instances among all {} categories:\n".format(num_classes)
        + colored(table, "cyan"),
        key="message",
    )


def DatasetCatalog_get(dataset_name, reduce_memory, reduce_memory_size):
    import os, psutil

    logger = logging.getLogger(__name__)
    logger.info(
        "Current memory usage: {} GB".format(
            psutil.Process(os.getpid()).memory_info().rss / 1024**3
        )
    )

    dataset_dicts = DatasetCatalog.get(dataset_name)

    # logger.info(
    #     "Current memory usage: {} GB".format(
    #         psutil.Process(os.getpid()).memory_info().rss / 1024**3
    #     )
    # )
    # logger.info("Reducing memory usage...")

    # for d in dataset_dicts:
    #     # LVIS
    #     if "not_exhaustive_category_ids" in d.keys():
    #         del d["not_exhaustive_category_ids"]
    #     if "neg_category_ids" in d.keys():
    #         del d["neg_category_ids"]
    #     if "pos_category_ids" in d.keys():
    #         del d["pos_category_ids"]

    #     if "annotations" not in d.keys():
    #         continue
    #     for anno in d["annotations"]:
    #         if "iscrowd" in anno.keys():
    #             if anno["iscrowd"] == 0:
    #                 del anno["iscrowd"]

    logger.info(
        "Current memory usage: {} GB".format(
            psutil.Process(os.getpid()).memory_info().rss / 1024**3
        )
    )

    if not reduce_memory:
        return dataset_dicts
    if len(dataset_dicts) < reduce_memory_size:
        return dataset_dicts

    logger.info("Reducing memory usage further...")

    for d in dataset_dicts:
        if "annotations" not in d.keys():
            continue

        for anno in d["annotations"]:

            if "bbox" in anno.keys():
                del anno["bbox"]

            if "bbox_mode" in anno.keys():
                del anno["bbox_mode"]

            if "segmentation" in anno.keys():
                del anno["segmentation"]

            if "phrase" in anno.keys():
                del anno["phrase"]

    logger.info(
        "Current memory usage: {} GB".format(
            psutil.Process(os.getpid()).memory_info().rss / 1024**3
        )
    )

    return dataset_dicts


def get_detection_dataset_dicts_multi_dataset(
    names,
    filter_empty=True,
    min_keypoints=0,
    proposal_files=None,
    check_consistency=True,
    filter_emptys=[True],
    dataloader_id=None,
    reduce_memory=False,
    reduce_memory_size=1e6,
):
    """
    Load and prepare dataset dicts for instance detection/segmentation and semantic segmentation.

    Args:
        names (str or list[str]): a dataset name or a list of dataset names
        filter_empty (bool): whether to filter out images without instance annotations
        min_keypoints (int): filter out images with fewer keypoints than
            `min_keypoints`. Set to 0 to do nothing.
        proposal_files (list[str]): if given, a list of object proposal files
            that match each dataset in `names`.
        check_consistency (bool): whether to check if datasets have consistent metadata.

    Returns:
        list[dict]: a list of dicts following the standard dataset dict format.
    """
    if isinstance(names, str):
        names = [names]
    assert len(names), names
    # dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in names]
    dataset_dicts = [
        DatasetCatalog_get(dataset_name, reduce_memory, reduce_memory_size)
        for dataset_name in names
    ]

    if isinstance(dataset_dicts[0], torchdata.Dataset):
        if len(dataset_dicts) > 1:
            # ConcatDataset does not work for iterable style dataset.
            # We could support concat for iterable as well, but it's often
            # not a good idea to concat iterables anyway.
            return torchdata.ConcatDataset(dataset_dicts)
        return dataset_dicts[0]

    for dataset_name, dicts in zip(names, dataset_dicts):
        assert len(dicts), "Dataset '{}' is empty!".format(dataset_name)

    for dataset_id, (dataset_name, dicts) in enumerate(zip(names, dataset_dicts)):
        for d in dicts:
            d["dataset_id"] = dataset_id
            if dataloader_id is not None:
                d["dataloader_id"] = dataloader_id

        has_instances = "annotations" in dicts[0]
        if not check_consistency or not has_instances:
            continue
        try:
            class_names = MetadataCatalog.get(dataset_name).thing_classes
            check_metadata_consistency("thing_classes", [dataset_name])
            print_instances_class_histogram(dicts, class_names)
        except AttributeError:  # class names are not available for this dataset
            pass

    assert proposal_files is None
    if proposal_files is not None:
        assert len(names) == len(proposal_files)
        # load precomputed proposals from proposal files
        dataset_dicts = [
            load_proposals_into_dataset(dataset_i_dicts, proposal_file)
            for dataset_i_dicts, proposal_file in zip(dataset_dicts, proposal_files)
        ]

    dataset_dicts = [
        filter_images_with_only_crowd_annotations(dicts)
        if flag and "annotations" in dicts[0]
        else dicts
        for dicts, flag in zip(dataset_dicts, filter_emptys)
    ]

    dataset_dicts = list(itertools.chain.from_iterable(dataset_dicts))

    has_instances = "annotations" in dataset_dicts[0]
    if filter_empty and has_instances and False:
        dataset_dicts = filter_images_with_only_crowd_annotations(dataset_dicts)
    if min_keypoints > 0 and has_instances:
        dataset_dicts = filter_images_with_few_keypoints(dataset_dicts, min_keypoints)

    if check_consistency and has_instances and False:
        try:
            class_names = MetadataCatalog.get(names[0]).thing_classes
            check_metadata_consistency("thing_classes", names)
            print_instances_class_histogram(dataset_dicts, class_names)
        except AttributeError:  # class names are not available for this dataset
            pass

    assert len(dataset_dicts), "No valid data found in {}.".format(",".join(names))
    return dataset_dicts


def build_batch_data_loader_multi_dataset(
    dataset,
    sampler,
    total_batch_size,
    total_batch_size_list,
    *,
    aspect_ratio_grouping=False,
    num_workers=0,
    collate_fn=None,
    num_datasets=1,
):
    """
    Build a batched dataloader. The main differences from `torch.utils.data.DataLoader` are:
    1. support aspect ratio grouping options
    2. use no "batch collation", because this is common for detection training

    Args:
        dataset (torch.utils.data.Dataset): a pytorch map-style or iterable dataset.
        sampler (torch.utils.data.sampler.Sampler or None): a sampler that produces indices.
            Must be provided iff. ``dataset`` is a map-style dataset.
        total_batch_size, aspect_ratio_grouping, num_workers, collate_fn: see
            :func:`build_detection_train_loader`.

    Returns:
        iterable[list]. Length of each list is the batch size of the current
            GPU. Each element in the list comes from the dataset.
    """
    world_size = get_world_size()
    assert (
        total_batch_size > 0 and total_batch_size % world_size == 0
    ), "Total batch size ({}) must be divisible by the number of gpus ({}).".format(
        total_batch_size, world_size
    )
    batch_size = total_batch_size // world_size

    if len(total_batch_size_list) < num_datasets:
        total_batch_size_list += [
            total_batch_size,
        ] * (num_datasets - len(total_batch_size_list))
    assert all([x > 0 for x in total_batch_size_list]) and all(
        [x % world_size == 0 for x in total_batch_size_list]
    ), "Total batch size ({}) must be divisible by the number of gpus ({}).".format(
        total_batch_size_list, world_size
    )
    batch_size = [x // world_size for x in total_batch_size_list]

    if isinstance(dataset, torchdata.IterableDataset):
        assert sampler is None, "sampler must be None if dataset is IterableDataset"
    else:
        dataset = ToIterableDataset(dataset, sampler)

    assert aspect_ratio_grouping
    if aspect_ratio_grouping:
        data_loader = torchdata.DataLoader(
            dataset,
            num_workers=num_workers,
            collate_fn=operator.itemgetter(0),  # don't batch, but yield individual elements
            worker_init_fn=worker_init_reset_seed,
        )  # yield individual mapped dict
        # data_loader = AspectRatioGroupedDataset(data_loader, batch_size)
        data_loader = MultiDatasetAspectRatioGroupedDataset(
            data_loader, batch_size, num_datasets=num_datasets
        )
        if collate_fn is None:
            return data_loader
        return MapDataset(data_loader, collate_fn)
    else:
        return torchdata.DataLoader(
            dataset,
            batch_size=batch_size,
            drop_last=True,
            num_workers=num_workers,
            collate_fn=trivial_batch_collator if collate_fn is None else collate_fn,
            worker_init_fn=worker_init_reset_seed,
        )


def _train_loader_from_config(cfg, mapper=None, *, dataset=None, sampler=None):
    assert len(cfg.DATASETS.TRAIN) == len(cfg.MULTI_DATASET.NAMES)
    assert len(cfg.DATASETS.TRAIN) == len(cfg.MULTI_DATASET.ENTITIES)
    assert len(cfg.DATASETS.TRAIN) == len(cfg.MULTI_DATASET.NUM_CLASSES)
    assert len(cfg.DATASETS.TRAIN) == len(cfg.MULTI_DATASET.RATIOS)
    assert len(cfg.DATASETS.TRAIN) == len(cfg.MULTI_DATASET.USE_CAS)
    assert len(cfg.DATASETS.TRAIN) == len(cfg.MULTI_DATASET.USE_RFS)
    assert len(cfg.DATASETS.TRAIN) == len(cfg.MULTI_DATASET.FILTER_EMPTY_ANNOTATIONS)
    # assert len(cfg.DATASETS.TRAIN) == len(cfg.SOLVER.IMS_PER_BATCH_LIST)
    # assert len(cfg.DATASETS.TRAIN) == len(cfg.SOLVER.AUGMENT_TYPE)

    seed1 = comm.shared_random_seed()
    seed2 = comm.shared_random_seed()
    logger = logging.getLogger(__name__)
    logger.info("rank {} seed1 {} seed2 {}".format(comm.get_local_rank(), seed1, seed2))

    # Hard-coded 2 sequent group and 1200s time wait.
    wait_group = 2
    wait_time = cfg.DATALOADER.GROUP_WAIT
    wait = comm.get_local_rank() % wait_group * wait_time
    logger.info("rank {} _train_loader_from_config sleep {}".format(comm.get_local_rank(), wait))
    time.sleep(wait)

    if dataset is None:
        dataset = get_detection_dataset_dicts_multi_dataset(
            cfg.DATASETS.TRAIN,
            filter_empty=cfg.DATALOADER.FILTER_EMPTY_ANNOTATIONS,
            min_keypoints=cfg.MODEL.ROI_KEYPOINT_HEAD.MIN_KEYPOINTS_PER_IMAGE
            if cfg.MODEL.KEYPOINT_ON
            else 0,
            proposal_files=cfg.DATASETS.PROPOSAL_FILES_TRAIN if cfg.MODEL.LOAD_PROPOSALS else None,
            filter_emptys=cfg.MULTI_DATASET.FILTER_EMPTY_ANNOTATIONS,
        )
        _log_api_usage("dataset." + cfg.DATASETS.TRAIN[0])

    if mapper is None:
        mapper = DatasetMapper_ape(cfg, True)

    if sampler is None:
        sampler_name = cfg.DATALOADER.SAMPLER_TRAIN
        logger = logging.getLogger(__name__)
        if isinstance(dataset, torchdata.IterableDataset):
            logger.info("Not using any sampler since the dataset is IterableDataset.")
            sampler = None
        else:
            logger.info("Using training sampler {}".format(sampler_name))
            if sampler_name == "TrainingSampler":
                sampler = TrainingSampler(len(dataset), seed=seed1)
            elif sampler_name == "RepeatFactorTrainingSampler":
                repeat_factors = RepeatFactorTrainingSampler.repeat_factors_from_category_frequency(
                    dataset, cfg.DATALOADER.REPEAT_THRESHOLD
                )
                sampler = RepeatFactorTrainingSampler(repeat_factors, seed=seed1)
            elif sampler_name == "RandomSubsetTrainingSampler":
                sampler = RandomSubsetTrainingSampler(
                    len(dataset),
                    cfg.DATALOADER.RANDOM_SUBSET_RATIO,
                    seed_shuffle=seed1,
                    seed_subset=seed2,
                )
            elif sampler_name == "MultiDatasetSampler":
                raise ValueError("Despreted training sampler: {}".format(sampler_name))
                sizes = [0 for _ in range(len(cfg.DATASETS.TRAIN))]
                for d in dataset:
                    sizes[d["dataset_id"]] += 1
                sampler = MultiDatasetSampler(cfg, dataset, sizes, seed=seed1)
            elif sampler_name == "MultiDatasetTrainingSampler":
                # sampler = MultiDatasetTrainingSampler(cfg, dataset, seed=seed1)
                repeat_factors = MultiDatasetTrainingSampler.get_repeat_factors(
                    dataset,
                    len(cfg.DATASETS.TRAIN),
                    cfg.MULTI_DATASET.RATIOS,
                    cfg.MULTI_DATASET.USE_RFS,
                    cfg.MULTI_DATASET.USE_CAS,
                    cfg.MULTI_DATASET.REPEAT_THRESHOLD,
                    cfg.MULTI_DATASET.CAS_LAMBDA,
                )
                sampler = MultiDatasetTrainingSampler(repeat_factors, seed=seed1)
            else:
                raise ValueError("Unknown training sampler: {}".format(sampler_name))

    return {
        "dataset": dataset,
        "sampler": sampler,
        "mapper": mapper,
        "total_batch_size": cfg.SOLVER.IMS_PER_BATCH,
        "total_batch_size_list": cfg.SOLVER.IMS_PER_BATCH_LIST,
        "aspect_ratio_grouping": cfg.DATALOADER.ASPECT_RATIO_GROUPING,
        "num_workers": cfg.DATALOADER.NUM_WORKERS,
        "num_datasets": len(cfg.DATASETS.TRAIN),
    }


@configurable(from_config=_train_loader_from_config)
def build_detection_train_loader_multi_dataset(
    dataset,
    *,
    mapper,
    sampler=None,
    total_batch_size,
    total_batch_size_list,
    aspect_ratio_grouping=True,
    num_workers=0,
    collate_fn=None,
    num_datasets=1,
):
    """
    Build a dataloader for object detection with some default features.

    Args:
        dataset (list or torch.utils.data.Dataset): a list of dataset dicts,
            or a pytorch dataset (either map-style or iterable). It can be obtained
            by using :func:`DatasetCatalog.get` or :func:`get_detection_dataset_dicts`.
        mapper (callable): a callable which takes a sample (dict) from dataset and
            returns the format to be consumed by the model.
            When using cfg, the default choice is ``DatasetMapper(cfg, is_train=True)``.
        sampler (torch.utils.data.sampler.Sampler or None): a sampler that produces
            indices to be applied on ``dataset``.
            If ``dataset`` is map-style, the default sampler is a :class:`TrainingSampler`,
            which coordinates an infinite random shuffle sequence across all workers.
            Sampler must be None if ``dataset`` is iterable.
        total_batch_size (int): total batch size across all workers.
        aspect_ratio_grouping (bool): whether to group images with similar
            aspect ratio for efficiency. When enabled, it requires each
            element in dataset be a dict with keys "width" and "height".
        num_workers (int): number of parallel data loading workers
        collate_fn: a function that determines how to do batching, same as the argument of
            `torch.utils.data.DataLoader`. Defaults to do no collation and return a list of
            data. No collation is OK for small batch size and simple data structures.
            If your batch size is large and each sample contains too many small tensors,
            it's more efficient to collate them in data loader.

    Returns:
        torch.utils.data.DataLoader:
            a dataloader. Each output from it is a ``list[mapped_element]`` of length
            ``total_batch_size / num_workers``, where ``mapped_element`` is produced
            by the ``mapper``.
    """
    # wait = round(comm.get_local_rank() * 1.0 * len(dataset) / 60000)
    # logger = logging.getLogger(__name__)
    # logger.info("get_detection_dataset_dicts_multi_dataset sleep {}".format(wait))
    # time.sleep(wait)

    if isinstance(sampler, Callable):
        sampler = sampler(dataset)

    if isinstance(dataset, list):
        dataset = DatasetFromList(dataset, copy=False)
    if mapper is not None:
        dataset = MapDataset(dataset, mapper)

    if isinstance(dataset, torchdata.IterableDataset):
        assert sampler is None, "sampler must be None if dataset is IterableDataset"
    else:
        if sampler is None:
            sampler = TrainingSampler(len(dataset))
        assert isinstance(sampler, torchdata.Sampler), f"Expect a Sampler but got {type(sampler)}"
    return build_batch_data_loader_multi_dataset(
        dataset,
        sampler,
        total_batch_size,
        total_batch_size_list,
        aspect_ratio_grouping=aspect_ratio_grouping,
        num_workers=num_workers,
        collate_fn=collate_fn,
        num_datasets=num_datasets,
    )


class MultiDatasetSampler(Sampler):
    def __init__(self, cfg, dataset_dicts, sizes, seed: Optional[int] = None):
        self.sizes = sizes
        self.sample_epoch_size = cfg.MULTI_DATASET.SAMPLE_EPOCH_SIZE
        assert self.sample_epoch_size % cfg.SOLVER.IMS_PER_BATCH == 0, (
            self.sample_epoch_size % cfg.SOLVER.IMS_PER_BATCH == 0
        )
        if seed is None:
            seed = comm.shared_random_seed()
        self._seed = int(seed)

        self._rank = comm.get_rank()
        self._world_size = comm.get_world_size()

        dataset_ratio = cfg.MULTI_DATASET.RATIOS
        assert len(dataset_ratio) == len(
            sizes
        ), "length of dataset ratio {} should be equal to number if dataset {}".format(
            len(dataset_ratio), len(sizes)
        )
        dataset_weight = [
            torch.ones(s) * max(sizes) / s * r / sum(dataset_ratio)
            for i, (r, s) in enumerate(zip(dataset_ratio, sizes))
        ]
        st = 0
        cas_factors = []
        for i, s in enumerate(sizes):
            if cfg.MULTI_DATASET.USE_CAS[i]:
                cas_factor = self._get_class_balance_factor_per_dataset(
                    dataset_dicts[st : st + s], l=cfg.MULTI_DATASET.CAS_LAMBDA
                )
                cas_factor = cas_factor * (s / cas_factor.sum())
            else:
                cas_factor = torch.ones(s)
            cas_factors.append(cas_factor)
            st = st + s
        cas_factors = torch.cat(cas_factors)
        dataset_weight = torch.cat(dataset_weight)
        self.weights = dataset_weight * cas_factors

    def __iter__(self):
        start = self._rank
        yield from itertools.islice(self._infinite_indices(), start, None, self._world_size)

    def _infinite_indices(self):
        g = torch.Generator()
        g.manual_seed(self._seed)
        while True:
            ids = torch.multinomial(
                self.weights, self.sample_epoch_size, generator=g, replacement=True
            )
            yield from ids

    def _get_class_balance_factor_per_dataset(self, dataset_dicts, l=1.0):
        ret = []
        category_freq = defaultdict(int)
        for dataset_dict in dataset_dicts:  # For each image (without repeats)
            cat_ids = {ann["category_id"] for ann in dataset_dict["annotations"]}
            for cat_id in cat_ids:
                category_freq[cat_id] += 1
        for dataset_dict in dataset_dicts:
            cat_ids = {ann["category_id"] for ann in dataset_dict["annotations"]}
            ret.append(sum([1.0 / (category_freq[cat_id] ** l) for cat_id in cat_ids]))
        return torch.tensor(ret).float()


# class MultiDatasetTrainingSampler(Sampler):
#     def __init__(self, cfg, dataset_dicts, *, shuffle=True, seed=None):
#         sizes = [0 for _ in range(len(cfg.DATASETS.TRAIN))]
#         for d in dataset_dicts:
#             sizes[d["dataset_id"]] += 1

#         dataset_ratio = cfg.MULTI_DATASET.RATIOS
#         assert len(dataset_ratio) == len(
#             sizes
#         ), "length of dataset ratio {} should be equal to number if dataset {}".format(
#             len(dataset_ratio), len(sizes)
#         )
#         dataset_weight = [
#             torch.ones(s) * max(sizes) / s * r for i, (r, s) in enumerate(zip(dataset_ratio, sizes))
#         ]

#         logger = logging.getLogger(__name__)
#         logger.info(
#             "Training sampler dataset weight: {}".format(
#                 str([max(sizes) / s * r for i, (r, s) in enumerate(zip(dataset_ratio, sizes))])
#             )
#         )

#         st = 0
#         repeat_factors = []
#         for i, s in enumerate(sizes):
#             assert cfg.MULTI_DATASET.USE_RFS[i] * cfg.MULTI_DATASET.USE_CAS[i] == 0
#             if cfg.MULTI_DATASET.USE_RFS[i]:
#                 repeat_factor = RepeatFactorTrainingSampler.repeat_factors_from_category_frequency(
#                     dataset_dicts[st : st + s], cfg.MULTI_DATASET.REPEAT_THRESHOLD
#                 )
#             elif cfg.MULTI_DATASET.USE_CAS[i]:
#                 repeat_factor = MultiDatasetTrainingSampler.get_class_balance_factor_per_dataset(
#                     dataset_dicts[st : st + s], l=cfg.MULTI_DATASET.CAS_LAMBDA
#                 )
#                 repeat_factor = repeat_factor * (s / repeat_factor.sum())
#             else:
#                 repeat_factor = torch.ones(s)
#             repeat_factors.append(repeat_factor)
#             st = st + s
#         repeat_factors = torch.cat(repeat_factors)
#         dataset_weight = torch.cat(dataset_weight)
#         repeat_factors = dataset_weight * repeat_factors

#         self._shuffle = shuffle
#         if seed is None:
#             seed = comm.shared_random_seed()
#         self._seed = int(seed)

#         self._rank = comm.get_rank()
#         self._world_size = comm.get_world_size()

#         # Split into whole number (_int_part) and fractional (_frac_part) parts.
#         self._int_part = torch.trunc(repeat_factors)
#         self._frac_part = repeat_factors - self._int_part

#     @staticmethod
#     def get_class_balance_factor_per_dataset(dataset_dicts, l=1.0):
#         rep_factors = []
#         category_freq = defaultdict(int)
#         for dataset_dict in dataset_dicts:  # For each image (without repeats)
#             cat_ids = {ann["category_id"] for ann in dataset_dict["annotations"]}
#             for cat_id in cat_ids:
#                 category_freq[cat_id] += 1
#         for dataset_dict in dataset_dicts:
#             cat_ids = {ann["category_id"] for ann in dataset_dict["annotations"]}
#             rep_factor = sum([1.0 / (category_freq[cat_id] ** l) for cat_id in cat_ids])
#             rep_factors.append(rep_factor)

#         return torch.tensor(rep_factors, dtype=torch.float32)

#     def _get_epoch_indices(self, generator):
#         """
#         Create a list of dataset indices (with repeats) to use for one epoch.

#         Args:
#             generator (torch.Generator): pseudo random number generator used for
#                 stochastic rounding.

#         Returns:
#             torch.Tensor: list of dataset indices to use in one epoch. Each index
#                 is repeated based on its calculated repeat factor.
#         """
#         # Since repeat factors are fractional, we use stochastic rounding so
#         # that the target repeat factor is achieved in expectation over the
#         # course of training
#         rands = torch.rand(len(self._frac_part), generator=generator)
#         rep_factors = self._int_part + (rands < self._frac_part).float()
#         # Construct a list of indices in which we repeat images as specified
#         indices = []
#         for dataset_index, rep_factor in enumerate(rep_factors):
#             indices.extend([dataset_index] * int(rep_factor.item()))
#         return torch.tensor(indices, dtype=torch.int64)

#     def __iter__(self):
#         start = self._rank
#         yield from itertools.islice(self._infinite_indices(), start, None, self._world_size)

#     def _infinite_indices(self):
#         g = torch.Generator()
#         g.manual_seed(self._seed)
#         while True:
#             # Sample indices with repeats determined by stochastic rounding; each
#             # "epoch" may have a slightly different size due to the rounding.
#             indices = self._get_epoch_indices(g)
#             if self._shuffle:
#                 randperm = torch.randperm(len(indices), generator=g)
#                 yield from indices[randperm].tolist()
#             else:
#                 yield from indices.tolist()


class MultiDatasetAspectRatioGroupedDataset(torch.utils.data.IterableDataset):
    """
    Batch data that have similar aspect ratio together.
    In this implementation, images whose aspect ratio < (or >) 1 will
    be batched together.
    This improves training speed because the images then need less padding
    to form a batch.

    It assumes the underlying dataset produces dicts with "width" and "height" keys.
    It will then produce a list of original dicts with length = batch_size,
    all with similar aspect ratios.
    """

    def __init__(self, dataset, batch_size, num_datasets):
        """
        Args:
            dataset: an iterable. Each element must be a dict with keys
                "width" and "height", which will be used to batch data.
            batch_size (int):
        """
        self.dataset = dataset
        self.batch_size = batch_size
        self._buckets = [[] for _ in range(2 * num_datasets)]
        # Hard-coded two aspect ratio groups: w > h and w < h.
        # Can add support for more aspect ratio groups, but doesn't seem useful

    def __iter__(self):
        for d in self.dataset:
            w, h = d["width"], d["height"]
            bucket_id = 0 if w > h else 1
            bucket_id = d["dataset_id"] * 2 + bucket_id
            bucket = self._buckets[bucket_id]
            bucket.append(d)
            if len(bucket) == self.batch_size[d["dataset_id"]]:
                data = bucket[:]
                # Clear bucket first, because code after yield is not
                # guaranteed to execute
                del bucket[:]
                yield data


================================================
FILE: ape/data/build_multi_dataset_copypaste.py
================================================
# Copyright (c) Facebook, Inc. and its affiliates.
import itertools
import logging
import operator
import time
from collections import defaultdict
from typing import Callable, Optional

import numpy as np
import torch
import torch.utils.data as torchdata
from termcolor import colored
from torch.utils.data.sampler import Sampler

from detectron2.config import configurable
from detectron2.data.build import (
    filter_images_with_few_keypoints,
    filter_images_with_only_crowd_annotations,
    get_detection_dataset_dicts,
    load_proposals_into_dataset,
    trivial_batch_collator,
    worker_init_reset_seed,
)
from detectron2.data.catalog import DatasetCatalog, MetadataCatalog
from detectron2.data.common import DatasetFromList, MapDataset, ToIterableDataset
from detectron2.data.detection_utils import check_metadata_consistency
from detectron2.data.samplers import (
    RandomSubsetTrainingSampler,
    RepeatFactorTrainingSampler,
    TrainingSampler,
)
from detectron2.utils import comm
from detectron2.utils.comm import get_world_size
from detectron2.utils.logger import _log_api_usage, log_first_n
from tabulate import tabulate

from .common_copypaste import MapDataset_coppaste
from .dataset_mapper_copypaste import DatasetMapper_copypaste
from .samplers import MultiDatasetTrainingSampler

"""
This file contains the default logic to build a dataloader for training or testing.
"""

__all__ = [
    "build_detection_train_loader_multi_dataset_copypaste",
]


def print_instances_class_histogram(dataset_dicts, class_names):
    """
    Args:
        dataset_dicts (list[dict]): list of dataset dicts.
        class_names (list[str]): list of class names (zero-indexed).
    """
    num_classes = len(class_names)
    hist_bins = np.arange(num_classes + 1)
    histogram = np.zeros((num_classes,), dtype=np.int)
    total_num_out_of_class = 0
    for entry in dataset_dicts:
        annos = entry["annotations"]
        classes = np.asarray(
            [x["category_id"] for x in annos if not x.get("iscrowd", 0)], dtype=np.int
        )
        if len(classes):
            assert classes.min() >= 0, f"Got an invalid category_id={classes.min()}"
            # assert (
            #     classes.max() < num_classes
            # ), f"Got an invalid category_id={classes.max()} for a dataset of {num_classes} classes"
        histogram += np.histogram(classes, bins=hist_bins)[0]

        total_num_out_of_class += sum(classes >= num_classes)

    N_COLS = min(6, len(class_names) * 2)

    def short_name(x):
        # make long class names shorter. useful for lvis
        if len(x) > 13:
            return x[:11] + ".."
        return x

    data = list(
        itertools.chain(*[[short_name(class_names[i]), int(v)] for i, v in enumerate(histogram)])
    )
    total_num_instances = sum(data[1::2])
    data.extend([None] * (N_COLS - (len(data) % N_COLS)))
    if num_classes > 1:
        data.extend(["total", total_num_instances])
    if total_num_out_of_class > 0:
        data.extend(["total out", total_num_out_of_class])
    data = itertools.zip_longest(*[data[i::N_COLS] for i in range(N_COLS)])
    table = tabulate(
        data,
        headers=["category", "#instances"] * (N_COLS // 2),
        tablefmt="pipe",
        numalign="left",
        stralign="center",
    )
    log_first_n(
        logging.INFO,
        "Distribution of instances among all {} categories:\n".format(num_classes)
        + colored(table, "cyan"),
        key="message",
    )


def DatasetCatalog_get(dataset_name, reduce_memory, reduce_memory_size):
    import os, psutil

    logger = logging.getLogger(__name__)
    logger.info(
        "Current memory usage: {} GB".format(
            psutil.Process(os.getpid()).memory_info().rss / 1024**3
        )
    )

    dataset_dicts = DatasetCatalog.get(dataset_name)

    # logger.info(
    #     "Current memory usage: {} GB".format(
    #         psutil.Process(os.getpid()).memory_info().rss / 1024**3
    #     )
    # )
    # logger.info("Reducing memory usage...")

    # for d in dataset_dicts:
    #     # LVIS
    #     if "not_exhaustive_category_ids" in d.keys():
    #         del d["not_exhaustive_category_ids"]
    #     if "neg_category_ids" in d.keys():
    #         del d["neg_category_ids"]
    #     if "pos_category_ids" in d.keys():
    #         del d["pos_category_ids"]

    #     if "annotations" not in d.keys():
    #         continue
    #     for anno in d["annotations"]:
    #         if "iscrowd" in anno.keys():
    #             if anno["iscrowd"] == 0:
    #                 del anno["iscrowd"]

    logger.info(
        "Current memory usage: {} GB".format(
            psutil.Process(os.getpid()).memory_info().rss / 1024**3
        )
    )

    if not reduce_memory:
        return dataset_dicts
    if len(dataset_dicts) < reduce_memory_size:
        return dataset_dicts

    logger.info("Reducing memory usage further...")

    for d in dataset_dicts:
        if "annotations" not in d.keys():
            continue

        for anno in d["annotations"]:

            if "bbox" in anno.keys():
                del anno["bbox"]

            if "bbox_mode" in anno.keys():
                del anno["bbox_mode"]

            if "segmentation" in anno.keys():
                del anno["segmentation"]

            if "phrase" in anno.keys():
                del anno["phrase"]

    logger.info(
        "Current memory usage: {} GB".format(
            psutil.Process(os.getpid()).memory_info().rss / 1024**3
        )
    )

    return dataset_dicts


def get_detection_dataset_dicts_multi_dataset_copypaste(
    names,
    filter_empty=True,
    min_keypoints=0,
    proposal_files=None,
    check_consistency=True,
    filter_emptys=[True],
    copypastes=[True],
    dataloader_id=None,
    reduce_memory=False,
    reduce_memory_size=1e6,
):
    """
    Load and prepare dataset dicts for instance detection/segmentation and semantic segmentation.

    Args:
        names (str or list[str]): a dataset name or a list of dataset names
        filter_empty (bool): whether to filter out images without instance annotations
        min_keypoints (int): filter out images with fewer keypoints than
            `min_keypoints`. Set to 0 to do nothing.
        proposal_files (list[str]): if given, a list of object proposal files
            that match each dataset in `names`.
        check_consistency (bool): whether to check if datasets have consistent metadata.

    Returns:
        list[dict]: a list of dicts following the standard dataset dict format.
    """
    if isinstance(names, str):
        names = [names]
    assert len(names), names
    # dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in names]
    dataset_dicts = [
        DatasetCatalog_get(dataset_name, reduce_memory, reduce_memory_size)
        for dataset_name in names
    ]

    if isinstance(dataset_dicts[0], torchdata.Dataset):
        if len(dataset_dicts) > 1:
            # ConcatDataset does not work for iterable style dataset.
            # We could support concat for iterable as well, but it's often
            # not a good idea to concat iterables anyway.
            return torchdata.ConcatDataset(dataset_dicts)
        return dataset_dicts[0]

    for dataset_name, dicts in zip(names, dataset_dicts):
        assert len(dicts), "Dataset '{}' is empty!".format(dataset_name)

    for dataset_id, (dataset_name, copypaste, dicts) in enumerate(
        zip(names, copypastes, dataset_dicts)
    ):
        for d in dicts:
            d["dataset_id"] = dataset_id
            d["copypaste"] = copypaste
            if dataloader_id is not None:
                d["dataloader_id"] = dataloader_id

        has_instances = "annotations" in dicts[0]
        if not check_consistency or not has_instances:
            continue
        try:
            class_names = MetadataCatalog.get(dataset_name).thing_classes
            check_metadata_consistency("thing_classes", [dataset_name])
            print_instances_class_histogram(dicts, class_names)
        except AttributeError:  # class names are not available for this dataset
            pass

    assert proposal_files is None
    if proposal_files is not None:
        assert len(names) == len(proposal_files)
        # load precomputed proposals from proposal files
        dataset_dicts = [
            load_proposals_into_dataset(dataset_i_dicts, proposal_file)
            for dataset_i_dicts, proposal_file in zip(dataset_dicts, proposal_files)
        ]

    dataset_dicts = [
        filter_images_with_only_crowd_annotations(dicts)
        if flag and "annotations" in dicts[0]
        else dicts
        for dicts, flag in zip(dataset_dicts, filter_emptys)
    ]

    dataset_dicts = list(itertools.chain.from_iterable(dataset_dicts))

    has_instances = "annotations" in dataset_dicts[0]
    if filter_empty and has_instances and False:
        dataset_dicts = filter_images_with_only_crowd_annotations(dataset_dicts)
    if min_keypoints > 0 and has_instances:
        dataset_dicts = filter_images_with_few_keypoints(dataset_dicts, min_keypoints)

    if check_consistency and has_instances and False:
        try:
            class_names = MetadataCatalog.get(names[0]).thing_classes
            check_metadata_consistency("thing_classes", names)
            print_instances_class_histogram(dataset_dicts, class_names)
        except AttributeError:  # class names are not available for this dataset
            pass

    assert len(dataset_dicts), "No valid data found in {}.".format(",".join(names))
    return dataset_dicts


def build_batch_data_loader_multi_dataset(
    dataset,
    sampler,
    total_batch_size,
    total_batch_size_list,
    *,
    aspect_ratio_grouping=False,
    num_workers=0,
    collate_fn=None,
    num_datasets=1,
):
    """
    Build a batched dataloader. The main differences from `torch.utils.data.DataLoader` are:
    1. support aspect ratio grouping options
    2. use no "batch collation", because this is common for detection training

    Args:
        dataset (torch.utils.data.Dataset): a pytorch map-style or iterable dataset.
        sampler (torch.utils.data.sampler.Sampler or None): a sampler that produces indices.
            Must be provided iff. ``dataset`` is a map-style dataset.
        total_batch_size, aspect_ratio_grouping, num_workers, collate_fn: see
            :func:`build_detection_train_loader`.

    Returns:
        iterable[list]. Length of each list is the batch size of the current
            GPU. Each element in the list comes from the dataset.
    """
    world_size = get_world_size()
    assert (
        total_batch_size > 0 and total_batch_size % world_size == 0
    ), "Total batch size ({}) must be divisible by the number of gpus ({}).".format(
        total_batch_size, world_size
    )
    batch_size = total_batch_size // world_size

    if len(total_batch_size_list) < num_datasets:
        total_batch_size_list += [
            total_batch_size,
        ] * (num_datasets - len(total_batch_size_list))
    assert all([x > 0 for x in total_batch_size_list]) and all(
        [x % world_size == 0 for x in total_batch_size_list]
    ), "Total batch size ({}) must be divisible by the number of gpus ({}).".format(
        total_batch_size_list, world_size
    )
    batch_size = [x // world_size for x in total_batch_size_list]

    if isinstance(dataset, torchdata.IterableDataset):
        assert sampler is None, "sampler must be None if dataset is IterableDataset"
    else:
        dataset = ToIterableDataset(dataset, sampler)

    assert aspect_ratio_grouping
    if aspect_ratio_grouping:
        data_loader = torchdata.DataLoader(
            dataset,
            num_workers=num_workers,
            collate_fn=operator.itemgetter(0),  # don't batch, but yield individual elements
            worker_init_fn=worker_init_reset_seed,
        )  # yield individual mapped dict
        # data_loader = AspectRatioGroupedDataset(data_loader, batch_size)
        data_loader = MultiDatasetAspectRatioGroupedDataset(
            data_loader, batch_size, num_datasets=num_datasets
        )
        if collate_fn is None:
            return data_loader
        return MapDataset(data_loader, collate_fn)
    else:
        return torchdata.DataLoader(
            dataset,
            batch_size=batch_size,
            drop_last=True,
            num_workers=num_workers,
            collate_fn=trivial_batch_collator if collate_fn is None else collate_fn,
            worker_init_fn=worker_init_reset_seed,
        )


def _train_loader_from_config(cfg, mapper=None, *, dataset=None, sampler=None):
    assert len(cfg.DATASETS.TRAIN) == len(cfg.MULTI_DATASET.NAMES)
    assert len(cfg.DATASETS.TRAIN) == len(cfg.MULTI_DATASET.ENTITIES)
    assert len(cfg.DATASETS.TRAIN) == len(cfg.MULTI_DATASET.NUM_CLASSES)
    assert len(cfg.DATASETS.TRAIN) == len(cfg.MULTI_DATASET.RATIOS)
    assert len(cfg.DATASETS.TRAIN) == len(cfg.MULTI_DATASET.USE_CAS)
    assert len(cfg.DATASETS.TRAIN) == len(cfg.MULTI_DATASET.USE_RFS)
    assert len(cfg.DATASETS.TRAIN) == len(cfg.MULTI_DATASET.FILTER_EMPTY_ANNOTATIONS)
    # assert len(cfg.DATASETS.TRAIN) == len(cfg.SOLVER.IMS_PER_BATCH_LIST)
    # assert len(cfg.DATASETS.TRAIN) == len(cfg.SOLVER.AUGMENT_TYPE)
    assert len(cfg.DATASETS.TRAIN) == len(cfg.DATASETS.COPYPASTE.COPYPASTE)

    seed1 = comm.shared_random_seed()
    seed2 = comm.shared_random_seed()
    seed3 = comm.shared_random_seed()
    seed4 = comm.shared_random_seed()
    logger = logging.getLogger(__name__)
    logger.info("rank {} seed1 {} seed2 {}".format(comm.get_local_rank(), seed1, seed2))
    logger.info("rank {} seed3 {} seed4 {}".format(comm.get_local_rank(), seed3, seed4))

    # Hard-coded 2 sequent group and 1200s time wait.
    wait_group = 2
    wait_time = cfg.DATALOADER.GROUP_WAIT
    wait = comm.get_local_rank() % wait_group * wait_time
    logger.info("rank {} _train_loader_from_config sleep {}".format(comm.get_local_rank(), wait))
    time.sleep(wait)

    if dataset is None:
        dataset = get_detection_dataset_dicts_multi_dataset_copypaste(
            cfg.DATASETS.TRAIN,
            filter_empty=cfg.DATALOADER.FILTER_EMPTY_ANNOTATIONS,
            min_keypoints=cfg.MODEL.ROI_KEYPOINT_HEAD.MIN_KEYPOINTS_PER_IMAGE
            if cfg.MODEL.KEYPOINT_ON
            else 0,
            proposal_files=cfg.DATASETS.PROPOSAL_FILES_TRAIN if cfg.MODEL.LOAD_PROPOSALS else None,
            filter_emptys=cfg.MULTI_DATASET.FILTER_EMPTY_ANNOTATIONS,
            copypastes=cfg.DATASETS.COPYPASTE.COPYPASTE,
        )
        _log_api_usage("dataset." + cfg.DATASETS.TRAIN[0])

    if True:
        dataset_bg = get_detection_dataset_dicts(
            cfg.DATASETS.COPYPASTE.BG,
            filter_empty=cfg.DATALOADER.FILTER_EMPTY_ANNOTATIONS,
            min_keypoints=cfg.MODEL.ROI_KEYPOINT_HEAD.MIN_KEYPOINTS_PER_IMAGE
            if cfg.MODEL.KEYPOINT_ON
            else 0,
            proposal_files=cfg.DATASETS.PROPOSAL_FILES_TRAIN if cfg.MODEL.LOAD_PROPOSALS else None,
        )
        _log_api_usage("dataset." + cfg.DATASETS.COPYPASTE.BG[0])

    if mapper is None:
        mapper = DatasetMapper_copypaste(cfg, True)

    if sampler is None:
        sampler_name = cfg.DATALOADER.SAMPLER_TRAIN
        logger = logging.getLogger(__name__)
        if isinstance(dataset, torchdata.IterableDataset):
            logger.info("Not using any sampler since the dataset is IterableDataset.")
            sampler = None
        else:
            logger.info("Using training sampler {}".format(sampler_name))
            if sampler_name == "TrainingSampler":
                sampler = TrainingSampler(len(dataset), seed=seed1)
            elif sampler_name == "RepeatFactorTrainingSampler":
                repeat_factors = RepeatFactorTrainingSampler.repeat_factors_from_category_frequency(
                    dataset, cfg.DATALOADER.REPEAT_THRESHOLD
                )
                sampler = RepeatFactorTrainingSampler(repeat_factors, seed=seed1)
            elif sampler_name == "RandomSubsetTrainingSampler":
                sampler = RandomSubsetTrainingSampler(
                    len(dataset),
                    cfg.DATALOADER.RANDOM_SUBSET_RATIO,
                    seed_shuffle=seed1,
                    seed_subset=seed2,
                )
            elif sampler_name == "MultiDatasetSampler":
                raise ValueError("Despreted training sampler: {}".format(sampler_name))
                sizes = [0 for _ in range(len(cfg.DATASETS.TRAIN))]
                for d in dataset:
                    sizes[d["dataset_id"]] += 1
                sampler = MultiDatasetSampler(cfg, dataset, sizes, seed=seed1)
            elif sampler_name == "MultiDatasetTrainingSampler":
                # sampler = MultiDatasetTrainingSampler(cfg, dataset, seed=seed1)
                repeat_factors = MultiDatasetTrainingSampler.get_repeat_factors(
                    dataset,
                    len(cfg.DATASETS.TRAIN),
                    cfg.MULTI_DATASET.RATIOS,
                    cfg.MULTI_DATASET.USE_RFS,
                    cfg.MULTI_DATASET.USE_CAS,
                    cfg.MULTI_DATASET.REPEAT_THRESHOLD,
                    cfg.MULTI_DATASET.CAS_LAMBDA,
                )
                sampler = MultiDatasetTrainingSampler(repeat_factors, seed=seed1)
            else:
                raise ValueError("Unknown training sampler: {}".format(sampler_name))

    if True:
        sampler_name = cfg.DATALOADER.COPYPASTE.SAMPLER_TRAIN
        logger = logging.getLogger(__name__)
        if isinstance(dataset_bg, torchdata.IterableDataset):
            logger.info("Not using any sampler since the dataset is IterableDataset.")
            sampler = None
        else:
            logger.info("Using training sampler {}".format(sampler_name))
            if sampler_name == "TrainingSampler":
                sampler_bg = TrainingSampler(len(dataset_bg), seed=seed3)
            elif sampler_name == "RepeatFactorTrainingSampler":
                repeat_factors = RepeatFactorTrainingSampler.repeat_factors_from_category_frequency(
                    dataset_bg, cfg.DATALOADER.COPYPASTE.REPEAT_THRESHOLD
                )
                sampler_bg = RepeatFactorTrainingSampler(repeat_factors, seed=seed3)
            elif sampler_name == "RandomSubsetTrainingSampler":
                sampler_bg = RandomSubsetTrainingSampler(
                    len(dataset_bg),
                    cfg.DATALOADER.COPYPASTE.RANDOM_SUBSET_RATIO,
                    seed_shuffle=seed3,
                    seed_subset=seed4,
                )
            else:
                raise ValueError("Unknown training sampler: {}".format(sampler_name))

    return {
        "dataset": dataset,
        "dataset_bg": dataset_bg,
        "sampler": sampler,
        "sampler_bg": sampler_bg,
        "mapper": mapper,
        "total_batch_size": cfg.SOLVER.IMS_PER_BATCH,
        "total_batch_size_list": cfg.SOLVER.IMS_PER_BATCH_LIST,
        "aspect_ratio_grouping": cfg.DATALOADER.ASPECT_RATIO_GROUPING,
        "num_workers": cfg.DATALOADER.NUM_WORKERS,
        "num_datasets": len(cfg.DATASETS.TRAIN),
    }


@configurable(from_config=_train_loader_from_config)
def build_detection_train_loader_multi_dataset_copypaste(
    dataset,
    dataset_bg,
    *,
    mapper,
    sampler=None,
    sampler_bg=None,
    total_batch_size,
    total_batch_size_list,
    aspect_ratio_grouping=True,
    num_workers=0,
    collate_fn=None,
    num_datasets=1,
):
    """
    Build a dataloader for object detection with some default features.

    Args:
        dataset (list or torch.utils.data.Dataset): a list of dataset dicts,
            or a pytorch dataset (either map-style or iterable). It can be obtained
            by using :func:`DatasetCatalog.get` or :func:`get_detection_dataset_dicts`.
        mapper (callable): a callable which takes a sample (dict) from dataset and
            returns the format to be consumed by the model.
            When using cfg, the default choice is ``DatasetMapper(cfg, is_train=True)``.
        sampler (torch.utils.data.sampler.Sampler or None): a sampler that produces
            indices to be applied on ``dataset``.
            If ``dataset`` is map-style, the default sampler is a :class:`TrainingSampler`,
            which coordinates an infinite random shuffle sequence across all workers.
            Sampler must be None if ``dataset`` is iterable.
        total_batch_size (int): total batch size across all workers.
        aspect_ratio_grouping (bool): whether to group images with similar
            aspect ratio for efficiency. When enabled, it requires each
            element in dataset be a dict with keys "width" and "height".
        num_workers (int): number of parallel data loading workers
        collate_fn: a function that determines how to do batching, same as the argument of
            `torch.utils.data.DataLoader`. Defaults to do no collation and return a list of
            data. No collation is OK for small batch size and simple data structures.
            If your batch size is large and each sample contains too many small tensors,
            it's more efficient to collate them in data loader.

    Returns:
        torch.utils.data.DataLoader:
            a dataloader. Each output from it is a ``list[mapped_element]`` of length
            ``total_batch_size / num_workers``, where ``mapped_element`` is produced
            by the ``mapper``.
    """
    # wait = round(comm.get_local_rank() * 1.0 * len(dataset) / 60000)
    # logger = logging.getLogger(__name__)
    # logger.info("get_detection_dataset_dicts_multi_dataset sleep {}".format(wait))
    # time.sleep(wait)

    if isinstance(sampler_bg, Callable):
        sampler_bg = sampler_bg(dataset_bg)
    if isinstance(sampler, Callable):
        sampler = sampler(dataset)

    if isinstance(dataset_bg, list):
        dataset_bg = DatasetFromList(dataset_bg, copy=False)

    if isinstance(dataset_bg, torchdata.IterableDataset):
        assert sampler_bg is None, "sampler must be None if dataset is IterableDataset"
    else:
        if sampler_bg is None:
            sampler_bg = TrainingSampler(len(dataset_bg))
        assert isinstance(
            sampler_bg, torchdata.Sampler
        ), f"Expect a Sampler but got {type(sampler)}"

    if isinstance(dataset, list):
        dataset = DatasetFromList(dataset, copy=False)
    if mapper is not None:
        dataset = MapDataset_coppaste(dataset, mapper, dataset_bg, sampler_bg)

    if isinstance(dataset, torchdata.IterableDataset):
        assert sampler is None, "sampler must be None if dataset is IterableDataset"
    else:
        if sampler is None:
            sampler = TrainingSampler(len(dataset))
        assert isinstance(sampler, torchdata.Sampler), f"Expect a Sampler but got {type(sampler)}"
    return build_batch_data_loader_multi_dataset(
        dataset,
        sampler,
        total_batch_size,
        total_batch_size_list,
        aspect_ratio_grouping=aspect_ratio_grouping,
        num_workers=num_workers,
        collate_fn=collate_fn,
        num_datasets=num_datasets,
    )


class MultiDatasetSampler(Sampler):
    def __init__(self, cfg, dataset_dicts, sizes, seed: Optional[int] = None):
        self.sizes = sizes
        self.sample_epoch_size = cfg.MULTI_DATASET.SAMPLE_EPOCH_SIZE
        assert self.sample_epoch_size % cfg.SOLVER.IMS_PER_BATCH == 0, (
            self.sample_epoch_size % cfg.SOLVER.IMS_PER_BATCH == 0
        )
        if seed is None:
            seed = comm.shared_random_seed()
        self._seed = int(seed)

        self._rank = comm.get_rank()
        self._world_size = comm.get_world_size()

        dataset_ratio = cfg.MULTI_DATASET.RATIOS
        assert len(dataset_ratio) == len(
            sizes
        ), "length of dataset ratio {} should be equal to number if dataset {}".format(
            len(dataset_ratio), len(sizes)
        )
        dataset_weight = [
            torch.ones(s) * max(sizes) / s * r / sum(dataset_ratio)
            for i, (r, s) in enumerate(zip(dataset_ratio, sizes))
        ]
        st = 0
        cas_factors = []
        for i, s in enumerate(sizes):
            if cfg.MULTI_DATASET.USE_CAS[i]:
                cas_factor = self._get_class_balance_factor_per_dataset(
                    dataset_dicts[st : st + s], l=cfg.MULTI_DATASET.CAS_LAMBDA
                )
                cas_factor = cas_factor * (s / cas_factor.sum())
            else:
                cas_factor = torch.ones(s)
            cas_factors.append(cas_factor)
            st = st + s
        cas_factors = torch.cat(cas_factors)
        dataset_weight = torch.cat(dataset_weight)
        self.weights = dataset_weight * cas_factors

    def __iter__(self):
        start = self._rank
        yield from itertools.islice(self._infinite_indices(), start, None, self._world_size)

    def _infinite_indices(self):
        g = torch.Generator()
        g.manual_seed(self._seed)
        while True:
            ids = torch.multinomial(
                self.weights, self.sample_epoch_size, generator=g, replacement=True
            )
            yield from ids

    def _get_class_balance_factor_per_dataset(self, dataset_dicts, l=1.0):
        ret = []
        category_freq = defaultdict(int)
        for dataset_dict in dataset_dicts:  # For each image (without repeats)
            cat_ids = {ann["category_id"] for ann in dataset_dict["annotations"]}
            for cat_id in cat_ids:
                category_freq[cat_id] += 1
        for dataset_dict in dataset_dicts:
            cat_ids = {ann["category_id"] for ann in dataset_dict["annotations"]}
            ret.append(sum([1.0 / (category_freq[cat_id] ** l) for cat_id in cat_ids]))
        return torch.tensor(ret).float()


# class MultiDatasetTrainingSampler(Sampler):
#     def __init__(self, cfg, dataset_dicts, *, shuffle=True, seed=None):
#         sizes = [0 for _ in range(len(cfg.DATASETS.TRAIN))]
#         for d in dataset_dicts:
#             sizes[d["dataset_id"]] += 1

#         dataset_ratio = cfg.MULTI_DATASET.RATIOS
#         assert len(dataset_ratio) == len(
#             sizes
#         ), "length of dataset ratio {} should be equal to number if dataset {}".format(
#             len(dataset_ratio), len(sizes)
#         )
#         dataset_weight = [
#             torch.ones(s) * max(sizes) / s * r for i, (r, s) in enumerate(zip(dataset_ratio, sizes))
#         ]

#         logger = logging.getLogger(__name__)
#         logger.info(
#             "Training sampler dataset weight: {}".format(
#                 str([max(sizes) / s * r for i, (r, s) in enumerate(zip(dataset_ratio, sizes))])
#             )
#         )

#         st = 0
#         repeat_factors = []
#         for i, s in enumerate(sizes):
#             assert cfg.MULTI_DATASET.USE_RFS[i] * cfg.MULTI_DATASET.USE_CAS[i] == 0
#             if cfg.MULTI_DATASET.USE_RFS[i]:
#                 repeat_factor = RepeatFactorTrainingSampler.repeat_factors_from_category_frequency(
#                     dataset_dicts[st : st + s], cfg.MULTI_DATASET.REPEAT_THRESHOLD
#                 )
#             elif cfg.MULTI_DATASET.USE_CAS[i]:
#                 repeat_factor = MultiDatasetTrainingSampler.get_class_balance_factor_per_dataset(
#                     dataset_dicts[st : st + s], l=cfg.MULTI_DATASET.CAS_LAMBDA
#                 )
#                 repeat_factor = repeat_factor * (s / repeat_factor.sum())
#             else:
#                 repeat_factor = torch.ones(s)
#             repeat_factors.append(repeat_factor)
#             st = st + s
#         repeat_factors = torch.cat(repeat_factors)
#         dataset_weight = torch.cat(dataset_weight)
#         repeat_factors = dataset_weight * repeat_factors

#         self._shuffle = shuffle
#         if seed is None:
#             seed = comm.shared_random_seed()
#         self._seed = int(seed)

#         self._rank = comm.get_rank()
#         self._world_size = comm.get_world_size()

#         # Split into whole number (_int_part) and fractional (_frac_part) parts.
#         self._int_part = torch.trunc(repeat_factors)
#         self._frac_part = repeat_factors - self._int_part

#     @staticmethod
#     def get_class_balance_factor_per_dataset(dataset_dicts, l=1.0):
#         rep_factors = []
#         category_freq = defaultdict(int)
#         for dataset_dict in dataset_dicts:  # For each image (without repeats)
#             cat_ids = {ann["category_id"] for ann in dataset_dict["annotations"]}
#             for cat_id in cat_ids:
#                 category_freq[cat_id] += 1
#         for dataset_dict in dataset_dicts:
#             cat_ids = {ann["category_id"] for ann in dataset_dict["annotations"]}
#             rep_factor = sum([1.0 / (category_freq[cat_id] ** l) for cat_id in cat_ids])
#             rep_factors.append(rep_factor)

#         return torch.tensor(rep_factors, dtype=torch.float32)

#     def _get_epoch_indices(self, generator):
#         """
#         Create a list of dataset indices (with repeats) to use for one epoch.

#         Args:
#             generator (torch.Generator): pseudo random number generator used for
#                 stochastic rounding.

#         Returns:
#             torch.Tensor: list of dataset indices to use in one epoch. Each index
#                 is repeated based on its calculated repeat factor.
#         """
#         # Since repeat factors are fractional, we use stochastic rounding so
#         # that the target repeat factor is achieved in expectation over the
#         # course of training
#         rands = torch.rand(len(self._frac_part), generator=generator)
#         rep_factors = self._int_part + (rands < self._frac_part).float()
#         # Construct a list of indices in which we repeat images as specified
#         indices = []
#         for dataset_index, rep_factor in enumerate(rep_factors):
#             indices.extend([dataset_index] * int(rep_factor.item()))
#         return torch.tensor(indices, dtype=torch.int64)

#     def __iter__(self):
#         start = self._rank
#         yield from itertools.islice(self._infinite_indices(), start, None, self._world_size)

#     def _infinite_indices(self):
#         g = torch.Generator()
#         g.manual_seed(self._seed)
#         while True:
#             # Sample indices with repeats determined by stochastic rounding; each
#             # "epoch" may have a slightly different size due to the rounding.
#             indices = self._get_epoch_indices(g)
#             if self._shuffle:
#                 randperm = torch.randperm(len(indices), generator=g)
#                 yield from indices[randperm].tolist()
#             else:
#                 yield from indices.tolist()


class MultiDatasetAspectRatioGroupedDataset(torch.utils.data.IterableDataset):
    """
    Batch data that have similar aspect ratio together.
    In this implementation, images whose aspect ratio < (or >) 1 will
    be batched together.
    This improves training speed because the images then need less padding
    to form a batch.

    It assumes the underlying dataset produces dicts with "width" and "height" keys.
    It will then produce a list of original dicts with length = batch_size,
    all with similar aspect ratios.
    """

    def __init__(self, dataset, batch_size, num_datasets):
        """
        Args:
            dataset: an iterable. Each element must be a dict with keys
                "width" and "height", which will be used to batch data.
            batch_size (int):
        """
        self.dataset = dataset
        self.batch_size = batch_size
        self._buckets = [[] for _ in range(2 * num_datasets)]
        # Hard-coded two aspect ratio groups: w > h and w < h.
        # Can add support for more aspect ratio groups, but doesn't seem useful

    def __iter__(self):
        for d in self.dataset:
            w, h = d["width"], d["height"]
            bucket_id = 0 if w > h else 1
            bucket_id = d["dataset_id"] * 2 + bucket_id
            bucket = self._buckets[bucket_id]
            bucket.append(d)
            if len(bucket) == self.batch_size[d["dataset_id"]]:
                data = bucket[:]
                # Clear bucket first, because code after yield is not
                # guaranteed to execute
                del bucket[:]
                yield data


================================================
FILE: ape/data/common_copypaste.py
================================================
# Copyright (c) Facebook, Inc. and its affiliates.
import logging
import random

import numpy as np
import torch.utils.data as data

from detectron2.data.common import _MapIterableDataset
from detectron2.utils.serialize import PicklableWrapper

__all__ = ["MapDataset_coppaste"]


class MapDataset_coppaste(data.Dataset):
    """
    Map a function over the elements in a dataset.
    """

    def __init__(self, dataset, map_func, dataset_bg, sampler_bg):
        """
        Args:
            dataset: a dataset where map function is applied. Can be either
                map-style or iterable dataset. When given an iterable dataset,
                the returned object will also be an iterable dataset.
            map_func: a callable which maps the element in dataset. map_func can
                return None to skip the data (e.g. in case of errors).
                How None is handled depends on the style of `dataset`.
                If `dataset` is map-style, it randomly tries other elements.
                If `dataset` is iterable, it skips the data and tries the next.
        """
        self._dataset = dataset
        self._map_func = PicklableWrapper(map_func)  # wrap so that a lambda will work

        self._rng = random.Random(42)
        self._fallback_candidates = set(range(len(dataset)))

        self._dataset_bg = dataset_bg
        self._sampler_bg = sampler_bg
        self._sampler_bg_iter = None

    def __new__(cls, dataset, map_func, dataset_bg, sampler_bg):
        is_iterable = isinstance(dataset, data.IterableDataset)
        if is_iterable:
            assert 0
            return _MapIterableDataset(dataset, map_func)
        else:
            return super().__new__(cls)

    def __getnewargs__(self):
        return self._dataset, self._map_func, self._dataset_bg, self._sampler_bg

    def __len__(self):
        return len(self._dataset)

    def __getitem__(self, idx):
        retry_count = 0
        cur_idx = int(idx)

        if self._sampler_bg_iter:
            pass
        else:
            self._sampler_bg._seed = np.random.randint(2**31)
            self._sampler_bg_iter = iter(self._sampler_bg)

        while True:
            cur_idx_bg = next(self._sampler_bg_iter)
            data = self._map_func(self._dataset[cur_idx], self._dataset_bg[cur_idx_bg])
            if data is not None:
                self._fallback_candidates.add(cur_idx)
                return data

            # _map_func fails for this idx, use a random new index from the pool
            retry_count += 1
            self._fallback_candidates.discard(cur_idx)
            cur_idx = self._rng.sample(self._fallback_candidates, k=1)[0]

            if retry_count >= 3:
                logger = logging.getLogger(__name__)
                logger.warning(
                    "Failed to apply `_map_func` for idx: {}, retry count: {}".format(
                        idx, retry_count
                    )
                )


================================================
FILE: ape/data/dataset_mapper.py
================================================
# Copyright (c) Facebook, Inc. and its affiliates.
import logging

from detectron2.data import detection_utils as utils
from detectron2.data import transforms as T
from detectron2.data.dataset_mapper import DatasetMapper as DatasetMapper_d2

from . import detection_utils as utils_ape

"""
This file contains the default mapping that's applied to "dataset dicts".
"""

__all__ = ["DatasetMapper_ape"]


class DatasetMapper_ape(DatasetMapper_d2):
    """
    A callable which takes a dataset dict in Detectron2 Dataset format,
    and map it into a format used by the model.

    This is the default callable to be used to map your dataset dict into training data.
    You may need to follow it to implement your own one for customized logic,
    such as a different way to read or transform images.
    See :doc:`/tutorials/data_loading` for details.

    The callable currently does the following:

    1. Read the image from "file_name"
    2. Applies cropping/geometric transforms to the image and annotations
    3. Prepare data and annotations to Tensor and :class:`Instances`
    """

    def __init__(self, cfg, is_train: bool = True):
        super().__init__(cfg, is_train)
        augmentations = utils_ape.build_augmentation(cfg, is_train)
        self.augmentations = T.AugmentationList(augmentations)

        logger = logging.getLogger(__name__)
        mode = "training" if is_train else "inference"
        logger.info(f"[DatasetMapper] Augmentations used in {mode}: {augmentations}")


================================================
FILE: ape/data/dataset_mapper_copypaste.py
================================================
import copy
import logging
import os
import random
from typing import List, Optional, Union

import cv2
import numpy as np
import torch

import detectron2.utils.comm as comm
from detectron2.config import configurable
from detectron2.data import MetadataCatalog
from detectron2.data import detection_utils as utils
from detectron2.data import transforms as T
from detectron2.data.dataset_mapper import DatasetMapper as DatasetMapper_d2
from detectron2.data.detection_utils import convert_image_to_rgb
from detectron2.structures import BitMasks, Boxes, Instances

from . import detection_utils as utils_ape
from . import mapper_utils

"""
This file contains the default mapping that's applied to "dataset dicts".
"""

__all__ = ["DatasetMapper_copypaste"]


class DatasetMapper_copypaste(DatasetMapper_d2):
    """
    A callable which takes a dataset dict in Detectron2 Dataset format,
    and map it into a format used by the model.

    This is the default callable to be used to map your dataset dict into training data.
    You may need to follow it to implement your own one for customized logic,
    such as a different way to read or transform images.
    See :doc:`/tutorials/data_loading` for details.

    The callable currently does the following:

    1. Read the image from "file_name"
    2. Applies cropping/geometric transforms to the image and annotations
    3. Prepare data and annotations to Tensor and :class:`Instances`
    """

    @configurable
    def __init__(
        self,
        is_train: bool,
        *,
        augmentations: List[Union[T.Augmentation, T.Transform]],
        augmentations_d2: List[Union[T.Augmentation, T.Transform]],
        augmentations_aa: List[Union[T.Augmentation, T.Transform]],
        augmentations_lsj: List[Union[T.Augmentation, T.Transform]],
        augmentations_type: List[str],
        image_format: str,
        use_instance_mask: bool = False,
        use_keypoint: bool = False,
        instance_mask_format: str = "polygon",
        keypoint_hflip_indices: Optional[np.ndarray] = None,
        precomputed_proposal_topk: Optional[int] = None,
        recompute_boxes: bool = False,
        copypaste_prob: float = 0.5,
        output_dir: str = None,
        vis_period: int = 0,
        dataset_names: tuple = (),
    ):
        """
        NOTE: this interface is experimental.

        Args:
            is_train: whether it's used in training or inference
            augmentations: a list of augmentations or deterministic transforms to apply
            image_format: an image format supported by :func:`detection_utils.read_image`.
            use_instance_mask: whether to process instance segmentation annotations, if available
            use_keypoint: whether to process keypoint annotations if available
            instance_mask_format: one of "polygon" or "bitmask". Process instance segmentation
                masks into this format.
            keypoint_hflip_indices: see :func:`detection_utils.create_keypoint_hflip_indices`
            precomputed_proposal_topk: if given, will load pre-computed
                proposals from dataset_dict and keep the top k proposals for each image.
            recompute_boxes: whether to overwrite bounding box annotations
                by computing tight bounding boxes from instance mask annotations.
        """
        if recompute_boxes:
            assert use_instance_mask, "recompute_boxes requires instance masks"
        # fmt: off
        self.is_train               = is_train
        self.augmentations          = T.AugmentationList(augmentations)
        self.augmentations_d2       = T.AugmentationList(augmentations_d2)
        self.augmentations_aa       = T.AugmentationList(augmentations_aa)
        self.augmentations_lsj      = T.AugmentationList(augmentations_lsj)
        self.augmentations_type     = augmentations_type
        self.image_format           = image_format
        self.use_instance_mask      = use_instance_mask
        self.instance_mask_format   = instance_mask_format
        self.use_keypoint           = use_keypoint
        self.keypoint_hflip_indices = keypoint_hflip_indices
        self.proposal_topk          = precomputed_proposal_topk
        self.recompute_boxes        = recompute_boxes
        # fmt: on
        logger = logging.getLogger(__name__)
        mode = "training" if is_train else "inference"
        logger.info(f"[DatasetMapper] Augmentations used in {mode}: {augmentations}")
        logger.info(f"[DatasetMapper] D2 Augmentations D2 used in {mode}: {augmentations_d2}")
        logger.info(f"[DatasetMapper] AA Augmentations used in {mode}: {augmentations_aa}")
        logger.info(f"[DatasetMapper] LSJ Augmentations used in {mode}: {augmentations_lsj}")
        logger.info(f"[DatasetMapper] Type Augmentations used in {mode}: {augmentations_type}")

        if output_dir is not None:
            self.output_dir = os.path.join(output_dir, "vis_mapper")
            os.makedirs(self.output_dir, exist_ok=True)

        self.copypaste_prob = copypaste_prob
        self.vis_period = vis_period
        self.iter = 0
        self.dataset_names = dataset_names

        self.metatada_list = []
        for dataset_name in self.dataset_names:
            metadata = MetadataCatalog.get(dataset_name)
            self.metatada_list.append(metadata)

    @classmethod
    def from_config(cls, cfg, is_train: bool = True):
        augs = utils_ape.build_augmentation(cfg, is_train)
        augs_d2 = utils.build_augmentation(cfg, is_train)
        augs_aa = utils_ape.build_augmentation_aa(cfg, is_train)
        augs_lsj = utils_ape.build_augmentation_lsj(cfg, is_train)
        if cfg.INPUT.CROP.ENABLED and is_train:
            raise NotImplementedError("cfg.INPUT.CROP.ENABLED is not supported yet")
            augs.insert(0, T.RandomCrop(cfg.INPUT.CROP.TYPE, cfg.INPUT.CROP.SIZE))
            recompute_boxes = cfg.MODEL.MASK_ON
        else:
            recompute_boxes = False

        if cfg.INPUT.MASK_FORMAT == "polygon":
            logger = logging.getLogger(__name__)
            logger.warning("Using polygon is slow, use bitmask instead")
        if cfg.INPUT.MASK_FORMAT == "bitmask":
            logger = logging.getLogger(__name__)
            logger.warning("Using bitmask may has bug, use polygon instead")
            assert (
                cfg.INPUT.SEG_PAD_VALUE == 0
            ), "PadTransform should pad bitmask with value 0. Please setting cfg.INPUT.SEG_PAD_VALUE to 0. \nNoted that cfg.INPUT.SEG_PAD_VALUE is also used to pad semantic segmentation. If semantic segmentation is used, Please set cfg.INPUT.FORMAT to polygon."

        ret = {
            "is_train": is_train,
            "augmentations": augs,
            "augmentations_d2": augs_d2,
            "augmentations_aa": augs_aa,
            "augmentations_lsj": augs_lsj,
            "augmentations_type": cfg.INPUT.AUGMENT_TYPE,
            "image_format": cfg.INPUT.FORMAT,
            "use_instance_mask": cfg.MODEL.MASK_ON,
            "instance_mask_format": cfg.INPUT.MASK_FORMAT,
            "use_keypoint": cfg.MODEL.KEYPOINT_ON,
            "recompute_boxes": recompute_boxes,
            "output_dir": cfg.OUTPUT_DIR,
            "copypaste_prob": cfg.DATASETS.COPYPASTE.PROB,
            "vis_period": cfg.VIS_PERIOD,
            "dataset_names": cfg.DATASETS.TRAIN,
        }

        if cfg.MODEL.KEYPOINT_ON:
            ret["keypoint_hflip_indices"] = utils.create_keypoint_hflip_indices(cfg.DATASETS.TRAIN)

        if cfg.MODEL.LOAD_PROPOSALS:
            ret["precomputed_proposal_topk"] = (
                cfg.DATASETS.PRECOMPUTED_PROPOSAL_TOPK_TRAIN
                if is_train
                else cfg.DATASETS.PRECOMPUTED_PROPOSAL_TOPK_TEST
            )
        return ret

    def _transform_annotations(self, dataset_dict, transforms, image_shape):
        # USER: Modify this if you want to keep them for some reason.
        for anno in dataset_dict["annotations"]:
            if not self.use_instance_mask:
                anno.pop("segmentation", None)
            if not self.use_keypoint:
                anno.pop("keypoints", None)

        copypaste = [
            obj.get("copypaste", 0)
            for obj in dataset_dict["annotations"]
            if obj.get("iscrowd", 0) == 0
        ]

        phrases = [
            obj.get("phrase", "")
            for obj in dataset_dict["annotations"]
            if obj.get("iscrowd", 0) == 0
        ]

        # USER: Implement additional transformations if you have other types of data
        annos = [
            utils.transform_instance_annotations(
                obj, transforms, image_shape, keypoint_hflip_indices=self.keypoint_hflip_indices
            )
            for obj in dataset_dict.pop("annotations")
            if obj.get("iscrowd", 0) == 0
        ]
        instances = utils.annotations_to_instances(
            annos, image_shape, mask_format=self.instance_mask_format
        )

        instances.copypaste = torch.tensor(copypaste)

        if sum([len(x) for x in phrases]) > 0:
            instances.phrase_idxs = torch.tensor(range(len(phrases)))

        # After transforms such as cropping are applied, the bounding box may no longer
        # tightly bound the object. As an example, imagine a triangle object
        # [(0,0), (2,0), (0,2)] cropped by a box [(1,0),(2,2)] (XYXY format). The tight
        # bounding box of the cropped triangle should be [(1,0),(2,1)], which is not equal to
        # the intersection of original bounding box and the cropping box.
        if self.recompute_boxes and instances.has("gt_masks"):
            instances.gt_boxes = instances.gt_masks.get_bounding_boxes()
        dataset_dict["instances"] = utils.filter_empty_instances(instances, box_threshold=10)

        if sum([len(x) for x in phrases]) > 0:
            phrases_filtered = []
            for x in dataset_dict["instances"].phrase_idxs.tolist():
                phrases_filtered.append(phrases[x])
            dataset_dict["instances"].phrases = mapper_utils.transform_phrases(
                phrases_filtered, transforms
            )
            dataset_dict["instances"].remove("phrase_idxs")
            # dataset_dict["instances"].gt_classes = torch.tensor(range(len(phrases_filtered)))

    def __call__(self, dataset_dict, dataset_dict_bg):
        """
        Args:
            dataset_dict (dict): Metadata of one image, in Detectron2 Dataset format.

        Returns:
            dict: a format that builtin models in detectron2 accept
        """
        dataset_dict = copy.deepcopy(dataset_dict)  # it will be modified by code below
        # USER: Write your own image loading if it's not from a file
        try:
            image = utils.read_image(dataset_dict["file_name"], format=self.image_format)
        except Exception as e:
            logger = logging.getLogger(__name__)
            logger.error(f"read_image fails: {dataset_dict['file_name']}")
            logger.error(f"read_image fails: {e}")
            return None
        utils.check_image_size(dataset_dict, image)

        # ------------------------------------------------------------------------------------
        if (
            self.is_train
            and "annotations" in dataset_dict
            and (
                len(dataset_dict["annotations"]) == 0
                or any(["bbox" not in anno for anno in dataset_dict["annotations"]])
            )
        ):
            if "dataset_id" in dataset_dict:
                dataset_id = dataset_dict["dataset_id"]
            else:
                dataset_id = 0
            metadata = self.metatada_list[dataset_id]
            if "sa1b" in self.dataset_names[dataset_id]:
                metadata = None
            dataset_dict = mapper_utils.maybe_load_annotation_from_file(dataset_dict, meta=metadata)

            for anno in dataset_dict["annotations"]:
                if "bbox" not in anno:
                    logger = logging.getLogger(__name__)
                    logger.warning(f"Box not found: {dataset_dict}")
                    return None
                if "category_id" not in anno:
                    anno["category_id"] = 0
        # ------------------------------------------------------------------------------------

        # ------------------------------------------------------------------------------------
        if dataset_dict["copypaste"] and self.copypaste_prob > random.uniform(0, 1):
            image_cp, dataset_dict_cp = mapper_utils.copypaste(
                dataset_dict, dataset_dict_bg, self.image_format, self.instance_mask_format
            )

            if dataset_dict_cp is None or image_cp is None:
                pass
            else:
                for key in dataset_dict.keys():
                    if key in dataset_dict_cp:
                        continue
                    dataset_dict_cp[key] = dataset_dict[key]
                dataset_dict = dataset_dict_cp
                image = image_cp
        # ------------------------------------------------------------------------------------

        # USER: Remove if you don't do semantic/panoptic segmentation.
        if "sem_seg_file_name" in dataset_dict:
            try:
                sem_seg_gt = utils.read_image(dataset_dict.pop("sem_seg_file_name"), "L").squeeze(2)
            except Exception as e:
                logger = logging.getLogger(__name__)
                logger.error(f"read_image fails: {e}")
                logger.error(f"read_image fails: {dataset_dict}")
                return None

            if "copypaste_mask" in dataset_dict:
                # assume thing class is 0
                sem_seg_gt = sem_seg_gt.copy()
                sem_seg_gt[dataset_dict["copypaste_mask"]] = 0
        else:
            sem_seg_gt = None

        aug_input = T.AugInput(image, sem_seg=sem_seg_gt)
        try:
            if "dataset_id" not in dataset_dict or dataset_dict["dataset_id"] >= len(
                self.augmentations_type
            ):
                transforms = self.augmentations(aug_input)
            elif self.augmentations_type[dataset_dict["dataset_id"]] == "D2":
                transforms = self.augmentations_d2(aug_input)
            elif self.augmentations_type[dataset_dict["dataset_id"]] == "AA":
                transforms = self.augmentations_aa(aug_input)
            elif self.augmentations_type[dataset_dict["dataset_id"]] == "LSJ":
                transforms = self.augmentations_lsj(aug_input)
            else:
                print("fall back to default augmentation")
                transforms = self.augmentations(aug_input)
            image, sem_seg_gt = aug_input.image, aug_input.sem_seg
        except Exception as e:
            logger = logging.getLogger(__name__)
            logger.error(f"augment fails: {dataset_dict['file_name']}")
            logger.error(f"augment fails: {e}")
            return None

        image_shape = image.shape[:2]  # h, w
        # Pytorch's dataloader is efficient on torch.Tensor due to shared-memory,
        # but not efficient on large generic data structures due to the use of pickle & mp.Queue.
        # Therefore it's important to use torch.Tensor.
        dataset_dict["image"] = torch.as_tensor(np.ascontiguousarray(image.transpose(2, 0, 1)))
        if sem_seg_gt is not None:
            dataset_dict["sem_seg"] = torch.as_tensor(sem_seg_gt.astype("long"))

        # USER: Remove if you don't use pre-computed proposals.
        # Most users would not need this feature.
        if self.proposal_topk is not None:
            utils.transform_proposals(
                dataset_dict, image_shape, transforms, proposal_topk=self.proposal_topk
            )

        if not self.is_train:
            # USER: Modify this if you want to keep them for some reason.
            dataset_dict.pop("annotations", None)
            dataset_dict.pop("sem_seg_file_name", None)
            return dataset_dict

        # seperate box and region
        if "annotations" in dataset_dict:
            annotations = []
            annotations_phrase = []
            for ann in dataset_dict.pop("annotations"):
                if ann.get("isobject", 1) == 0:
                    annotations_phrase.append(ann)
                else:
                    annotations.append(ann)
            if len(annotations_phrase) > 0:
                dataset_dict["annotations"] = annotations_phrase
                self._transform_annotations(dataset_dict, transforms, image_shape)
                dataset_dict["instances_phrase"] = dataset_dict.pop("instances")
            dataset_dict["annotations"] = annotations

        if "annotations" in dataset_dict:
            self._transform_annotations(dataset_dict, transforms, image_shape)

        # ------------------------------------------------------------------------------------
        if self.vis_period > 0 and self.iter % self.vis_period == 0:
            self.visualize_training(dataset_dict)
        # ------------------------------------------------------------------------------------
        self.iter += 1

        return dataset_dict

    def visualize_training(self, dataset_dict, prefix="", suffix=""):
        if self.output_dir is None:
            return
        if dataset_dict is None:
            return
        # if "instances" not in dataset_dict:
        #     return
        from detectron2.utils.visualizer import Visualizer
        from detectron2.data import MetadataCatalog

        if "dataset_id" in dataset_dict:
            dataset_id = dataset_dict["dataset_id"]
        else:
            dataset_id = 0
        dataset_name = self.dataset_names[dataset_id]
        metadata = MetadataCatalog.get(dataset_name)
        class_names = metadata.get(
            "thing_classes",
            [
                "thing",
            ],
        )

        img = dataset_dict["image"]
        img = convert_image_to_rgb(img.permute(1, 2, 0), self.image_format)
        image_shape = img.shape[:2]  # h, w
        vis = Visualizer(img, metadata=metadata)
        if "instances" in dataset_dict:
            vis = vis.overlay_instances(
                boxes=dataset_dict["instances"].gt_boxes,
                masks=dataset_dict["instances"].gt_masks
                if dataset_dict["instances"].has("gt_masks")
                else None,
                labels=[class_names[i] for i in dataset_dict["instances"].gt_classes],
            )
        else:
            vis = vis.overlay_instances(
                boxes=None,
                masks=None,
                labels=None,
            )
        vis_gt = vis.get_image()

        if "instances_phrase" in dataset_dict:
            vis = Visualizer(img, metadata=metadata)
            vis = vis.overlay_instances(
                boxes=dataset_dict["instances_phrase"].gt_boxes,
                masks=dataset_dict["instances_phrase"].gt_masks
                if dataset_dict["instances_phrase"].has("gt_masks")
                else None,
                labels=dataset_dict["instances_phrase"].phrases,
            )
            vis_phrase = vis.get_image()
            vis_gt = np.concatenate((vis_gt, vis_phrase), axis=1)

        if "captions" in dataset_dict:
            vis = Visualizer(img, metadata=metadata)
            vis = vis.overlay_instances(
                boxes=Boxes(
                    np.array(
                        [
                            [
                                0 + i * 20,
                                0 + i * 20,
                                image_shape[1] - 1 - i * 20,
                                image_shape[0] - 1 - i * 20,
                            ]
                            for i in range(len(dataset_dict["captions"]))
                        ]
                    )
                ),
                masks=None,
                labels=dataset_dict["captions"],
            )
            vis_cap = vis.get_image()
            vis_gt = np.concatenate((vis_gt, vis_cap), axis=1)

        if "sem_seg" in dataset_dict:
            vis = Visualizer(img, metadata=metadata)
            vis = vis.draw_sem_seg(dataset_dict["sem_seg"], area_threshold=0, alpha=0.5)
            vis_sem_gt = vis.get_image()
            vis_gt = np.concatenate((vis_gt, vis_sem_gt), axis=1)

        concat = np.concatenate((vis_gt, img), axis=1)

        image_name = os.path.basename(dataset_dict["file_name"]).split(".")[0]

        save_path = os.path.join(
            self.output_dir,
            prefix
            + str(self.iter)
            + "_"
            + image_name
            + "_g"
            + str(comm.get_rank())
            + suffix
            + ".png",
        )
        concat = cv2.cvtColor(concat, cv2.COLOR_RGB2BGR)
        cv2.imwrite(save_path, concat)

        return

        import pickle

        save_path = os.path.join(
            self.output_dir,
            prefix
            + str(self.iter)
            + "_"
            + str(dataset_dict["image_id"])
            + "_g"
            + str(comm.get_rank())
            + suffix
            + ".pkl",
        )
        with open(save_path, "wb") as save_file:
            pickle.dump(dataset_dict, save_file)


================================================
FILE: ape/data/dataset_mapper_detr_instance.py
================================================
import copy
import logging
from typing import List, Optional, Union

import numpy as np
import torch

from detectron2.config import configurable
from detectron2.data import MetadataCatalog
from detectron2.data import detection_utils as utils
from detectron2.data import transforms as T
from detectron2.layers import batched_nms

from . import mapper_utils

"""
This file contains the default mapping that's applied to "dataset dicts".
"""

__all__ = ["DatasetMapper_detr_instance"]


class DatasetMapper_detr_instance:
    """
    A callable which takes a dataset dict in Detectron2 Dataset format,
    and map it into a format used by the model.

    This is the default callable to be used to map your dataset dict into training data.
    You may need to follow it to implement your own one for customized logic,
    such as a different way to read or transform images.
    See :doc:`/tutorials/data_loading` for details.

    The callable currently does the following:

    1. Read the image from "file_name"
    2. Applies cropping/geometric transforms to the image and annotations
    3. Prepare data and annotations to Tensor and :class:`Instances`
    """

    @configurable
    def __init__(
        self,
        is_train: bool,
        *,
        augmentations: List[Union[T.Augmentation, T.Transform]],
        augmentations_with_crop: List[Union[T.Augmentation, T.Transform]],
        image_format: str,
        use_instance_mask: bool = False,
        use_keypoint: bool = False,
        instance_mask_format: str = "polygon",
        keypoint_hflip_indices: Optional[np.ndarray] = None,
        precomputed_proposal_topk: Optional[int] = None,
        recompute_boxes: bool = False,
        dataset_names: tuple = (),
        max_num_phrase: int = 0,
        nms_thresh_phrase: float = 0.0,
    ):
        """
        NOTE: this interface is experimental.

        Args:
            is_train: whether it's used in training or inference
            augmentations: a list of augmentations or deterministic transforms to apply
            image_format: an image format supported by :func:`detection_utils.read_image`.
            use_instance_mask: whether to process instance segmentation annotations, if available
            use_keypoint: whether to process keypoint annotations if available
            instance_mask_format: one of "polygon" or "bitmask". Process instance segmentation
                masks into this format.
            keypoint_hflip_indices: see :func:`detection_utils.create_keypoint_hflip_indices`
            precomputed_proposal_topk: if given, will load pre-computed
                proposals from dataset_dict and keep the top k proposals for each image.
            recompute_boxes: whether to overwrite bounding box annotations
                by computing tight bounding boxes from instance mask annotations.
        """
        if recompute_boxes:
            assert use_instance_mask, "recompute_boxes requires instance masks"
        # fmt: off
        self.is_train               = is_train
        self.augmentations          = T.AugmentationList(augmentations)
        self.augmentations_with_crop = T.AugmentationList(augmentations_with_crop)
        self.image_format           = image_format
        self.use_instance_mask      = use_instance_mask
        self.instance_mask_format   = instance_mask_format
        self.use_keypoint           = use_keypoint
        self.keypoint_hflip_indices = keypoint_hflip_indices
        self.proposal_topk          = precomputed_proposal_topk
        self.recompute_boxes        = recompute_boxes
        # fmt: on
        logger = logging.getLogger(__name__)
        mode = "training" if is_train else "inference"
        logger.info(f"[DatasetMapper] Augmentations used in {mode}: {augmentations}")
        logger.info(f"[DatasetMapper] Augmentations used in {mode}: {augmentations_with_crop}")

        self.dataset_names = dataset_names

        self.metatada_list = []
        for dataset_name in self.dataset_names:
            metadata = MetadataCatalog.get(dataset_name)
            self.metatada_list.append(metadata)

        self.max_num_phrase = max_num_phrase
        self.nms_thresh_phrase = nms_thresh_phrase

    @classmethod
    def from_config(cls, cfg, is_train: bool = True):
        raise NotImplementedError(self.__class__.__name__)

    def _transform_annotations(self, dataset_dict, transforms, image_shape):
        # USER: Modify this if you want to keep them for some reason.
        for anno in dataset_dict["annotations"]:
            if not self.use_instance_mask:
                anno.pop("segmentation", None)
            if not self.use_keypoint:
                anno.pop("keypoints", None)

        phrases = [
            obj.get("phrase", "")
            for obj in dataset_dict["annotations"]
            if obj.get("iscrowd", 0) == 0
        ]

        # USER: Implement additional transformations if you have other types of data
        annos = [
            utils.transform_instance_annotations(
                obj, transforms, image_shape, keypoint_hflip_indices=self.keypoint_hflip_indices
            )
            for obj in dataset_dict.pop("annotations")
            if obj.get("iscrowd", 0) == 0
        ]
        instances = utils.annotations_to_instances(
            annos, image_shape, mask_format=self.instance_mask_format
        )

        if sum([len(x) for x in phrases]) > 0:
            instances.phrase_idxs = torch.tensor(range(len(phrases)))

        # After transforms such as cropping are applied, the bounding box may no longer
        # tightly bound the object. As an example, imagine a triangle object
        # [(0,0), (2,0), (0,2)] cropped by a box [(1,0),(2,2)] (XYXY format). The tight
        # bounding box of the cropped triangle should be [(1,0),(2,1)], which is not equal to
        # the intersection of original bounding box and the cropping box.
        if self.recompute_boxes and instances.has("gt_masks"):
            instances.gt_boxes = instances.gt_masks.get_bounding_boxes()
        dataset_dict["instances"] = utils.filter_empty_instances(instances)

        if sum([len(x) for x in phrases]) > 0:
            phrases_filtered = []
            for x in dataset_dict["instances"].phrase_idxs.tolist():
                phrases_filtered.append(phrases[x])
            dataset_dict["instances"].phrases = mapper_utils.transform_phrases(
                phrases_filtered, transforms
            )
            dataset_dict["instances"].remove("phrase_idxs")
            # dataset_dict["instances"].gt_classes = torch.tensor(range(len(phrases_filtered)))

    def __call__(self, dataset_dict):
        """
        Args:
            dataset_dict (dict): Metadata of one image, in Detectron2 Dataset format.

        Returns:
            dict: a format that builtin models in detectron2 accept
        """
        dataset_dict = copy.deepcopy(dataset_dict)  # it will be modified by code below
        # USER: Write your own image loading if it's not from a file
        try:
            image = utils.read_image(dataset_dict["file_name"], format=self.image_format)
            dataset_dict["width"] = image.shape[1]
            dataset_dict["height"] = image.shape[0]
        except Exception as e:
            logger = logging.getLogger(__name__)
            logger.error(f"read_image fails: {dataset_dict['file_name']}")
            logger.error(f"read_image fails: {e}")
            return None
        utils.check_image_size(dataset_dict, image)

        # ------------------------------------------------------------------------------------
        if (
            self.is_train
            and "annotations" in dataset_dict
            and (
                len(dataset_dict["annotations"]) == 0
                or any(["bbox" not in anno for anno in dataset_dict["annotations"]])
            )
        ):
            if "dataset_id" in dataset_dict:
                dataset_id = dataset_dict["dataset_id"]
            else:
                dataset_id = 0
            metadata = self.metatada_list[dataset_id]
            if "sa1b" in self.dataset_names[dataset_id]:
                metadata = None
            dataset_dict = mapper_utils.maybe_load_annotation_from_file(dataset_dict, meta=metadata)

            for anno in dataset_dict["annotations"]:
                if "bbox" not in anno:
                    logger = logging.getLogger(__name__)
                    logger.warning(f"Box not found: {dataset_dict}")
                    return None
                if "category_id" not in anno:
                    anno["category_id"] = 0
        # ------------------------------------------------------------------------------------

        # USER: Remove if you don't do semantic/panoptic segmentation.
        if "sem_seg_file_name" in dataset_dict:
            sem_seg_gt = utils.read_image(dataset_dict.pop("sem_seg_file_name"), "L").squeeze(2)
        else:
            sem_seg_gt = None

        # ordinal numbers
        disable_crop = False
        if (
            "annotations" in dataset_dict
            and len(dataset_dict["annotations"]) > 0
            and "phrase" in dataset_dict["annotations"][0]
        ):
            disable_crop = disable_crop or mapper_utils.has_ordinal_num(
                [anno["phrase"] for anno in dataset_dict["annotations"]]
            )
        if "expressions" in dataset_dict:
            disable_crop = disable_crop or mapper_utils.has_ordinal_num(dataset_dict["expressions"])

        if self.augmentations_with_crop is None or disable_crop:
            augmentations = self.augmentations
        else:
            if np.random.rand() > 0.5:
                augmentations = self.augmentations
            else:
                augmentations = self.augmentations_with_crop

        aug_input = T.AugInput(image, sem_seg=sem_seg_gt)
        # transforms = self.augmentations(aug_input)
        transforms = augmentations(aug_input)
        image, sem_seg_gt = aug_input.image, aug_input.sem_seg

        image_shape = image.shape[:2]  # h, w
        # Pytorch's dataloader is efficient on torch.Tensor due to shared-memory,
        # but not efficient on large generic data structures due to the use of pickle & mp.Queue.
        # Therefore it's important to use torch.Tensor.
        dataset_dict["image"] = torch.as_tensor(np.ascontiguousarray(image.transpose(2, 0, 1)))
        if sem_seg_gt is not None:
            dataset_dict["sem_seg"] = torch.as_tensor(sem_seg_gt.astype("long"))

        # USER: Remove if you don't use pre-computed proposals.
        # Most users would not need this feature.
        if self.proposal_topk is not None:
            utils.transform_proposals(
                dataset_dict, image_shape, transforms, proposal_topk=self.proposal_topk
            )

        if "expressions" in dataset_dict:
            dataset_dict["expressions"] = mapper_utils.transform_expressions(
                dataset_dict["expressions"], transforms
            )

        if not self.is_train:
            # USER: Modify this if you want to keep them for some reason.
            dataset_dict.pop("annotations", None)
            dataset_dict.pop("sem_seg_file_name", None)
            return dataset_dict

        if "annotations" in dataset_dict:
            self._transform_annotations(dataset_dict, transforms, image_shape)

        if "instances" in dataset_dict and dataset_dict["instances"].has("phrases"):
            num_instances = len(dataset_dict["instances"])

            if self.nms_thresh_phrase > 0:
                boxes = dataset_dict["instances"].gt_boxes.tensor
                scores = torch.rand(num_instances)
                classes = torch.zeros(num_instances)
                keep = batched_nms(boxes, scores, classes, self.nms_thresh_phrase)
            else:
                keep = torch.randperm(num_instances)

            if self.max_num_phrase > 0:
                keep = keep[: self.max_num_phrase]

            phrases = dataset_dict["instances"].phrases
            phrases_filtered = []
            for x in keep:
                phrases_filtered.append(phrases[x])

            dataset_dict["instances"].remove("phrases")
            dataset_dict["instances"] = dataset_dict["instances"][keep]
            dataset_dict["instances"].phrases = phrases_filtered

        return dataset_dict


================================================
FILE: ape/data/dataset_mapper_detr_instance_exp.py
================================================
import copy
import logging
from typing import List, Optional, Union

import numpy as np
import torch

from detectron2.config import configurable
from detectron2.data import MetadataCatalog
from detectron2.data import detection_utils as utils
from detectron2.data import transforms as T

from . import mapper_utils

"""
This file contains the default mapping that's applied to "dataset dicts".
"""

__all__ = ["DatasetMapper_detr_instance_exp"]


class DatasetMapper_detr_instance_exp:
    """
    A callable which takes a dataset dict in Detectron2 Dataset format,
    and map it into a format used by the model.

    This is the default callable to be used to map your dataset dict into training data.
    You may need to follow it to implement your own one for customized logic,
    such as a different way to read or transform images.
    See :doc:`/tutorials/data_loading` for details.

    The callable currently does the following:

    1. Read the image from "file_name"
    2. Applies cropping/geometric transforms to the image and annotations
    3. Prepare data and annotations to Tensor and :class:`Instances`
    """

    @configurable
    def __init__(
        self,
        is_train: bool,
        *,
        augmentations: List[Union[T.Augmentation, T.Transform]],
        augmentations_with_crop: List[Union[T.Augmentation, T.Transform]],
        image_format: str,
        use_instance_mask: bool = False,
        use_keypoint: bool = False,
        instance_mask_format: str = "polygon",
        keypoint_hflip_indices: Optional[np.ndarray] = None,
        precomputed_proposal_topk: Optional[int] = None,
        recompute_boxes: bool = False,
        dataset_names: tuple = (),
    ):
        """
        NOTE: this interface is experimental.

        Args:
            is_train: whether it's used in training or inference
            augmentations: a list of augmentations or deterministic transforms to apply
            image_format: an image format supported by :func:`detection_utils.read_image`.
            use_instance_mask: whether to process instance segmentation annotations, if available
            use_keypoint: whether to process keypoint annotations if available
            instance_mask_format: one of "polygon" or "bitmask". Process instance segmentation
                masks into this format.
            keypoint_hflip_indices: see :func:`detection_utils.create_keypoint_hflip_indices`
            precomputed_proposal_topk: if given, will load pre-computed
                proposals from dataset_dict and keep the top k proposals for each image.
            recompute_boxes: whether to overwrite bounding box annotations
                by computing tight bounding boxes from instance mask annotations.
        """
        if recompute_boxes:
            assert use_instance_mask, "recompute_boxes requires instance masks"
        # fmt: off
        self.is_train               = is_train
        self.augmentations          = T.AugmentationList(augmentations)
        self.augmentations_with_crop = T.AugmentationList(augmentations_with_crop)
        self.image_format           = image_format
        self.use_instance_mask      = use_instance_mask
        self.instance_mask_format   = instance_mask_format
        self.use_keypoint           = use_keypoint
        self.keypoint_hflip_indices = keypoint_hflip_indices
        self.proposal_topk          = precomputed_proposal_topk
        self.recompute_boxes        = recompute_boxes
        # fmt: on
        logger = logging.getLogger(__name__)
        mode = "training" if is_train else "inference"
        logger.info(f"[DatasetMapper] Augmentations used in {mode}: {augmentations}")
        logger.info(f"[DatasetMapper] Augmentations used in {mode}: {augmentations_with_crop}")

        self.dataset_names = dataset_names

        self.metatada_list = []
        for dataset_name in self.dataset_names:
            metadata = MetadataCatalog.get(dataset_name)
            self.metatada_list.append(metadata)

    @classmethod
    def from_config(cls, cfg, is_train: bool = True):
        raise NotImplementedError(self.__class__.__name__)

    def _transform_annotations(self, dataset_dict, transforms, image_shape):
        # USER: Modify this if you want to keep them for some reason.
        for anno in dataset_dict["annotations"]:
            if not self.use_instance_mask:
                anno.pop("segmentation", None)
            if not self.use_keypoint:
                anno.pop("keypoints", None)

        # USER: Implement additional transformations if you have other types of data
        annos = [
            utils.transform_instance_annotations(
                obj, transforms, image_shape, keypoint_hflip_indices=self.keypoint_hflip_indices
            )
            for obj in dataset_dict.pop("annotations")
            if obj.get("iscrowd", 0) == 0
        ]
        instances = utils.annotations_to_instances(
            annos, image_shape, mask_format=self.instance_mask_format
        )

        # After transforms such as cropping are applied, the bounding box may no longer
        # tightly bound the object. As an example, imagine a triangle object
        # [(0,0), (2,0), (0,2)] cropped by a box [(1,0),(2,2)] (XYXY format). The tight
        # bounding box of the cropped triangle should be [(1,0),(2,1)], which is not equal to
        # the intersection of original bounding box and the cropping box.
        if self.recompute_boxes and instances.has("gt_masks"):
            instances.gt_boxes = instances.gt_masks.get_bounding_boxes()
        dataset_dict["instances"] = utils.filter_empty_instances(instances)

    def __call__(self, dataset_dict):
        """
        Args:
            dataset_dict (dict): Metadata of one image, in Detectron2 Dataset format.

        Returns:
            dict: a format that builtin models in detectron2 accept
        """
        dataset_dict = copy.deepcopy(dataset_dict)  # it will be modified by code below
        # USER: Write your own image loading if it's not from a file
        image = utils.read_image(dataset_dict["file_name"], format=self.image_format)
        utils.check_image_size(dataset_dict, image)

        # ------------------------------------------------------------------------------------
        if (
            self.is_train
            and "annotations" in dataset_dict
            and (
                len(dataset_dict["annotations"]) == 0
                or any(["bbox" not in anno for anno in dataset_dict["annotations"]])
            )
        ):
            if "dataset_id" in dataset_dict:
                dataset_id = dataset_dict["dataset_id"]
            else:
                dataset_id = 0
            metadata = self.metatada_list[dataset_id]
            if "sa1b" in self.dataset_names[dataset_id]:
                metadata = None
            dataset_dict = mapper_utils.maybe_load_annotation_from_file(dataset_dict, meta=metadata)

            for anno in dataset_dict["annotations"]:
                if "bbox" not in anno:
                    logger = logging.getLogger(__name__)
                    logger.warning(f"Box not found: {dataset_dict}")
                    return None
                if "category_id" not in anno:
                    anno["category_id"] = 0
        # ------------------------------------------------------------------------------------

        # USER: Remove if you don't do semantic/panoptic segmentation.
        if "sem_seg_file_name" in dataset_dict:
            sem_seg_gt = utils.read_image(dataset_dict.pop("sem_seg_file_name"), "L").squeeze(2)
        else:
            sem_seg_gt = None

        # ordinal numbers
        disable_crop = False
        if (
            "annotations" in dataset_dict
            and len(dataset_dict["annotations"]) > 0
            and "phrase" in dataset_dict["annotations"][0]
        ):
            disable_crop = disable_crop or mapper_utils.has_ordinal_num(
                [anno["phrase"] for anno in dataset_dict["annotations"]]
            )
        if "expressions" in dataset_dict:
            disable_crop = disable_crop or mapper_utils.has_ordinal_num(dataset_dict["expressions"])

        if self.augmentations_with_crop is None or disable_crop:
            augmentations = self.augmentations
        else:
            if np.random.rand() > 0.5:
                augmentations = self.augmentations
            else:
                augmentations = self.augmentations_with_crop

        aug_input = T.AugInput(image, sem_seg=sem_seg_gt)
        # transforms = self.augmentations(aug_input)
        transforms = augmentations(aug_input)
        image, sem_seg_gt = aug_input.image, aug_input.sem_seg

        image_shape = image.shape[:2]  # h, w
        # Pytorch's dataloader is efficient on torch.Tensor due to shared-memory,
        # but not efficient on large generic data structures due to the use of pickle & mp.Queue.
        # Therefore it's important to use torch.Tensor.
        dataset_dict["image"] = torch.as_tensor(np.ascontiguousarray(image.transpose(2, 0, 1)))
        if sem_seg_gt is not None:
            dataset_dict["sem_seg"] = torch.as_tensor(sem_seg_gt.astype("long"))

        # USER: Remove if you don't use pre-computed proposals.
        # Most users would not need this feature.
        if self.proposal_topk is not None:
            utils.transform_proposals(
                dataset_dict, image_shape, transforms, proposal_topk=self.proposal_topk
            )

        if "expressions" in dataset_dict:
            dataset_dict["expressions"] = mapper_utils.transform_expressions(
                dataset_dict["expressions"], transforms
            )

        if not self.is_train:
            # USER: Modify this if you want to keep them for some reason.
            dataset_dict.pop("annotations", None)
            dataset_dict.pop("sem_seg_file_name", None)
            return dataset_dict

        if "annotations" in dataset_dict:
            self._transform_annotations(dataset_dict, transforms, image_shape)

        return dataset_dict


================================================
FILE: ape/data/dataset_mapper_detr_panoptic.py
================================================
import copy
import logging
import re
from typing import List, Optional, Union

import numpy as np
import torch

from detectron2.config import configurable
from detectron2.data import MetadataCatalog
from detectron2.data import detection_utils as utils
from detectron2.data import transforms as T
from detectron2.structures import BitMasks, Boxes, Instances, PolygonMasks

from . import mapper_utils

"""
This file contains the default mapping that's applied to "dataset dicts".
"""

__all__ = ["DatasetMapper_detr_panoptic"]


class DatasetMapper_detr_panoptic:
    """
    A callable which takes a dataset dict in Detectron2 Dataset format,
    and map it into a format used by the model.

    This is the default callable to be used to map your dataset dict into training data.
    You may need to follow it to implement your own one for customized logic,
    such as a different way to read or transform images.
    See :doc:`/tutorials/data_loading` for details.

    The callable currently does the following:

    1. Read the image from "file_name"
    2. Applies cropping/geometric transforms to the image and annotations
    3. Prepare data and annotations to Tensor and :class:`Instances`
    """

    @configurable
    def __init__(
        self,
        is_train: bool,
        *,
        augmentations: List[Union[T.Augmentation, T.Transform]],
        augmentations_with_crop: List[Union[T.Augmentation, T.Transform]],
        image_format: str,
        use_instance_mask: bool = False,
        use_keypoint: bool = False,
        instance_mask_format: str = "polygon",
        keypoint_hflip_indices: Optional[np.ndarray] = None,
        precomputed_proposal_topk: Optional[int] = None,
        recompute_boxes: bool = False,
        ignore_label: int = 255,
        stuff_classes_offset: int = 80,
        stuff_classes_decomposition: bool = False,
        dataset_names: tuple = (),
    ):
        """
        NOTE: this interface is experimental.

        Args:
            is_train: whether it's used in training or inference
            augmentations: a list of augmentations or deterministic transforms to apply
            image_format: an image format supported by :func:`detection_utils.read_image`.
            use_instance_mask: whether to process instance segmentation annotations, if available
            use_keypoint: whether to process keypoint annotations if available
            instance_mask_format: one of "polygon" or "bitmask". Process instance segmentation
                masks into this format.
            keypoint_hflip_indices: see :func:`detection_utils.create_keypoint_hflip_indices`
            precomputed_proposal_topk: if given, will load pre-computed
                proposals from dataset_dict and keep the top k proposals for each image.
            recompute_boxes: whether to overwrite bounding box annotations
                by computing tight bounding boxes from instance mask annotations.
        """
        if recompute_boxes:
            assert use_instance_mask, "recompute_boxes requires instance masks"
        # fmt: off
        self.is_train               = is_train
        self.augmentations          = T.AugmentationList(augmentations)
        self.augmentations_with_crop = T.AugmentationList(augmentations_with_crop)
        self.image_format           = image_format
        self.use_instance_mask      = use_instance_mask
        self.instance_mask_format   = instance_mask_format
        self.use_keypoint           = use_keypoint
        self.keypoint_hflip_indices = keypoint_hflip_indices
        self.proposal_topk          = precomputed_proposal_topk
        self.recompute_boxes        = recompute_boxes
        self.ignore_label           = ignore_label
        self.stuff_classes_offset   = stuff_classes_offset
        self.stuff_classes_decomposition   = stuff_classes_decomposition
        # fmt: on
        logger = logging.getLogger(__name__)
        mode = "training" if is_train else "inference"
        logger.info(f"[DatasetMapper] Augmentations used in {mode}: {augmentations}")
        logger.info(f"[DatasetMapper] Augmentations used in {mode}: {augmentations_with_crop}")

        self.dataset_names = dataset_names

        self.metatada_list = []
        for dataset_name in self.dataset_names:
            metadata = MetadataCatalog.get(dataset_name)
            self.metatada_list.append(metadata)

    @classmethod
    def from_config(cls, cfg, is_train: bool = True):
        raise NotImplementedError(self.__class__.__name__)

    def _transform_annotations(self, dataset_dict, transforms, image_shape):
        # USER: Modify this if you want to keep them for some reason.
        for anno in dataset_dict["annotations"]:
            if not self.use_instance_mask:
                anno.pop("segmentation", None)
            if not self.use_keypoint:
                anno.pop("keypoints", None)

        # USER: Implement additional transformations if you have other types of data
        annos = [
            utils.transform_instance_annotations(
                obj, transforms, image_shape, keypoint_hflip_indices=self.keypoint_hflip_indices
            )
            for obj in dataset_dict.pop("annotations")
            if obj.get("iscrowd", 0) == 0
        ]
        instances = utils.annotations_to_instances(
            annos, image_shape, mask_format=self.instance_mask_format
        )

        # After transforms such as cropping are applied, the bounding box may no longer
        # tightly bound the object. As an example, imagine a triangle object
        # [(0,0), (2,0), (0,2)] cropped by a box [(1,0),(2,2)] (XYXY format). The tight
        # bounding box of the cropped triangle should be [(1,0),(2,1)], which is not equal to
        # the intersection of original bounding box and the cropping box.
        if self.recompute_boxes and instances.has("gt_masks"):
            instances.gt_boxes = instances.gt_masks.get_bounding_boxes()
        dataset_dict["instances"] = utils.filter_empty_instances(instances)

    def __call__(self, dataset_dict):
        """
        Args:
            dataset_dict (dict): Metadata of one image, in Detectron2 Dataset format.

        Returns:
            dict: a format that builtin models in detectron2 accept
        """
        dataset_dict = copy.deepcopy(dataset_dict)  # it will be modified by code below
        # USER: Write your own image loading if it's not from a file
        image = utils.read_image(dataset_dict["file_name"], format=self.image_format)
        utils.check_image_size(dataset_dict, image)

        # ------------------------------------------------------------------------------------
        if "dataset_id" in dataset_dict:
            dataset_id = dataset_dict["dataset_id"]
        else:
            dataset_id = 0
        metadata = self.metatada_list[dataset_id]
        if "sa1b" in self.dataset_names[dataset_id]:
            metadata = None
        if (
            self.is_train
            and "annotations" in dataset_dict
            and (
                len(dataset_dict["annotations"]) == 0
                or any(["bbox" not in anno for anno in dataset_dict["annotations"]])
            )
        ):
            dataset_dict = mapper_utils.maybe_load_annotation_from_file(dataset_dict, meta=metadata)

            for anno in dataset_dict["annotations"]:
                if "bbox" not in anno:
                    logger = logging.getLogger(__name__)
                    logger.warning(f"Box not found: {dataset_dict}")
                    return None
                if "category_id" not in anno:
                    anno["category_id"] = 0
        # ------------------------------------------------------------------------------------

        # USER: Remove if you don't do semantic/panoptic segmentation.
        if "sem_seg_file_name" in dataset_dict:
            sem_seg_gt = utils.read_image(dataset_dict.pop("sem_seg_file_name"), "L").squeeze(2)
        else:
            sem_seg_gt = None

        # ordinal numbers
        disable_crop = False
        if (
            "annotations" in dataset_dict
            and len(dataset_dict["annotations"]) > 0
            and "phrase" in dataset_dict["annotations"][0]
        ):
            disable_crop = disable_crop or mapper_utils.has_ordinal_num(
                [anno["phrase"] for anno in dataset_dict["annotations"]]
            )
        if "expressions" in dataset_dict:
            disable_crop = disable_crop or mapper_utils.has_ordinal_num(dataset_dict["expressions"])

        if self.augmentations_with_crop is None or disable_crop:
            augmentations = self.augmentations
        else:
            if np.random.rand() > 0.5:
                augmentations = self.augmentations
            else:
                augmentations = self.augmentations_with_crop

        aug_input = T.AugInput(image, sem_seg=sem_seg_gt)
        # transforms = self.augmentations(aug_input)
        transforms = augmentations(aug_input)
        image, sem_seg_gt = aug_input.image, aug_input.sem_seg

        image_shape = image.shape[:2]  # h, w
        # Pytorch's dataloader is efficient on torch.Tensor due to shared-memory,
        # but not efficient on large generic data structures due to the use of pickle & mp.Queue.
        # Therefore it's important to use torch.Tensor.
        dataset_dict["image"] = torch.as_tensor(np.ascontiguousarray(image.transpose(2, 0, 1)))
        if sem_seg_gt is not None:
            dataset_dict["sem_seg"] = torch.as_tensor(sem_seg_gt.astype("long"))

        # USER: Remove if you don't use pre-computed proposals.
        # Most users would not need this feature.
        if self.proposal_topk is not None:
            utils.transform_proposals(
                dataset_dict, image_shape, transforms, proposal_topk=self.proposal_topk
            )

        if "expressions" in dataset_dict:
            dataset_dict["expressions"] = mapper_utils.transform_expressions(
                dataset_dict["expressions"], transforms
            )

        if not self.is_train:
            # USER: Modify this if you want to keep them for some reason.
            dataset_dict.pop("annotations", None)
            dataset_dict.pop("sem_seg_file_name", None)
            dataset_dict.pop("pan_seg_file_name", None)
            dataset_dict.pop("segments_info", None)
            return dataset_dict

        if "annotations" in dataset_dict:
            self._transform_annotations(dataset_dict, transforms, image_shape)

            dataset_dict["instances"].is_thing = torch.tensor(
                [True for _ in range(len(dataset_dict["instances"]))], dtype=torch.bool
            )

        # Prepare per-category binary masks
        if sem_seg_gt is not None and not self.stuff_classes_decomposition:
            instances = Instances(image_shape)
            classes = np.unique(sem_seg_gt).astype(np.int64)
            # remove ignored region
            classes = classes[classes != self.ignore_label]

            if self.stuff_classes_offset > 0:
                classes = classes[classes != 0]
                instances.gt_classes = torch.tensor(
                    classes + self.stuff_classes_offset - 1, dtype=torch.int64
                )
            else:
                instances.gt_classes = torch.tensor(classes, dtype=torch.int64)

            masks = []
            for class_id in classes:
                masks.append(sem_seg_gt == class_id)

            if len(masks) == 0:
                # # Some image does not have annotation (all ignored)
                # instances.gt_masks = torch.zeros((0, sem_seg_gt.shape[-2], sem_seg_gt.shape[-1]))
                masks = BitMasks(torch.zeros((0, sem_seg_gt.shape[-2], sem_seg_gt.shape[-1])))
            else:
                masks = BitMasks(
                    torch.stack([torch.from_numpy(np.ascontiguousarray(x.copy())) for x in masks])
                )

            instances.gt_masks = masks
            instances.gt_boxes = masks.get_bounding_boxes()

            instances.is_thing = torch.tensor(
                [False for _ in range(len(instances))], dtype=torch.bool
            )

            if "instances" in dataset_dict and dataset_dict["instances"].has("copypaste"):
                instances.copypaste = torch.tensor([False for _ in range(len(instances))])

            if len(instances) > 0:
                if "instances" in dataset_dict and len(dataset_dict["instances"]) > 0:
                    dataset_dict["instances"] = Instances.cat(
                        [dataset_dict["instances"], instances]
                    )
                else:
                    dataset_dict["instances"] = instances

        # Prepare per-category binary masks
        if sem_seg_gt is not None and self.stuff_classes_decomposition:
            classes = np.unique(sem_seg_gt)
            # remove ignored region
            classes = classes[classes != self.ignore_label]

            if self.stuff_classes_offset > 0:
                classes = classes[classes != 0]

            gt_masks = []
            gt_classes = []
            for class_id in classes:
                bitmask = sem_seg_gt == class_id
                pygmask, _ = mapper_utils.mask_to_polygons_2(bitmask)
                for mask in pygmask:
                    gt_masks.append([mask])
                    gt_classes.append(class_id)

            # if len(gt_masks) == 0:
            #     return None

            instances = Instances(image_shape)
            instances.gt_classes = torch.tensor(gt_classes, dtype=torch.int64)
            if self.stuff_classes_offset > 0:
                instances.gt_classes += self.stuff_classes_offset - 1
            if self.instance_mask_format == "polygon":
                instances.gt_masks = PolygonMasks(gt_masks)
            else:
                assert self.instance_mask_format == "bitmask", self.instance_mask_format
                instances.gt_masks = BitMasks.from_polygon_masks(
                    gt_masks, image_shape[0], image_shape[1]
                )
            instances.gt_boxes = instances.gt_masks.get_bounding_boxes()

            if self.instance_mask_format == "polygon":
                area = instances.gt_masks.area()
            else:
                assert self.instance_mask_format == "bitmask", self.instance_mask_format
                area = instances.gt_masks.tensor.sum((1, 2))
            instances = instances[area > 8 * 8]

            instances.is_thing = torch.tensor(
                [False for _ in range(len(instances))], dtype=torch.bool
            )

            if "instances" in dataset_dict and dataset_dict["instances"].has("copypaste"):
                instances.copypaste = torch.tensor([False for _ in range(len(instances))])

            if len(instances) > 0:
                if "instances" in dataset_dict and len(dataset_dict["instances"]) > 0:
                    dataset_dict["instances"] = Instances.cat(
                        [dataset_dict["instances"], instances]
                    )
                else:
                    dataset_dict["instances"] = instances

        if "pan_seg_file_name" in dataset_dict and not self.stuff_classes_decomposition:
            pan_seg_gt = utils.read_image(dataset_dict.pop("pan_seg_file_name"), "RGB")
            segments_info = dataset_dict["segments_info"]

            # apply the same transformation to panoptic segmentation
            pan_seg_gt = transforms.apply_segmentation(pan_seg_gt)

            from panopticapi.utils import rgb2id

            pan_seg_gt = rgb2id(pan_seg_gt)

            instances = Instances(image_shape)
            classes = []
            masks = []
            for segment_info in segments_info:
                class_id = segment_info["category_id"]
                if not segment_info["iscrowd"]:
                    classes.append(class_id)
                    masks.append(pan_seg_gt == segment_info["id"])

            classes = np.array(classes)
            instances.gt_classes = torch.tensor(classes, dtype=torch.int64)
            if len(masks) == 0:
                # Some image does not have annotation (all ignored)
                instances.gt_masks = torch.zeros((0, pan_seg_gt.shape[-2], pan_seg_gt.shape[-1]))
                instances.gt_boxes = Boxes(torch.zeros((0, 4)))
            else:
                masks = BitMasks(
                    torch.stack([torch.from_numpy(np.ascontiguousarray(x.copy())) for x in masks])
                )
                instances.gt_masks = masks.tensor
                instances.gt_boxes = masks.get_bounding_boxes()

            if "instances" in dataset_dict and dataset_dict["instances"].has("copypaste"):
                instances.copypaste = torch.tensor([False for _ in range(len(instances))])

            dataset_dict["instances"] = instances

        if "pan_seg_file_name" in dataset_dict and self.stuff_classes_decomposition:
            pan_seg_gt = utils.read_image(dataset_dict.pop("pan_seg_file_name"), "RGB")
            segments_info = dataset_dict["segments_info"]

            # apply the same transformation to panoptic segmentation
            pan_seg_gt = transforms.apply_segmentation(pan_seg_gt)

            from panopticapi.utils import rgb2id

            pan_seg_gt = rgb2id(pan_seg_gt)

            instances = Instances(image_shape)
            classes = []
            masks = []
            for segment_info in segments_info:
                class_id = segment_info["category_id"]
                if not segment_info["iscrowd"]:
                    if class_id in metadata.thing_dataset_id_to_contiguous_id.values():
                        classes.append(class_id)
                        masks.append(pan_seg_gt == segment_info["id"])
                    else:
                        bitmask = pan_seg_gt == segment_info["id"]
                        pygmask, _ = mapper_utils.mask_to_polygons_2(bitmask)
                        for mask in pygmask:
                            mask = (
                                BitMasks.from_polygon_masks(
                                    [[mask]], image_shape[0], image_shape[1]
                                )
                                .tensor[0, ...]
                                .numpy()
                            )
                            classes.append(class_id)
                            masks.append(mask)

            classes = np.array(classes)
            instances.gt_classes = torch.tensor(classes, dtype=torch.int64)
            if len(masks) == 0:
                # Some image does not have annotation (all ignored)
                instances.gt_masks = torch.zeros((0, pan_seg_gt.shape[-2], pan_seg_gt.shape[-1]))
                instances.gt_boxes = Boxes(torch.zeros((0, 4)))
            else:
                masks = BitMasks(
                    torch.stack([torch.from_numpy(np.ascontiguousarray(x.copy())) for x in masks])
                )
                instances.gt_masks = masks.tensor
                instances.gt_boxes = masks.get_bounding_boxes()

            if "instances" in dataset_dict and dataset_dict["instances"].has("copypaste"):
                instances.copypaste = torch.tensor([False for _ in range(len(instances))])

            dataset_dict["instances"] = instances

        if "instances" in dataset_dict and len(dataset_dict["instances"]) > 0:
            pass
        else:
            return None

        return dataset_dict


================================================
FILE: ape/data/dataset_mapper_detr_panoptic_copypaste.py
================================================
import copy
import logging
import os
import random
from typing import List, Optional, Union

import cv2
import numpy as np
import torch

import detectron2.utils.comm as comm
from detectron2.config import configurable
from detectron2.data import MetadataCatalog
from detectron2.data import detection_utils as utils
from detectron2.data import transforms as T
from detectron2.data.detection_utils import convert_image_to_rgb
from detectron2.layers import batched_nms
from detectron2.structures import BitMasks, Boxes, Instances, PolygonMasks

from . import mapper_utils

"""
This file contains the default mapping that's applied to "dataset dicts".
"""

__all__ = ["DatasetMapper_detr_panoptic_copypaste"]


class DatasetMapper_detr_panoptic_copypaste:
    """
    A callable which takes a dataset dict in Detectron2 Dataset format,
    and map it into a format used by the model.

    This is the default callable to be used to map your dataset dict into training data.
    You may need to follow it to implement your own one for customized logic,
    such as a different way to read or transform images.
    See :doc:`/tutorials/data_loading` for details.

    The callable currently does the following:

    1. Read the image from "file_name"
    2. Applies cropping/geometric transforms to the image and annotations
    3. Prepare data and annotations to Tensor and :class:`Instances`
    """

    @configurable
    def __init__(
        self,
        is_train: bool,
        *,
        augmentations: List[Union[T.Augmentation, T.Transform]],
        augmentations_with_crop: List[Union[T.Augmentation, T.Transform]],
        image_format: str,
        use_instance_mask: bool = False,
        use_keypoint: bool = False,
        instance_mask_format: str = "polygon",
        keypoint_hflip_indices: Optional[np.ndarray] = None,
        precomputed_proposal_topk: Optional[int] = None,
        recompute_boxes: bool = False,
        ignore_label: int = 255,
        stuff_classes_offset: int = 80,
        stuff_classes_decomposition: bool = False,
        copypaste_prob: float = 0.5,
        output_dir: str = None,
        vis_period: int = 0,
        dataset_names: tuple = (),
        max_num_phrase: int = 0,
        nms_thresh_phrase: float = 0.0,
    ):
        """
        NOTE: this interface is experimental.

        Args:
            is_train: whether it's used in training or inference
            augmentations: a list of augmentations or deterministic transforms to apply
            image_format: an image format supported by :func:`detection_utils.read_image`.
            use_instance_mask: whether to process instance segmentation annotations, if available
            use_keypoint: whether to process keypoint annotations if available
            instance_mask_format: one of "polygon" or "bitmask". Process instance segmentation
                masks into this format.
            keypoint_hflip_indices: see :func:`detection_utils.create_keypoint_hflip_indices`
            precomputed_proposal_topk: if given, will load pre-computed
                proposals from dataset_dict and keep the top k proposals for each image.
            recompute_boxes: whether to overwrite bounding box annotations
                by computing tight bounding boxes from instance mask annotations.
        """
        if recompute_boxes:
            assert use_instance_mask, "recompute_boxes requires instance masks"
        # fmt: off
        self.is_train               = is_train
        self.augmentations          = T.AugmentationList(augmentations)
        self.augmentations_with_crop = T.AugmentationList(augmentations_with_crop)
        self.image_format           = image_format
        self.use_instance_mask      = use_instance_mask
        self.instance_mask_format   = instance_mask_format
        self.use_keypoint           = use_keypoint
        self.keypoint_hflip_indices = keypoint_hflip_indices
        self.proposal_topk          = precomputed_proposal_topk
        self.recompute_boxes        = recompute_boxes
        self.ignore_label           = ignore_label
        self.stuff_classes_offset   = stuff_classes_offset
        self.stuff_classes_decomposition   = stuff_classes_decomposition
        # fmt: on
        logger = logging.getLogger(__name__)
        mode = "training" if is_train else "inference"
        logger.info(f"[DatasetMapper] Augmentations used in {mode}: {augmentations}")
        logger.info(f"[DatasetMapper] Augmentations used in {mode}: {augmentations_with_crop}")

        if output_dir is not None:
            self.output_dir = os.path.join(output_dir, "vis_mapper")
            os.makedirs(self.output_dir, exist_ok=True)

        self.copypaste_prob = copypaste_prob
        self.vis_period = vis_period
        self.iter = 0
        self.dataset_names = dataset_names

        self.metatada_list = []
        for dataset_name in self.dataset_names:
            metadata = MetadataCatalog.get(dataset_name)
            self.metatada_list.append(metadata)

        self.max_num_phrase = max_num_phrase
        self.nms_thresh_phrase = nms_thresh_phrase

    @classmethod
    def from_config(cls, cfg, is_train: bool = True):
        raise NotImplementedError(self.__class__.__name__)

    def _transform_annotations(self, dataset_dict, transforms, image_shape):
        # USER: Modify this if you want to keep them for some reason.
        for anno in dataset_dict["annotations"]:
            if not self.use_instance_mask:
                anno.pop("segmentation", None)
            if not self.use_keypoint:
                anno.pop("keypoints", None)

        copypaste = [
            obj.get("copypaste", 0)
            for obj in dataset_dict["annotations"]
            if obj.get("iscrowd", 0) == 0
        ]

        phrases = [
            obj.get("phrase", "")
            for obj in dataset_dict["annotations"]
            if obj.get("iscrowd", 0) == 0
        ]

        # USER: Implement additional transformations if you have other types of data
        annos = [
            utils.transform_instance_annotations(
                obj, transforms, image_shape, keypoint_hflip_indices=self.keypoint_hflip_indices
            )
            for obj in dataset_dict.pop("annotations")
            if obj.get("iscrowd", 0) == 0
        ]
        instances = utils.annotations_to_instances(
            annos, image_shape, mask_format=self.instance_mask_format
        )

        instances.copypaste = torch.tensor(copypaste)

        if sum([len(x) for x in phrases]) > 0:
            instances.phrase_idxs = torch.tensor(range(len(phrases)))

        # After transforms such as cropping are applied, the bounding box may no longer
        # tightly bound the object. As an example, imagine a triangle object
        # [(0,0), (2,0), (0,2)] cropped by a box [(1,0),(2,2)] (XYXY format). The tight
        # bounding box of the cropped triangle should be [(1,0),(2,1)], which is not equal to
        # the intersection of original bounding box and the cropping box.
        if self.recompute_boxes and instances.has("gt_masks"):
            instances.gt_boxes = instances.gt_masks.get_bounding_boxes()
        dataset_dict["instances"] = utils.filter_empty_instances(instances)

        if sum([len(x) for x in phrases]) > 0:
            phrases_filtered = []
            for x in dataset_dict["instances"].phrase_idxs.tolist():
                phrases_filtered.append(phrases[x])
            dataset_dict["instances"].phrases = mapper_utils.transform_phrases(
                phrases_filtered, transforms
            )
            dataset_dict["instances"].remove("phrase_idxs")
            # dataset_dict["instances"].gt_classes = torch.tensor(range(len(phrases_filtered)))

    def __call__(self, dataset_dict, dataset_dict_bg):
        """
        Args:
            dataset_dict (dict): Metadata of one image, in Detectron2 Dataset format.

        Returns:
            dict: a format that builtin models in detectron2 accept
        """
        dataset_dict = copy.deepcopy(dataset_dict)  # it will be modified by code below
        # USER: Write your own image loading if it's not from a file
        try:
            image = utils.read_image(dataset_dict["file_name"], format=self.image_format)
        except Exception as e:
            logger = logging.getLogger(__name__)
            logger.error(f"read_image fails: {dataset_dict['file_name']}")
            logger.error(f"read_image fails: {e}")
            return None
        utils.check_image_size(dataset_dict, image)

        # ------------------------------------------------------------------------------------
        if "dataset_id" in dataset_dict:
            dataset_id = dataset_dict["dataset_id"]
        else:
            dataset_id = 0
        metadata = self.metatada_list[dataset_id]
        if "sa1b" in self.dataset_names[dataset_id]:
            metadata = None
        if (
            self.is_train
            and "annotations" in dataset_dict
            and (
                len(dataset_dict["annotations"]) == 0
                or any(["bbox" not in anno for anno in dataset_dict["annotations"]])
            )
        ):
            dataset_dict = mapper_utils.maybe_load_annotation_from_file(dataset_dict, meta=metadata)

            for anno in dataset_dict["annotations"]:
                if "bbox" not in anno:
                    logger = logging.getLogger(__name__)
                    logger.warning(f"Box not found: {dataset_dict}")
                    return None
                if "category_id" not in anno:
                    anno["category_id"] = 0
        # ------------------------------------------------------------------------------------

        # ------------------------------------------------------------------------------------
        if dataset_dict["copypaste"] and self.copypaste_prob > random.uniform(0, 1):
            image_cp, dataset_dict_cp = mapper_utils.copypaste(
                dataset_dict, dataset_dict_bg, self.image_format, self.instance_mask_format
            )

            if dataset_dict_cp is None or image_cp is None:
                pass
            else:
                for key in dataset_dict.keys():
                    if key in dataset_dict_cp:
                        continue
                    dataset_dict_cp[key] = dataset_dict[key]
                dataset_dict = dataset_dict_cp
                image = image_cp
        # ------------------------------------------------------------------------------------

        # USER: Remove if you don't do semantic/panoptic segmentation.
        if "sem_seg_file_name" in dataset_dict:
            try:
                sem_seg_gt = utils.read_image(dataset_dict.pop("sem_seg_file_name"), "L").squeeze(2)
            except Exception as e:
                logger = logging.getLogger(__name__)
                logger.error(f"read_image fails: {e}")
                logger.error(f"read_image fails: {dataset_dict}")
                return None

            if "copypaste_mask" in dataset_dict:
                # assume thing class is 0
                sem_seg_gt = sem_seg_gt.copy()
                sem_seg_gt[dataset_dict["copypaste_mask"]] = 0
        else:
            sem_seg_gt = None

        # ordinal numbers
        disable_crop = False
        if (
            "annotations" in dataset_dict
            and len(dataset_dict["annotations"]) > 0
            and "phrase" in dataset_dict["annotations"][0]
        ):
            disable_crop = disable_crop or mapper_utils.has_ordinal_num(
                [anno["phrase"] for anno in dataset_dict["annotations"]]
            )
        if "expressions" in dataset_dict:
            disable_crop = disable_crop or mapper_utils.has_ordinal_num(dataset_dict["expressions"])

        if self.augmentations_with_crop is None or disable_crop:
            augmentations = self.augmentations
        else:
            if np.random.rand() > 0.5:
                augmentations = self.augmentations
            else:
                augmentations = self.augmentations_with_crop

        aug_input = T.AugInput(image, sem_seg=sem_seg_gt)
        # transforms = self.augmentations(aug_input)
        transforms = augmentations(aug_input)
        image, sem_seg_gt = aug_input.image, aug_input.sem_seg

        image_shape = image.shape[:2]  # h, w
        # Pytorch's dataloader is efficient on torch.Tensor due to shared-memory,
        # but not efficient on large generic data structures due to the use of pickle & mp.Queue.
        # Therefore it's important to use torch.Tensor.
        dataset_dict["image"] = torch.as_tensor(np.ascontiguousarray(image.transpose(2, 0, 1)))
        if sem_seg_gt is not None:
            dataset_dict["sem_seg"] = torch.as_tensor(sem_seg_gt.astype("long"))

        # USER: Remove if you don't use pre-computed proposals.
        # Most users would not need this feature.
        if self.proposal_topk is not None:
            utils.transform_proposals(
                dataset_dict, image_shape, transforms, proposal_topk=self.proposal_topk
            )

        if "expressions" in dataset_dict:
            dataset_dict["expressions"] = mapper_utils.transform_expressions(
                dataset_dict["expressions"], transforms
            )

        if not self.is_train:
            # USER: Modify this if you want to keep them for some reason.
            dataset_dict.pop("annotations", None)
            dataset_dict.pop("sem_seg_file_name", None)
            dataset_dict.pop("pan_seg_file_name", None)
            dataset_dict.pop("segments_info", None)
            return dataset_dict

        if "annotations" in dataset_dict:
            self._transform_annotations(dataset_dict, transforms, image_shape)

            dataset_dict["instances"].is_thing = torch.tensor(
                [True for _ in range(len(dataset_dict["instances"]))], dtype=torch.bool
            )

        if "instances" in dataset_dict and dataset_dict["instances"].has("phrases"):
            num_instances = len(dataset_dict["instances"])

            if self.nms_thresh_phrase > 0:
                boxes = dataset_dict["instances"].gt_boxes.tensor
                scores = torch.rand(num_instances)
                classes = torch.zeros(num_instances)
                keep = batched_nms(boxes, scores, classes, self.nms_thresh_phrase)
            else:
                keep = torch.randperm(num_instances)

            if self.max_num_phrase > 0:
                keep = keep[: self.max_num_phrase]

            phrases = dataset_dict["instances"].phrases
            phrases_filtered = []
            for x in keep:
                phrases_filtered.append(phrases[x])

            dataset_dict["instances"].remove("phrases")
            dataset_dict["instances"] = dataset_dict["instances"][keep]
            dataset_dict["instances"].phrases = phrases_filtered

        # Prepare per-category binary masks
        if sem_seg_gt is not None and not self.stuff_classes_decomposition:
            instances = Instances(image_shape)
            classes = np.unique(sem_seg_gt).astype(np.int64)
            # remove ignored region
            classes = classes[classes != self.ignore_label]

            if self.stuff_classes_offset > 0:
                classes = classes[classes != 0]
                instances.gt_classes = torch.tensor(
                    classes + self.stuff_classes_offset - 1, dtype=torch.int64
                )
            else:
                instances.gt_classes = torch.tensor(classes, dtype=torch.int64)

            masks = []
            for class_id in classes:
                masks.append(sem_seg_gt == class_id)

            if len(masks) == 0:
                # # Some image does not have annotation (all ignored)
                # instances.gt_masks = torch.zeros((0, sem_seg_gt.shape[-2], sem_seg_gt.shape[-1]))
                masks = BitMasks(torch.zeros((0, sem_seg_gt.shape[-2], sem_seg_gt.shape[-1])))
            else:
                masks = BitMasks(
                    torch.stack([torch.from_numpy(np.ascontiguousarray(x.copy())) for x in masks])
                )

            instances.gt_masks = masks
            instances.gt_boxes = masks.get_bounding_boxes()

            instances.is_thing = torch.tensor(
                [False for _ in range(len(instances))], dtype=torch.bool
            )

            if "instances" in dataset_dict and dataset_dict["instances"].has("copypaste"):
                instances.copypaste = torch.tensor([False for _ in range(len(instances))])

            if len(instances) > 0:
                if "instances" in dataset_dict and len(dataset_dict["instances"]) > 0:
                    dataset_dict["instances"] = Instances.cat(
                        [dataset_dict["instances"], instances]
                    )
                else:
                    dataset_dict["instances"] = instances

        # Prepare per-category binary masks
        if sem_seg_gt is not None and self.stuff_classes_decomposition:
            classes = np.unique(sem_seg_gt)
            # remove ignored region
            classes = classes[classes != self.ignore_label]

            if self.stuff_classes_offset > 0:
                classes = classes[classes != 0]

            gt_masks = []
            gt_classes = []
            for class_id in classes:
                bitmask = sem_seg_gt == class_id
                pygmask, _ = mapper_utils.mask_to_polygons_2(bitmask)
                for mask in pygmask:
                    gt_masks.append([mask])
                    gt_classes.append(class_id)

            # if len(gt_masks) == 0:
            #     return None

            instances = Instances(image_shape)
            instances.gt_classes = torch.tensor(gt_classes, dtype=torch.int64)
            if self.stuff_classes_offset > 0:
                instances.gt_classes += self.stuff_classes_offset - 1
            if self.instance_mask_format == "polygon":
                instances.gt_masks = PolygonMasks(gt_masks)
            else:
                assert self.instance_mask_format == "bitmask", self.instance_mask_format
                instances.gt_masks = BitMasks.from_polygon_masks(
                    gt_masks, image_shape[0], image_shape[1]
                )
            instances.gt_boxes = instances.gt_masks.get_bounding_boxes()

            if self.instance_mask_format == "polygon":
                area = instances.gt_masks.area()
            else:
                assert self.instance_mask_format == "bitmask", self.instance_mask_format
                area = instances.gt_masks.tensor.sum((1, 2))
            instances = instances[area > 8 * 8]

            instances.is_thing = torch.tensor(
                [False for _ in range(len(instances))], dtype=torch.bool
            )

            if "instances" in dataset_dict and dataset_dict["instances"].has("copypaste"):
                instances.copypaste = torch.tensor([False for _ in range(len(instances))])

            if len(instances) > 0:
                if "instances" in dataset_dict and len(dataset_dict["instances"]) > 0:
                    dataset_dict["instances"] = Instances.cat(
                        [dataset_dict["instances"], instances]
                    )
                else:
                    dataset_dict["instances"] = instances

        if "pan_seg_file_name" in dataset_dict and not self.stuff_classes_decomposition:
            pan_seg_gt = utils.read_image(dataset_dict.pop("pan_seg_file_name"), "RGB")
            segments_info = dataset_dict["segments_info"]

            # apply the same transformation to panoptic segmentation
            pan_seg_gt = transforms.apply_segmentation(pan_seg_gt)

            from panopticapi.utils import rgb2id

            pan_seg_gt = rgb2id(pan_seg_gt)

            instances = Instances(image_shape)
            classes = []
            masks = []
            for segment_info in segments_info:
                class_id = segment_info["category_id"]
                if not segment_info["iscrowd"]:
                    classes.append(class_id)
                    masks.append(pan_seg_gt == segment_info["id"])

            classes = np.array(classes)
            instances.gt_classes = torch.tensor(classes, dtype=torch.int64)
            if len(masks) == 0:
                # Some image does not have annotation (all ignored)
                instances.gt_masks = torch.zeros((0, pan_seg_gt.shape[-2], pan_seg_gt.shape[-1]))
                instances.gt_boxes = Boxes(torch.zeros((0, 4)))
            else:
                masks = BitMasks(
                    torch.stack([torch.from_numpy(np.ascontiguousarray(x.copy())) for x in masks])
                )
                instances.gt_masks = masks.tensor
                instances.gt_boxes = masks.get_bounding_boxes()

            if "instances" in dataset_dict and dataset_dict["instances"].has("copypaste"):
                instances.copypaste = torch.tensor([False for _ in range(len(instances))])

            dataset_dict["instances"] = instances

        if "pan_seg_file_name" in dataset_dict and self.stuff_classes_decomposition:
            pan_seg_gt = utils.read_image(dataset_dict.pop("pan_seg_file_name"), "RGB")
            segments_info = dataset_dict["segments_info"]

            # apply the same transformation to panoptic segmentation
            pan_seg_gt = transforms.apply_segmentation(pan_seg_gt)

            from panopticapi.utils import rgb2id

            pan_seg_gt = rgb2id(pan_seg_gt)

            instances = Instances(image_shape)
            classes = []
            masks = []
            for segment_info in segments_info:
                class_id = segment_info["category_id"]
                if not segment_info["iscrowd"]:
                    if class_id in metadata.thing_dataset_id_to_contiguous_i

Download .txt

gitextract_oqcovuov/

├── .gitignore
├── LICENSE
├── README.md
├── ape/
│   ├── __init__.py
│   ├── checkpoint/
│   │   ├── __init__.py
│   │   └── detection_checkpoint.py
│   ├── data/
│   │   ├── __init__.py
│   │   ├── build.py
│   │   ├── build_copypaste.py
│   │   ├── build_multi_dataset.py
│   │   ├── build_multi_dataset_copypaste.py
│   │   ├── common_copypaste.py
│   │   ├── dataset_mapper.py
│   │   ├── dataset_mapper_copypaste.py
│   │   ├── dataset_mapper_detr_instance.py
│   │   ├── dataset_mapper_detr_instance_exp.py
│   │   ├── dataset_mapper_detr_panoptic.py
│   │   ├── dataset_mapper_detr_panoptic_copypaste.py
│   │   ├── dataset_mapper_detr_semantic.py
│   │   ├── datasets/
│   │   │   ├── __init__.py
│   │   │   ├── coco.py
│   │   │   ├── d_cube.py
│   │   │   ├── flickr30k.py
│   │   │   ├── gqa.py
│   │   │   ├── grit.py
│   │   │   ├── inst_categories.py
│   │   │   ├── lvis_coco.py
│   │   │   ├── lvis_coco_panoptic.py
│   │   │   ├── lvis_v1_coco_category_image_count.py
│   │   │   ├── objects365.py
│   │   │   ├── odinw_categories.py
│   │   │   ├── odinw_instance.py
│   │   │   ├── odinw_prompts.py
│   │   │   ├── oid.py
│   │   │   ├── openimages_v6_category_image_count.py
│   │   │   ├── pascal_voc_external.py
│   │   │   ├── phrasecut.py
│   │   │   ├── refcoco.py
│   │   │   ├── register_bdd100k_panoseg.py
│   │   │   ├── register_bdd100k_semseg.py
│   │   │   ├── register_pascal_context.py
│   │   │   ├── register_voc_seg.py
│   │   │   ├── sa1b.py
│   │   │   ├── seginw_categories.py
│   │   │   ├── seginw_instance.py
│   │   │   ├── visualgenome.py
│   │   │   └── visualgenome_categories.py
│   │   ├── detection_utils.py
│   │   ├── mapper_utils.py
│   │   ├── samplers/
│   │   │   ├── __init__.py
│   │   │   └── distributed_sampler_multi_dataset.py
│   │   └── transforms/
│   │       ├── __init__.py
│   │       ├── augmentation_aa.py
│   │       └── augmentation_lsj.py
│   ├── engine/
│   │   ├── __init__.py
│   │   ├── defaults.py
│   │   └── train_loop.py
│   ├── evaluation/
│   │   ├── __init__.py
│   │   ├── d3_evaluation.py
│   │   ├── evaluator.py
│   │   ├── instance_evaluation.py
│   │   ├── lvis_evaluation.py
│   │   ├── multi_dataset_evaluator.py
│   │   ├── oideval.py
│   │   ├── refcoco_evaluation.py
│   │   └── refcocoeval.py
│   ├── layers/
│   │   ├── __init__.py
│   │   ├── csrc/
│   │   │   ├── MsDeformAttn/
│   │   │   │   ├── ms_deform_attn.h
│   │   │   │   ├── ms_deform_attn_cpu.cpp
│   │   │   │   ├── ms_deform_attn_cpu.h
│   │   │   │   ├── ms_deform_attn_cuda.cu
│   │   │   │   ├── ms_deform_attn_cuda.h
│   │   │   │   └── ms_deform_im2col_cuda.cuh
│   │   │   ├── cuda_version.cu
│   │   │   └── vision.cpp
│   │   ├── fuse_helper.py
│   │   ├── multi_scale_deform_attn.py
│   │   ├── vision_language_align.py
│   │   ├── vision_language_fusion.py
│   │   └── zero_shot_fc.py
│   ├── model_zoo/
│   │   ├── __init__.py
│   │   └── model_zoo.py
│   ├── modeling/
│   │   ├── __init__.py
│   │   ├── ape_deta/
│   │   │   ├── __init__.py
│   │   │   ├── ape_deta.py
│   │   │   ├── assigner.py
│   │   │   ├── deformable_criterion.py
│   │   │   ├── deformable_detr.py
│   │   │   ├── deformable_detr_segm.py
│   │   │   ├── deformable_detr_segm_vl.py
│   │   │   ├── deformable_transformer.py
│   │   │   ├── deformable_transformer_vl.py
│   │   │   ├── fast_rcnn.py
│   │   │   ├── misc.py
│   │   │   └── segmentation.py
│   │   ├── backbone/
│   │   │   ├── __init__.py
│   │   │   ├── utils_eva.py
│   │   │   ├── utils_eva02.py
│   │   │   ├── vit.py
│   │   │   ├── vit_eva.py
│   │   │   ├── vit_eva02.py
│   │   │   └── vit_eva_clip.py
│   │   ├── deta/
│   │   │   ├── __init__.py
│   │   │   ├── assigner.py
│   │   │   ├── deformable_criterion.py
│   │   │   ├── deformable_detr.py
│   │   │   ├── deformable_detr_segm.py
│   │   │   ├── deformable_transformer.py
│   │   │   ├── misc.py
│   │   │   └── segmentation.py
│   │   └── text/
│   │       ├── __init__.py
│   │       ├── bert_wrapper.py
│   │       ├── clip_wrapper.py
│   │       ├── clip_wrapper_eva01.py
│   │       ├── clip_wrapper_eva02.py
│   │       ├── clip_wrapper_open.py
│   │       ├── eva01_clip/
│   │       │   ├── README.md
│   │       │   ├── __init__.py
│   │       │   ├── clip.py
│   │       │   ├── eva_clip.py
│   │       │   ├── eva_model.py
│   │       │   ├── model.py
│   │       │   ├── simple_tokenizer.py
│   │       │   └── vit_model.py
│   │       ├── eva02_clip/
│   │       │   ├── __init__.py
│   │       │   ├── constants.py
│   │       │   ├── eva_vit_model.py
│   │       │   ├── factory.py
│   │       │   ├── hf_configs.py
│   │       │   ├── hf_model.py
│   │       │   ├── loss.py
│   │       │   ├── model.py
│   │       │   ├── modified_resnet.py
│   │       │   ├── openai.py
│   │       │   ├── pretrained.py
│   │       │   ├── rope.py
│   │       │   ├── timm_model.py
│   │       │   ├── tokenizer.py
│   │       │   ├── transform.py
│   │       │   ├── transformer.py
│   │       │   └── utils.py
│   │       ├── llama2_wrapper.py
│   │       ├── t5_wrapper.py
│   │       ├── text_encoder.py
│   │       └── utils.py
│   └── utils/
│       ├── __init__.py
│       ├── box_ops.py
│       ├── misc.py
│       └── plot_utils.py
├── configs/
│   ├── ADE20kFull_SemanticSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024.py
│   │       └── ape_deta_vitt_eva02_vlf_lsj1024.py
│   ├── ADE20k_PanopticSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_160k.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024.py
│   │       └── ape_deta_vitt_eva02_vlf_lsj1024.py
│   ├── ADE20k_SemanticSegmentation/
│   │   ├── ape_deta/
│   │   │   ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │   │   ├── ape_deta_vitl_eva02_lsj1024.py
│   │   │   ├── ape_deta_vitl_eva02_vlf_lsj1024.py
│   │   │   └── ape_deta_vitt_eva02_vlf_lsj1024.py
│   │   └── deformable_deta/
│   │       └── deformable_deta_segm_r50_160k.py
│   ├── BDD10k_PanopticSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024.py
│   │       └── ape_deta_vitt_eva02_vlf_lsj1024.py
│   ├── BDD10k_SemanticSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024.py
│   │       └── ape_deta_vitt_eva02_vlf_lsj1024.py
│   ├── COCO_Detection/
│   │   ├── deformable_deta/
│   │   │   ├── deformable_deta_r50_12ep.py
│   │   │   ├── deformable_deta_r50_24ep.py
│   │   │   ├── deformable_deta_vitb_clip_openai_lsj1024_cp_12ep.py
│   │   │   ├── deformable_deta_vitb_lsj1024_12ep.py
│   │   │   ├── deformable_deta_vitg_eva_lsj1024_12ep.py
│   │   │   ├── deformable_deta_vitg_eva_lsj1024_cp_12ep.py
│   │   │   ├── deformable_deta_vitl_eva02_lsj1024_cp_12ep.py
│   │   │   ├── deformable_deta_vitl_eva_lsj1024_cp_12ep.py
│   │   │   ├── deformable_deta_vitl_lsj1024_12ep.py
│   │   │   └── models/
│   │   │       └── deformable_deta_r50.py
│   │   └── deformable_detr/
│   │       ├── deformable_detr_r50_50ep.py
│   │       ├── deformable_detr_r50_two_stage_50ep.py
│   │       ├── deformable_detr_r50_with_box_refinement_50ep.py
│   │       ├── improved_deformable_detr_r50_12ep.py
│   │       ├── improved_deformable_detr_r50_50ep.py
│   │       ├── improved_deformable_detr_r50_two_stage_12ep.py
│   │       ├── improved_deformable_detr_r50_two_stage_50ep.py
│   │       └── models/
│   │           ├── deformable_detr_r50.py
│   │           └── improved_deformable_detr_r50.py
│   ├── COCO_InstanceSegmentation/
│   │   ├── ape_deta/
│   │   │   ├── ape_deta_r50_12ep.py
│   │   │   ├── ape_deta_r50_vlf_12ep.py
│   │   │   ├── ape_deta_vite_eva02_clip_lsj1024_cp_12ep_fsdp.py
│   │   │   ├── ape_deta_vite_eva02_clip_lsj1024_cp_32x90k_fsdp.py
│   │   │   ├── ape_deta_vitg_eva01_clip_lsj1536_cp_128x45k.py
│   │   │   ├── ape_deta_vitg_eva01_clip_lsj1536_cp_64x90k.py
│   │   │   ├── ape_deta_vitg_eva01_lsj1536_cp_64x90k.py
│   │   │   ├── ape_deta_vitl_eva02_clip_lsj1024_cp_12ep.py
│   │   │   ├── ape_deta_vitl_eva02_clip_lsj1024_cp_12ep_fsdp.py
│   │   │   ├── ape_deta_vitl_eva02_clip_lsj1536_cp_128x45k.py
│   │   │   ├── ape_deta_vitl_eva02_clip_lsj1536_cp_64x90k.py
│   │   │   ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_12ep.py
│   │   │   ├── ape_deta_vitl_eva02_lsj1024_cp_12ep.py
│   │   │   ├── ape_deta_vitl_eva02_lsj1536_cp_128x90k.py
│   │   │   ├── ape_deta_vitl_eva02_lsj1536_cp_12ep.py
│   │   │   ├── ape_deta_vitl_eva02_lsj1536_cp_64x90k.py
│   │   │   ├── ape_deta_vitl_eva02_vlf_lsj1024_cp_12ep.py
│   │   │   ├── ape_deta_vitl_lsj1024_cp_12ep.py
│   │   │   ├── ape_deta_vitt_eva02_lsj1024_cp_12ep.py
│   │   │   ├── ape_deta_vitt_eva02_vlf_lsj1024_cp_12ep.py
│   │   │   └── models/
│   │   │       └── ape_deta_r50.py
│   │   └── deformable_deta/
│   │       ├── deformable_deta_segm_r50_12ep.py
│   │       ├── deformable_deta_segm_r50_24ep.py
│   │       ├── deformable_deta_segm_vitl_eva02_lsj1024_cp_12ep.py
│   │       └── models/
│   │           └── deformable_deta_segm_r50.py
│   ├── COCO_PanopticSegmentation/
│   │   ├── ape_deta/
│   │   │   ├── ape_deta_r50_12ep.py
│   │   │   ├── ape_deta_r50_12ep_separated.py
│   │   │   ├── ape_deta_r50_24ep.py
│   │   │   ├── ape_deta_r50_lsj1024.py
│   │   │   ├── ape_deta_r50_vlf_lsj1024.py
│   │   │   ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │   │   ├── ape_deta_vitl_eva02_lsj1024.py
│   │   │   ├── ape_deta_vitl_eva02_vlf_lsj1024.py
│   │   │   └── ape_deta_vitt_eva02_vlf_lsj1024.py
│   │   └── deformable_deta/
│   │       ├── deformable_deta_segm_r50_12ep.py
│   │       ├── deformable_deta_segm_r50_24ep.py
│   │       ├── deformable_deta_segm_r50_36ep.py
│   │       └── deformable_deta_segm_r50_50ep.py
│   ├── COCO_REFCOCO/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_12ep.py
│   │       ├── ape_deta_r50_24ep.py
│   │       ├── ape_deta_r50_36ep.py
│   │       ├── ape_deta_r50_vlf_12ep.py
│   │       ├── ape_deta_r50_vlf_36ep.py
│   │       ├── ape_deta_r50_vlf_bert_36ep.py
│   │       ├── ape_deta_vitl_eva02_lsj1024_12ep.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024_36ep.py
│   │       └── ape_deta_vitl_lsj1024_12ep.py
│   ├── COCO_SA1B_InstanceSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_24ep.py
│   │       └── ape_deta_r50_24ep_mp.py
│   ├── COCO_SA1B_PanopticSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_24ep.py
│   │       ├── ape_deta_r50_24ep_lp.py
│   │       └── ape_deta_r50_24ep_vlf_lp.py
│   ├── COCO_SemanticSegmentation/
│   │   ├── ape_deta/
│   │   │   ├── ape_deta_r50_12ep.py
│   │   │   ├── ape_deta_r50_vlf_lsj1024_12ep.py
│   │   │   └── ape_deta_vitl_eva02_lsj1024_12ep.py
│   │   └── deformable_deta/
│   │       └── deformable_deta_segm_r50_12ep.py
│   ├── Cityscapes_PanopticSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024.py
│   │       └── ape_deta_vitt_eva02_vlf_lsj1024.py
│   ├── D3_InstanceSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024.py
│   │       └── ape_deta_vitt_eva02_vlf_lsj1024.py
│   ├── Flickr30k_VisualGrounding/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_12ep.py
│   │       ├── ape_deta_r50_vlf_12ep.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │       └── ape_deta_vitl_eva02_vlf_lsj1024.py
│   ├── GQA_VisualGrounding/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_12ep.py
│   │       ├── ape_deta_r50_12ep_eval_odinw13.py
│   │       ├── ape_deta_r50_12ep_eval_odinw35.py
│   │       ├── ape_deta_r50_vlf_12ep.py
│   │       ├── ape_deta_r50_vlf_12ep_eval_odinw13.py
│   │       └── ape_deta_r50_vlf_12ep_eval_odinw35.py
│   ├── GRIT_SA1B_VisualGrounding/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_24ep.py
│   │       └── ape_deta_r50_vlf_24ep.py
│   ├── GRIT_VisualGrounding/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_400k.py
│   │       ├── ape_deta_r50_vlf_400k.py
│   │       └── ape_deta_r50_vlf_lsj224_256x50k.py
│   ├── LVISCOCOCOCOSTUFF_O365_OID_VG/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_lsj1024_cp_50ep.py
│   │       ├── ape_deta_vitl_eva02_lsj1024_cp_180k.py
│   │       ├── ape_deta_vitl_eva02_lsj1024_cp_720k.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024_cp_180k.py
│   │       └── ape_deta_vitl_eva02_vlf_lsj1024_cp_720k.py
│   ├── LVISCOCOCOCOSTUFF_O365_OID_VGR_REFCOCO/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_lsj1024_cp_180k.py
│   │       ├── ape_deta_vitl_eva02_lsj1024_cp_720k.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024_cp_180k.py
│   │       └── ape_deta_vitl_eva02_vlf_lsj1024_cp_720k.py
│   ├── LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k.py
│   │       └── ape_deta_vitl_eva02_vlf_lsj1024_cp_2160k.py
│   ├── LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/
│   │   └── ape_deta/
│   │       ├── ape_deta_vite_eva02_clip_vlf_lsj1024_cp_16x4_1080k.py
│   │       ├── ape_deta_vite_eva02_clip_vlf_lsj1024_cp_16x4_1080k_mdl_fsdp.py
│   │       ├── ape_deta_vite_eva02_clip_vlf_lsj1024_cp_32x2_540k_mdl_fsdp.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_08x8x270k.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_1080k.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k_mdl.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k_mdl_llama2.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4x270k.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4x270k_mdl.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4x270k_mdl_llama2.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4x337k_mdl.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_32x2x270k.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_48x2x270k.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_64x1x270k.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1536_cp_08x8x270k.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1536_cp_32x2x270k.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1536_cp_64x270k.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k.py
│   │       ├── ape_deta_vitt_eva02_vlf_lsj1024_cp_16x4_1080k.py
│   │       ├── ape_deta_vitt_eva02_vlf_lsj1024_cp_16x4_1080k_mdl.py
│   │       └── ape_deta_vitt_eva02_vlf_lsj1024_cp_64x1_270k_mdl.py
│   ├── LVISCOCOCOCOSTUFF_PanopticSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_lsj1024_cp_50ep.py
│   │       └── ape_deta_vitl_eva02_lsj1024_cp_24ep.py
│   ├── LVISCOCOCOCOSTUFF_REFCOCO/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_lsj1024_50ep.py
│   │       ├── ape_deta_r50_lsj1024_cp_50ep.py
│   │       ├── ape_deta_r50_vlf_lsj1024_cp_50ep.py
│   │       ├── ape_deta_r50_vlf_lsj1024_cp_bert_50ep.py
│   │       ├── ape_deta_vitl_eva02_lsj1024_24ep.py
│   │       └── ape_deta_vitl_eva02_vlf_lsj1024_50ep.py
│   ├── LVISCOCO_COCOSTUFF_O365_OID_VG_REFCOCO/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024_cp_180k.py
│   │       └── ape_deta_vitl_eva02_vlf_lsj1024_cp_720k.py
│   ├── LVISCOCO_COCOSTUFF_PanopticSegmentation/
│   │   └── ape_deta/
│   │       └── ape_deta_r50_lsj1024_cp_50ep.py
│   ├── LVIS_Detection/
│   │   └── deformable_deta/
│   │       ├── deformable_deta_r50_lsj1024_24ep.py
│   │       ├── deformable_deta_vitb_lsj1024_24ep.py
│   │       ├── deformable_deta_vitg_eva_lsj1024_24ep.py
│   │       ├── deformable_deta_vitg_eva_lsj1024_cp_24ep.py
│   │       ├── deformable_deta_vitl_eva02_lsj1024_cp_24ep.py
│   │       ├── deformable_deta_vitl_eva_lsj1024_cp_24ep.py
│   │       └── deformable_deta_vitl_lsj1024_24ep.py
│   ├── LVIS_InstanceSegmentation/
│   │   ├── ape_deta/
│   │   │   ├── ape_deta_r50_24ep.py
│   │   │   ├── ape_deta_r50_vlf_24ep.py
│   │   │   ├── ape_deta_vite_eva02_clip_lsj1024_cp_24ep_fsdp.py
│   │   │   ├── ape_deta_vitl_eva02_clip_lsj1024_cp_24ep.py
│   │   │   ├── ape_deta_vitl_eva02_clip_lsj1536_cp_64x90k.py
│   │   │   ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_24ep.py
│   │   │   ├── ape_deta_vitl_eva02_lsj1024_cp_24ep.py
│   │   │   ├── ape_deta_vitl_eva02_lsj1536_cp_64x90k.py
│   │   │   ├── ape_deta_vitl_eva02_vlf_lsj1024_cp_24ep.py
│   │   │   ├── ape_deta_vitt_eva02_lsj1024_cp_24ep.py
│   │   │   └── ape_deta_vitt_eva02_vlf_lsj1024_cp_24ep.py
│   │   └── deformable_deta/
│   │       ├── deformable_deta_segm_vitl_eva02_4scale_lsj1024_cp_24ep.py
│   │       └── deformable_deta_segm_vitl_eva02_lsj1024_cp_24ep.py
│   ├── LVIS_SA1B_InstanceSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_50ep.py
│   │       ├── ape_deta_r50_50ep_eval_odinw13.py
│   │       ├── ape_deta_r50_50ep_eval_odinw35.py
│   │       ├── ape_deta_r50_50ep_eval_seginw.py
│   │       ├── ape_deta_r50_50ep_iouloss_lp.py
│   │       └── ape_deta_r50_50ep_mp.py
│   ├── ODinW_Detection/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_13.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_35.py
│   │       ├── ape_deta_vitl_eva02_lsj1024_13.py
│   │       ├── ape_deta_vitl_eva02_lsj1024_35.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024_13.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024_35.py
│   │       ├── ape_deta_vitt_eva02_vlf_lsj1024_13.py
│   │       └── ape_deta_vitt_eva02_vlf_lsj1024_35.py
│   ├── PascalContext459_SemanticSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024.py
│   │       └── ape_deta_vitt_eva02_vlf_lsj1024.py
│   ├── PascalContext59_SemanticSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024.py
│   │       └── ape_deta_vitt_eva02_vlf_lsj1024.py
│   ├── PascalVOC20_SemanticSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024.py
│   │       └── ape_deta_vitt_eva02_vlf_lsj1024.py
│   ├── PascalVOCParts_PanopticSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_12ep.py
│   │       └── ape_deta_r50_vlf_12ep.py
│   ├── PhraseCut_VisualGrounding/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_12ep.py
│   │       ├── ape_deta_r50_vlf_12ep.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │       └── ape_deta_vitl_eva02_vlf_lsj1024.py
│   ├── REFCOCO_VisualGrounding/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_12ep.py
│   │       ├── ape_deta_r50_bert_vlf_12ep.py
│   │       ├── ape_deta_r50_vlf_12ep.py
│   │       ├── ape_deta_vitl_eva02_clip_lsj1024_12ep.py
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024_12ep.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024_12ep.py
│   │       └── ape_deta_vitl_lsj1024_12ep.py
│   ├── Roboflow_Detection/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024.py
│   │       └── ape_deta_vitt_eva02_vlf_lsj1024.py
│   ├── SegInW_InstanceSegmentation/
│   │   └── ape_deta/
│   │       ├── ape_deta_vitl_eva02_clip_vlf_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_lsj1024.py
│   │       ├── ape_deta_vitl_eva02_vlf_lsj1024.py
│   │       └── ape_deta_vitt_eva02_vlf_lsj1024.py
│   ├── VisualGenome_VisualGrounding/
│   │   └── ape_deta/
│   │       ├── ape_deta_r50_12ep.py
│   │       ├── ape_deta_r50_12ep_eval_odinw13.py
│   │       ├── ape_deta_r50_12ep_eval_odinw35.py
│   │       ├── ape_deta_r50_vlf_12ep.py
│   │       ├── ape_deta_r50_vlf_12ep_eval_odinw13.py
│   │       └── ape_deta_r50_vlf_12ep_eval_odinw35.py
│   └── common/
│       ├── backbone/
│       │   ├── vite_eva02_clip_1024.py
│       │   ├── vite_eva02_clip_1536.py
│       │   ├── vitg_eva01.py
│       │   ├── vitg_eva01_1536.py
│       │   ├── vitg_eva01_clip_1024.py
│       │   ├── vitg_eva01_clip_1536.py
│       │   ├── vitl_eva02.py
│       │   ├── vitl_eva02_1536.py
│       │   ├── vitl_eva02_clip.py
│       │   ├── vitl_eva02_clip_1536.py
│       │   └── vitt_eva02.py
│       └── data/
│           ├── ade20k_panoptic.py
│           ├── ade20k_panoptic_lsj1024.py
│           ├── ade20k_semantic.py
│           ├── ade20k_semantic_lsj1024.py
│           ├── ade20kfull_semantic_lsj1024.py
│           ├── bdd10k_panoptic_lsj1024.py
│           ├── bdd10k_semantic_lsj1024.py
│           ├── cityscapes_panoptic_lsj1024.py
│           ├── cityscapes_semantic_lsj1024.py
│           ├── coco_instance.py
│           ├── coco_instance_lsj1024.py
│           ├── coco_instance_lsj1024_cp.py
│           ├── coco_instance_lsj1536_cp.py
│           ├── coco_panoptic.py
│           ├── coco_panoptic_lsj1024.py
│           ├── coco_panoptic_separated.py
│           ├── coco_refcoco_instance.py
│           ├── coco_refcoco_instance_lsj1024.py
│           ├── coco_sa1b_instance.py
│           ├── coco_sa1b_panoptic.py
│           ├── coco_semantic.py
│           ├── coco_semantic_lsj1024.py
│           ├── constants.py
│           ├── d3_instance_lsj1024.py
│           ├── flickr30k_instance.py
│           ├── flickr30k_instance_lsj1024.py
│           ├── gqa_region_instance.py
│           ├── grit_instance.py
│           ├── grit_instance_lsj224.py
│           ├── grit_sa1b_instance.py
│           ├── lvis_instance_lsj1024_cp.py
│           ├── lvis_instance_lsj1536_cp.py
│           ├── lvis_sa1b_instance.py
│           ├── lviscoco_cocostuff_o365_oid_vg_refcoco_panoptic_lsj1024_cp.py
│           ├── lviscoco_cocostuff_panoptic_lsj1024_cp.py
│           ├── lviscocococostuff_o365_oid_refcoco_panoptic_lsj1024.py
│           ├── lviscocococostuff_o365_oid_refcoco_panoptic_lsj1024_cp.py
│           ├── lviscocococostuff_o365_oid_vg_panoptic_lsj1024_cp.py
│           ├── lviscocococostuff_o365_oid_vg_refcoco_panoptic_lsj1024_cp.py
│           ├── lviscocococostuff_o365_oid_vgr_refcoco_group_by_image_panoptic_lsj1024_cp.py
│           ├── lviscocococostuff_o365_oid_vgr_refcoco_panoptic_lsj1024_cp.py
│           ├── lviscocococostuff_o365_oid_vgr_sa1b_refcoco_group_by_image_gqa_panoptic_lsj1024_cp.py
│           ├── lviscocococostuff_o365_oid_vgr_sa1b_refcoco_group_by_image_gqa_panoptic_lsj1536_cp.py
│           ├── lviscocococostuff_o365_oid_vgr_sa1b_refcoco_group_by_image_gqa_phrasecut_flickr30k_panoptic_lsj1024_cp.py
│           ├── lviscocococostuff_o365_oid_vgr_sa1b_refcoco_group_by_image_gqa_phrasecut_flickr30k_panoptic_lsj1024_cp_mdl.py
│           ├── lviscocococostuff_o365_oid_vgr_sa1b_refcoco_group_by_image_gqa_phrasecut_flickr30k_panoptic_lsj1536_cp.py
│           ├── lviscocococostuff_o365_oid_vgr_sa1b_refcoco_group_by_image_gqa_phrasecut_panoptic_lsj1024_cp.py
│           ├── lviscocococostuff_o365_oid_vgr_sa1b_refcoco_group_by_image_gqa_phrasecut_panoptic_lsj1536_cp.py
│           ├── lviscocococostuff_o365_oid_vgr_sa1b_refcoco_group_by_image_panoptic_lsj1024_cp.py
│           ├── lviscocococostuff_panoptic_lsj1024_cp.py
│           ├── lviscocococostuff_refcoco_group_by_image_panoptic_lsj1024_cp.py
│           ├── lviscocococostuff_refcoco_panoptic_lsj1024.py
│           ├── lviscocococostuff_refcoco_panoptic_lsj1024_cp.py
│           ├── lviscocococostuff_sa1b_panoptic.py
│           ├── o365_instance_lsj1024.py
│           ├── odinw13_instance.py
│           ├── odinw13_instance_lsj1024.py
│           ├── odinw13_instance_lsj1536.py
│           ├── odinw35_instance.py
│           ├── odinw35_instance_lsj1024.py
│           ├── odinw35_instance_lsj1536.py
│           ├── odinwvoc_instance_lsj1024.py
│           ├── pascalcontext459_semantic_lsj1024.py
│           ├── pascalcontext59_semantic_lsj1024.py
│           ├── pascalvoc20_semantic_lsj1024.py
│           ├── pascalvocpart_panoptic.py
│           ├── phrasecut_instance.py
│           ├── phrasecut_instance_lsj1024.py
│           ├── refcoco_group_by_image_instance.py
│           ├── refcoco_group_by_image_instance_lsj1024.py
│           ├── refcoco_instance.py
│           ├── refcoco_instance_lsj1024.py
│           ├── roboflow100_instance_lsj1024.py
│           ├── seginw_instance.py
│           ├── seginw_instance_lsj1024.py
│           ├── seginw_instance_lsj1536.py
│           └── vgregion_instance.py
├── datasets/
│   ├── README.md
│   ├── prepare_ade20k_full_sem_seg.py
│   ├── prepare_coco_semantic_annos_from_panoptic_annos.py
│   ├── prepare_pascal_context.py
│   └── prepare_voc_sem_seg.py
├── demo/
│   ├── .gitattributes
│   ├── README.md
│   ├── app.py
│   ├── demo_lazy.py
│   ├── pre-requirements.txt
│   ├── predictor_lazy.py
│   └── requirements.txt
├── requirements.txt
├── scripts/
│   ├── eval_APE-L_A.sh
│   ├── eval_APE-L_B.sh
│   ├── eval_APE-L_C.sh
│   ├── eval_APE-L_D.sh
│   ├── eval_APE-Ti.sh
│   ├── eval_flops.sh
│   └── eval_time.sh
├── setup.py
└── tools/
    ├── analyze_model.py
    ├── eva_interpolate_patch_14to16.py
    ├── train_net.py
    ├── train_net_fsdp.py
    └── visualize_json_results.py

Download .txt

SYMBOL INDEX (1314 symbols across 129 files)

FILE: ape/checkpoint/detection_checkpoint.py
  class DetectionCheckpointer (line 16) | class DetectionCheckpointer(DetectionCheckpointer_d2):
    method _convert_ndarray_to_tensor (line 22) | def _convert_ndarray_to_tensor(self, state_dict: Dict[str, Any]) -> None:
  class FSDPDetectionCheckpointer (line 50) | class FSDPDetectionCheckpointer(DetectionCheckpointer):
    method save (line 56) | def save(self, name: str, **kwargs: Any) -> None:

FILE: ape/data/build.py
  function _test_loader_from_config (line 44) | def _test_loader_from_config(cfg, dataset_name, mapper=None):
  function build_detection_test_loader (line 74) | def build_detection_test_loader(

FILE: ape/data/build_copypaste.py
  function get_detection_dataset_dicts_copypaste (line 38) | def get_detection_dataset_dicts_copypaste(
  function _train_loader_from_config (line 103) | def _train_loader_from_config(cfg, mapper=None, *, dataset=None, sampler...
  function build_detection_train_loader_copypaste (line 179) | def build_detection_train_loader_copypaste(

FILE: ape/data/build_multi_dataset.py
  function print_instances_class_histogram (line 49) | def print_instances_class_histogram(dataset_dicts, class_names):
  function DatasetCatalog_get (line 106) | def DatasetCatalog_get(dataset_name, reduce_memory, reduce_memory_size):
  function get_detection_dataset_dicts_multi_dataset (line 181) | def get_detection_dataset_dicts_multi_dataset(
  function build_batch_data_loader_multi_dataset (line 279) | def build_batch_data_loader_multi_dataset(
  function _train_loader_from_config (line 356) | def _train_loader_from_config(cfg, mapper=None, *, dataset=None, sampler...
  function build_detection_train_loader_multi_dataset (line 450) | def build_detection_train_loader_multi_dataset(
  class MultiDatasetSampler (line 525) | class MultiDatasetSampler(Sampler):
    method __init__ (line 526) | def __init__(self, cfg, dataset_dicts, sizes, seed: Optional[int] = No...
    method __iter__ (line 565) | def __iter__(self):
    method _infinite_indices (line 569) | def _infinite_indices(self):
    method _get_class_balance_factor_per_dataset (line 578) | def _get_class_balance_factor_per_dataset(self, dataset_dicts, l=1.0):
  class MultiDatasetAspectRatioGroupedDataset (line 703) | class MultiDatasetAspectRatioGroupedDataset(torch.utils.data.IterableDat...
    method __init__ (line 716) | def __init__(self, dataset, batch_size, num_datasets):
    method __iter__ (line 729) | def __iter__(self):

FILE: ape/data/build_multi_dataset_copypaste.py
  function print_instances_class_histogram (line 50) | def print_instances_class_histogram(dataset_dicts, class_names):
  function DatasetCatalog_get (line 107) | def DatasetCatalog_get(dataset_name, reduce_memory, reduce_memory_size):
  function get_detection_dataset_dicts_multi_dataset_copypaste (line 182) | def get_detection_dataset_dicts_multi_dataset_copypaste(
  function build_batch_data_loader_multi_dataset (line 284) | def build_batch_data_loader_multi_dataset(
  function _train_loader_from_config (line 361) | def _train_loader_from_config(cfg, mapper=None, *, dataset=None, sampler...
  function build_detection_train_loader_multi_dataset_copypaste (line 498) | def build_detection_train_loader_multi_dataset_copypaste(
  class MultiDatasetSampler (line 589) | class MultiDatasetSampler(Sampler):
    method __init__ (line 590) | def __init__(self, cfg, dataset_dicts, sizes, seed: Optional[int] = No...
    method __iter__ (line 629) | def __iter__(self):
    method _infinite_indices (line 633) | def _infinite_indices(self):
    method _get_class_balance_factor_per_dataset (line 642) | def _get_class_balance_factor_per_dataset(self, dataset_dicts, l=1.0):
  class MultiDatasetAspectRatioGroupedDataset (line 767) | class MultiDatasetAspectRatioGroupedDataset(torch.utils.data.IterableDat...
    method __init__ (line 780) | def __init__(self, dataset, batch_size, num_datasets):
    method __iter__ (line 793) | def __iter__(self):

FILE: ape/data/common_copypaste.py
  class MapDataset_coppaste (line 14) | class MapDataset_coppaste(data.Dataset):
    method __init__ (line 19) | def __init__(self, dataset, map_func, dataset_bg, sampler_bg):
    method __new__ (line 41) | def __new__(cls, dataset, map_func, dataset_bg, sampler_bg):
    method __getnewargs__ (line 49) | def __getnewargs__(self):
    method __len__ (line 52) | def __len__(self):
    method __getitem__ (line 55) | def __getitem__(self, idx):

FILE: ape/data/dataset_mapper.py
  class DatasetMapper_ape (line 17) | class DatasetMapper_ape(DatasetMapper_d2):
    method __init__ (line 34) | def __init__(self, cfg, is_train: bool = True):

FILE: ape/data/dataset_mapper_copypaste.py
  class DatasetMapper_copypaste (line 30) | class DatasetMapper_copypaste(DatasetMapper_d2):
    method __init__ (line 48) | def __init__(
    method from_config (line 126) | def from_config(cls, cfg, is_train: bool = True):
    method _transform_annotations (line 177) | def _transform_annotations(self, dataset_dict, transforms, image_shape):
    method __call__ (line 233) | def __call__(self, dataset_dict, dataset_dict_bg):
    method visualize_training (line 382) | def visualize_training(self, dataset_dict, prefix="", suffix=""):

FILE: ape/data/dataset_mapper_detr_instance.py
  class DatasetMapper_detr_instance (line 23) | class DatasetMapper_detr_instance:
    method __init__ (line 41) | def __init__(
    method from_config (line 105) | def from_config(cls, cfg, is_train: bool = True):
    method _transform_annotations (line 108) | def _transform_annotations(self, dataset_dict, transforms, image_shape):
    method __call__ (line 156) | def __call__(self, dataset_dict):

FILE: ape/data/dataset_mapper_detr_instance_exp.py
  class DatasetMapper_detr_instance_exp (line 22) | class DatasetMapper_detr_instance_exp:
    method __init__ (line 40) | def __init__(
    method from_config (line 99) | def from_config(cls, cfg, is_train: bool = True):
    method _transform_annotations (line 102) | def _transform_annotations(self, dataset_dict, transforms, image_shape):
    method __call__ (line 131) | def __call__(self, dataset_dict):

FILE: ape/data/dataset_mapper_detr_panoptic.py
  class DatasetMapper_detr_panoptic (line 24) | class DatasetMapper_detr_panoptic:
    method __init__ (line 42) | def __init__(
    method from_config (line 107) | def from_config(cls, cfg, is_train: bool = True):
    method _transform_annotations (line 110) | def _transform_annotations(self, dataset_dict, transforms, image_shape):
    method __call__ (line 139) | def __call__(self, dataset_dict):

FILE: ape/data/dataset_mapper_detr_panoptic_copypaste.py
  class DatasetMapper_detr_panoptic_copypaste (line 29) | class DatasetMapper_detr_panoptic_copypaste:
    method __init__ (line 47) | def __init__(
    method from_config (line 127) | def from_config(cls, cfg, is_train: bool = True):
    method _transform_annotations (line 130) | def _transform_annotations(self, dataset_dict, transforms, image_shape):
    method __call__ (line 186) | def __call__(self, dataset_dict, dataset_dict_bg):
    method visualize_training (line 555) | def visualize_training(self, dataset_dict, prefix="", suffix=""):

FILE: ape/data/dataset_mapper_detr_semantic.py
  class DatasetMapper_detr_semantic (line 24) | class DatasetMapper_detr_semantic:
    method __init__ (line 42) | def __init__(
    method from_config (line 97) | def from_config(cls, cfg, is_train: bool = True):
    method _transform_annotations (line 100) | def _transform_annotations(self, dataset_dict, transforms, image_shape):
    method __call__ (line 129) | def __call__(self, dataset_dict):

FILE: ape/data/datasets/coco.py
  function custom_load_coco_json (line 23) | def custom_load_coco_json(json_file, image_root, dataset_name=None, extr...
  function custom_load_sem_seg (line 238) | def custom_load_sem_seg(gt_root, image_root, gt_ext="png", image_ext="jp...
  function custom_load_sem_seg_list (line 314) | def custom_load_sem_seg_list(gt_root, image_root, gt_ext="png", image_ex...
  function custom_register_coco_instances (line 338) | def custom_register_coco_instances(name, metadata, json_file, image_root):
  function custom_register_coco_semseg (line 368) | def custom_register_coco_semseg(name, metadata, sem_seg_root, image_root):

FILE: ape/data/datasets/d_cube.py
  function register_d3_instances (line 26) | def register_d3_instances(name, metadata, json_file, image_root, anno_ro...
  function load_d3_json (line 42) | def load_d3_json(json_file, image_root, anno_root, dataset_name=None, ex...
  function get_d3_instances_meta (line 223) | def get_d3_instances_meta(dataset_name):
  function register_all_D3 (line 259) | def register_all_D3(root):

FILE: ape/data/datasets/flickr30k.py
  function _get_builtin_metadata (line 9) | def _get_builtin_metadata(dataset_name):
  function _get_flickr30k_metadata (line 15) | def _get_flickr30k_metadata(categories):
  function register_all_flickr30k (line 52) | def register_all_flickr30k(root):

FILE: ape/data/datasets/gqa.py
  function _get_builtin_metadata (line 9) | def _get_builtin_metadata(dataset_name):
  function _get_gqa_metadata (line 15) | def _get_gqa_metadata(categories):
  function register_all_gqa (line 44) | def register_all_gqa(root):

FILE: ape/data/datasets/grit.py
  function _get_builtin_metadata (line 10) | def _get_builtin_metadata(dataset_name):
  function register_all_GRIT (line 47) | def register_all_GRIT(root):

FILE: ape/data/datasets/lvis_coco.py
  function custom_register_lvis_instances (line 26) | def custom_register_lvis_instances(name, metadata, json_file, image_root):
  function custom_load_lvis_json (line 42) | def custom_load_lvis_json(json_file, image_root, dataset_name=None, extr...
  function get_lvis_instances_meta (line 210) | def get_lvis_instances_meta(dataset_name):
  function _get_lvis_instances_meta_v0_5 (line 231) | def _get_lvis_instances_meta_v0_5():
  function _get_lvis_instances_meta_v1 (line 244) | def _get_lvis_instances_meta_v1():
  function register_all_lvis_coco (line 270) | def register_all_lvis_coco(root):

FILE: ape/data/datasets/lvis_coco_panoptic.py
  function register_lvis_panoptic_separated (line 16) | def register_lvis_panoptic_separated(
  function merge_to_panoptic (line 83) | def merge_to_panoptic(detection_dicts, sem_seg_dicts):
  function _get_builtin_metadata (line 108) | def _get_builtin_metadata(dataset_name):
  function _get_lvis_panoptic_separated_meta (line 115) | def _get_lvis_panoptic_separated_meta():
  function register_all_lvis_coco_panoptic (line 168) | def register_all_lvis_coco_panoptic(root):

FILE: ape/data/datasets/objects365.py
  function _get_builtin_metadata (line 742) | def _get_builtin_metadata(key):
  function register_all_objects365 (line 786) | def register_all_objects365(root):

FILE: ape/data/datasets/odinw_instance.py
  function load_coco_json (line 19) | def load_coco_json(json_file, image_root, dataset_name=None, extra_annot...
  function register_coco_instances (line 219) | def register_coco_instances(name, metadata, json_file, image_root):
  function _get_builtin_metadata (line 798) | def _get_builtin_metadata(name):
  function register_all_odinw (line 810) | def register_all_odinw(root):

FILE: ape/data/datasets/oid.py
  function register_oid_instances (line 12) | def register_oid_instances(name, metadata, json_file, image_root):
  function _get_builtin_metadata (line 1454) | def _get_builtin_metadata(cats, class_image_count=None):
  function register_all_oid (line 1560) | def register_all_oid(root):

FILE: ape/data/datasets/pascal_voc_external.py
  function _get_ctx59_meta (line 820) | def _get_ctx59_meta():
  function register_all_ctx59 (line 838) | def register_all_ctx59(root):
  function _get_pascal21_meta (line 862) | def _get_pascal21_meta():
  function register_all_pascal21 (line 880) | def register_all_pascal21(root):
  function _get_ctx459_meta (line 904) | def _get_ctx459_meta():
  function register_all_ctx459 (line 922) | def register_all_ctx459(root):
  function _get_parts_meta (line 946) | def _get_parts_meta():
  function _get_parts_only_meta (line 964) | def _get_parts_only_meta():
  function register_all_pascal_parts_only (line 982) | def register_all_pascal_parts_only(root):
  function register_all_pascal_parts (line 1136) | def register_all_pascal_parts(root):
  function _get_builtin_metadata (line 1178) | def _get_builtin_metadata(dataset_name):
  function _get_pascalvocpart_metadata (line 1184) | def _get_pascalvocpart_metadata(categories):
  function register_all_pascalvocpart (line 1196) | def register_all_pascalvocpart(root):

FILE: ape/data/datasets/phrasecut.py
  function _get_builtin_metadata (line 9) | def _get_builtin_metadata(dataset_name):
  function _get_phrasecut_metadata (line 15) | def _get_phrasecut_metadata(categories):
  function register_all_phrasecut (line 44) | def register_all_phrasecut(root):

FILE: ape/data/datasets/refcoco.py
  function _get_refcoco_meta (line 30) | def _get_refcoco_meta():
  function load_refcoco_json (line 45) | def load_refcoco_json(json_file, image_root, dataset_name=None, extra_an...
  function register_refcoco (line 254) | def register_refcoco(name, metadata, json_file, image_root):
  function register_all_refcoco (line 323) | def register_all_refcoco(root):

FILE: ape/data/datasets/register_bdd100k_panoseg.py
  function load_bdd_panoptic_json (line 113) | def load_bdd_panoptic_json(json_file, image_dir, gt_dir, meta):
  function register_bdd_panoptic (line 167) | def register_bdd_panoptic(
  function get_metadata (line 214) | def get_metadata():
  function register_all_bdd_panoptic (line 257) | def register_all_bdd_panoptic(root):

FILE: ape/data/datasets/register_bdd100k_semseg.py
  function load_bdd_instances (line 42) | def load_bdd_instances(
  function register_bdd_context (line 72) | def register_bdd_context(name, dirname, split, class_names=BDD_SEM):
  function register_all_bdd_semseg (line 85) | def register_all_bdd_semseg(root):

FILE: ape/data/datasets/register_pascal_context.py
  function _get_voc_meta (line 532) | def _get_voc_meta(cat_list):
  function register_pascal_context_59 (line 539) | def register_pascal_context_59(root):
  function register_pascal_context_459 (line 561) | def register_pascal_context_459(root):

FILE: ape/data/datasets/register_voc_seg.py
  function _get_voc_meta (line 31) | def _get_voc_meta(cat_list):
  function register_pascalvoc (line 38) | def register_pascalvoc(root):

FILE: ape/data/datasets/sa1b.py
  function _get_builtin_metadata (line 10) | def _get_builtin_metadata(key):
  function register_all_sa1b (line 31) | def register_all_sa1b(root):

FILE: ape/data/datasets/seginw_instance.py
  function get_metadata (line 61) | def get_metadata(name):
  function load_seginw_json (line 69) | def load_seginw_json(name, image_root, annot_json, metadata):
  function register_seginw (line 111) | def register_seginw(name, metadata, image_root, annot_json):
  function register_all_seginw (line 126) | def register_all_seginw(root):

FILE: ape/data/datasets/visualgenome.py
  function _get_builtin_metadata (line 16) | def _get_builtin_metadata(dataset_name):
  function _get_visualgenome_metadata (line 50) | def _get_visualgenome_metadata(categories):
  function register_all_visualgenome (line 206) | def register_all_visualgenome(root):

FILE: ape/data/detection_utils.py
  function load_fed_loss_cls_weights (line 29) | def load_fed_loss_cls_weights(class_freq_path: str, freq_weight_power=1.0):
  function get_fed_loss_cls_weights (line 41) | def get_fed_loss_cls_weights(dataset_names: Union[str, List[str]], freq_...
  function get_fed_loss_cls_weights_v2 (line 75) | def get_fed_loss_cls_weights_v2(dataset_names: Union[str, List[str]], fr...
  function build_augmentation (line 128) | def build_augmentation(cfg, is_train):
  function build_augmentation_lsj (line 174) | def build_augmentation_lsj(cfg, is_train):
  function build_augmentation_aa (line 202) | def build_augmentation_aa(cfg, is_train):

FILE: ape/data/mapper_utils.py
  function clean_string (line 32) | def clean_string(phrase):
  function transform_phrases (line 57) | def transform_phrases(phrases, transforms):
  function transform_expressions (line 70) | def transform_expressions(expressions, transforms):
  function has_ordinal_num (line 83) | def has_ordinal_num(phrases):
  function mask_to_polygons_2 (line 111) | def mask_to_polygons_2(mask):
  function mask_to_polygons (line 132) | def mask_to_polygons(mask):
  function close_contour (line 152) | def close_contour(contour):
  function binary_mask_to_polygon (line 159) | def binary_mask_to_polygon(binary_mask, tolerance=0):
  function instances_to_annotations (line 185) | def instances_to_annotations(instances, img_id, bbox_mode, instance_mask...
  function copypaste (line 232) | def copypaste(dataset_dict, dataset_dict_bg, image_format, instance_mask...
  function maybe_load_annotation_from_file (line 383) | def maybe_load_annotation_from_file(record, meta=None, extra_annotation_...

FILE: ape/data/samplers/distributed_sampler_multi_dataset.py
  class MultiDatasetTrainingSampler (line 17) | class MultiDatasetTrainingSampler(Sampler):
    method __init__ (line 18) | def __init__(self, repeat_factors, *, shuffle=True, seed=None):
    method get_repeat_factors (line 32) | def get_repeat_factors(
    method get_class_balance_factor_per_dataset (line 85) | def get_class_balance_factor_per_dataset(dataset_dicts, l=1.0):
    method _get_epoch_indices (line 99) | def _get_epoch_indices(self, generator):
    method __iter__ (line 122) | def __iter__(self):
    method _infinite_indices (line 126) | def _infinite_indices(self):
  class InferenceSampler (line 140) | class InferenceSampler(Sampler):
    method __init__ (line 148) | def __init__(self, size: int):
    method _get_local_indices (line 160) | def _get_local_indices(total_size, world_size, rank):
    method __iter__ (line 172) | def __iter__(self):
    method __len__ (line 175) | def __len__(self):

FILE: ape/data/transforms/augmentation_aa.py
  class AutoAugment (line 5) | class AutoAugment(T.Augmentation):
    method __init__ (line 6) | def __init__(self, cfg):
    method __call__ (line 29) | def __call__(self, aug_input) -> Transform:
    method __repr__ (line 37) | def __repr__(self):

FILE: ape/data/transforms/augmentation_lsj.py
  class LargeScaleJitter (line 5) | class LargeScaleJitter(T.Augmentation):
    method __init__ (line 6) | def __init__(self, cfg):
    method __call__ (line 32) | def __call__(self, aug_input) -> Transform:
    method __repr__ (line 36) | def __repr__(self):

FILE: ape/engine/defaults.py
  function create_fsdp_model (line 37) | def create_fsdp_model(model, *, fp16_compression=False, **kwargs):
  class DefaultPredictor (line 159) | class DefaultPredictor:
    method __init__ (line 187) | def __init__(self, cfg):
    method __call__ (line 203) | def __call__(self, original_image, text_prompt=None, mask_prompt=None):

FILE: ape/engine/train_loop.py
  class SimpleTrainer (line 21) | class SimpleTrainer(TrainerBase):
    method __init__ (line 40) | def __init__(
    method run_step (line 83) | def run_step(self):
    method _data_loader_iter (line 160) | def _data_loader_iter(self):
    method reset_data_loader (line 166) | def reset_data_loader(self, data_loader_builder):
    method _write_metrics (line 176) | def _write_metrics(
    method write_metrics (line 194) | def write_metrics(
    method state_dict (line 244) | def state_dict(self):
    method load_state_dict (line 249) | def load_state_dict(self, state_dict):
    method after_train (line 253) | def after_train(self):
    method _write_metrics_common (line 257) | def _write_metrics_common(
    method write_metrics_common (line 274) | def write_metrics_common(
  class AMPTrainer (line 297) | class AMPTrainer(SimpleTrainer):
    method __init__ (line 303) | def __init__(
    method run_step (line 339) | def run_step(self):
    method state_dict (line 408) | def state_dict(self):
    method load_state_dict (line 413) | def load_state_dict(self, state_dict):

FILE: ape/evaluation/d3_evaluation.py
  class D3Evaluator (line 34) | class D3Evaluator(DatasetEvaluator):
    method __init__ (line 47) | def __init__(
    method reset (line 158) | def reset(self):
    method process (line 161) | def process(self, inputs, outputs):
    method evaluate (line 195) | def evaluate(self, img_ids=None):
    method _tasks_from_predictions (line 229) | def _tasks_from_predictions(self, predictions):
    method _eval_predictions (line 241) | def _eval_predictions(self, predictions, img_ids=None):
    method _eval_box_proposals (line 303) | def _eval_box_proposals(self, predictions):
    method _derive_coco_results (line 342) | def _derive_coco_results(self, coco_eval, iou_type, class_names=None):
  function instances_to_coco_json (line 441) | def instances_to_coco_json(instances, img_id):
  function _evaluate_box_proposals (line 505) | def _evaluate_box_proposals(dataset_predictions, coco_api, thresholds=No...
  function _evaluate_predictions_on_coco (line 616) | def _evaluate_predictions_on_coco(
  class COCOevalMaxDets (line 683) | class COCOevalMaxDets(COCOeval):
    method summarize (line 689) | def summarize(self):
    method __str__ (line 770) | def __str__(self):

FILE: ape/evaluation/evaluator.py
  function inference_on_dataset (line 17) | def inference_on_dataset(

FILE: ape/evaluation/instance_evaluation.py
  class InstanceSegEvaluator (line 35) | class InstanceSegEvaluator(COCOEvaluator):
    method _eval_predictions (line 48) | def _eval_predictions(self, predictions, img_ids=None):

FILE: ape/evaluation/lvis_evaluation.py
  class LVISEvaluator (line 24) | class LVISEvaluator(DatasetEvaluator):
    method __init__ (line 30) | def __init__(
    method reset (line 79) | def reset(self):
    method process (line 82) | def process(self, inputs, outputs):
    method evaluate (line 101) | def evaluate(self):
    method _tasks_from_predictions (line 130) | def _tasks_from_predictions(self, predictions):
    method _eval_predictions (line 136) | def _eval_predictions(self, predictions):
    method _eval_box_proposals (line 186) | def _eval_box_proposals(self, predictions):
    method _derive_lvis_results (line 225) | def _derive_lvis_results(self, lvis_eval, iou_type, class_names=None):
  function _evaluate_box_proposals (line 293) | def _evaluate_box_proposals(dataset_predictions, lvis_api, thresholds=No...
  function _evaluate_predictions_on_lvis (line 402) | def _evaluate_predictions_on_lvis(

FILE: ape/evaluation/multi_dataset_evaluator.py
  function get_unified_evaluator (line 24) | def get_unified_evaluator(evaluator_type, dataset_name, cfg, distributed...
  function map_back_unified_id (line 43) | def map_back_unified_id(results, map_back, reverse_id_mapping=None):
  function map_back_unified_id_novel_classes (line 54) | def map_back_unified_id_novel_classes(results, map_back, reverse_id_mapp...
  class UnifiedCOCOEvaluator (line 68) | class UnifiedCOCOEvaluator(COCOEvaluator):
    method _eval_predictions (line 69) | def _eval_predictions(self, tasks, predictions):
  class UnifiedCityscapesEvaluator (line 132) | class UnifiedCityscapesEvaluator(COCOEvaluator):
    method __init__ (line 133) | def __init__(self, unified_label_file, dataset_name, cfg, distributed,...
    method process (line 160) | def process(self, inputs, outputs):
    method _eval_predictions (line 183) | def _eval_predictions(self, tasks, predictions):
    method write_as_cityscapes (line 264) | def write_as_cityscapes(
  class UnifiedOIDEvaluator (line 324) | class UnifiedOIDEvaluator(OIDEvaluator):
    method __init__ (line 325) | def __init__(self, unified_label_file, dataset_name, cfg, distributed,...
    method evaluate (line 335) | def evaluate(self):

FILE: ape/evaluation/oideval.py
  function compute_average_precision (line 31) | def compute_average_precision(precision, recall):
  class OIDEval (line 79) | class OIDEval:
    method __init__ (line 80) | def __init__(
    method _to_mask (line 170) | def _to_mask(self, anns, lvis):
    method _prepare (line 175) | def _prepare(self):
    method _prepare_freq_group (line 217) | def _prepare_freq_group(self):
    method evaluate (line 228) | def evaluate(self):
    method _get_gt_dt (line 259) | def _get_gt_dt(self, img_id, cat_id):
    method compute_iou (line 272) | def compute_iou(self, img_id, cat_id):
    method evaluate_img_google (line 299) | def evaluate_img_google(self, img_id, cat_id, area_rng):
    method accumulate (line 396) | def accumulate(self):
    method _summarize (line 495) | def _summarize(self, summary_type, iou_thr=None, area_rng="all", freq_...
    method summarize (line 522) | def summarize(self):
    method run (line 546) | def run(self):
    method print_results (line 552) | def print_results(self):
    method get_results (line 584) | def get_results(self):
  class Params (line 590) | class Params:
    method __init__ (line 591) | def __init__(self, iou_type):
  class OIDEvaluator (line 623) | class OIDEvaluator(DatasetEvaluator):
    method __init__ (line 624) | def __init__(
    method reset (line 673) | def reset(self):
    method process (line 676) | def process(self, inputs, outputs):
    method evaluate (line 695) | def evaluate(self):
    method _tasks_from_predictions (line 724) | def _tasks_from_predictions(self, predictions):
    method _eval_predictions (line 731) | def _eval_predictions(self, predictions):
    method _derive_oid_results (line 780) | def _derive_oid_results(self, oid_eval, iou_type, class_names=None):
  function _evaluate_predictions_on_oid (line 847) | def _evaluate_predictions_on_oid(

FILE: ape/evaluation/refcoco_evaluation.py
  class RefCOCOEvaluator (line 31) | class RefCOCOEvaluator(DatasetEvaluator):
    method __init__ (line 44) | def __init__(
    method reset (line 144) | def reset(self):
    method process (line 147) | def process(self, inputs, outputs):
    method evaluate (line 167) | def evaluate(self, img_ids=None):
    method _tasks_from_predictions (line 200) | def _tasks_from_predictions(self, predictions):
    method _eval_predictions (line 212) | def _eval_predictions(self, predictions, img_ids=None):
    method _eval_box_proposals (line 270) | def _eval_box_proposals(self, predictions):
    method _derive_coco_results (line 309) | def _derive_coco_results(self, coco_eval, iou_type, class_names=None):
    method _derive_refcoco_results (line 377) | def _derive_refcoco_results(self, coco_eval, iou_type):
  function instances_to_coco_json (line 425) | def instances_to_coco_json(instances, img_id):
  function _evaluate_box_proposals (line 489) | def _evaluate_box_proposals(dataset_predictions, coco_api, thresholds=No...
  function _evaluate_predictions_on_coco (line 600) | def _evaluate_predictions_on_coco(
  class COCOevalMaxDets (line 665) | class COCOevalMaxDets(COCOeval):
    method summarize (line 671) | def summarize(self):
    method __str__ (line 752) | def __str__(self):

FILE: ape/evaluation/refcocoeval.py
  function compute_bbox_iou (line 17) | def compute_bbox_iou(boxes1: torch.Tensor, boxes2: torch.Tensor):
  function compute_mask_iou (line 34) | def compute_mask_iou(outputs: torch.Tensor, labels: torch.Tensor, EPS=1e...
  class RefCOCOeval (line 42) | class RefCOCOeval:
    method __init__ (line 92) | def __init__(self, cocoGt=None, cocoDt=None, iouType="segm"):
    method _prepare (line 121) | def _prepare(self):
    method evaluate (line 160) | def evaluate(self):
    method computeIoU (line 201) | def computeIoU(self, imgId, catId):
    method evaluateImg (line 252) | def evaluateImg(self, imgId, catId, aRng, maxDet):
    method accumulate (line 336) | def accumulate(self, p=None):
    method summarize (line 444) | def summarize(self):
    method __str__ (line 524) | def __str__(self):
  class Params (line 528) | class Params:
    method setDetParams (line 533) | def setDetParams(self):
    method setKpParams (line 549) | def setKpParams(self):
    method __init__ (line 584) | def __init__(self, iouType="segm"):

FILE: ape/layers/csrc/MsDeformAttn/ms_deform_attn.h
  function namespace (line 19) | namespace ape {

FILE: ape/layers/csrc/MsDeformAttn/ms_deform_attn_cpu.cpp
  type ape (line 15) | namespace ape {
    function ms_deform_attn_cpu_forward (line 17) | at::Tensor
    function ms_deform_attn_cpu_backward (line 29) | std::vector<at::Tensor>

FILE: ape/layers/csrc/MsDeformAttn/ms_deform_attn_cpu.h
  function namespace (line 14) | namespace ape {

FILE: ape/layers/csrc/MsDeformAttn/ms_deform_attn_cuda.h
  function namespace (line 14) | namespace ape {

FILE: ape/layers/csrc/vision.cpp
  type ape (line 6) | namespace ape {
    function get_cuda_version (line 12) | std::string get_cuda_version() {
    function has_cuda (line 37) | bool has_cuda() {
    function get_compiler_version (line 47) | std::string get_compiler_version() {
    function PYBIND11_MODULE (line 73) | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
    function TORCH_LIBRARY (line 76) | TORCH_LIBRARY(ape, m) {

FILE: ape/layers/fuse_helper.py
  class BiMultiHeadAttention (line 8) | class BiMultiHeadAttention(nn.Module):
    method __init__ (line 9) | def __init__(
    method _shape (line 50) | def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int):
    method _reset_parameters (line 53) | def _reset_parameters(self):
    method forward (line 67) | def forward(self, v, l, attention_mask_v=None, attention_mask_l=None):
    method extra_repr (line 168) | def extra_repr(self):
  class BiAttentionBlock (line 178) | class BiAttentionBlock(nn.Module):
    method __init__ (line 179) | def __init__(
    method forward (line 221) | def forward(self, v, l, attention_mask_v=None, attention_mask_l=None):

FILE: ape/layers/multi_scale_deform_attn.py
  function _is_power_of_2 (line 26) | def _is_power_of_2(n):
  class MultiScaleDeformableAttnFunction (line 32) | class MultiScaleDeformableAttnFunction(Function):
    method forward (line 34) | def forward(
    method backward (line 63) | def backward(ctx, grad_output):
  function multi_scale_deformable_attn_pytorch (line 84) | def multi_scale_deformable_attn_pytorch(
  class MultiScaleDeformableAttention (line 127) | class MultiScaleDeformableAttention(nn.Module):
    method __init__ (line 145) | def __init__(
    method init_weights (line 190) | def init_weights(self):
    method forward (line 215) | def forward(
  function create_dummy_class (line 361) | def create_dummy_class(klass, dependency, message=""):
  function create_dummy_func (line 390) | def create_dummy_func(func, dependency, message=""):

FILE: ape/layers/vision_language_align.py
  class VisionLanguageAlign (line 8) | class VisionLanguageAlign(nn.Module):
    method __init__ (line 9) | def __init__(
    method forward (line 27) | def forward(self, x, embedding):
  class StillClassifier (line 55) | class StillClassifier(nn.Module):
    method __init__ (line 56) | def __init__(self, hidden_dim):
    method forward (line 60) | def forward(self, x, lang_feat=None):

FILE: ape/layers/vision_language_fusion.py
  class VisionLanguageFusion (line 7) | class VisionLanguageFusion(torch.nn.Module):
    method __init__ (line 12) | def __init__(
    method forward (line 46) | def forward(self, v, l, attention_mask_v=None, attention_mask_l=None):
    method extra_repr (line 52) | def extra_repr(self):

FILE: ape/layers/zero_shot_fc.py
  class ZeroShotFC (line 12) | class ZeroShotFC(nn.Module):
    method __init__ (line 13) | def __init__(
    method forward (line 96) | def forward(self, x, classifier=None):
    method set_predictor (line 134) | def set_predictor(self, param_or_path):
    method extra_repr (line 153) | def extra_repr(self):

FILE: ape/model_zoo/model_zoo.py
  class _ModelZooUrls (line 13) | class _ModelZooUrls(object):
    method query (line 100) | def query(config_path: str) -> Optional[str]:
  function get_checkpoint_url (line 112) | def get_checkpoint_url(config_path):
  function get_config_file (line 129) | def get_config_file(config_path):
  function get_config (line 148) | def get_config(config_path, trained: bool = False):
  function get (line 181) | def get(config_path, trained: bool = False, device: Optional[str] = None):

FILE: ape/modeling/ape_deta/ape_deta.py
  class SomeThing (line 20) | class SomeThing(nn.Module):
    method __init__ (line 21) | def __init__(
    method forward (line 35) | def forward(self, batched_inputs, do_postprocess=True):
    method set_eval_dataset (line 39) | def set_eval_dataset(self, dataset_name):

FILE: ape/modeling/ape_deta/assigner.py
  function nonzero_tuple (line 10) | def nonzero_tuple(x):
  class Matcher (line 23) | class Matcher(object):
    method __init__ (line 39) | def __init__(
    method __call__ (line 76) | def __call__(self, match_quality_matrix):
    method set_low_quality_matches_ (line 115) | def set_low_quality_matches_(self, match_labels, match_quality_matrix):
  function subsample_labels (line 132) | def subsample_labels(
  function sample_topk_per_gt (line 177) | def sample_topk_per_gt(pr_inds, gt_inds, iou, k):
  class Stage2Assigner (line 189) | class Stage2Assigner(nn.Module):
    method __init__ (line 190) | def __init__(self, num_queries, num_classes, max_k=4):
    method _sample_proposals (line 200) | def _sample_proposals(
    method forward (line 235) | def forward(self, outputs, targets, return_cost_matrix=False):
    method postprocess_indices (line 273) | def postprocess_indices(self, pr_inds, gt_inds, iou):
    method __repr__ (line 276) | def __repr__(self, _repr_indent=8):
  class Stage1Assigner (line 287) | class Stage1Assigner(nn.Module):
    method __init__ (line 288) | def __init__(self, t_low=0.3, t_high=0.7, max_k=4):
    method _subsample_labels (line 299) | def _subsample_labels(self, label):
    method forward (line 316) | def forward(self, outputs, targets, return_cost_matrix=False):
    method postprocess_indices (line 353) | def postprocess_indices(self, pr_inds, gt_inds, iou):
    method __repr__ (line 356) | def __repr__(self, _repr_indent=8):

FILE: ape/modeling/ape_deta/deformable_criterion.py
  function sigmoid_ce_loss (line 23) | def sigmoid_ce_loss(
  function calculate_uncertainty (line 43) | def calculate_uncertainty(logits):
  class DeformableCriterion (line 60) | class DeformableCriterion(SetCriterion):
    method __init__ (line 65) | def __init__(
    method get_fed_loss_classes (line 159) | def get_fed_loss_classes(self, gt_classes, num_fed_loss_classes, num_c...
    method loss_labels (line 187) | def loss_labels(self, outputs, targets, indices, num_boxes):
    method loss_anchor_ious (line 278) | def loss_anchor_ious(self, outputs, targets, indices, num_boxes):
    method loss_pred_ious (line 293) | def loss_pred_ious(self, outputs, targets, indices, num_boxes):
    method loss_boxes (line 315) | def loss_boxes(self, outputs, targets, indices, num_boxes):
    method loss_boxes_panoptic (line 340) | def loss_boxes_panoptic(self, outputs, targets, indices, num_boxes):
    method loss_masks (line 375) | def loss_masks(self, outputs, targets, indices, num_boxes):
    method loss_masks_maskdino (line 424) | def loss_masks_maskdino(self, outputs, targets, indices, num_boxes):
    method get_loss (line 492) | def get_loss(self, loss, outputs, targets, indices, num_boxes, **kwargs):
    method forward (line 505) | def forward(self, outputs, targets):
    method __repr__ (line 591) | def __repr__(self):

FILE: ape/modeling/ape_deta/deformable_detr.py
  class DeformableDETR (line 22) | class DeformableDETR(nn.Module):
    method __init__ (line 52) | def __init__(
    method device (line 299) | def device(self):
    method _move_to_current_device (line 302) | def _move_to_current_device(self, x):
    method forward (line 305) | def forward(self, batched_inputs, do_postprocess=True):
    method _set_aux_loss (line 405) | def _set_aux_loss(self, outputs_class, outputs_coord):
    method inference (line 411) | def inference(self, box_cls, box_pred, image_sizes):
    method prepare_targets (line 487) | def prepare_targets(self, targets):
    method preprocess_image (line 498) | def preprocess_image(self, batched_inputs):
    method _postprocess (line 510) | def _postprocess(instances, batched_inputs: List[Dict[str, torch.Tenso...
    method set_eval_dataset (line 524) | def set_eval_dataset(self, dataset_name):
  class NMSPostProcess (line 552) | class NMSPostProcess(nn.Module):
    method forward (line 556) | def forward(self, outputs, target_sizes, select_box_nums_for_evaluation):

FILE: ape/modeling/ape_deta/deformable_detr_segm.py
  class DeformableDETRSegm (line 32) | class DeformableDETRSegm(DeformableDETR):
    method __init__ (line 62) | def __init__(
    method forward (line 165) | def forward(self, batched_inputs, do_postprocess=True):
    method maskdino_mask_features (line 670) | def maskdino_mask_features(self, encode_feats, multi_level_feats, mult...
    method _set_aux_loss (line 695) | def _set_aux_loss(self, outputs_class, outputs_coord, outputs_mask):
    method inference (line 701) | def inference(self, box_cls, box_pred, image_sizes, use_sigmoid=True):
    method prepare_targets (line 780) | def prepare_targets(self, targets):
    method preprocess_image (line 814) | def preprocess_image(self, batched_inputs):
    method _postprocess_instance (line 826) | def _postprocess_instance(
    method _postprocess_semantic (line 843) | def _postprocess_semantic(
    method _postprocess_panoptic (line 874) | def _postprocess_panoptic(
    method visualize_training (line 954) | def visualize_training(
    method visualize_inference_panoptic (line 1105) | def visualize_inference_panoptic(self, batched_inputs, results, datase...
    method visualize_training_enc_output (line 1165) | def visualize_training_enc_output(self, batched_inputs, output, images...
    method visualize_training_enc_output_nonms (line 1222) | def visualize_training_enc_output_nonms(
    method visualize_training_init_reference (line 1298) | def visualize_training_init_reference(
    method visualize_training_enc_output_pos (line 1362) | def visualize_training_enc_output_pos(
    method visualize_training_init_reference_pos (line 1439) | def visualize_training_init_reference_pos(
    method set_model_language (line 1510) | def set_model_language(self, model_language):
  class NMSPostProcess (line 1514) | class NMSPostProcess(nn.Module):
    method forward (line 1518) | def forward(self, outputs, target_sizes, select_box_nums_for_evaluation):
  function is_thing_stuff_overlap (line 1573) | def is_thing_stuff_overlap(metadata):
  function get_text_list (line 1587) | def get_text_list(metadata, dataset_entity):
  function get_stuff_score (line 1609) | def get_stuff_score(box_cls, metadata, dataset_entity):

FILE: ape/modeling/ape_deta/deformable_detr_segm_vl.py
  class DeformableDETRSegmVL (line 33) | class DeformableDETRSegmVL(DeformableDETR):
    method __init__ (line 63) | def __init__(
    method forward (line 166) | def forward(self, batched_inputs, do_postprocess=True):
    method maskdino_mask_features (line 728) | def maskdino_mask_features(self, encode_feats, multi_level_feats, mult...
    method _set_aux_loss (line 753) | def _set_aux_loss(self, outputs_class, outputs_coord, outputs_mask):
    method inference (line 759) | def inference(self, box_cls, box_pred, image_sizes, use_sigmoid=True):
    method prepare_targets (line 812) | def prepare_targets(self, targets):
    method preprocess_image (line 846) | def preprocess_image(self, batched_inputs):
    method _postprocess_instance (line 858) | def _postprocess_instance(
    method _postprocess_semantic (line 875) | def _postprocess_semantic(
    method _postprocess_panoptic (line 921) | def _postprocess_panoptic(
    method visualize_training (line 1001) | def visualize_training(
    method visualize_training_enc_output (line 1154) | def visualize_training_enc_output(self, batched_inputs, output, images...
    method set_model_language (line 1211) | def set_model_language(self, model_language):
  function is_thing_stuff_overlap (line 1215) | def is_thing_stuff_overlap(metadata):
  function get_text_list (line 1229) | def get_text_list(metadata, dataset_entity):
  function get_stuff_score (line 1251) | def get_stuff_score(box_cls, metadata, dataset_entity):

FILE: ape/modeling/ape_deta/deformable_transformer.py
  class DeformableDetrTransformerEncoder (line 19) | class DeformableDetrTransformerEncoder(TransformerLayerSequence):
    method __init__ (line 20) | def __init__(
    method forward (line 65) | def forward(
  class DeformableDetrTransformerDecoder (line 109) | class DeformableDetrTransformerDecoder(TransformerLayerSequence):
    method __init__ (line 110) | def __init__(
    method forward (line 159) | def forward(
  class DeformableDetrTransformer (line 238) | class DeformableDetrTransformer(nn.Module):
    method __init__ (line 250) | def __init__(
    method init_weights (line 289) | def init_weights(self):
    method gen_encoder_output_proposals (line 301) | def gen_encoder_output_proposals(self, memory, memory_padding_mask, sp...
    method get_reference_points (line 344) | def get_reference_points(spatial_shapes, valid_ratios, device):
    method get_valid_ratio (line 374) | def get_valid_ratio(self, mask):
    method get_proposal_pos_embed (line 384) | def get_proposal_pos_embed(self, proposals, num_pos_feats=128, tempera...
    method forward (line 394) | def forward(

FILE: ape/modeling/ape_deta/deformable_transformer_vl.py
  class DeformableDetrTransformerEncoderVL (line 20) | class DeformableDetrTransformerEncoderVL(TransformerLayerSequence):
    method __init__ (line 21) | def __init__(
    method forward (line 69) | def forward(
  class DeformableDetrTransformerDecoderVL (line 124) | class DeformableDetrTransformerDecoderVL(TransformerLayerSequence):
    method __init__ (line 125) | def __init__(
    method forward (line 177) | def forward(
  class DeformableDetrTransformerVL (line 258) | class DeformableDetrTransformerVL(nn.Module):
    method __init__ (line 270) | def __init__(
    method init_weights (line 309) | def init_weights(self):
    method gen_encoder_output_proposals (line 321) | def gen_encoder_output_proposals(
    method get_reference_points (line 372) | def get_reference_points(spatial_shapes, valid_ratios, device):
    method get_valid_ratio (line 402) | def get_valid_ratio(self, mask):
    method get_proposal_pos_embed (line 412) | def get_proposal_pos_embed(self, proposals, num_pos_feats=128, tempera...
    method forward (line 422) | def forward(

FILE: ape/modeling/ape_deta/fast_rcnn.py
  function fast_rcnn_inference (line 40) | def fast_rcnn_inference(
  function fast_rcnn_inference_single_image (line 97) | def fast_rcnn_inference_single_image(

FILE: ape/modeling/ape_deta/misc.py
  class SmoothedValue (line 26) | class SmoothedValue(object):
    method __init__ (line 31) | def __init__(self, window_size=20, fmt=None):
    method update (line 39) | def update(self, value, n=1):
    method synchronize_between_processes (line 44) | def synchronize_between_processes(self):
    method median (line 58) | def median(self):
    method avg (line 63) | def avg(self):
    method global_avg (line 68) | def global_avg(self):
    method max (line 72) | def max(self):
    method value (line 76) | def value(self):
    method __str__ (line 79) | def __str__(self):
  function all_gather (line 89) | def all_gather(data):
  function reduce_dict (line 127) | def reduce_dict(input_dict, average=True):
  class MetricLogger (line 153) | class MetricLogger(object):
    method __init__ (line 154) | def __init__(self, delimiter="\t"):
    method update (line 158) | def update(self, **kwargs):
    method __getattr__ (line 165) | def __getattr__(self, attr):
    method __str__ (line 172) | def __str__(self):
    method synchronize_between_processes (line 178) | def synchronize_between_processes(self):
    method add_meter (line 182) | def add_meter(self, name, meter):
    method log_every (line 185) | def log_every(self, iterable, print_freq, header=None):
  function get_sha (line 259) | def get_sha():
  function collate_fn (line 280) | def collate_fn(batch):
  function _max_by_axis (line 286) | def _max_by_axis(the_list):
  class NestedTensor (line 294) | class NestedTensor(object):
    method __init__ (line 295) | def __init__(self, tensors, mask: Optional[Tensor]):
    method to (line 299) | def to(self, device):
    method decompose (line 309) | def decompose(self):
    method __repr__ (line 312) | def __repr__(self):
  function nested_tensor_from_tensor_list (line 316) | def nested_tensor_from_tensor_list(tensor_list: List[Tensor]):
  function _onnx_nested_tensor_from_tensor_list (line 337) | def _onnx_nested_tensor_from_tensor_list(tensor_list: List[Tensor]) -> N...
  function setup_for_distributed (line 363) | def setup_for_distributed(is_master):
  function is_dist_avail_and_initialized (line 379) | def is_dist_avail_and_initialized():
  function get_world_size (line 387) | def get_world_size():
  function get_rank (line 393) | def get_rank():
  function is_main_process (line 399) | def is_main_process():
  function save_on_master (line 403) | def save_on_master(*args, **kwargs):
  function init_distributed_mode (line 408) | def init_distributed_mode(args):
  function accuracy (line 437) | def accuracy(output, target, topk=(1,)):
  function interpolate (line 455) | def interpolate(input, size=None, scale_factor=None, mode="nearest", ali...

FILE: ape/modeling/ape_deta/segmentation.py
  class DETRsegm (line 20) | class DETRsegm(nn.Module):
    method __init__ (line 21) | def __init__(self, detr, freeze_detr=False):
    method forward (line 33) | def forward(self, samples):
  class MaskHeadSmallConv (line 66) | class MaskHeadSmallConv(nn.Module):
    method __init__ (line 72) | def __init__(self, dim, fpn_dims, context_dim):
    method forward (line 106) | def forward(self, x, bbox_mask, fpns):
  class MHAttentionMap (line 147) | class MHAttentionMap(nn.Module):
    method __init__ (line 150) | def __init__(self, query_dim, hidden_dim, num_heads, dropout=0, bias=T...
    method forward (line 165) | def forward(self, q, k, mask=None):
  function dice_loss (line 181) | def dice_loss(inputs, targets, num_boxes):
  function sigmoid_focal_loss (line 199) | def sigmoid_focal_loss(inputs, targets, num_boxes, alpha: float = 0.25, ...
  class PostProcessSegm (line 227) | class PostProcessSegm(nn.Module):
    method __init__ (line 228) | def __init__(self, threshold=0.5):
    method forward (line 233) | def forward(self, results, outputs, orig_target_sizes, max_target_sizes):
  class PostProcessPanoptic (line 254) | class PostProcessPanoptic(nn.Module):
    method __init__ (line 258) | def __init__(self, is_thing_map, threshold=0.85):
    method forward (line 269) | def forward(self, outputs, processed_sizes, target_sizes=None):

FILE: ape/modeling/backbone/utils_eva.py
  function window_partition (line 18) | def window_partition(x, window_size):
  function window_unpartition (line 42) | def window_unpartition(windows, window_size, pad_hw, hw):
  function get_rel_pos (line 65) | def get_rel_pos(q_size, k_size, rel_pos, interp_type):
  function add_decomposed_rel_pos (line 132) | def add_decomposed_rel_pos(attn, q, rel_pos_h, rel_pos_w, q_size, k_size...
  function get_abs_pos (line 164) | def get_abs_pos(abs_pos, has_cls_token, hw):
  class PatchEmbed (line 196) | class PatchEmbed(nn.Module):
    method __init__ (line 201) | def __init__(
    method forward (line 218) | def forward(self, x):

FILE: ape/modeling/backbone/utils_eva02.py
  function window_partition (line 19) | def window_partition(x, window_size):
  function window_unpartition (line 43) | def window_unpartition(windows, window_size, pad_hw, hw):
  function get_rel_pos (line 66) | def get_rel_pos(q_size, k_size, rel_pos):
  function add_decomposed_rel_pos (line 126) | def add_decomposed_rel_pos(attn, q, rel_pos_h, rel_pos_w, q_size, k_size):
  function get_abs_pos (line 158) | def get_abs_pos(abs_pos, has_cls_token, hw):
  class PatchEmbed (line 190) | class PatchEmbed(nn.Module):
    method __init__ (line 195) | def __init__(
    method forward (line 212) | def forward(self, x):
  function broadcat (line 230) | def broadcat(tensors, dim = -1):
  function rotate_half (line 248) | def rotate_half(x):
  class VisionRotaryEmbedding (line 256) | class VisionRotaryEmbedding(nn.Module):
    method __init__ (line 257) | def __init__(
    method forward (line 296) | def forward(self, t, start_index = 0):
  class VisionRotaryEmbeddingFast (line 307) | class VisionRotaryEmbeddingFast(nn.Module):
    method __init__ (line 308) | def __init__(
    method forward (line 346) | def forward(self, t): return  t * self.freqs_cos + rotate_half(t) * se...

FILE: ape/modeling/backbone/vit.py
  function get_vit_lr_decay_rate (line 8) | def get_vit_lr_decay_rate(name, lr_decay_rate=1.0, num_layers=12):

FILE: ape/modeling/backbone/vit_eva.py
  class LayerNormWithForceFP32 (line 37) | class LayerNormWithForceFP32(nn.Module):
    method __init__ (line 43) | def __init__(self, normalized_shape: _shape_t, eps: float = 1e-5, elem...
    method reset_parameters (line 58) | def reset_parameters(self) -> None:
    method forward (line 63) | def forward(self, input: Tensor) -> Tensor:
    method extra_repr (line 67) | def extra_repr(self) -> Tensor:
  class Attention (line 72) | class Attention(nn.Module):
    method __init__ (line 75) | def __init__(
    method forward (line 121) | def forward(self, x):
  class ResBottleneckBlock (line 149) | class ResBottleneckBlock(CNNBlockBase):
    method __init__ (line 155) | def __init__(
    method forward (line 201) | def forward(self, x):
  class Block (line 210) | class Block(nn.Module):
    method __init__ (line 213) | def __init__(
    method forward (line 285) | def forward(self, x):
  class ViT (line 311) | class ViT(Backbone):
    method __init__ (line 318) | def __init__(
    method _freeze_stages (line 434) | def _freeze_stages(self):
    method _init_weights (line 452) | def _init_weights(self, m):
    method forward (line 465) | def forward(self, x):
  class SimpleFeaturePyramid (line 479) | class SimpleFeaturePyramid(Backbone):
    method __init__ (line 485) | def __init__(
    method padding_constraints (line 587) | def padding_constraints(self):
    method forward (line 593) | def forward(self, x):
  function get_vit_lr_decay_rate (line 622) | def get_vit_lr_decay_rate(name, lr_decay_rate=1.0, num_layers=12):

FILE: ape/modeling/backbone/vit_eva02.py
  class xops_SwiGLU (line 43) | class xops_SwiGLU(nn.Module):
    method __init__ (line 49) | def __init__(
    method forward (line 84) | def forward(self, x: torch.Tensor) -> torch.Tensor:
    method _ordered_params (line 109) | def _ordered_params(
    method _packed_ordered_params (line 151) | def _packed_ordered_params(
  class SwiGLU (line 179) | class SwiGLU(nn.Module):
    method __init__ (line 180) | def __init__(self, in_features, hidden_features=None, out_features=Non...
    method forward (line 196) | def forward(self, x):
  class Attention (line 206) | class Attention(nn.Module):
    method __init__ (line 207) | def __init__(
    method forward (line 245) | def forward(self, x):
  class ResBottleneckBlock (line 294) | class ResBottleneckBlock(CNNBlockBase):
    method __init__ (line 300) | def __init__(
    method forward (line 346) | def forward(self, x):
  class Block (line 355) | class Block(nn.Module):
    method __init__ (line 358) | def __init__(
    method forward (line 437) | def forward(self, x):
  class ViT (line 461) | class ViT(Backbone):
    method __init__ (line 468) | def __init__(
    method _freeze_stages (line 595) | def _freeze_stages(self):
    method _init_weights (line 614) | def _init_weights(self, m):
    method forward (line 623) | def forward(self, x):
  class SimpleFeaturePyramid (line 637) | class SimpleFeaturePyramid(Backbone):
    method __init__ (line 643) | def __init__(
    method padding_constraints (line 745) | def padding_constraints(self):
    method forward (line 751) | def forward(self, x):
  function get_vit_lr_decay_rate (line 780) | def get_vit_lr_decay_rate(name, lr_decay_rate=1.0, num_layers=12):

FILE: ape/modeling/backbone/vit_eva_clip.py
  class LayerNorm (line 29) | class LayerNorm(nn.LayerNorm):
    method forward (line 32) | def forward(self, x: torch.Tensor):
  class DropPath (line 53) | class DropPath(nn.Module):
    method __init__ (line 56) | def __init__(self, drop_prob=None):
    method forward (line 60) | def forward(self, x):
    method extra_repr (line 63) | def extra_repr(self) -> str:
  class Mlp (line 67) | class Mlp(nn.Module):
    method __init__ (line 68) | def __init__(
    method forward (line 89) | def forward(self, x):
  class SwiGLU (line 101) | class SwiGLU(nn.Module):
    method __init__ (line 102) | def __init__(
    method forward (line 125) | def forward(self, x):
  class Attention (line 135) | class Attention(nn.Module):
    method __init__ (line 136) | def __init__(
    method forward (line 218) | def forward(self, x, rel_pos_bias=None, attn_mask=None):
  class ResBottleneckBlock (line 322) | class ResBottleneckBlock(CNNBlockBase):
    method __init__ (line 328) | def __init__(
    method forward (line 374) | def forward(self, x):
  class Block (line 383) | class Block(nn.Module):
    method __init__ (line 386) | def __init__(
    method forward (line 484) | def forward(self, x, rel_pos_bias=None, attn_mask=None):
  class ViT (line 570) | class ViT(Backbone):
    method __init__ (line 577) | def __init__(
    method _freeze_stages (line 715) | def _freeze_stages(self):
    method _init_weights (line 734) | def _init_weights(self, m):
    method forward (line 743) | def forward(self, x):
  class SimpleFeaturePyramid (line 757) | class SimpleFeaturePyramid(Backbone):
    method __init__ (line 763) | def __init__(
    method padding_constraints (line 865) | def padding_constraints(self):
    method forward (line 871) | def forward(self, x):
  function get_vit_lr_decay_rate (line 925) | def get_vit_lr_decay_rate(name, lr_decay_rate=1.0, num_layers=12):

FILE: ape/modeling/deta/assigner.py
  function nonzero_tuple (line 9) | def nonzero_tuple(x):
  class Matcher (line 22) | class Matcher(object):
    method __init__ (line 38) | def __init__(
    method __call__ (line 75) | def __call__(self, match_quality_matrix):
    method set_low_quality_matches_ (line 114) | def set_low_quality_matches_(self, match_labels, match_quality_matrix):
  function subsample_labels (line 131) | def subsample_labels(
  function sample_topk_per_gt (line 176) | def sample_topk_per_gt(pr_inds, gt_inds, iou, k):
  class Stage2Assigner (line 188) | class Stage2Assigner(nn.Module):
    method __init__ (line 189) | def __init__(self, num_queries, num_classes, max_k=4):
    method _sample_proposals (line 199) | def _sample_proposals(
    method forward (line 234) | def forward(self, outputs, targets, return_cost_matrix=False):
    method postprocess_indices (line 271) | def postprocess_indices(self, pr_inds, gt_inds, iou):
    method __repr__ (line 274) | def __repr__(self, _repr_indent=8):
  class Stage1Assigner (line 285) | class Stage1Assigner(nn.Module):
    method __init__ (line 286) | def __init__(self, t_low=0.3, t_high=0.7, max_k=4):
    method _subsample_labels (line 297) | def _subsample_labels(self, label):
    method forward (line 314) | def forward(self, outputs, targets):
    method postprocess_indices (line 347) | def postprocess_indices(self, pr_inds, gt_inds, iou):
    method __repr__ (line 350) | def __repr__(self, _repr_indent=8):

FILE: ape/modeling/deta/deformable_criterion.py
  function sigmoid_ce_loss (line 23) | def sigmoid_ce_loss(
  function calculate_uncertainty (line 43) | def calculate_uncertainty(logits):
  class DeformableCriterion (line 60) | class DeformableCriterion(SetCriterion):
    method __init__ (line 65) | def __init__(
    method get_fed_loss_classes (line 155) | def get_fed_loss_classes(self, gt_classes, num_fed_loss_classes, num_c...
    method loss_labels (line 183) | def loss_labels(self, outputs, targets, indices, num_boxes):
    method loss_boxes (line 267) | def loss_boxes(self, outputs, targets, indices, num_boxes):
    method loss_boxes_panoptic (line 292) | def loss_boxes_panoptic(self, outputs, targets, indices, num_boxes):
    method loss_masks (line 327) | def loss_masks(self, outputs, targets, indices, num_boxes):
    method loss_masks_maskdino (line 366) | def loss_masks_maskdino(self, outputs, targets, indices, num_boxes):
    method get_loss (line 432) | def get_loss(self, loss, outputs, targets, indices, num_boxes, **kwargs):
    method forward (line 443) | def forward(self, outputs, targets):
    method __repr__ (line 515) | def __repr__(self):

FILE: ape/modeling/deta/deformable_detr.py
  class DeformableDETR (line 18) | class DeformableDETR(nn.Module):
    method __init__ (line 48) | def __init__(
    method device (line 176) | def device(self):
    method _move_to_current_device (line 179) | def _move_to_current_device(self, x):
    method forward (line 182) | def forward(self, batched_inputs, do_postprocess=True):
    method _set_aux_loss (line 279) | def _set_aux_loss(self, outputs_class, outputs_coord):
    method inference (line 285) | def inference(self, box_cls, box_pred, image_sizes):
    method prepare_targets (line 361) | def prepare_targets(self, targets):
    method preprocess_image (line 372) | def preprocess_image(self, batched_inputs):
    method _postprocess (line 384) | def _postprocess(instances, batched_inputs: List[Dict[str, torch.Tenso...
  class NMSPostProcess (line 399) | class NMSPostProcess(nn.Module):
    method forward (line 403) | def forward(self, outputs, target_sizes, select_box_nums_for_evaluation):

FILE: ape/modeling/deta/deformable_detr_segm.py
  class DeformableDETRSegm (line 30) | class DeformableDETRSegm(DeformableDETR):
    method __init__ (line 60) | def __init__(
    method forward (line 139) | def forward(self, batched_inputs, do_postprocess=True):
    method maskdino_mask_features (line 411) | def maskdino_mask_features(self, encode_feats, multi_level_feats, mult...
    method _set_aux_loss (line 436) | def _set_aux_loss(self, outputs_class, outputs_coord, outputs_mask):
    method inference (line 442) | def inference(self, box_cls, box_pred, image_sizes):
    method prepare_targets (line 511) | def prepare_targets(self, targets):
    method preprocess_image (line 545) | def preprocess_image(self, batched_inputs):
    method _postprocess_instance (line 557) | def _postprocess_instance(
    method _postprocess_semantic (line 574) | def _postprocess_semantic(
    method _postprocess_panoptic (line 602) | def _postprocess_panoptic(
    method visualize_training (line 686) | def visualize_training(self, batched_inputs, output, images):
    method visualize_inference_panoptic (line 809) | def visualize_inference_panoptic(self, batched_inputs, results):
  class NMSPostProcess (line 869) | class NMSPostProcess(nn.Module):
    method forward (line 873) | def forward(self, outputs, target_sizes, select_box_nums_for_evaluation):
  function is_thing_stuff_overlap (line 929) | def is_thing_stuff_overlap(metadata):

FILE: ape/modeling/deta/deformable_transformer.py
  class DeformableDetrTransformerEncoder (line 18) | class DeformableDetrTransformerEncoder(TransformerLayerSequence):
    method __init__ (line 19) | def __init__(
    method forward (line 59) | def forward(
  class DeformableDetrTransformerDecoder (line 89) | class DeformableDetrTransformerDecoder(TransformerLayerSequence):
    method __init__ (line 90) | def __init__(
    method forward (line 134) | def forward(
  class DeformableDetrTransformer (line 197) | class DeformableDetrTransformer(nn.Module):
    method __init__ (line 209) | def __init__(
    method init_weights (line 242) | def init_weights(self):
    method gen_encoder_output_proposals (line 254) | def gen_encoder_output_proposals(self, memory, memory_padding_mask, sp...
    method get_reference_points (line 297) | def get_reference_points(spatial_shapes, valid_ratios, device):
    method get_valid_ratio (line 327) | def get_valid_ratio(self, mask):
    method get_proposal_pos_embed (line 337) | def get_proposal_pos_embed(self, proposals, num_pos_feats=128, tempera...
    method forward (line 347) | def forward(

FILE: ape/modeling/deta/misc.py
  class SmoothedValue (line 26) | class SmoothedValue(object):
    method __init__ (line 31) | def __init__(self, window_size=20, fmt=None):
    method update (line 39) | def update(self, value, n=1):
    method synchronize_between_processes (line 44) | def synchronize_between_processes(self):
    method median (line 58) | def median(self):
    method avg (line 63) | def avg(self):
    method global_avg (line 68) | def global_avg(self):
    method max (line 72) | def max(self):
    method value (line 76) | def value(self):
    method __str__ (line 79) | def __str__(self):
  function all_gather (line 89) | def all_gather(data):
  function reduce_dict (line 127) | def reduce_dict(input_dict, average=True):
  class MetricLogger (line 153) | class MetricLogger(object):
    method __init__ (line 154) | def __init__(self, delimiter="\t"):
    method update (line 158) | def update(self, **kwargs):
    method __getattr__ (line 165) | def __getattr__(self, attr):
    method __str__ (line 172) | def __str__(self):
    method synchronize_between_processes (line 178) | def synchronize_between_processes(self):
    method add_meter (line 182) | def add_meter(self, name, meter):
    method log_every (line 185) | def log_every(self, iterable, print_freq, header=None):
  function get_sha (line 259) | def get_sha():
  function collate_fn (line 280) | def collate_fn(batch):
  function _max_by_axis (line 286) | def _max_by_axis(the_list):
  class NestedTensor (line 294) | class NestedTensor(object):
    method __init__ (line 295) | def __init__(self, tensors, mask: Optional[Tensor]):
    method to (line 299) | def to(self, device):
    method decompose (line 309) | def decompose(self):
    method __repr__ (line 312) | def __repr__(self):
  function nested_tensor_from_tensor_list (line 316) | def nested_tensor_from_tensor_list(tensor_list: List[Tensor]):
  function _onnx_nested_tensor_from_tensor_list (line 337) | def _onnx_nested_tensor_from_tensor_list(tensor_list: List[Tensor]) -> N...
  function setup_for_distributed (line 363) | def setup_for_distributed(is_master):
  function is_dist_avail_and_initialized (line 379) | def is_dist_avail_and_initialized():
  function get_world_size (line 387) | def get_world_size():
  function get_rank (line 393) | def get_rank():
  function is_main_process (line 399) | def is_main_process():
  function save_on_master (line 403) | def save_on_master(*args, **kwargs):
  function init_distributed_mode (line 408) | def init_distributed_mode(args):
  function accuracy (line 437) | def accuracy(output, target, topk=(1,)):
  function interpolate (line 455) | def interpolate(input, size=None, scale_factor=None, mode="nearest", ali...

FILE: ape/modeling/deta/segmentation.py
  class DETRsegm (line 20) | class DETRsegm(nn.Module):
    method __init__ (line 21) | def __init__(self, detr, freeze_detr=False):
    method forward (line 33) | def forward(self, samples):
  class MaskHeadSmallConv (line 66) | class MaskHeadSmallConv(nn.Module):
    method __init__ (line 72) | def __init__(self, dim, fpn_dims, context_dim):
    method forward (line 106) | def forward(self, x, bbox_mask, fpns):
  class MHAttentionMap (line 147) | class MHAttentionMap(nn.Module):
    method __init__ (line 150) | def __init__(self, query_dim, hidden_dim, num_heads, dropout=0, bias=T...
    method forward (line 165) | def forward(self, q, k, mask=None):
  function dice_loss (line 181) | def dice_loss(inputs, targets, num_boxes):
  function sigmoid_focal_loss (line 199) | def sigmoid_focal_loss(inputs, targets, num_boxes, alpha: float = 0.25, ...
  class PostProcessSegm (line 227) | class PostProcessSegm(nn.Module):
    method __init__ (line 228) | def __init__(self, threshold=0.5):
    method forward (line 233) | def forward(self, results, outputs, orig_target_sizes, max_target_sizes):
  class PostProcessPanoptic (line 254) | class PostProcessPanoptic(nn.Module):
    method __init__ (line 258) | def __init__(self, is_thing_map, threshold=0.85):
    method forward (line 269) | def forward(self, outputs, processed_sizes, target_sizes=None):

FILE: ape/modeling/text/bert_wrapper.py
  class Bert (line 16) | class Bert(nn.Module):
    method __init__ (line 17) | def __init__(
    method device (line 48) | def device(self):
    method forward_text (line 53) | def forward_text(self, text_list, cache=False):

FILE: ape/modeling/text/clip_wrapper.py
  class LayerNorm (line 15) | class LayerNorm(nn.LayerNorm):
    method forward (line 18) | def forward(self, x: torch.Tensor):
  class QuickGELU (line 24) | class QuickGELU(nn.Module):
    method forward (line 25) | def forward(self, x: torch.Tensor):
  class ResidualAttentionBlock (line 29) | class ResidualAttentionBlock(nn.Module):
    method __init__ (line 30) | def __init__(self, d_model: int, n_head: int, attn_mask: torch.Tensor ...
    method attention (line 47) | def attention(self, x: torch.Tensor):
    method forward (line 55) | def forward(self, x: torch.Tensor):
  class Transformer (line 61) | class Transformer(nn.Module):
    method __init__ (line 62) | def __init__(self, width: int, layers: int, heads: int, attn_mask: tor...
    method forward (line 70) | def forward(self, x: torch.Tensor):
  class CLIPTEXT (line 74) | class CLIPTEXT(nn.Module):
    method __init__ (line 75) | def __init__(
    method initialize_parameters (line 107) | def initialize_parameters(self):
    method build_attention_mask (line 123) | def build_attention_mask(self):
    method device (line 130) | def device(self):
    method dtype (line 134) | def dtype(self):
    method tokenize (line 137) | def tokenize(self, texts: Union[str, List[str]], context_length: int =...
    method encode_text (line 155) | def encode_text(self, text):
    method forward (line 165) | def forward(self, captions):
  function build_clip_text_encoder (line 174) | def build_clip_text_encoder(model_path, pretrain=True):
  function get_clip_embeddings (line 215) | def get_clip_embeddings(text_model, vocabulary, prompt="a "):

FILE: ape/modeling/text/clip_wrapper_eva01.py
  class EVA01CLIP (line 10) | class EVA01CLIP(nn.Module):
    method __init__ (line 11) | def __init__(
    method device (line 41) | def device(self):
    method infer_image (line 44) | def infer_image(self, features):
    method encode_text (line 51) | def encode_text(self, text_list, cache=False):
    method forward_text (line 85) | def forward_text(self, text_list, cache=False):
    method custom_encode_text (line 129) | def custom_encode_text(self, text, m):

FILE: ape/modeling/text/clip_wrapper_eva02.py
  class EVA02CLIP (line 8) | class EVA02CLIP(nn.Module):
    method __init__ (line 9) | def __init__(
    method device (line 44) | def device(self):
    method infer_image (line 47) | def infer_image(self, features):
    method encode_text (line 54) | def encode_text(self, text_list, cache=False):
    method forward_text (line 88) | def forward_text(self, text_list, cache=False):
    method custom_encode_text (line 132) | def custom_encode_text(self, text, m, normalize: bool = False):

FILE: ape/modeling/text/clip_wrapper_open.py
  function build_openclip_text_encoder (line 11) | def build_openclip_text_encoder(open_clip_name, open_clip_model):
  function get_openclip_embeddings (line 31) | def get_openclip_embeddings(model, tokenizer, vocabulary, prompt="a "):

FILE: ape/modeling/text/eva01_clip/clip.py
  function _download (line 43) | def _download(url: str, root: str):
  function _convert_image_to_rgb (line 75) | def _convert_image_to_rgb(image):
  function _transform (line 79) | def _transform(n_px):
  function available_models (line 89) | def available_models() -> List[str]:
  function load (line 94) | def load(name: str, device: Union[str, torch.device] = "cuda" if torch.c...
  function tokenize (line 196) | def tokenize(texts: Union[str, List[str]], context_length: int = 77, tru...

FILE: ape/modeling/text/eva01_clip/eva_clip.py
  function _natural_key (line 23) | def _natural_key(string_):
  function _rescan_model_configs (line 27) | def _rescan_model_configs():
  function list_models (line 50) | def list_models():
  function add_model_config (line 55) | def add_model_config(path):
  function get_model_config (line 62) | def get_model_config(model_name):
  function load_state_dict (line 68) | def load_state_dict(checkpoint_path: str, map_location: str='cpu', model...
  function load_checkpoint (line 81) | def load_checkpoint(model, checkpoint_path, model_key="model|module|stat...
  function create_model (line 87) | def create_model(
  function _convert_to_rgb (line 123) | def _convert_to_rgb(image):
  function image_transform (line 126) | def image_transform(
  function build_eva_model_and_transforms (line 156) | def build_eva_model_and_transforms(

FILE: ape/modeling/text/eva01_clip/eva_model.py
  class LayerNorm (line 19) | class LayerNorm(nn.LayerNorm):
    method forward (line 22) | def forward(self, x: torch.Tensor):
    method forward (line 38) | def forward(self, x: torch.Tensor):
  class LayerNorm (line 35) | class LayerNorm(nn.LayerNorm):
    method forward (line 22) | def forward(self, x: torch.Tensor):
    method forward (line 38) | def forward(self, x: torch.Tensor):
  class QuickGELU (line 44) | class QuickGELU(nn.Module):
    method forward (line 46) | def forward(self, x: torch.Tensor):
  class Attention (line 50) | class Attention(nn.Module):
    method __init__ (line 51) | def __init__(
    method forward (line 90) | def forward(self, x, attn_mask: Optional[torch.Tensor] = None):
  class ResidualAttentionBlock (line 126) | class ResidualAttentionBlock(nn.Module):
    method __init__ (line 127) | def __init__(
    method attention (line 160) | def attention(self, x: torch.Tensor, attn_mask: Optional[torch.Tensor]...
    method cross_attention (line 168) | def cross_attention(self, x: torch.Tensor, context: torch.Tensor, attn...
    method forward (line 172) | def forward(self, x: torch.Tensor, attn_mask: Optional[torch.Tensor] =...
  class Transformer (line 177) | class Transformer(nn.Module):
    method __init__ (line 178) | def __init__(self, width: int, layers: int, heads: int,  mlp_ratio: fl...
    method forward (line 188) | def forward(self, x: torch.Tensor, attn_mask: Optional[torch.Tensor] =...
  class TextTransformer (line 193) | class TextTransformer(nn.Module):
    method __init__ (line 194) | def __init__(
    method init_parameters (line 223) | def init_parameters(self):
    method build_attention_mask (line 240) | def build_attention_mask(self):
    method forward_features (line 248) | def forward_features(self, text: torch.Tensor):
    method forward (line 262) | def forward(self, x: torch.Tensor):
  class CLIPVisionCfg (line 269) | class CLIPVisionCfg:
  class CLIPTextCfg (line 283) | class CLIPTextCfg:
  class EVA_CLIP (line 292) | class EVA_CLIP(nn.Module):
    method __init__ (line 293) | def __init__(
    method encode_image (line 337) | def encode_image(self, image):
    method encode_text (line 340) | def encode_text(self, text):
    method forward (line 343) | def forward(self, image, text):
  function convert_weights_to_fp16 (line 357) | def convert_weights_to_fp16(model: nn.Module):

FILE: ape/modeling/text/eva01_clip/model.py
  class Bottleneck (line 10) | class Bottleneck(nn.Module):
    method __init__ (line 13) | def __init__(self, inplanes, planes, stride=1):
    method forward (line 40) | def forward(self, x: torch.Tensor):
  class AttentionPool2d (line 56) | class AttentionPool2d(nn.Module):
    method __init__ (line 57) | def __init__(self, spacial_dim: int, embed_dim: int, num_heads: int, o...
    method forward (line 66) | def forward(self, x, return_all_tokens=False):
  class ModifiedResNet (line 95) | class ModifiedResNet(nn.Module):
    method __init__ (line 103) | def __init__(self, layers, output_dim, heads, input_resolution=224, wi...
    method _make_layer (line 128) | def _make_layer(self, planes, blocks, stride=1):
    method forward (line 137) | def forward(self, x, return_side_out=False, return_all_tokens=False):
  class LayerNorm (line 166) | class LayerNorm(nn.LayerNorm):
    method forward (line 169) | def forward(self, x: torch.Tensor):
  class QuickGELU (line 175) | class QuickGELU(nn.Module):
    method forward (line 176) | def forward(self, x: torch.Tensor):
  class ResidualAttentionBlock (line 180) | class ResidualAttentionBlock(nn.Module):
    method __init__ (line 181) | def __init__(self, d_model: int, n_head: int, attn_mask: torch.Tensor ...
    method attention (line 194) | def attention(self, x: torch.Tensor):
    method forward (line 199) | def forward(self, x: torch.Tensor):
  class Transformer (line 205) | class Transformer(nn.Module):
    method __init__ (line 206) | def __init__(self, width: int, layers: int, heads: int, attn_mask: tor...
    method forward (line 212) | def forward(self, x: torch.Tensor):
  class VisionTransformer (line 216) | class VisionTransformer(nn.Module):
    method __init__ (line 217) | def __init__(self, input_resolution: int, patch_size: int, width: int,...
    method interpolate_pos_encoding (line 236) | def interpolate_pos_encoding(self, x, w, h):
    method forward (line 253) | def forward(self, x: torch.Tensor):
  class CLIP (line 277) | class CLIP(nn.Module):
    method __init__ (line 278) | def __init__(self,
    method initialize_parameters (line 333) | def initialize_parameters(self):
    method build_attention_mask (line 362) | def build_attention_mask(self):
    method dtype (line 371) | def dtype(self):
    method encode_image (line 374) | def encode_image(self, image):
    method encode_text (line 377) | def encode_text(self, text):
    method forward (line 392) | def forward(self, image, text):
  function convert_weights (line 409) | def convert_weights(model: nn.Module):
  function build_model (line 434) | def build_model(state_dict: dict):

FILE: ape/modeling/text/eva01_clip/simple_tokenizer.py
  function default_bpe (line 11) | def default_bpe():
  function bytes_to_unicode (line 16) | def bytes_to_unicode():
  function get_pairs (line 38) | def get_pairs(word):
  function basic_clean (line 50) | def basic_clean(text):
  function whitespace_clean (line 56) | def whitespace_clean(text):
  class SimpleTokenizer (line 62) | class SimpleTokenizer(object):
    method __init__ (line 63) | def __init__(self, bpe_path: str = default_bpe()):
    method bpe (line 80) | def bpe(self, token):
    method encode (line 121) | def encode(self, text):
    method decode (line 129) | def decode(self, tokens):

FILE: ape/modeling/text/eva01_clip/vit_model.py
  function _cfg (line 26) | def _cfg(url='', **kwargs):
  class DropPath (line 36) | class DropPath(nn.Module):
    method __init__ (line 39) | def __init__(self, drop_prob=None):
    method forward (line 43) | def forward(self, x):
    method extra_repr (line 46) | def extra_repr(self) -> str:
  class Mlp (line 50) | class Mlp(nn.Module):
    method __init__ (line 51) | def __init__(self, in_features, hidden_features=None, out_features=Non...
    method forward (line 60) | def forward(self, x):
  class Attention (line 70) | class Attention(nn.Module):
    method __init__ (line 71) | def __init__(
    method forward (line 128) | def forward(self, x, rel_pos_bias=None):
  class Block (line 173) | class Block(nn.Module):
    method __init__ (line 175) | def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=False, qk_sc...
    method forward (line 197) | def forward(self, x, rel_pos_bias=None):
  class PatchEmbed (line 207) | class PatchEmbed(nn.Module):
    method __init__ (line 210) | def __init__(self, img_size=224, patch_size=16, in_chans=3, embed_dim=...
    method forward (line 222) | def forward(self, x, **kwargs):
  class RelativePositionBias (line 231) | class RelativePositionBias(nn.Module):
    method __init__ (line 233) | def __init__(self, window_size, num_heads):
    method forward (line 262) | def forward(self):
  class VisionTransformer (line 270) | class VisionTransformer(nn.Module):
    method __init__ (line 273) | def __init__(self, img_size=224, patch_size=16, in_chans=3, num_classe...
    method fix_init_weight (line 326) | def fix_init_weight(self):
    method _init_weights (line 334) | def _init_weights(self, m):
    method get_classifier (line 343) | def get_classifier(self):
    method reset_classifier (line 346) | def reset_classifier(self, num_classes, global_pool=''):
    method forward_features (line 350) | def forward_features(self, x):
    method forward (line 368) | def forward(self, x):
    method get_intermediate_layers (line 373) | def get_intermediate_layers(self, x):

FILE: ape/modeling/text/eva02_clip/eva_vit_model.py
  class DropPath (line 33) | class DropPath(nn.Module):
    method __init__ (line 36) | def __init__(self, drop_prob=None):
    method forward (line 40) | def forward(self, x):
    method extra_repr (line 43) | def extra_repr(self) -> str:
  class Mlp (line 47) | class Mlp(nn.Module):
    method __init__ (line 48) | def __init__(
    method forward (line 70) | def forward(self, x):
  class SwiGLU (line 81) | class SwiGLU(nn.Module):
    method __init__ (line 82) | def __init__(self, in_features, hidden_features=None, out_features=Non...
    method forward (line 97) | def forward(self, x):
  class Attention (line 106) | class Attention(nn.Module):
    method __init__ (line 107) | def __init__(
    method forward (line 173) | def forward(self, x, rel_pos_bias=None, attn_mask=None):
  class Block (line 246) | class Block(nn.Module):
    method __init__ (line 248) | def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=False, qk_sc...
    method forward (line 287) | def forward(self, x, rel_pos_bias=None, attn_mask=None):
  class PatchEmbed (line 305) | class PatchEmbed(nn.Module):
    method __init__ (line 308) | def __init__(self, img_size=224, patch_size=16, in_chans=3, embed_dim=...
    method forward (line 320) | def forward(self, x, **kwargs):
  class RelativePositionBias (line 329) | class RelativePositionBias(nn.Module):
    method __init__ (line 331) | def __init__(self, window_size, num_heads):
    method forward (line 358) | def forward(self):
  class EVAVisionTransformer (line 366) | class EVAVisionTransformer(nn.Module):
    method __init__ (line 369) | def __init__(self, img_size=224, patch_size=16, in_chans=3, num_classe...
    method fix_init_weight (line 443) | def fix_init_weight(self):
    method get_cast_dtype (line 454) | def get_cast_dtype(self) -> torch.dtype:
    method _init_weights (line 457) | def _init_weights(self, m):
    method get_num_layers (line 466) | def get_num_layers(self):
    method lock (line 469) | def lock(self, unlocked_groups=0, freeze_bn_stats=False):
    method set_grad_checkpointing (line 475) | def set_grad_checkpointing(self, enable=True):
    method no_weight_decay (line 479) | def no_weight_decay(self):
    method get_classifier (line 482) | def get_classifier(self):
    method reset_classifier (line 485) | def reset_classifier(self, num_classes, global_pool=''):
    method forward_features (line 489) | def forward_features(self, x, return_all_features=False):
    method forward (line 526) | def forward(self, x, return_all_features=False):

FILE: ape/modeling/text/eva02_clip/factory.py
  function _natural_key (line 25) | def _natural_key(string_):
  function _rescan_model_configs (line 29) | def _rescan_model_configs():
  function list_models (line 53) | def list_models():
  function add_model_config (line 58) | def add_model_config(path):
  function get_model_config (line 66) | def get_model_config(model_name):
  function get_tokenizer (line 73) | def get_tokenizer(model_name):
  function load_state_dict (line 80) | def load_state_dict(checkpoint_path: str, map_location: str='cpu', model...
  function load_checkpoint (line 110) | def load_checkpoint(model, checkpoint_path, model_key="model|module|stat...
  function load_clip_visual_state_dict (line 131) | def load_clip_visual_state_dict(checkpoint_path: str, map_location: str=...
  function load_clip_text_state_dict (line 144) | def load_clip_text_state_dict(checkpoint_path: str, map_location: str='c...
  function get_pretrained_tag (line 152) | def get_pretrained_tag(pretrained_model):
  function load_pretrained_checkpoint (line 163) | def load_pretrained_checkpoint(
  function create_model (line 211) | def create_model(
  function create_model_and_transforms (line 358) | def create_model_and_transforms(
  function create_model_from_pretrained (line 412) | def create_model_from_pretrained(

FILE: ape/modeling/text/eva02_clip/hf_model.py
  class BaseModelOutput (line 21) | class BaseModelOutput:
  class PretrainedConfig (line 25) | class PretrainedConfig:
  function _camel2snake (line 31) | def _camel2snake(s):
  function register_pooler (line 37) | def register_pooler(cls):
  class MeanPooler (line 44) | class MeanPooler(nn.Module):
    method forward (line 46) | def forward(self, x:BaseModelOutput, attention_mask:TensorType):
  class MaxPooler (line 51) | class MaxPooler(nn.Module):
    method forward (line 53) | def forward(self, x:BaseModelOutput, attention_mask:TensorType):
  class ClsPooler (line 58) | class ClsPooler(nn.Module):
    method __init__ (line 60) | def __init__(self, use_pooler_output=True):
    method forward (line 65) | def forward(self, x:BaseModelOutput, attention_mask:TensorType):
  class HFTextEncoder (line 75) | class HFTextEncoder(nn.Module):
    method __init__ (line 77) | def __init__(
    method mask (line 152) | def mask(self, input_ids, vocab_size, device, targets=None, masked_ind...
    method forward_mlm (line 177) | def forward_mlm(self, input_ids, image_embeds, mlm_probability=0.25):
    method forward (line 213) | def forward(self, x:TensorType) -> TensorType:
    method lock (line 220) | def lock(self, unlocked_layers:int=0, freeze_layer_norm:bool=True):
    method set_grad_checkpointing (line 239) | def set_grad_checkpointing(self, enable=True):
    method get_num_layers (line 242) | def get_num_layers(self):
    method init_parameters (line 247) | def init_parameters(self):

FILE: ape/modeling/text/eva02_clip/loss.py
  function gather_features (line 21) | def gather_features(
  class ClipLoss (line 70) | class ClipLoss(nn.Module):
    method __init__ (line 72) | def __init__(
    method forward (line 95) | def forward(self, image_features, text_features, logit_scale=1.):

FILE: ape/modeling/text/eva02_clip/model.py
  class CLIPVisionCfg (line 38) | class CLIPVisionCfg:
  class CLIPTextCfg (line 67) | class CLIPTextCfg:
  function get_cast_dtype (line 84) | def get_cast_dtype(precision: str):
  function _build_vision_tower (line 93) | def _build_vision_tower(
  function _build_text_tower (line 174) | def _build_text_tower(
  class CLIP (line 211) | class CLIP(nn.Module):
    method __init__ (line 212) | def __init__(
    method lock_image_tower (line 234) | def lock_image_tower(self, unlocked_groups=0, freeze_bn_stats=False):
    method set_grad_checkpointing (line 239) | def set_grad_checkpointing(self, enable=True):
    method no_weight_decay (line 244) | def no_weight_decay(self):
    method encode_image (line 247) | def encode_image(self, image, normalize: bool = False):
    method encode_text (line 251) | def encode_text(self, text, normalize: bool = False):
    method forward (line 265) | def forward(self, image, text):
  class CustomCLIP (line 271) | class CustomCLIP(nn.Module):
    method __init__ (line 272) | def __init__(
    method lock_image_tower (line 286) | def lock_image_tower(self, unlocked_groups=0, freeze_bn_stats=False):
    method lock_text_tower (line 290) | def lock_text_tower(self, unlocked_layers:int=0, freeze_layer_norm:boo...
    method set_grad_checkpointing (line 294) | def set_grad_checkpointing(self, enable=True):
    method no_weight_decay (line 299) | def no_weight_decay(self):
    method encode_image (line 302) | def encode_image(self, image, normalize: bool = False):
    method encode_text (line 306) | def encode_text(self, text, normalize: bool = False):
    method forward (line 310) | def forward(self, image, text):
  function convert_weights_to_lp (line 316) | def convert_weights_to_lp(model: nn.Module, dtype=torch.float16):
  function convert_to_custom_text_state_dict (line 348) | def convert_to_custom_text_state_dict(state_dict: dict):
  function build_model_from_openai_state_dict (line 367) | def build_model_from_openai_state_dict(
  function trace_model (line 427) | def trace_model(model, batch_size=256, device=torch.device('cpu')):

FILE: ape/modeling/text/eva02_clip/modified_resnet.py
  class Bottleneck (line 10) | class Bottleneck(nn.Module):
    method __init__ (line 13) | def __init__(self, inplanes, planes, stride=1):
    method forward (line 42) | def forward(self, x: torch.Tensor):
  class AttentionPool2d (line 58) | class AttentionPool2d(nn.Module):
    method __init__ (line 59) | def __init__(self, spacial_dim: int, embed_dim: int, num_heads: int, o...
    method forward (line 68) | def forward(self, x):
  class ModifiedResNet (line 95) | class ModifiedResNet(nn.Module):
    method __init__ (line 103) | def __init__(self, layers, output_dim, heads, image_size=224, width=64):
    method _make_layer (line 132) | def _make_layer(self, planes, blocks, stride=1):
    method init_parameters (line 141) | def init_parameters(self):
    method lock (line 154) | def lock(self, unlocked_groups=0, freeze_bn_stats=False):
    method set_grad_checkpointing (line 162) | def set_grad_checkpointing(self, enable=True):
    method stem (line 166) | def stem(self, x):
    method forward (line 173) | def forward(self, x):

FILE: ape/modeling/text/eva02_clip/openai.py
  function list_openai_models (line 18) | def list_openai_models() -> List[str]:
  function load_openai_model (line 23) | def load_openai_model(

FILE: ape/modeling/text/eva02_clip/pretrained.py
  function _pcfg (line 18) | def _pcfg(url='', hf_hub='', filename='', mean=None, std=None):
  function _clean_tag (line 191) | def _clean_tag(tag: str):
  function list_pretrained (line 196) | def list_pretrained(as_str: bool = False):
  function list_pretrained_models_by_tag (line 203) | def list_pretrained_models_by_tag(tag: str):
  function list_pretrained_tags_by_model (line 213) | def list_pretrained_tags_by_model(model: str):
  function is_pretrained_cfg (line 221) | def is_pretrained_cfg(model: str, tag: str):
  function get_pretrained_cfg (line 227) | def get_pretrained_cfg(model: str, tag: str):
  function get_pretrained_url (line 234) | def get_pretrained_url(model: str, tag: str):
  function download_pretrained_from_url (line 239) | def download_pretrained_from_url(
  function has_hf_hub (line 285) | def has_hf_hub(necessary=False):
  function download_pretrained_from_hf (line 293) | def download_pretrained_from_hf(
  function download_pretrained (line 304) | def download_pretrained(

FILE: ape/modeling/text/eva02_clip/rope.py
  function broadcat (line 7) | def broadcat(tensors, dim = -1):
  function rotate_half (line 23) | def rotate_half(x):
  class VisionRotaryEmbedding (line 30) | class VisionRotaryEmbedding(nn.Module):
    method __init__ (line 31) | def __init__(
    method forward (line 70) | def forward(self, t, start_index = 0):
  class VisionRotaryEmbeddingFast (line 79) | class VisionRotaryEmbeddingFast(nn.Module):
    method __init__ (line 80) | def __init__(
    method forward (line 121) | def forward(self, t, patch_indices_keep=None):

FILE: ape/modeling/text/eva02_clip/timm_model.py
  class TimmModel (line 28) | class TimmModel(nn.Module):
    method __init__ (line 33) | def __init__(
    method lock (line 80) | def lock(self, unlocked_groups=0, freeze_bn_stats=False):
    method set_grad_checkpointing (line 113) | def set_grad_checkpointing(self, enable=True):
    method forward (line 119) | def forward(self, x):

FILE: ape/modeling/text/eva02_clip/tokenizer.py
  function default_bpe (line 21) | def default_bpe():
  function bytes_to_unicode (line 26) | def bytes_to_unicode():
  function get_pairs (line 48) | def get_pairs(word):
  function basic_clean (line 60) | def basic_clean(text):
  function whitespace_clean (line 66) | def whitespace_clean(text):
  class SimpleTokenizer (line 72) | class SimpleTokenizer(object):
    method __init__ (line 73) | def __init__(self, bpe_path: str = default_bpe(), special_tokens=None):
    method bpe (line 98) | def bpe(self, token):
    method encode (line 139) | def encode(self, text):
    method decode (line 147) | def decode(self, tokens):
  function tokenize (line 156) | def tokenize(texts: Union[str, List[str]], context_length: int = 77) -> ...
  class HFTokenizer (line 188) | class HFTokenizer:
    method __init__ (line 190) | def __init__(self, tokenizer_name:str):
    method __call__ (line 194) | def __call__(self, texts:Union[str, List[str]], context_length:int=77)...

FILE: ape/modeling/text/eva02_clip/transform.py
  class ResizeMaxSize (line 13) | class ResizeMaxSize(nn.Module):
    method __init__ (line 15) | def __init__(self, max_size, interpolation=InterpolationMode.BICUBIC, ...
    method forward (line 24) | def forward(self, img):
  function _convert_to_rgb (line 39) | def _convert_to_rgb(image):
  function image_transform (line 60) | def image_transform(

FILE: ape/modeling/text/eva02_clip/transformer.py
  class LayerNormFp32 (line 36) | class LayerNormFp32(nn.LayerNorm):
    method __init__ (line 38) | def __init__(self, *args, **kwargs):
    method forward (line 41) | def forward(self, x: torch.Tensor):
  class LayerNorm (line 52) | class LayerNorm(nn.LayerNorm):
    method forward (line 55) | def forward(self, x: torch.Tensor):
  class QuickGELU (line 60) | class QuickGELU(nn.Module):
    method forward (line 62) | def forward(self, x: torch.Tensor):
  class LayerScale (line 66) | class LayerScale(nn.Module):
    method __init__ (line 67) | def __init__(self, dim, init_values=1e-5, inplace=False):
    method forward (line 72) | def forward(self, x):
  class PatchDropout (line 75) | class PatchDropout(nn.Module):
    method __init__ (line 80) | def __init__(self, prob, exclude_first_token=True):
    method forward (line 87) | def forward(self, x):
  function _in_projection_packed (line 119) | def _in_projection_packed(
  class Attention (line 150) | class Attention(nn.Module):
    method __init__ (line 151) | def __init__(
    method forward (line 195) | def forward(self, x, attn_mask: Optional[torch.Tensor] = None):
  class CustomAttention (line 243) | class CustomAttention(nn.Module):
    method __init__ (line 244) | def __init__(
    method forward (line 286) | def forward(self, query: torch.Tensor, key: torch.Tensor, value: torch...
  class CustomResidualAttentionBlock (line 339) | class CustomResidualAttentionBlock(nn.Module):
    method __init__ (line 340) | def __init__(
    method forward (line 384) | def forward(self, q: torch.Tensor, k: torch.Tensor, v: torch.Tensor, a...
  class CustomTransformer (line 389) | class CustomTransformer(nn.Module):
    method __init__ (line 390) | def __init__(
    method get_cast_dtype (line 429) | def get_cast_dtype(self) -> torch.dtype:
    method forward (line 432) | def forward(self, q: torch.Tensor, k: torch.Tensor = None, v: torch.Te...
  class ResidualAttentionBlock (line 443) | class ResidualAttentionBlock(nn.Module):
    method __init__ (line 444) | def __init__(
    method attention (line 474) | def attention(self, x: torch.Tensor, attn_mask: Optional[torch.Tensor]...
    method forward (line 480) | def forward(self, x: torch.Tensor, attn_mask: Optional[torch.Tensor] =...
  class Transformer (line 485) | class Transformer(nn.Module):
    method __init__ (line 486) | def __init__(
    method get_cast_dtype (line 508) | def get_cast_dtype(self) -> torch.dtype:
    method forward (line 511) | def forward(self, x: torch.Tensor, attn_mask: Optional[torch.Tensor] =...
  class VisionTransformer (line 520) | class VisionTransformer(nn.Module):
    method __init__ (line 521) | def __init__(
    method lock (line 567) | def lock(self, unlocked_groups=0, freeze_bn_stats=False):
    method get_num_layers (line 600) | def get_num_layers(self):
    method set_grad_checkpointing (line 604) | def set_grad_checkpointing(self, enable=True):
    method no_weight_decay (line 608) | def no_weight_decay(self):
    method forward (line 611) | def forward(self, x: torch.Tensor, return_all_features: bool=False):
  class TextTransformer (line 642) | class TextTransformer(nn.Module):
    method __init__ (line 643) | def __init__(
    method init_parameters (line 686) | def init_parameters(self):
    method set_grad_checkpointing (line 703) | def set_grad_checkpointing(self, enable=True):
    method no_weight_decay (line 707) | def no_weight_decay(self):
    method get_num_layers (line 711) | def get_num_layers(self):
    method build_attention_mask (line 714) | def build_attention_mask(self):
    method forward (line 722) | def forward(self, text, return_all_features: bool=False):

FILE: ape/modeling/text/eva02_clip/utils.py
  function resize_clip_pos_embed (line 13) | def resize_clip_pos_embed(state_dict, model, interpolation: str = 'bicub...
  function resize_visual_pos_embed (line 46) | def resize_visual_pos_embed(state_dict, model, interpolation: str = 'bic...
  function resize_evaclip_pos_embed (line 78) | def resize_evaclip_pos_embed(state_dict, model, interpolation: str = 'bi...
  function resize_eva_pos_embed (line 109) | def resize_eva_pos_embed(state_dict, model, interpolation: str = 'bicubi...
  function resize_rel_pos_embed (line 140) | def resize_rel_pos_embed(state_dict, model, interpolation: str = 'bicubi...
  function freeze_batch_norm_2d (line 237) | def freeze_batch_norm_2d(module, module_match={}, name=''):
  function _ntuple (line 277) | def _ntuple(n):
  function is_logging (line 292) | def is_logging(args):
  class AllGather (line 304) | class AllGather(torch.autograd.Function):
    method forward (line 311) | def forward(ctx, tensor, rank, world_size):
    method backward (line 319) | def backward(ctx, grad_output):

FILE: ape/modeling/text/llama2_wrapper.py
  class Llama2 (line 28) | class Llama2(nn.Module):
    method __init__ (line 29) | def __init__(
    method forward_text (line 107) | def forward_text(self, text_list, cache=False):
    method device (line 153) | def device(self):

FILE: ape/modeling/text/t5_wrapper.py
  class T5_warpper (line 26) | class T5_warpper(nn.Module):
    method __init__ (line 27) | def __init__(
    method forward_text (line 70) | def forward_text(self, text_list, cache=False):
    method device (line 102) | def device(self):

FILE: ape/modeling/text/text_encoder.py
  class TextModel (line 12) | class TextModel(nn.Module):
    method __init__ (line 13) | def __init__(
    method forward_text (line 33) | def forward_text(self, text, prompt="a "):

FILE: ape/modeling/text/utils.py
  function clean_name (line 4) | def clean_name(name):
  function reduce_language_feature (line 11) | def reduce_language_feature(features, mask, reduce_type="average"):

FILE: ape/utils/box_ops.py
  function box_cxcywh_to_xyxy (line 18) | def box_cxcywh_to_xyxy(x):
  function box_xyxy_to_cxcywh (line 24) | def box_xyxy_to_cxcywh(x):
  function box_iou (line 31) | def box_iou(boxes1, boxes2):
  function generalized_box_iou (line 47) | def generalized_box_iou(boxes1, boxes2):
  function masks_to_boxes (line 71) | def masks_to_boxes(masks):

FILE: ape/utils/misc.py
  function _check_size_scale_factor (line 38) | def _check_size_scale_factor(dim, size, scale_factor):
  function _output_size (line 50) | def _output_size(dim, input, size, scale_factor):
  class SmoothedValue (line 70) | class SmoothedValue(object):
    method __init__ (line 75) | def __init__(self, window_size=20, fmt=None):
    method update (line 83) | def update(self, value, n=1):
    method synchronize_between_processes (line 88) | def synchronize_between_processes(self):
    method median (line 102) | def median(self):
    method avg (line 107) | def avg(self):
    method global_avg (line 112) | def global_avg(self):
    method max (line 116) | def max(self):
    method value (line 120) | def value(self):
    method __str__ (line 123) | def __str__(self):
  function all_gather (line 133) | def all_gather(data):
  function reduce_dict (line 176) | def reduce_dict(input_dict, average=True):
  class MetricLogger (line 203) | class MetricLogger(object):
    method __init__ (line 204) | def __init__(self, delimiter="\t"):
    method update (line 208) | def update(self, **kwargs):
    method __getattr__ (line 215) | def __getattr__(self, attr):
    method __str__ (line 222) | def __str__(self):
    method synchronize_between_processes (line 228) | def synchronize_between_processes(self):
    method add_meter (line 232) | def add_meter(self, name, meter):
    method log_every (line 235) | def log_every(self, iterable, print_freq, header=None):
  function get_sha (line 309) | def get_sha():
  function collate_fn (line 330) | def collate_fn(batch):
  function _max_by_axis (line 336) | def _max_by_axis(the_list):
  function nested_tensor_from_tensor_list (line 345) | def nested_tensor_from_tensor_list(tensor_list: List[Tensor]):
  class NestedTensor (line 365) | class NestedTensor(object):
    method __init__ (line 366) | def __init__(self, tensors, mask: Optional[Tensor]):
    method to (line 370) | def to(self, device, non_blocking=False):
    method record_stream (line 381) | def record_stream(self, *args, **kwargs):
    method decompose (line 386) | def decompose(self):
    method __repr__ (line 389) | def __repr__(self):
  function setup_for_distributed (line 393) | def setup_for_distributed(is_master):
  function is_dist_avail_and_initialized (line 409) | def is_dist_avail_and_initialized():
  function get_world_size (line 417) | def get_world_size():
  function get_rank (line 423) | def get_rank():
  function get_local_size (line 429) | def get_local_size():
  function get_local_rank (line 435) | def get_local_rank():
  function is_main_process (line 441) | def is_main_process():
  function save_on_master (line 445) | def save_on_master(*args, **kwargs):
  function init_distributed_mode (line 450) | def init_distributed_mode(args):
  function accuracy (line 494) | def accuracy(output, target, topk=(1,)):
  function interpolate (line 512) | def interpolate(input, size=None, scale_factor=None, mode="nearest", ali...
  function get_total_grad_norm (line 532) | def get_total_grad_norm(parameters, norm_type=2):
  function inverse_sigmoid (line 543) | def inverse_sigmoid(x, eps=1e-5):

FILE: ape/utils/plot_utils.py
  function plot_logs (line 22) | def plot_logs(
  function plot_precision_recall (line 88) | def plot_precision_recall(files, naming_scheme="iter"):

FILE: configs/common/data/roboflow100_instance_lsj1024.py
  function _get_builtin_metadata (line 31) | def _get_builtin_metadata(name):

FILE: datasets/prepare_ade20k_full_sem_seg.py
  function loadAde20K (line 932) | def loadAde20K(file):

FILE: datasets/prepare_coco_semantic_annos_from_panoptic_annos.py
  function _process_panoptic_to_semantic (line 18) | def _process_panoptic_to_semantic(input_panoptic, output_semantic, segme...
  function separate_coco_semantic_from_panoptic (line 29) | def separate_coco_semantic_from_panoptic(panoptic_json, panoptic_root, s...

FILE: datasets/prepare_pascal_context.py
  function convert_pc59 (line 13) | def convert_pc59(mask_path, new_mask_path, pc59_dict):
  function convert_pc459 (line 25) | def convert_pc459(mask_path, new_mask_path):

FILE: datasets/prepare_voc_sem_seg.py
  function convert_to_trainID (line 39) | def convert_to_trainID(

FILE: demo/app.py
  function setup_model (line 338) | def setup_model(name):
  function run_on_image_A (line 357) | def run_on_image_A(input_image_path, input_text, score_threshold, output...
  function run_on_image_C (line 374) | def run_on_image_C(input_image_path, input_text, score_threshold, output...
  function run_on_image_D (line 391) | def run_on_image_D(input_image_path, input_text, score_threshold, output...
  function run_on_image_comparison (line 408) | def run_on_image_comparison(input_image_path, input_text, score_threshol...
  function run_on_image (line 431) | def run_on_image(
  function load_APE_A (line 528) | def load_APE_A():
  function load_APE_B (line 575) | def load_APE_B():
  function load_APE_C (line 623) | def load_APE_C():
  function load_APE_D (line 671) | def load_APE_D():
  function APE_A_tab (line 716) | def APE_A_tab():
  function APE_C_tab (line 766) | def APE_C_tab():
  function APE_D_tab (line 816) | def APE_D_tab():
  function comparison_tab (line 865) | def comparison_tab():
  function is_port_in_use (line 917) | def is_port_in_use(port: int) -> bool:
  function add_head_info (line 924) | def add_head_info(max_available_memory):
  function add_tail_info (line 946) | def add_tail_info():

FILE: demo/demo_lazy.py
  function setup_cfg (line 29) | def setup_cfg(args):
  function get_parser (line 60) | def get_parser():
  function test_opencv_video_format (line 104) | def test_opencv_video_format(codec, file_ext):

FILE: demo/predictor_lazy.py
  function filter_instances (line 20) | def filter_instances(instances, metadata):
  function cuda_grabcut (line 40) | def cuda_grabcut(img, masks, iter=5, gamma=50, iou_threshold=0.75):
  function opencv_grabcut (line 87) | def opencv_grabcut(img, masks, iter=5):
  class VisualizationDemo (line 128) | class VisualizationDemo(object):
    method __init__ (line 129) | def __init__(self, cfg, instance_mode=ColorMode.IMAGE, parallel=False,...
    method run_on_image (line 181) | def run_on_image(
    method _frame_from_video (line 266) | def _frame_from_video(self, video):
    method run_on_video (line 274) | def run_on_video(self, video):
  class AsyncPredictor (line 341) | class AsyncPredictor:
    class _StopToken (line 348) | class _StopToken:
    class _PredictWorker (line 351) | class _PredictWorker(mp.Process):
      method __init__ (line 352) | def __init__(self, cfg, task_queue, result_queue):
      method run (line 358) | def run(self):
    method __init__ (line 369) | def __init__(self, cfg, num_gpus: int = 1):
    method put (line 396) | def put(self, image):
    method get (line 400) | def get(self):
    method __len__ (line 416) | def __len__(self):
    method __call__ (line 419) | def __call__(self, image):
    method shutdown (line 423) | def shutdown(self):
    method default_buffer_size (line 428) | def default_buffer_size(self):

FILE: setup.py
  function get_version (line 18) | def get_version():
  function get_extensions (line 41) | def get_extensions():
  function get_model_zoo_configs (line 111) | def get_model_zoo_configs() -> List[str]:

FILE: tools/analyze_model.py
  function setup (line 25) | def setup(args):
  function do_flop (line 42) | def do_flop(cfg):
  function do_activation (line 73) | def do_activation(cfg):
  function do_parameter (line 102) | def do_parameter(cfg):
  function do_structure (line 110) | def do_structure(cfg):

FILE: tools/eva_interpolate_patch_14to16.py
  function interpolate_pos_embed (line 19) | def interpolate_pos_embed(checkpoint_model, new_size=16, image_size=224):

FILE: tools/train_net.py
  class Trainer (line 52) | class Trainer(SimpleTrainer):
    method __init__ (line 57) | def __init__(
    method run_step (line 112) | def run_step(self):
    method run_step_accumulate (line 199) | def run_step_accumulate(self):
    method run_step_accumulate_iter_loop (line 292) | def run_step_accumulate_iter_loop(self):
    method clip_grads (line 385) | def clip_grads(self, params):
    method state_dict (line 393) | def state_dict(self):
    method load_state_dict (line 399) | def load_state_dict(self, state_dict):
    method _data_loader_iter (line 405) | def _data_loader_iter(self):
  function do_test (line 423) | def do_test(cfg, model, eval_only=False):
  function do_train (line 514) | def do_train(args, cfg):
  function main (line 610) | def main(args):

FILE: tools/train_net_fsdp.py
  class Trainer (line 56) | class Trainer(SimpleTrainer):
    method __init__ (line 61) | def __init__(
    method run_step (line 122) | def run_step(self):
    method run_step_accumulate (line 211) | def run_step_accumulate(self):
    method run_step_accumulate_iter_loop (line 304) | def run_step_accumulate_iter_loop(self):
    method clip_grads (line 397) | def clip_grads(self, params):
    method state_dict (line 406) | def state_dict(self):
    method load_state_dict (line 412) | def load_state_dict(self, state_dict):
    method _data_loader_iter (line 418) | def _data_loader_iter(self):
  function do_test (line 436) | def do_test(cfg, model, eval_only=False):
  function do_train (line 545) | def do_train(args, cfg):
  function main (line 648) | def main(args):

FILE: tools/visualize_json_results.py
  function create_instances (line 18) | def create_instances(predictions, image_size):
  function dataset_id_map (line 63) | def dataset_id_map(ds_id):
  function dataset_id_map (line 68) | def dataset_id_map(ds_id):

Download .json

Condensed preview — 501 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (6,683K chars).

[
  {
    "path": ".gitignore",
    "chars": 549,
    "preview": "# output dir\noutput\ninstant_test_output\ninference_test_output\n\n\n*.png\n*.json\n*.diff\n*.jpg\n!/projects/DensePose/doc/image"
  },
  {
    "path": "LICENSE",
    "chars": 11357,
    "preview": "                                 Apache License\n                           Version 2.0, January 2004\n                   "
  },
  {
    "path": "README.md",
    "chars": 14118,
    "preview": "# APE: Aligning and Prompting Everything All at Once for Universal Visual Perception\n\n\n<!-- \n<a href='https://github.com"
  },
  {
    "path": "ape/__init__.py",
    "chars": 163,
    "preview": "from .data import *\n\n# This line will be programatically read/write by setup.py.\n# Leave them at the bottom of this file"
  },
  {
    "path": "ape/checkpoint/__init__.py",
    "chars": 179,
    "preview": "# -*- coding: utf-8 -*-\n\n\nfrom .detection_checkpoint import DetectionCheckpointer\nfrom .detection_checkpoint import FSDP"
  },
  {
    "path": "ape/checkpoint/detection_checkpoint.py",
    "chars": 3258,
    "preview": "# Copyright (c) Facebook, Inc. and its affiliates.\nimport logging\nimport os\nimport pickle\nfrom typing import IO, Any, Di"
  },
  {
    "path": "ape/data/__init__.py",
    "chars": 970,
    "preview": "from . import datasets\nfrom .build_copypaste import (\n    build_detection_train_loader_copypaste,\n    get_detection_data"
  },
  {
    "path": "ape/data/build.py",
    "chars": 4891,
    "preview": "# Copyright (c) Facebook, Inc. and its affiliates.\nimport itertools\nimport logging\nimport numpy as np\nimport operator\nim"
  },
  {
    "path": "ape/data/build_copypaste.py",
    "chars": 10531,
    "preview": "# Copyright (c) Facebook, Inc. and its affiliates.\nimport itertools\nimport logging\n\nimport torch.utils.data as torchdata"
  },
  {
    "path": "ape/data/build_multi_dataset.py",
    "chars": 29538,
    "preview": "# Copyright (c) Facebook, Inc. and its affiliates.\nimport itertools\nimport logging\nimport operator\nimport time\nfrom coll"
  },
  {
    "path": "ape/data/build_multi_dataset_copypaste.py",
    "chars": 32496,
    "preview": "# Copyright (c) Facebook, Inc. and its affiliates.\nimport itertools\nimport logging\nimport operator\nimport time\nfrom coll"
  },
  {
    "path": "ape/data/common_copypaste.py",
    "chars": 2966,
    "preview": "# Copyright (c) Facebook, Inc. and its affiliates.\nimport logging\nimport random\n\nimport numpy as np\nimport torch.utils.d"
  },
  {
    "path": "ape/data/dataset_mapper.py",
    "chars": 1501,
    "preview": "# Copyright (c) Facebook, Inc. and its affiliates.\nimport logging\n\nfrom detectron2.data import detection_utils as utils\n"
  },
  {
    "path": "ape/data/dataset_mapper_copypaste.py",
    "chars": 21252,
    "preview": "import copy\nimport logging\nimport os\nimport random\nfrom typing import List, Optional, Union\n\nimport cv2\nimport numpy as "
  },
  {
    "path": "ape/data/dataset_mapper_detr_instance.py",
    "chars": 12379,
    "preview": "import copy\nimport logging\nfrom typing import List, Optional, Union\n\nimport numpy as np\nimport torch\n\nfrom detectron2.co"
  },
  {
    "path": "ape/data/dataset_mapper_detr_instance_exp.py",
    "chars": 10082,
    "preview": "import copy\nimport logging\nfrom typing import List, Optional, Union\n\nimport numpy as np\nimport torch\n\nfrom detectron2.co"
  },
  {
    "path": "ape/data/dataset_mapper_detr_panoptic.py",
    "chars": 19478,
    "preview": "import copy\nimport logging\nimport re\nfrom typing import List, Optional, Union\n\nimport numpy as np\nimport torch\n\nfrom det"
  },
  {
    "path": "ape/data/dataset_mapper_detr_panoptic_copypaste.py",
    "chars": 28387,
    "preview": "import copy\nimport logging\nimport os\nimport random\nfrom typing import List, Optional, Union\n\nimport cv2\nimport numpy as "
  },
  {
    "path": "ape/data/dataset_mapper_detr_semantic.py",
    "chars": 10565,
    "preview": "import copy\nimport logging\nfrom typing import List, Optional, Union\n\nimport cv2\nimport numpy as np\nimport torch\n\nfrom de"
  },
  {
    "path": "ape/data/datasets/__init__.py",
    "chars": 836,
    "preview": "from . import d_cube as _d_cube\nfrom . import flickr30k as _flickr30k\nfrom . import gqa as _gqa\nfrom . import grit as _g"
  },
  {
    "path": "ape/data/datasets/coco.py",
    "chars": 16464,
    "preview": "import contextlib\nimport io\nimport logging\nimport os\n\nimport pycocotools.mask as mask_util\n\nfrom detectron2.data import "
  },
  {
    "path": "ape/data/datasets/d_cube.py",
    "chars": 11117,
    "preview": "import logging\nimport os\n\nimport pycocotools.mask as mask_util\n\nfrom detectron2.data import DatasetCatalog, MetadataCata"
  },
  {
    "path": "ape/data/datasets/flickr30k.py",
    "chars": 2147,
    "preview": "import logging\nimport os\n\nfrom .coco import custom_register_coco_instances\n\nlogger = logging.getLogger(__name__)\n\n\ndef _"
  },
  {
    "path": "ape/data/datasets/gqa.py",
    "chars": 1740,
    "preview": "import logging\nimport os\n\nfrom .coco import custom_register_coco_instances\n\nlogger = logging.getLogger(__name__)\n\n\ndef _"
  },
  {
    "path": "ape/data/datasets/grit.py",
    "chars": 2556,
    "preview": "import os\n\nfrom .coco import custom_register_coco_instances\n\nGRIT_CATEGORIES = [\n    {\"id\": 0, \"name\": \"object\"},\n]\n\n\nde"
  },
  {
    "path": "ape/data/datasets/inst_categories.py",
    "chars": 31434,
    "preview": "categories = {\n    \"coco\": [\n        {\"color\": [220, 20, 60], \"isthing\": 1, \"id\": 1, \"name\": \"person\"},\n        {\"color\""
  },
  {
    "path": "ape/data/datasets/lvis_coco.py",
    "chars": 11806,
    "preview": "import logging\nimport os\n\nimport pycocotools.mask as mask_util\n\nfrom detectron2.data import DatasetCatalog, MetadataCata"
  },
  {
    "path": "ape/data/datasets/lvis_coco_panoptic.py",
    "chars": 7556,
    "preview": "# Copyright (c) Facebook, Inc. and its affiliates.\nimport copy\nimport json\nimport os\n\nfrom detectron2.data import Datase"
  },
  {
    "path": "ape/data/datasets/lvis_v1_coco_category_image_count.py",
    "chars": 39437,
    "preview": "# Copyright (c) Facebook, Inc. and its affiliates.\n# Autogen with\n# with open(\"lvis_v1_train.json\", \"r\") as f:\n#     a ="
  },
  {
    "path": "ape/data/datasets/objects365.py",
    "chars": 29077,
    "preview": "import os\n\nfrom detectron2.data.datasets.register_coco import register_coco_instances\n\nOBJECTS365_CATEGORIES_FIXNAME = ["
  },
  {
    "path": "ape/data/datasets/odinw_categories.py",
    "chars": 21644,
    "preview": "ODINW_CATEGORIES = {\n    \"AerialMaritimeDrone\": [\n        {\"id\": 1, \"name\": \"boat\", \"supercategory\": \"movable-objects\"},"
  },
  {
    "path": "ape/data/datasets/odinw_instance.py",
    "chars": 38303,
    "preview": "import contextlib\nimport io\nimport logging\nimport os\n\nimport pycocotools.mask as mask_util\n\nfrom detectron2.data import "
  },
  {
    "path": "ape/data/datasets/odinw_prompts.py",
    "chars": 2611,
    "preview": "ODINW_PROMPTS = {\n    \"AerialMaritimeDrone\": lambda name: \"a ship\" if name == \"boat\" else name,\n    \"AmericanSignLanguag"
  },
  {
    "path": "ape/data/datasets/oid.py",
    "chars": 96253,
    "preview": "import contextlib\nimport io\nimport logging\nimport os\n\nfrom detectron2.data import DatasetCatalog, MetadataCatalog\n\nfrom "
  },
  {
    "path": "ape/data/datasets/openimages_v6_category_image_count.py",
    "chars": 62966,
    "preview": "OPENIMAGES_v6_CATEGORY_IMAGE_COUNT = [{\"id\": 1, \"name\": \"Tortoise\", \"freebase_id\": \"/m/011k07\", \"image_count\": 1151, \"in"
  },
  {
    "path": "ape/data/datasets/pascal_voc_external.py",
    "chars": 68503,
    "preview": "import os\n\nfrom detectron2.data import DatasetCatalog, MetadataCatalog\nfrom detectron2.data.datasets import load_sem_seg"
  },
  {
    "path": "ape/data/datasets/phrasecut.py",
    "chars": 1817,
    "preview": "import logging\nimport os\n\nfrom .coco import custom_register_coco_instances\n\nlogger = logging.getLogger(__name__)\n\n\ndef _"
  },
  {
    "path": "ape/data/datasets/refcoco.py",
    "chars": 14569,
    "preview": "import contextlib\nimport io\nimport logging\nimport os\n\nimport numpy as np\nimport pycocotools.mask as mask_util\nfrom PIL i"
  },
  {
    "path": "ape/data/datasets/register_bdd100k_panoseg.py",
    "chars": 12283,
    "preview": "# --------------------------------------------------------\n# X-Decoder -- Generalized Decoding for Pixel, Image, and Lan"
  },
  {
    "path": "ape/data/datasets/register_bdd100k_semseg.py",
    "chars": 2775,
    "preview": "# --------------------------------------------------------\n# X-Decoder -- Generalized Decoding for Pixel, Image, and Lan"
  },
  {
    "path": "ape/data/datasets/register_pascal_context.py",
    "chars": 9975,
    "preview": "# Copyright (c) Facebook, Inc. and its affiliates.\nimport os\n\nfrom detectron2.data import DatasetCatalog, MetadataCatalo"
  },
  {
    "path": "ape/data/datasets/register_voc_seg.py",
    "chars": 1527,
    "preview": "# Copyright (c) Facebook, Inc. and its affiliates.\nimport os\n\nfrom detectron2.data import DatasetCatalog, MetadataCatalo"
  },
  {
    "path": "ape/data/datasets/sa1b.py",
    "chars": 1550,
    "preview": "import os\n\nfrom detectron2.data.datasets.register_coco import register_coco_instances\n\nSA1B_CATEGORIES = [\n    {\"id\": 1,"
  },
  {
    "path": "ape/data/datasets/seginw_categories.py",
    "chars": 2608,
    "preview": "SEGINW_CATEGORIES = {\n    \"seginw_Helmet-Head\": [\"Helmet\"],\n    \"seginw_Line-Contour\": [\"line-structure\"],\n    \"seginw_E"
  },
  {
    "path": "ape/data/datasets/seginw_instance.py",
    "chars": 3929,
    "preview": "# Copyright (c) Facebook, Inc. and its affiliates.\nimport collections\nimport json\nimport os\n\nfrom detectron2.data import"
  },
  {
    "path": "ape/data/datasets/visualgenome.py",
    "chars": 7601,
    "preview": "import logging\nimport os\n\nfrom .coco import custom_register_coco_instances\nfrom .visualgenome_categories import (\n    VI"
  },
  {
    "path": "ape/data/datasets/visualgenome_categories.py",
    "chars": 3029181,
    "preview": "# fmt: off\n\nVISUALGENOME_150_CATEGORIES = [{\"id\": 1, \"name\": \"airplane\"}, {\"id\": 2, \"name\": \"animal\"}, {\"id\": 3, \"name\":"
  },
  {
    "path": "ape/data/detection_utils.py",
    "chars": 7994,
    "preview": "# -*- coding: utf-8 -*-\n# Copyright (c) Facebook, Inc. and its affiliates.\n\n\"\"\"\nCommon data processing utilities that ar"
  },
  {
    "path": "ape/data/mapper_utils.py",
    "chars": 17579,
    "preview": "# -*- coding: utf-8 -*-\nimport copy\nimport json\nimport logging\nimport os\nimport random\nimport re\n\nimport cv2\nimport nump"
  },
  {
    "path": "ape/data/samplers/__init__.py",
    "chars": 167,
    "preview": "from .distributed_sampler_multi_dataset import MultiDatasetTrainingSampler, InferenceSampler\n\n__all__ = [\n    \"MultiData"
  },
  {
    "path": "ape/data/samplers/distributed_sampler_multi_dataset.py",
    "chars": 6728,
    "preview": "# Copyright (c) Facebook, Inc. and its affiliates.\nimport itertools\nimport logging\nimport math\nfrom collections import d"
  },
  {
    "path": "ape/data/transforms/__init__.py",
    "chars": 180,
    "preview": "# Copyright (c) Facebook, Inc. and its affiliates.\nfrom .augmentation_aa import *\nfrom .augmentation_lsj import *\n\n__all"
  },
  {
    "path": "ape/data/transforms/augmentation_aa.py",
    "chars": 1300,
    "preview": "from detectron2.data import transforms as T\nfrom fvcore.transforms.transform import Transform, TransformList\n\n\nclass Aut"
  },
  {
    "path": "ape/data/transforms/augmentation_lsj.py",
    "chars": 1211,
    "preview": "from detectron2.data import transforms as T\nfrom fvcore.transforms.transform import Transform, TransformList\n\n\nclass Lar"
  },
  {
    "path": "ape/engine/__init__.py",
    "chars": 116,
    "preview": "from .defaults import *\nfrom .train_loop import *\n\n__all__ = [k for k in globals().keys() if not k.startswith(\"_\")]\n"
  },
  {
    "path": "ape/engine/defaults.py",
    "chars": 8308,
    "preview": "# -*- coding: utf-8 -*-\n# Copyright (c) Facebook, Inc. and its affiliates.\n\n\"\"\"\nThis file contains components with some "
  },
  {
    "path": "ape/engine/train_loop.py",
    "chars": 15672,
    "preview": "# -*- coding: utf-8 -*-\n# Copyright (c) Facebook, Inc. and its affiliates.\nimport concurrent.futures\nimport logging\nimpo"
  },
  {
    "path": "ape/evaluation/__init__.py",
    "chars": 329,
    "preview": "from .d3_evaluation import D3Evaluator\nfrom .evaluator import inference_on_dataset\nfrom .instance_evaluation import Inst"
  },
  {
    "path": "ape/evaluation/d3_evaluation.py",
    "chars": 31840,
    "preview": "# Copyright (c) Facebook, Inc. and its affiliates.\nimport contextlib\nimport copy\nimport io\nimport itertools\nimport json\n"
  },
  {
    "path": "ape/evaluation/evaluator.py",
    "chars": 7881,
    "preview": "# Copyright (c) Facebook, Inc. and its affiliates.\nimport datetime\nimport logging\nimport time\nfrom collections import ab"
  },
  {
    "path": "ape/evaluation/instance_evaluation.py",
    "chars": 4712,
    "preview": "# Copyright (c) Facebook, Inc. and its affiliates.\nimport contextlib\nimport copy\nimport io\nimport itertools\nimport json\n"
  },
  {
    "path": "ape/evaluation/lvis_evaluation.py",
    "chars": 18057,
    "preview": "# Copyright (c) Facebook, Inc. and its affiliates.\nimport copy\nimport itertools\nimport json\nimport logging\nimport os\nimp"
  },
  {
    "path": "ape/evaluation/multi_dataset_evaluator.py",
    "chars": 16725,
    "preview": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\n# Modified by Xingyi Zhou\nimport copy\nimport glob"
  },
  {
    "path": "ape/evaluation/oideval.py",
    "chars": 35764,
    "preview": "# Part of the code is from https://github.com/tensorflow/models/blob/master/research/object_detection/metrics/oid_challe"
  },
  {
    "path": "ape/evaluation/refcoco_evaluation.py",
    "chars": 31594,
    "preview": "# Copyright (c) Facebook, Inc. and its affiliates.\nimport contextlib\nimport copy\nimport io\nimport itertools\nimport json\n"
  },
  {
    "path": "ape/evaluation/refcocoeval.py",
    "chars": 24932,
    "preview": "__author__ = \"tsungyi\"\n\nimport copy\nimport datetime\nimport time\nfrom collections import defaultdict\n\nimport numpy as np\n"
  },
  {
    "path": "ape/layers/__init__.py",
    "chars": 347,
    "preview": "from .fuse_helper import BiAttentionBlock, BiMultiHeadAttention\nfrom .multi_scale_deform_attn import (\n    MultiScaleDef"
  },
  {
    "path": "ape/layers/csrc/MsDeformAttn/ms_deform_attn.h",
    "chars": 1872,
    "preview": "/*!\n**************************************************************************************************\n* Deformable DETR"
  },
  {
    "path": "ape/layers/csrc/MsDeformAttn/ms_deform_attn_cpu.cpp",
    "chars": 1256,
    "preview": "/*!\n**************************************************************************************************\n* Deformable DETR"
  },
  {
    "path": "ape/layers/csrc/MsDeformAttn/ms_deform_attn_cpu.h",
    "chars": 1174,
    "preview": "/*!\n**************************************************************************************************\n* Deformable DETR"
  },
  {
    "path": "ape/layers/csrc/MsDeformAttn/ms_deform_attn_cuda.cu",
    "chars": 7373,
    "preview": "/*!\n**************************************************************************************************\n* Deformable DETR"
  },
  {
    "path": "ape/layers/csrc/MsDeformAttn/ms_deform_attn_cuda.h",
    "chars": 1175,
    "preview": "/*!\n**************************************************************************************************\n* Deformable DETR"
  },
  {
    "path": "ape/layers/csrc/MsDeformAttn/ms_deform_im2col_cuda.cuh",
    "chars": 54694,
    "preview": "/*!\n**************************************************************************\n* Deformable DETR\n* Copyright (c) 2020 Se"
  },
  {
    "path": "ape/layers/csrc/cuda_version.cu",
    "chars": 120,
    "preview": "#include <cuda_runtime_api.h>\n\nnamespace ape {\nint get_cudart_version() {\n  return CUDART_VERSION;\n}\n} // namespace ape\n"
  },
  {
    "path": "ape/layers/csrc/vision.cpp",
    "chars": 1729,
    "preview": "// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\n\n#include <torch/extension.h>\n#include \"MsDeform"
  },
  {
    "path": "ape/layers/fuse_helper.py",
    "chars": 9958,
    "preview": "import torch\r\nimport torch.nn as nn\r\nimport torch.nn.functional as F\r\n\r\nfrom timm.models.layers import DropPath\r\n\r\n\r\ncla"
  },
  {
    "path": "ape/layers/multi_scale_deform_attn.py",
    "chars": 16125,
    "preview": "# coding=utf-8\n# ------------------------------------------------------------------------------------------------\n# Defo"
  },
  {
    "path": "ape/layers/vision_language_align.py",
    "chars": 2438,
    "preview": "import math\n\nimport torch\nimport torch.nn.functional as F\nfrom torch import nn\n\n\nclass VisionLanguageAlign(nn.Module):\n "
  },
  {
    "path": "ape/layers/vision_language_fusion.py",
    "chars": 1619,
    "preview": "import torch\nimport torch.utils.checkpoint as checkpoint\n\nfrom .fuse_helper import BiAttentionBlock\n\n\nclass VisionLangua"
  },
  {
    "path": "ape/layers/zero_shot_fc.py",
    "chars": 5603,
    "preview": "import logging\nimport math\n\nimport numpy as np\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\nl"
  },
  {
    "path": "ape/model_zoo/__init__.py",
    "chars": 449,
    "preview": "# Copyright (c) Facebook, Inc. and its affiliates.\n\"\"\"\nModel Zoo API for Detectron2: a collection of functions to create"
  },
  {
    "path": "ape/model_zoo/model_zoo.py",
    "chars": 11257,
    "preview": "# Copyright (c) Facebook, Inc. and its affiliates.\nimport os\nfrom typing import Optional\n\nimport pkg_resources\nimport to"
  },
  {
    "path": "ape/modeling/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "ape/modeling/ape_deta/__init__.py",
    "chars": 597,
    "preview": "from .ape_deta import SomeThing\nfrom .assigner import Stage1Assigner, Stage2Assigner\nfrom .deformable_criterion import D"
  },
  {
    "path": "ape/modeling/ape_deta/ape_deta.py",
    "chars": 1321,
    "preview": "import copy\nimport math\nfrom typing import Dict, List, Optional, Tuple\n\nimport torch\nimport torch.nn as nn\nimport torch."
  },
  {
    "path": "ape/modeling/ape_deta/assigner.py",
    "chars": 15358,
    "preview": "from typing import List\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom detrex.layers import b"
  },
  {
    "path": "ape/modeling/ape_deta/deformable_criterion.py",
    "chars": 24612,
    "preview": "import copy\nimport logging\nfrom typing import Callable, List, Optional\n\nimport torch\nimport torch.nn.functional as F\n\nfr"
  },
  {
    "path": "ape/modeling/ape_deta/deformable_detr.py",
    "chars": 24482,
    "preview": "import copy\nimport logging\nimport math\nfrom typing import Dict, List, Optional, Tuple\n\nimport torch\nimport torch.nn as n"
  },
  {
    "path": "ape/modeling/ape_deta/deformable_detr_segm.py",
    "chars": 66316,
    "preview": "import copy\nimport math\nimport os\nimport time\nfrom typing import Dict, List, Optional, Tuple\n\nimport cv2\nimport numpy as"
  },
  {
    "path": "ape/modeling/ape_deta/deformable_detr_segm_vl.py",
    "chars": 52267,
    "preview": "import copy\nimport random\nimport math\nimport os\nimport time\nfrom typing import Dict, List, Optional, Tuple\n\nimport cv2\ni"
  },
  {
    "path": "ape/modeling/ape_deta/deformable_transformer.py",
    "chars": 25716,
    "preview": "import math\n\nimport torch\nimport torch.nn as nn\nimport torch.utils.checkpoint as checkpoint\n\nfrom ape.layers import Mult"
  },
  {
    "path": "ape/modeling/ape_deta/deformable_transformer_vl.py",
    "chars": 27641,
    "preview": "import copy\nimport math\n\nimport torch\nimport torch.nn as nn\nimport torch.utils.checkpoint as checkpoint\n\nfrom ape.layers"
  },
  {
    "path": "ape/modeling/ape_deta/fast_rcnn.py",
    "chars": 7313,
    "preview": "import warnings\nfrom typing import List, Tuple\n\nimport torch\n\nfrom detectron2.layers import batched_nms\nfrom detectron2."
  },
  {
    "path": "ape/modeling/ape_deta/misc.py",
    "chars": 14576,
    "preview": "\"\"\"\nMisc functions, including distributed helpers.\n\nMostly copy-paste from torchvision references.\n\"\"\"\nimport datetime\ni"
  },
  {
    "path": "ape/modeling/ape_deta/segmentation.py",
    "chars": 15013,
    "preview": "\"\"\"\nThis file provides the definition of the convolutional heads used to predict masks, as well as the losses\n\"\"\"\nimport"
  },
  {
    "path": "ape/modeling/backbone/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "ape/modeling/backbone/utils_eva.py",
    "chars": 7906,
    "preview": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\nimport math\nimport numpy as np\nfrom scipy import "
  },
  {
    "path": "ape/modeling/backbone/utils_eva02.py",
    "chars": 12019,
    "preview": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\nimport math\nimport numpy as np\nfrom scipy import "
  },
  {
    "path": "ape/modeling/backbone/vit.py",
    "chars": 1144,
    "preview": "import logging\n\nlogger = logging.getLogger(__name__)\n\n\n__all__ = [\"get_vit_lr_decay_rate\"]\n\ndef get_vit_lr_decay_rate(na"
  },
  {
    "path": "ape/modeling/backbone/vit_eva.py",
    "chars": 24942,
    "preview": "import logging\nimport math\nimport fvcore.nn.weight_init as weight_init\nimport torch\nimport torch.nn as nn\nimport torch.n"
  },
  {
    "path": "ape/modeling/backbone/vit_eva02.py",
    "chars": 29253,
    "preview": "import logging\r\nimport math\r\nfrom functools import partial\r\nfrom typing import Dict, Optional, Sequence, Tuple, Union\r\n\r"
  },
  {
    "path": "ape/modeling/backbone/vit_eva_clip.py",
    "chars": 35648,
    "preview": "import logging\nimport math\nfrom functools import partial\n\nimport fvcore.nn.weight_init as weight_init\nimport torch\nimpor"
  },
  {
    "path": "ape/modeling/deta/__init__.py",
    "chars": 351,
    "preview": "from .assigner import Stage1Assigner, Stage2Assigner\nfrom .deformable_criterion import DeformableCriterion\nfrom .deforma"
  },
  {
    "path": "ape/modeling/deta/assigner.py",
    "chars": 15125,
    "preview": "from typing import List\n\nimport torch\nimport torch.nn as nn\n\nfrom ape.utils.box_ops import box_cxcywh_to_xyxy, box_iou, "
  },
  {
    "path": "ape/modeling/deta/deformable_criterion.py",
    "chars": 21732,
    "preview": "import copy\nimport logging\nfrom typing import Callable, List, Optional\n\nimport torch\nimport torch.nn.functional as F\n\nfr"
  },
  {
    "path": "ape/modeling/deta/deformable_detr.py",
    "chars": 18278,
    "preview": "import copy\nimport math\nfrom typing import Dict, List, Optional, Tuple\n\nimport torch\nimport torch.nn as nn\nimport torch."
  },
  {
    "path": "ape/modeling/deta/deformable_detr_segm.py",
    "chars": 37273,
    "preview": "import copy\nimport math\nimport os\nfrom typing import Dict, List, Optional, Tuple\n\nimport cv2\nimport numpy as np\nimport t"
  },
  {
    "path": "ape/modeling/deta/deformable_transformer.py",
    "chars": 20711,
    "preview": "import math\n\nimport torch\nimport torch.nn as nn\n\nfrom ape.layers import MultiScaleDeformableAttention\nfrom detrex.layers"
  },
  {
    "path": "ape/modeling/deta/misc.py",
    "chars": 14576,
    "preview": "\"\"\"\nMisc functions, including distributed helpers.\n\nMostly copy-paste from torchvision references.\n\"\"\"\nimport datetime\ni"
  },
  {
    "path": "ape/modeling/deta/segmentation.py",
    "chars": 15013,
    "preview": "\"\"\"\nThis file provides the definition of the convolutional heads used to predict masks, as well as the losses\n\"\"\"\nimport"
  },
  {
    "path": "ape/modeling/text/__init__.py",
    "chars": 376,
    "preview": "from .bert_wrapper import Bert\nfrom .clip_wrapper import build_clip_text_encoder, get_clip_embeddings\nfrom .clip_wrapper"
  },
  {
    "path": "ape/modeling/text/bert_wrapper.py",
    "chars": 3183,
    "preview": "import torch\nfrom torch import nn\nfrom torch.cuda.amp import autocast\n\nfrom transformers import (\n    AutoConfig,\n    Au"
  },
  {
    "path": "ape/modeling/text/clip_wrapper.py",
    "chars": 7523,
    "preview": "import logging\nfrom collections import OrderedDict\nfrom typing import List, Union\n\nimport torch\nfrom torch import nn\n\nfr"
  },
  {
    "path": "ape/modeling/text/clip_wrapper_eva01.py",
    "chars": 4856,
    "preview": "import torch\nimport torch.nn as nn\nfrom torch.cuda.amp import autocast\n\nfrom clip import tokenize\n\nfrom .eva01_clip impo"
  },
  {
    "path": "ape/modeling/text/clip_wrapper_eva02.py",
    "chars": 5149,
    "preview": "import torch\nimport torch.nn as nn\nfrom torch.cuda.amp import autocast\n\nfrom .eva02_clip import create_model_and_transfo"
  },
  {
    "path": "ape/modeling/text/clip_wrapper_open.py",
    "chars": 1345,
    "preview": "import logging\nfrom collections import OrderedDict\nfrom typing import List, Union\n\nimport torch\nfrom torch import nn\n\nfr"
  },
  {
    "path": "ape/modeling/text/eva01_clip/README.md",
    "chars": 3490,
    "preview": "# Contrastive Language-Image Pre-Training with EVA (EVA-CLIP)\n\n**Table of Contents**\n\n- [Contrastive Language-Image Pre-"
  },
  {
    "path": "ape/modeling/text/eva01_clip/__init__.py",
    "chars": 186,
    "preview": "# from .clip import *\n# from .eva_clip import *\n# from .model import *\n# from .simple_tokenizer import *\n# from .vit_mod"
  },
  {
    "path": "ape/modeling/text/eva01_clip/clip.py",
    "chars": 8994,
    "preview": "import hashlib\nimport os\nimport urllib\nimport warnings\nfrom typing import Any, Union, List\nfrom pkg_resources import pac"
  },
  {
    "path": "ape/modeling/text/eva01_clip/eva_clip.py",
    "chars": 5792,
    "preview": "import json\nimport logging\nimport os\nimport pathlib\nimport re\nfrom copy import deepcopy\nfrom pathlib import Path\n# from "
  },
  {
    "path": "ape/modeling/text/eva01_clip/eva_model.py",
    "chars": 13627,
    "preview": "\"\"\" CLIP Model\n\nAdapted from https://github.com/mlfoundations/open_clip\n\n\"\"\"\nimport math\nfrom dataclasses import datacla"
  },
  {
    "path": "ape/modeling/text/eva01_clip/model.py",
    "chars": 18939,
    "preview": "from collections import OrderedDict\nfrom typing import Tuple, Union\n\nimport numpy as np\nimport torch\nimport math\nimport "
  },
  {
    "path": "ape/modeling/text/eva01_clip/simple_tokenizer.py",
    "chars": 4628,
    "preview": "import gzip\nimport html\nimport os\nfrom functools import lru_cache\n\nimport ftfy\nimport regex as re\n\n\n@lru_cache()\ndef def"
  },
  {
    "path": "ape/modeling/text/eva01_clip/vit_model.py",
    "chars": 16728,
    "preview": "# --------------------------------------------------------\n# BEIT: BERT Pre-Training of Image Transformers (https://arxi"
  },
  {
    "path": "ape/modeling/text/eva02_clip/__init__.py",
    "chars": 880,
    "preview": "# from .constants import OPENAI_DATASET_MEAN, OPENAI_DATASET_STD\n# from .factory import create_model, create_model_and_t"
  },
  {
    "path": "ape/modeling/text/eva02_clip/constants.py",
    "chars": 116,
    "preview": "OPENAI_DATASET_MEAN = (0.48145466, 0.4578275, 0.40821073)\nOPENAI_DATASET_STD = (0.26862954, 0.26130258, 0.27577711)\n"
  },
  {
    "path": "ape/modeling/text/eva02_clip/eva_vit_model.py",
    "chars": 22392,
    "preview": "# --------------------------------------------------------\n# Adapted from  https://github.com/microsoft/unilm/tree/maste"
  },
  {
    "path": "ape/modeling/text/eva02_clip/factory.py",
    "chars": 18883,
    "preview": "import json\nimport logging\nimport os\nimport pathlib\nimport re\nfrom copy import deepcopy\nfrom pathlib import Path\nfrom ty"
  },
  {
    "path": "ape/modeling/text/eva02_clip/hf_configs.py",
    "chars": 1928,
    "preview": "# HF architecture dict:\narch_dict = {\n  # https://huggingface.co/docs/transformers/model_doc/roberta#roberta\n  \"roberta\""
  },
  {
    "path": "ape/modeling/text/eva02_clip/hf_model.py",
    "chars": 10571,
    "preview": "\"\"\" huggingface model adapter\n\nWraps HuggingFace transformers (https://github.com/huggingface/transformers) models for u"
  },
  {
    "path": "ape/modeling/text/eva02_clip/loss.py",
    "chars": 5862,
    "preview": "import math\nimport torch\nimport torch.nn as nn\nfrom torch.nn import functional as F\n\ntry:\n    import torch.distributed.n"
  },
  {
    "path": "ape/modeling/text/eva02_clip/model.py",
    "chars": 16745,
    "preview": "\"\"\" CLIP Model\n\nAdapted from https://github.com/openai/CLIP. Originally MIT License, Copyright (c) 2021 OpenAI.\n\"\"\"\nimpo"
  },
  {
    "path": "ape/modeling/text/eva02_clip/modified_resnet.py",
    "chars": 7026,
    "preview": "from collections import OrderedDict\n\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\nfrom .utils"
  },
  {
    "path": "ape/modeling/text/eva02_clip/openai.py",
    "chars": 5446,
    "preview": "\"\"\" OpenAI pretrained model functions\n\nAdapted from https://github.com/openai/CLIP. Originally MIT License, Copyright (c"
  },
  {
    "path": "ape/modeling/text/eva02_clip/pretrained.py",
    "chars": 11791,
    "preview": "import hashlib\nimport os\nimport urllib\nimport warnings\nfrom functools import partial\nfrom typing import Dict, Union\n\nfro"
  },
  {
    "path": "ape/modeling/text/eva02_clip/rope.py",
    "chars": 5368,
    "preview": "from math import pi\nimport torch\nfrom torch import nn\nfrom einops import rearrange, repeat\nimport logging\n\ndef broadcat("
  },
  {
    "path": "ape/modeling/text/eva02_clip/timm_model.py",
    "chars": 4910,
    "preview": "\"\"\" timm model adapter\n\nWraps timm (https://github.com/rwightman/pytorch-image-models) models for use as a vision tower "
  },
  {
    "path": "ape/modeling/text/eva02_clip/tokenizer.py",
    "chars": 7121,
    "preview": "\"\"\" CLIP tokenizer\n\nCopied from https://github.com/openai/CLIP. Originally MIT License, Copyright (c) 2021 OpenAI.\n\"\"\"\ni"
  },
  {
    "path": "ape/modeling/text/eva02_clip/transform.py",
    "chars": 3382,
    "preview": "from typing import Optional, Sequence, Tuple\n\nimport torch\nimport torch.nn as nn\nimport torchvision.transforms.functiona"
  },
  {
    "path": "ape/modeling/text/eva02_clip/transformer.py",
    "chars": 26732,
    "preview": "import os\nimport logging\nfrom collections import OrderedDict\nimport math\nfrom typing import Callable, Optional, Sequence"
  },
  {
    "path": "ape/modeling/text/eva02_clip/utils.py",
    "chars": 14654,
    "preview": "from itertools import repeat\nimport collections.abc\nimport logging\nimport math\nimport numpy as np\n\nimport torch\nfrom tor"
  },
  {
    "path": "ape/modeling/text/llama2_wrapper.py",
    "chars": 5368,
    "preview": "import copy\nimport logging\nimport math\nimport time\nfrom typing import Dict, List, Optional, Tuple\n\nimport torch\nimport t"
  },
  {
    "path": "ape/modeling/text/t5_wrapper.py",
    "chars": 3401,
    "preview": "import copy\nimport math\nimport time\nfrom typing import Dict, List, Optional, Tuple\n\nimport torch\nimport torch.nn as nn\ni"
  },
  {
    "path": "ape/modeling/text/text_encoder.py",
    "chars": 1098,
    "preview": "import logging\nfrom collections import OrderedDict\nfrom typing import List, Union\n\nimport torch\nfrom torch import nn\n\nfr"
  },
  {
    "path": "ape/modeling/text/utils.py",
    "chars": 1147,
    "preview": "import torch\n\n\ndef clean_name(name):\n    name = re.sub(r\"\\(.*\\)\", \"\", name)\n    name = re.sub(r\"_\", \" \", name)\n    name "
  },
  {
    "path": "ape/utils/__init__.py",
    "chars": 506,
    "preview": "# ------------------------------------------------------------------------\n# Deformable DETR\n# Copyright (c) 2020 SenseT"
  },
  {
    "path": "ape/utils/box_ops.py",
    "chars": 2976,
    "preview": "# ------------------------------------------------------------------------\n# Deformable DETR\n# Copyright (c) 2020 SenseT"
  },
  {
    "path": "ape/utils/misc.py",
    "chars": 18057,
    "preview": "# ------------------------------------------------------------------------\n# Deformable DETR\n# Copyright (c) 2020 SenseT"
  },
  {
    "path": "ape/utils/plot_utils.py",
    "chars": 4814,
    "preview": "# ------------------------------------------------------------------------\n# Deformable DETR\n# Copyright (c) 2020 SenseT"
  },
  {
    "path": "configs/ADE20kFull_SemanticSegmentation/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024.py",
    "chars": 3151,
    "preview": "import torch.nn as nn\n\nfrom detectron2.config import LazyCall as L\nfrom detectron2.layers import ShapeSpec\nfrom detrex.m"
  },
  {
    "path": "configs/ADE20kFull_SemanticSegmentation/ape_deta/ape_deta_vitl_eva02_lsj1024.py",
    "chars": 1095,
    "preview": "from detectron2.data import MetadataCatalog\n\nfrom ...COCO_InstanceSegmentation.ape_deta.ape_deta_vitl_eva02_lsj1024_cp_1"
  },
  {
    "path": "configs/ADE20kFull_SemanticSegmentation/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024.py",
    "chars": 1418,
    "preview": "from detectron2.config import LazyCall as L\nfrom ape.layers import VisionLanguageFusion\nfrom ape.modeling.ape_deta impor"
  },
  {
    "path": "configs/ADE20kFull_SemanticSegmentation/ape_deta/ape_deta_vitt_eva02_vlf_lsj1024.py",
    "chars": 3149,
    "preview": "import torch.nn as nn\n\nfrom detectron2.config import LazyCall as L\nfrom detectron2.layers import ShapeSpec\nfrom detrex.m"
  },
  {
    "path": "configs/ADE20k_PanopticSegmentation/ape_deta/ape_deta_r50_160k.py",
    "chars": 1319,
    "preview": "from detectron2.config import LazyCall as L\nfrom detectron2.solver import WarmupParamScheduler\nfrom fvcore.common.param_"
  },
  {
    "path": "configs/ADE20k_PanopticSegmentation/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024.py",
    "chars": 3151,
    "preview": "import torch.nn as nn\n\nfrom detectron2.config import LazyCall as L\nfrom detectron2.layers import ShapeSpec\nfrom detrex.m"
  },
  {
    "path": "configs/ADE20k_PanopticSegmentation/ape_deta/ape_deta_vitl_eva02_lsj1024.py",
    "chars": 682,
    "preview": "from ...COCO_InstanceSegmentation.ape_deta.ape_deta_vitl_eva02_lsj1024_cp_12ep import (\n    lr_multiplier,\n    model,\n  "
  },
  {
    "path": "configs/ADE20k_PanopticSegmentation/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024.py",
    "chars": 1418,
    "preview": "from detectron2.config import LazyCall as L\nfrom ape.layers import VisionLanguageFusion\nfrom ape.modeling.ape_deta impor"
  },
  {
    "path": "configs/ADE20k_PanopticSegmentation/ape_deta/ape_deta_vitt_eva02_vlf_lsj1024.py",
    "chars": 3149,
    "preview": "import torch.nn as nn\n\nfrom detectron2.config import LazyCall as L\nfrom detectron2.layers import ShapeSpec\nfrom detrex.m"
  },
  {
    "path": "configs/ADE20k_SemanticSegmentation/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024.py",
    "chars": 3151,
    "preview": "import torch.nn as nn\n\nfrom detectron2.config import LazyCall as L\nfrom detectron2.layers import ShapeSpec\nfrom detrex.m"
  },
  {
    "path": "configs/ADE20k_SemanticSegmentation/ape_deta/ape_deta_vitl_eva02_lsj1024.py",
    "chars": 684,
    "preview": "from ...COCO_InstanceSegmentation.ape_deta.ape_deta_vitl_eva02_lsj1024_cp_12ep import (\n    lr_multiplier,\n    model,\n  "
  },
  {
    "path": "configs/ADE20k_SemanticSegmentation/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024.py",
    "chars": 1418,
    "preview": "from detectron2.config import LazyCall as L\nfrom ape.layers import VisionLanguageFusion\nfrom ape.modeling.ape_deta impor"
  },
  {
    "path": "configs/ADE20k_SemanticSegmentation/ape_deta/ape_deta_vitt_eva02_vlf_lsj1024.py",
    "chars": 3149,
    "preview": "import torch.nn as nn\n\nfrom detectron2.config import LazyCall as L\nfrom detectron2.layers import ShapeSpec\nfrom detrex.m"
  },
  {
    "path": "configs/ADE20k_SemanticSegmentation/deformable_deta/deformable_deta_segm_r50_160k.py",
    "chars": 1282,
    "preview": "from detectron2.config import LazyCall as L\nfrom detectron2.solver import WarmupParamScheduler\nfrom fvcore.common.param_"
  },
  {
    "path": "configs/BDD10k_PanopticSegmentation/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024.py",
    "chars": 3330,
    "preview": "import torch.nn as nn\n\nfrom detectron2.config import LazyCall as L\nfrom detectron2.layers import ShapeSpec\nfrom detrex.m"
  },
  {
    "path": "configs/BDD10k_PanopticSegmentation/ape_deta/ape_deta_vitl_eva02_lsj1024.py",
    "chars": 683,
    "preview": "from ...COCO_InstanceSegmentation.ape_deta.ape_deta_vitl_eva02_lsj1024_cp_12ep import (\n    lr_multiplier,\n    model,\n  "
  },
  {
    "path": "configs/BDD10k_PanopticSegmentation/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024.py",
    "chars": 1418,
    "preview": "from detectron2.config import LazyCall as L\nfrom ape.layers import VisionLanguageFusion\nfrom ape.modeling.ape_deta impor"
  },
  {
    "path": "configs/BDD10k_PanopticSegmentation/ape_deta/ape_deta_vitt_eva02_vlf_lsj1024.py",
    "chars": 3328,
    "preview": "import torch.nn as nn\n\nfrom detectron2.config import LazyCall as L\nfrom detectron2.layers import ShapeSpec\nfrom detrex.m"
  },
  {
    "path": "configs/BDD10k_SemanticSegmentation/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024.py",
    "chars": 3151,
    "preview": "import torch.nn as nn\n\nfrom detectron2.config import LazyCall as L\nfrom detectron2.layers import ShapeSpec\nfrom detrex.m"
  },
  {
    "path": "configs/BDD10k_SemanticSegmentation/ape_deta/ape_deta_vitl_eva02_lsj1024.py",
    "chars": 684,
    "preview": "from ...COCO_InstanceSegmentation.ape_deta.ape_deta_vitl_eva02_lsj1024_cp_12ep import (\n    lr_multiplier,\n    model,\n  "
  },
  {
    "path": "configs/BDD10k_SemanticSegmentation/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024.py",
    "chars": 1418,
    "preview": "from detectron2.config import LazyCall as L\nfrom ape.layers import VisionLanguageFusion\nfrom ape.modeling.ape_deta impor"
  },
  {
    "path": "configs/BDD10k_SemanticSegmentation/ape_deta/ape_deta_vitt_eva02_vlf_lsj1024.py",
    "chars": 3149,
    "preview": "import torch.nn as nn\n\nfrom detectron2.config import LazyCall as L\nfrom detectron2.layers import ShapeSpec\nfrom detrex.m"
  },
  {
    "path": "configs/COCO_Detection/deformable_deta/deformable_deta_r50_12ep.py",
    "chars": 1270,
    "preview": "from detrex.config import get_config\n\nfrom .models.deformable_deta_r50 import model\n\ndataloader = get_config(\"common/dat"
  },
  {
    "path": "configs/COCO_Detection/deformable_deta/deformable_deta_r50_24ep.py",
    "chars": 1219,
    "preview": "from detrex.config import get_config\n\nfrom .models.deformable_deta_r50 import model\n\ndataloader = get_config(\"common/dat"
  },
  {
    "path": "configs/COCO_Detection/deformable_deta/deformable_deta_vitb_clip_openai_lsj1024_cp_12ep.py",
    "chars": 618,
    "preview": "\n\nfrom ...common.data.coco_lsj1024_cp import dataloader\nfrom ...common.data.constants import constants\nfrom .deformable_"
  },
  {
    "path": "configs/COCO_Detection/deformable_deta/deformable_deta_vitb_lsj1024_12ep.py",
    "chars": 2353,
    "preview": "from functools import partial\n\nimport torch.nn as nn\n\nfrom detectron2.config import LazyCall as L\nfrom detectron2.modeli"
  },
  {
    "path": "configs/COCO_Detection/deformable_deta/deformable_deta_vitg_eva_lsj1024_12ep.py",
    "chars": 1943,
    "preview": "from functools import partial\n\nfrom ape.modeling.backbone.vit_eva import SimpleFeaturePyramid, ViT, get_vit_lr_decay_rat"
  },
  {
    "path": "configs/COCO_Detection/deformable_deta/deformable_deta_vitg_eva_lsj1024_cp_12ep.py",
    "chars": 448,
    "preview": "from ....configs.common.data.coco_lsj1024_cp import dataloader\nfrom .deformable_deta_vitg_eva_lsj1024_12ep import lr_mul"
  },
  {
    "path": "configs/COCO_Detection/deformable_deta/deformable_deta_vitl_eva02_lsj1024_cp_12ep.py",
    "chars": 3100,
    "preview": "from functools import partial\n\nimport torch.nn as nn\n\nfrom detectron2.config import LazyCall as L\nfrom detectron2.data.c"
  },
  {
    "path": "configs/COCO_Detection/deformable_deta/deformable_deta_vitl_eva_lsj1024_cp_12ep.py",
    "chars": 870,
    "preview": "from detectron2.modeling.backbone.vit import get_vit_lr_decay_rate\n\nfrom ...common.data.coco_lsj1024_cp import dataloade"
  },
  {
    "path": "configs/COCO_Detection/deformable_deta/deformable_deta_vitl_lsj1024_12ep.py",
    "chars": 1239,
    "preview": "from detectron2.modeling.backbone.vit import get_vit_lr_decay_rate\n\nfrom .deformable_deta_vitb_lsj1024_12ep import datal"
  },
  {
    "path": "configs/COCO_Detection/deformable_deta/models/deformable_deta_r50.py",
    "chars": 3746,
    "preview": "import torch.nn as nn\n\nfrom detectron2.config import LazyCall as L\nfrom detectron2.layers import ShapeSpec\nfrom detectro"
  },
  {
    "path": "configs/COCO_Detection/deformable_detr/deformable_detr_r50_50ep.py",
    "chars": 1052,
    "preview": "from detrex.config import get_config\n\nfrom .models.deformable_detr_r50 import model\n\ndataloader = get_config(\"common/dat"
  },
  {
    "path": "configs/COCO_Detection/deformable_detr/deformable_detr_r50_two_stage_50ep.py",
    "chars": 290,
    "preview": "from .deformable_detr_r50_50ep import dataloader, lr_multiplier, model, optimizer, train\n\nmodel.with_box_refine = True\nm"
  },
  {
    "path": "configs/COCO_Detection/deformable_detr/deformable_detr_r50_with_box_refinement_50ep.py",
    "chars": 270,
    "preview": "from .deformable_detr_r50_50ep import dataloader, lr_multiplier, model, optimizer, train\n\nmodel.with_box_refine = True\n\n"
  },
  {
    "path": "configs/COCO_Detection/deformable_detr/improved_deformable_detr_r50_12ep.py",
    "chars": 1261,
    "preview": "from detrex.config import get_config\n\nfrom .models.improved_deformable_detr_r50 import model\n\ndataloader = get_config(\"c"
  },
  {
    "path": "configs/COCO_Detection/deformable_detr/improved_deformable_detr_r50_50ep.py",
    "chars": 1210,
    "preview": "from detrex.config import get_config\n\nfrom .models.improved_deformable_detr_r50 import model\n\ndataloader = get_config(\"c"
  },
  {
    "path": "configs/COCO_Detection/deformable_detr/improved_deformable_detr_r50_two_stage_12ep.py",
    "chars": 308,
    "preview": "from .improved_deformable_detr_r50_12ep import dataloader, lr_multiplier, model, optimizer, train\n\nmodel.with_box_refine"
  },
  {
    "path": "configs/COCO_Detection/deformable_detr/improved_deformable_detr_r50_two_stage_50ep.py",
    "chars": 308,
    "preview": "from .improved_deformable_detr_r50_50ep import dataloader, lr_multiplier, model, optimizer, train\n\nmodel.with_box_refine"
  },
  {
    "path": "configs/COCO_Detection/deformable_detr/models/deformable_detr_r50.py",
    "chars": 3399,
    "preview": "import torch.nn as nn\n\nfrom detectron2.config import LazyCall as L\nfrom detectron2.layers import ShapeSpec\nfrom detectro"
  },
  {
    "path": "configs/COCO_Detection/deformable_detr/models/improved_deformable_detr_r50.py",
    "chars": 3399,
    "preview": "import torch.nn as nn\n\nfrom detectron2.config import LazyCall as L\nfrom detectron2.layers import ShapeSpec\nfrom detectro"
  },
  {
    "path": "configs/COCO_InstanceSegmentation/ape_deta/ape_deta_r50_12ep.py",
    "chars": 1824,
    "preview": "from detectron2.config import LazyCall as L\nfrom detrex.config import get_config\nfrom ape.modeling.text import EVA01CLIP"
  },
  {
    "path": "configs/COCO_InstanceSegmentation/ape_deta/ape_deta_r50_vlf_12ep.py",
    "chars": 1407,
    "preview": "from detectron2.config import LazyCall as L\nfrom ape.layers import VisionLanguageFusion\nfrom ape.modeling.ape_deta impor"
  },
  {
    "path": "configs/COCO_InstanceSegmentation/ape_deta/ape_deta_vite_eva02_clip_lsj1024_cp_12ep_fsdp.py",
    "chars": 3576,
    "preview": "from detectron2.config import LazyCall as L\nfrom detectron2.layers import ShapeSpec\n\nfrom detectron2.model_zoo import ge"
  },
  {
    "path": "configs/COCO_InstanceSegmentation/ape_deta/ape_deta_vite_eva02_clip_lsj1024_cp_32x90k_fsdp.py",
    "chars": 3577,
    "preview": "from detectron2.config import LazyCall as L\nfrom detectron2.layers import ShapeSpec\n\nfrom detectron2.model_zoo import ge"
  },
  {
    "path": "configs/COCO_InstanceSegmentation/ape_deta/ape_deta_vitg_eva01_clip_lsj1536_cp_128x45k.py",
    "chars": 2710,
    "preview": "from detectron2.config import LazyCall as L\nfrom detectron2.layers import ShapeSpec\nfrom detrex.config import get_config"
  },
  {
    "path": "configs/COCO_InstanceSegmentation/ape_deta/ape_deta_vitg_eva01_clip_lsj1536_cp_64x90k.py",
    "chars": 2709,
    "preview": "from detectron2.config import LazyCall as L\nfrom detectron2.layers import ShapeSpec\nfrom detrex.config import get_config"
  },
  {
    "path": "configs/COCO_InstanceSegmentation/ape_deta/ape_deta_vitg_eva01_lsj1536_cp_64x90k.py",
    "chars": 2627,
    "preview": "from detectron2.config import LazyCall as L\nfrom detectron2.layers import ShapeSpec\nfrom detrex.config import get_config"
  },
  {
    "path": "configs/COCO_InstanceSegmentation/ape_deta/ape_deta_vitl_eva02_clip_lsj1024_cp_12ep.py",
    "chars": 3077,
    "preview": "from detectron2.config import LazyCall as L\nfrom detectron2.layers import ShapeSpec\n\nfrom detectron2.model_zoo import ge"
  },
  {
    "path": "configs/COCO_InstanceSegmentation/ape_deta/ape_deta_vitl_eva02_clip_lsj1024_cp_12ep_fsdp.py",
    "chars": 3571,
    "preview": "from detectron2.config import LazyCall as L\nfrom detectron2.layers import ShapeSpec\n\nfrom detectron2.model_zoo import ge"
  },
  {
    "path": "configs/COCO_InstanceSegmentation/ape_deta/ape_deta_vitl_eva02_clip_lsj1536_cp_128x45k.py",
    "chars": 356,
    "preview": "from .ape_deta_vitl_eva02_clip_lsj1536_cp_64x90k import (\n    dataloader,\n    lr_multiplier,\n    model,\n    optimizer,\n "
  }
]

// ... and 301 more files (download for full content)

About this extraction

This page contains the full source code of the shenyunhang/APE GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 501 files (5.7 MB), approximately 1.5M tokens, and a symbol index with 1314 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo