Full Code of orhir/PoseAnything for AI

main 7f1253f8c6a9 cached
95 files
575.5 KB
149.6k tokens
362 symbols
1 requests
Download .txt
Showing preview only (607K chars total). Download the full file or copy to clipboard to get everything.
Repository: orhir/PoseAnything
Branch: main
Commit: 7f1253f8c6a9
Files: 95
Total size: 575.5 KB

Directory structure:
gitextract_qsqas83s/

├── .gitignore
├── LICENSE
├── README.md
├── app.py
├── configs/
│   ├── 1shot-swin/
│   │   ├── base_split1_config.py
│   │   ├── base_split2_config.py
│   │   ├── base_split3_config.py
│   │   ├── base_split4_config.py
│   │   ├── base_split5_config.py
│   │   ├── graph_split1_config.py
│   │   ├── graph_split2_config.py
│   │   ├── graph_split3_config.py
│   │   ├── graph_split4_config.py
│   │   └── graph_split5_config.py
│   ├── 1shots/
│   │   ├── base_split1_config.py
│   │   ├── base_split2_config.py
│   │   ├── base_split3_config.py
│   │   ├── base_split4_config.py
│   │   ├── base_split5_config.py
│   │   ├── graph_split1_config.py
│   │   ├── graph_split2_config.py
│   │   ├── graph_split3_config.py
│   │   ├── graph_split4_config.py
│   │   └── graph_split5_config.py
│   ├── 5shot-swin/
│   │   ├── base_split1_config.py
│   │   ├── base_split2_config.py
│   │   ├── base_split3_config.py
│   │   ├── base_split4_config.py
│   │   ├── base_split5_config.py
│   │   ├── graph_split1_config.py
│   │   ├── graph_split2_config.py
│   │   ├── graph_split3_config.py
│   │   ├── graph_split4_config.py
│   │   └── graph_split5_config.py
│   ├── 5shots/
│   │   ├── base_split1_config.py
│   │   ├── base_split2_config.py
│   │   ├── base_split3_config.py
│   │   ├── base_split4_config.py
│   │   ├── base_split5_config.py
│   │   ├── graph_split1_config.py
│   │   ├── graph_split2_config.py
│   │   ├── graph_split3_config.py
│   │   ├── graph_split4_config.py
│   │   └── graph_split5_config.py
│   └── demo_b.py
├── demo.py
├── docker/
│   └── Dockerfile
├── models/
│   ├── VERSION
│   ├── __init__.py
│   ├── apis/
│   │   ├── __init__.py
│   │   └── train.py
│   ├── core/
│   │   ├── __init__.py
│   │   └── custom_hooks/
│   │       └── shuffle_hooks.py
│   ├── datasets/
│   │   ├── __init__.py
│   │   ├── builder.py
│   │   ├── datasets/
│   │   │   ├── __init__.py
│   │   │   └── mp100/
│   │   │       ├── __init__.py
│   │   │       ├── fewshot_base_dataset.py
│   │   │       ├── fewshot_dataset.py
│   │   │       ├── test_base_dataset.py
│   │   │       ├── test_dataset.py
│   │   │       ├── transformer_base_dataset.py
│   │   │       └── transformer_dataset.py
│   │   └── pipelines/
│   │       ├── __init__.py
│   │       ├── post_transforms.py
│   │       └── top_down_transform.py
│   ├── models/
│   │   ├── __init__.py
│   │   ├── backbones/
│   │   │   ├── __init__.py
│   │   │   ├── simmim.py
│   │   │   ├── swin_mlp.py
│   │   │   ├── swin_transformer.py
│   │   │   ├── swin_transformer_moe.py
│   │   │   ├── swin_transformer_v2.py
│   │   │   └── swin_utils.py
│   │   ├── detectors/
│   │   │   ├── __init__.py
│   │   │   └── pam.py
│   │   ├── keypoint_heads/
│   │   │   ├── __init__.py
│   │   │   └── head.py
│   │   └── utils/
│   │       ├── __init__.py
│   │       ├── builder.py
│   │       ├── encoder_decoder.py
│   │       ├── positional_encoding.py
│   │       └── transformer.py
│   └── version.py
├── requirements.txt
├── setup.cfg
├── setup.py
├── test.py
├── tools/
│   ├── dist_test.sh
│   ├── dist_train.sh
│   ├── fix_carfuxion.py
│   ├── slurm_test.sh
│   ├── slurm_train.sh
│   └── visualization.py
└── train.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
.eggs/*
.vscode/*
work_dirs/*
work_dir/*
pretrained/*
ckpt/*
runai_dataset/*
*/__pycache__
*.pyc
data/*
data
output/*
.idea/*
pose_anything.egg-info/*

================================================
FILE: LICENSE
================================================
Copyright (c) 2022 SenseTime. All Rights Reserved.

                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright 2020 MMClassification Authors.

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: README.md
================================================
:new: *Please check out [EdgeCape](https://github.com/orhir/EdgeCape), our more recent effort in the same line of work.*
<br /> <br />

# A Graph-Based Approach for Category-Agnostic Pose Estimation [ECCV 2024]
<a href="https://orhir.github.io/pose-anything/"><img src="https://img.shields.io/static/v1?label=Project&message=Website&color=blue"></a>
<a href="https://arxiv.org/abs/2311.17891"><img src="https://img.shields.io/badge/arXiv-2311.17891-b31b1b.svg"></a>
<a href="https://www.apache.org/licenses/LICENSE-2.0.txt"><img src="https://img.shields.io/badge/License-Apache-yellow"></a>
[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/orhir/PoseAnything)
[![Open in OpenXLab](https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg)](https://openxlab.org.cn/apps/detail/orhir/Pose-Anything)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pose-anything-a-graph-based-approach-for/2d-pose-estimation-on-mp-100)](https://paperswithcode.com/sota/2d-pose-estimation-on-mp-100?p=pose-anything-a-graph-based-approach-for)

By [Or Hirschorn](https://scholar.google.co.il/citations?user=GgFuT_QAAAAJ&hl=iw&oi=ao) and [Shai Avidan](https://scholar.google.co.il/citations?hl=iw&user=hpItE1QAAAAJ)

This repo is the official implementation of "[A Graph-Based Approach for Category-Agnostic Pose Estimation](https://arxiv.org/pdf/2311.17891.pdf)".

<p align="center">
<img src="Pose_Anything_Teaser.png" width="384">
</p>

## 🔔 News
- **`11 July 2024`** Our paper will be presented at **ECCV 2024**.
- **`10 July 2024`** Uploaded new annotations - fix a small bug of DeepFashion skeletons.
- **`2 Feburary 2024`** Uploaded new weights - smaller models with stronger performance.
- **`20 December 2023`** Demo is online on [Huggingface](https://huggingface.co/spaces/orhir/PoseAnything) and [OpenXLab](https://openxlab.org.cn/apps/detail/orhir/Pose-Anything).
- **`7 December 2023`** Official code release.

## Introduction

We present a novel approach to CAPE that leverages the inherent geometrical relations between keypoints through a newly designed Graph Transformer Decoder. By capturing and incorporating this crucial structural information, our method enhances the accuracy of keypoint localization, marking a significant departure from conventional CAPE techniques that treat keypoints as isolated entities.

## Citation
If you find this useful, please cite this work as follows:
```bibtex
@misc{hirschorn2023pose,
      title={A Graph-Based Approach for Category-Agnostic Pose Estimation}, 
      author={Or Hirschorn and Shai Avidan},
      year={2024},
      eprint={2311.17891},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2311.17891}, 
}
```

## Getting Started

### Docker [Recommended]
We provide a docker image for easy use.
You can simply pull the docker image from docker hub, containing all the required libraries and packages:

```
docker pull orhir/pose_anything
docker run --name pose_anything -v {DATA_DIR}:/workspace/PoseAnything/PoseAnything/data/mp100 -it orhir/pose_anything /bin/bash
```
### Conda Environment
We train and evaluate our model on Python 3.8 and Pytorch 2.0.1 with CUDA 12.1. 

Please first install pytorch and torchvision following official documentation Pytorch. 
Then, follow [MMPose](https://mmpose.readthedocs.io/en/latest/installation.html) to install the following packages:
```
mmcv-full=1.6.2
mmpose=0.29.0
```
Having installed these packages, run:
```
python setup.py develop
```

## Demo on Custom Images
<i>TRY IT NOW ON:</i> <a href="https://huggingface.co/spaces/orhir/PoseAnything">HuggingFace</a> / <a href="https://openxlab.org.cn/apps/detail/orhir/Pose-Anything">OpenXLab</a>


We provide a demo code to test our code on custom images. 

### Gradio Demo
We first require to install gradio:
```
pip install gradio==3.44.0
```
Then, Download the [pretrained model](https://drive.google.com/file/d/1RT1Q8AMEa1kj6k9ZqrtWIKyuR4Jn4Pqc/view?usp=drive_link) and run:
```
python app.py --checkpoint [path_to_pretrained_ckpt]
```
### Terminal Demo
Download
the [pretrained model](https://drive.google.com/file/d/1RT1Q8AMEa1kj6k9ZqrtWIKyuR4Jn4Pqc/view?usp=drive_link)
and run:

```
python demo.py --support [path_to_support_image] --query [path_to_query_image] --config configs/demo_b.py --checkpoint [path_to_pretrained_ckpt]
```
***Note:*** The demo code supports any config with suitable checkpoint file. More pre-trained models can be found in the evaluation section.


## Updated MP-100 Dataset
Please follow the [official guide](https://github.com/luminxu/Pose-for-Everything/blob/main/mp100/README.md) to prepare the MP-100 dataset for training and evaluation, and organize the data structure properly.

**We provide an updated annotation file, which includes skeleton definitions, in the following [link](https://drive.google.com/drive/folders/1uRyGB-P5Tc_6TmAZ6RnOi0SWjGq9b28T?usp=sharing).**

**Please note:**

Current version of the MP-100 dataset includes some discrepancies and filenames errors:
1. Note that the mentioned DeepFasion dataset is actually DeepFashion2 dataset. The link in the official repo is wrong. Use this [repo](https://github.com/switchablenorms/DeepFashion2/tree/master) instead.
2. We provide a script to fix CarFusion filename errors, which can be run by:
```
python tools/fix_carfusion.py [path_to_CarFusion_dataset] [path_to_mp100_annotation]
```

## Training

### Backbone Options
To use pre-trained Swin-Transformer as used in our paper, we provide the weights, taken from this [repo](https://github.com/microsoft/Swin-Transformer/blob/main/MODELHUB.md), in the following [link](https://drive.google.com/drive/folders/1-q4mSxlNAUwDlevc3Hm5Ij0l_2OGkrcg?usp=sharing).
These should be placed in the `./pretrained` folder.

We also support DINO and ResNet backbones. To use them, you can easily change the config file to use the desired backbone.
This can be done by changing the `pretrained` field in the config file to `dinov2`, `dino` or `resnet` respectively (this will automatically load the pretrained weights from the official repo).

### Training
To train the model, run:
```
python train.py --config [path_to_config_file]  --work-dir [path_to_work_dir]
```

## Evaluation and Pretrained Models
You can download the pretrained checkpoints from following [link](https://drive.google.com/drive/folders/1RmrqzE3g0qYRD5xn54-aXEzrIkdYXpEW?usp=sharing).

Here we provide the evaluation results of our pretrained models on MP-100 dataset along with the config files and checkpoints:

### 1-Shot Models
| Setting |                                                                       split 1                                                                       |                                                                       split 2                                                                       |                                                                       split 3                                                                       |                                                                       split 4                                                                       |                                                                       split 5                                                                       |
|:-------:|:---------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------:|
|  Tiny   |                                                                        91.19                                                                        |                                                                        87.81                                                                        |                                                                        85.68                                                                        |                                                                        85.87                                                                        |                                                                        85.61                                                                        |
|         |   [link](https://drive.google.com/file/d/1GubmkVkqybs-eD4hiRkgBzkUVGE_rIFX/view?usp=drive_link) / [config](configs/1shots/graph_split1_config.py)   |   [link](https://drive.google.com/file/d/1EEekDF3xV_wJOVk7sCQWUA8ygUKzEm2l/view?usp=drive_link) / [config](configs/1shots/graph_split2_config.py)   |   [link](https://drive.google.com/file/d/1FuwpNBdPI3mfSovta2fDGKoqJynEXPZQ/view?usp=drive_link) / [config](configs/1shots/graph_split3_config.py)   |   [link](https://drive.google.com/file/d/1_SSqSANuZlbC0utzIfzvZihAW9clefcR/view?usp=drive_link) / [config](configs/1shots/graph_split4_config.py)   |   [link](https://drive.google.com/file/d/1nUHr07W5F55u-FKQEPFq_CECgWZOKKLF/view?usp=drive_link) / [config](configs/1shots/graph_split5_config.py)   |
|  Small  |                                                                        94.73                                                                        |                                                                        89.79                                                                        |                                                                        90.69                                                                        |                                                                        88.09                                                                        |                                                                        90.11                                                                        |
|         | [link](https://drive.google.com/file/d/1RT1Q8AMEa1kj6k9ZqrtWIKyuR4Jn4Pqc/view?usp=drive_link) / [config](configs/1shot-swin/graph_split1_config.py) | [link](https://drive.google.com/file/d/1BT5b8MlnkflcdhTFiBROIQR3HccLsPQd/view?usp=drive_link) / [config](configs/1shot-swin/graph_split2_config.py) | [link](https://drive.google.com/file/d/1Z64cw_1CSDGObabSAWKnMK0BA_bqDHxn/view?usp=drive_link) / [config](configs/1shot-swin/graph_split3_config.py) | [link](https://drive.google.com/file/d/1vf82S8LAjIzpuBcbEoDCa26cR8DqNriy/view?usp=drive_link) / [config](configs/1shot-swin/graph_split4_config.py) | [link](https://drive.google.com/file/d/14FNx0JNbkS2CvXQMiuMU_kMZKFGO2rDV/view?usp=drive_link) / [config](configs/1shot-swin/graph_split5_config.py) |

### 5-Shot Models
| Setting |                                                                       split 1                                                                       |                                                                       split 2                                                                       |                                                                       split 3                                                                       |                                                                       split 4                                                                       |                                                                       split 5                                                                       |
|:-------:|:---------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------:|
|  Tiny   |                                                                        94.24                                                                        |                                                                        91.32                                                                        |                                                                        90.15                                                                        |                                                                        90.37                                                                        |                                                                        89.73                                                                        |
|         |   [link](https://drive.google.com/file/d/1PeMuwv5YwiF3UCE5oN01Qchu5K3BaQ9L/view?usp=drive_link) / [config](configs/5shots/graph_split1_config.py)   |   [link](https://drive.google.com/file/d/1enIapPU1D8lZOET7q_qEjnhC1HFy3jWK/view?usp=drive_link) / [config](configs/5shots/graph_split2_config.py)   |   [link](https://drive.google.com/file/d/1MTeZ9Ba-ucLuqX0KBoLbBD5PaEct7VUp/view?usp=drive_link) / [config](configs/5shots/graph_split3_config.py)   |   [link](https://drive.google.com/file/d/1U2N7DI2F0v7NTnPCEEAgx-WKeBZNAFoa/view?usp=drive_link) / [config](configs/5shots/graph_split4_config.py)   |   [link](https://drive.google.com/file/d/1wapJDgtBWtmz61JNY7ktsFyvckRKiR2C/view?usp=drive_link) / [config](configs/5shots/graph_split5_config.py)   |
|  Small  |                                                                        96.67                                                                        |                                                                        91.48                                                                        |                                                                        92.62                                                                        |                                                                        90.95                                                                        |                                                                        92.41                                                                        |
|         | [link](https://drive.google.com/file/d/1p5rnA0MhmndSKEbyXMk49QXvNE03QV2p/view?usp=drive_link) / [config](configs/5shot-swin/graph_split1_config.py) | [link](https://drive.google.com/file/d/1Q3KNyUW_Gp3JytYxUPhkvXFiDYF6Hv8w/view?usp=drive_link) / [config](configs/5shot-swin/graph_split2_config.py) | [link](https://drive.google.com/file/d/1gWgTk720fSdAf_ze1FkfXTW0t7k-69dV/view?usp=drive_link) / [config](configs/5shot-swin/graph_split3_config.py) | [link](https://drive.google.com/file/d/1LuaRQ8a6AUPrkr7l5j2W6Fe_QbgASkwY/view?usp=drive_link) / [config](configs/5shot-swin/graph_split4_config.py) | [link](https://drive.google.com/file/d/1z--MAOPCwMG_GQXru9h2EStbnIvtHv1L/view?usp=drive_link) / [config](configs/5shot-swin/graph_split5_config.py) |

### Evaluation
The evaluation on a single GPU will take approximately 30 min. 

To evaluate the pretrained model, run:
```
python test.py [path_to_config_file] [path_to_pretrained_ckpt]
```
## Acknowledgement

Our code is based on code from:
 - [MMPose](https://github.com/open-mmlab/mmpose)
 - [CapeFormer](https://github.com/flyinglynx/CapeFormer)


## License
This project is released under the Apache 2.0 license.


================================================
FILE: app.py
================================================
import argparse
import random

import gradio as gr
import matplotlib
import numpy as np
import torch
from PIL import ImageDraw, Image
from matplotlib import pyplot as plt
from mmcv import Config
from mmcv.runner import load_checkpoint
from mmpose.core import wrap_fp16_model
from mmpose.models import build_posenet
from torchvision import transforms

from demo import Resize_Pad
from models import *

# Copyright (c) OpenMMLab. All rights reserved.
# os.system('python -m pip install timm')
# os.system('python -m pip install Openmim')
# os.system('python -m mim install mmengine')
# os.system('python -m mim install "mmcv-full==1.6.2"')
# os.system('python -m mim install "mmpose==0.29.0"')
# os.system('python -m mim install "gradio==3.44.0"')
# os.system('python setup.py develop')

matplotlib.use('agg')
checkpoint_path = ''


def plot_results(support_img, query_img, support_kp, support_w, query_kp,
                 query_w, skeleton,
                 initial_proposals, prediction, radius=6):
    h, w, c = support_img.shape
    prediction = prediction[-1].cpu().numpy() * h
    query_img = (query_img - np.min(query_img)) / (
            np.max(query_img) - np.min(query_img))
    for id, (img, w, keypoint) in enumerate(zip([query_img],
                                                [query_w],
                                                [prediction])):
        f, axes = plt.subplots()
        plt.imshow(img)
        for k in range(keypoint.shape[0]):
            if w[k] > 0:
                kp = keypoint[k, :2]
                c = (1, 0, 0, 0.75) if w[k] == 1 else (0, 0, 1, 0.6)
                patch = plt.Circle(kp, radius, color=c)
                axes.add_patch(patch)
                axes.text(kp[0], kp[1], k)
                plt.draw()
        for l, limb in enumerate(skeleton):
            kp = keypoint[:, :2]
            if l > len(COLORS) - 1:
                c = [x / 255 for x in random.sample(range(0, 255), 3)]
            else:
                c = [x / 255 for x in COLORS[l]]
            if w[limb[0]] > 0 and w[limb[1]] > 0:
                patch = plt.Line2D([kp[limb[0], 0], kp[limb[1], 0]],
                                   [kp[limb[0], 1], kp[limb[1], 1]],
                                   linewidth=6, color=c, alpha=0.6)
                axes.add_artist(patch)
        plt.axis('off')  # command for hiding the axis.
        plt.subplots_adjust(0, 0, 1, 1, 0, 0)
        return plt


COLORS = [
    [255, 85, 0], [255, 170, 0], [255, 255, 0], [170, 255, 0],
    [85, 255, 0], [0, 255, 0], [0, 255, 85], [0, 255, 170], [0, 255, 255],
    [0, 170, 255], [0, 85, 255], [0, 0, 255], [85, 0, 255], [170, 0, 255],
    [255, 0, 255], [255, 0, 170], [255, 0, 85], [255, 0, 0]
]


def process(query_img, state,
            cfg_path='configs/demo_b.py'):
    cfg = Config.fromfile(cfg_path)
    width, height, _ = state['original_support_image'].shape
    kp_src_np = np.array(state['kp_src']).copy().astype(np.float32)
    kp_src_np[:, 0] = kp_src_np[:, 0] / (
            width // 4) * cfg.model.encoder_config.img_size
    kp_src_np[:, 1] = kp_src_np[:, 1] / (
            height // 4) * cfg.model.encoder_config.img_size
    kp_src_np = np.flip(kp_src_np, 1).copy()
    kp_src_tensor = torch.tensor(kp_src_np).float()
    preprocess = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
        Resize_Pad(cfg.model.encoder_config.img_size,
                   cfg.model.encoder_config.img_size)])

    if len(state['skeleton']) == 0:
        state['skeleton'] = [(0, 0)]

    support_img = preprocess(state['original_support_image']).flip(0)[None]
    np_query = np.array(query_img)[:, :, ::-1].copy()
    q_img = preprocess(np_query).flip(0)[None]
    # Create heatmap from keypoints
    genHeatMap = TopDownGenerateTargetFewShot()
    data_cfg = cfg.data_cfg
    data_cfg['image_size'] = np.array([cfg.model.encoder_config.img_size,
                                       cfg.model.encoder_config.img_size])
    data_cfg['joint_weights'] = None
    data_cfg['use_different_joint_weights'] = False
    kp_src_3d = torch.cat(
        (kp_src_tensor, torch.zeros(kp_src_tensor.shape[0], 1)), dim=-1)
    kp_src_3d_weight = torch.cat(
        (torch.ones_like(kp_src_tensor),
         torch.zeros(kp_src_tensor.shape[0], 1)), dim=-1)
    target_s, target_weight_s = genHeatMap._msra_generate_target(data_cfg,
                                                                 kp_src_3d,
                                                                 kp_src_3d_weight,
                                                                 sigma=1)
    target_s = torch.tensor(target_s).float()[None]
    target_weight_s = torch.ones_like(
        torch.tensor(target_weight_s).float()[None])

    data = {
        'img_s': [support_img],
        'img_q': q_img,
        'target_s': [target_s],
        'target_weight_s': [target_weight_s],
        'target_q': None,
        'target_weight_q': None,
        'return_loss': False,
        'img_metas': [{'sample_skeleton': [state['skeleton']],
                       'query_skeleton': state['skeleton'],
                       'sample_joints_3d': [kp_src_3d],
                       'query_joints_3d': kp_src_3d,
                       'sample_center': [kp_src_tensor.mean(dim=0)],
                       'query_center': kp_src_tensor.mean(dim=0),
                       'sample_scale': [
                           kp_src_tensor.max(dim=0)[0] -
                           kp_src_tensor.min(dim=0)[0]],
                       'query_scale': kp_src_tensor.max(dim=0)[0] -
                                      kp_src_tensor.min(dim=0)[0],
                       'sample_rotation': [0],
                       'query_rotation': 0,
                       'sample_bbox_score': [1],
                       'query_bbox_score': 1,
                       'query_image_file': '',
                       'sample_image_file': [''],
                       }]
    }
    # Load model
    model = build_posenet(cfg.model)
    fp16_cfg = cfg.get('fp16', None)
    if fp16_cfg is not None:
        wrap_fp16_model(model)
    load_checkpoint(model, checkpoint_path, map_location='cpu')
    model.eval()
    with torch.no_grad():
        outputs = model(**data)
    # visualize results
    vis_s_weight = target_weight_s[0]
    vis_q_weight = target_weight_s[0]
    vis_s_image = support_img[0].detach().cpu().numpy().transpose(1, 2, 0)
    vis_q_image = q_img[0].detach().cpu().numpy().transpose(1, 2, 0)
    support_kp = kp_src_3d
    out = plot_results(vis_s_image,
                       vis_q_image,
                       support_kp,
                       vis_s_weight,
                       None,
                       vis_q_weight,
                       state['skeleton'],
                       None,
                       torch.tensor(outputs['points']).squeeze(0),
                       )
    return out, state


def update_examples(support_img, posed_support, query_img, state, r=0.015, width=0.02):
    state['color_idx'] = 0
    state['original_support_image'] = np.array(support_img)[:, :, ::-1].copy()
    support_img, posed_support, _ = set_query(support_img, state, example=True)
    w, h = support_img.size
    draw_pose = ImageDraw.Draw(support_img)
    draw_limb = ImageDraw.Draw(posed_support)
    r = int(r * w)
    width = int(width * w)
    for pixel in state['kp_src']:
        leftUpPoint = (pixel[1] - r, pixel[0] - r)
        rightDownPoint = (pixel[1] + r, pixel[0] + r)
        twoPointList = [leftUpPoint, rightDownPoint]
        draw_pose.ellipse(twoPointList, fill=(255, 0, 0, 255))
        draw_limb.ellipse(twoPointList, fill=(255, 0, 0, 255))
    for limb in state['skeleton']:
        point_a = state['kp_src'][limb[0]][::-1]
        point_b = state['kp_src'][limb[1]][::-1]
        if state['color_idx'] < len(COLORS):
            c = COLORS[state['color_idx']]
            state['color_idx'] += 1
        else:
            c = random.choices(range(256), k=3)
        draw_limb.line([point_a, point_b], fill=tuple(c), width=width)
    return support_img, posed_support, query_img, state


def get_select_coords(kp_support,
                      limb_support,
                      state,
                      evt: gr.SelectData,
                      r=0.015):
    pixels_in_queue = set()
    pixels_in_queue.add((evt.index[1], evt.index[0]))
    while len(pixels_in_queue) > 0:
        pixel = pixels_in_queue.pop()
        if pixel[0] is not None and pixel[1] is not None and pixel not in \
                state['kp_src']:
            state['kp_src'].append(pixel)
        else:
            continue
        if limb_support is None:
            canvas_limb = kp_support
        else:
            canvas_limb = limb_support
        canvas_kp = kp_support
        w, h = canvas_kp.size
        draw_pose = ImageDraw.Draw(canvas_kp)
        draw_limb = ImageDraw.Draw(canvas_limb)
        r = int(r * w)
        leftUpPoint = (pixel[1] - r, pixel[0] - r)
        rightDownPoint = (pixel[1] + r, pixel[0] + r)
        twoPointList = [leftUpPoint, rightDownPoint]
        draw_pose.ellipse(twoPointList, fill=(255, 0, 0, 255))
        draw_limb.ellipse(twoPointList, fill=(255, 0, 0, 255))
    return canvas_kp, canvas_limb, state


def get_limbs(kp_support,
              state,
              evt: gr.SelectData,
              r=0.02, width=0.02):
    curr_pixel = (evt.index[1], evt.index[0])
    pixels_in_queue = set()
    pixels_in_queue.add((evt.index[1], evt.index[0]))
    canvas_kp = kp_support
    w, h = canvas_kp.size
    r = int(r * w)
    width = int(width * w)
    while len(pixels_in_queue) > 0 and curr_pixel != state['prev_clicked']:
        pixel = pixels_in_queue.pop()
        state['prev_clicked'] = pixel
        closest_point = min(state['kp_src'],
                            key=lambda p: (p[0] - pixel[0]) ** 2 +
                                          (p[1] - pixel[1]) ** 2)
        closest_point_index = state['kp_src'].index(closest_point)
        draw_limb = ImageDraw.Draw(canvas_kp)
        if state['color_idx'] < len(COLORS):
            c = COLORS[state['color_idx']]
        else:
            c = random.choices(range(256), k=3)
        leftUpPoint = (closest_point[1] - r, closest_point[0] - r)
        rightDownPoint = (closest_point[1] + r, closest_point[0] + r)
        twoPointList = [leftUpPoint, rightDownPoint]
        draw_limb.ellipse(twoPointList, fill=tuple(c))
        if state['count'] == 0:
            state['prev_pt'] = closest_point[1], closest_point[0]
            state['prev_pt_idx'] = closest_point_index
            state['count'] = state['count'] + 1
        else:
            if state['prev_pt_idx'] != closest_point_index:
                # Create Line and add Limb
                draw_limb.line(
                    [state['prev_pt'], (closest_point[1], closest_point[0])],
                    fill=tuple(c),
                    width=width)
                state['skeleton'].append(
                    (state['prev_pt_idx'], closest_point_index))
                state['color_idx'] = state['color_idx'] + 1
            else:
                draw_limb.ellipse(twoPointList, fill=(255, 0, 0, 255))
            state['count'] = 0
    return canvas_kp, state


def set_query(support_img, state, example=False):
    if not example:
        state['skeleton'].clear()
        state['kp_src'].clear()
    state['original_support_image'] = np.array(support_img)[:, :, ::-1].copy()
    width, height = support_img.size
    support_img = support_img.resize((width // 4, width // 4),
                                     Image.Resampling.LANCZOS)
    return support_img, support_img, state


with gr.Blocks() as demo:
    state = gr.State({
        'kp_src': [],
        'skeleton': [],
        'count': 0,
        'color_idx': 0,
        'prev_pt': None,
        'prev_pt_idx': None,
        'prev_clicked': None,
        'original_support_image': None,
    })

    gr.Markdown('''
    # Pose Anything Demo
    We present a novel approach to category agnostic pose estimation that 
    leverages the inherent geometrical relations between keypoints through a 
    newly designed Graph Transformer Decoder. By capturing and incorporating 
    this crucial structural information, our method enhances the accuracy of 
    keypoint localization, marking a significant departure from conventional 
    CAPE techniques that treat keypoints as isolated entities.
    ### [Paper](https://arxiv.org/abs/2311.17891) | [Official Repo](https://github.com/orhir/PoseAnything) 
    ## Instructions
    1. Upload an image of the object you want to pose on the **left** image.
    2. Click on the **left** image to mark keypoints.
    3. Click on the keypoints on the **right** image to mark limbs.
    4. Upload an image of the object you want to pose to the query image (
    **bottom**).
    5. Click **Evaluate** to pose the query image.
    ''')
    with gr.Row():
        support_img = gr.Image(label="Support Image",
                               type="pil",
                               info='Click to mark keypoints').style(
            height=400, width=400)
        posed_support = gr.Image(label="Posed Support Image",
                                 type="pil",
                                 interactive=False).style(height=400,
                                                          width=400)
    with gr.Row():
        query_img = gr.Image(label="Query Image",
                             type="pil").style(height=400, width=400)
    with gr.Row():
        eval_btn = gr.Button(value="Evaluate")
    with gr.Row():
        output_img = gr.Plot(label="Output Image", height=400, width=400)
    with gr.Row():
        gr.Markdown("## Examples")
    with gr.Row():
        gr.Examples(
            examples=[
                ['examples/dog2.png',
                 'examples/dog2.png',
                 'examples/dog1.png',
                 {'kp_src': [(50, 58), (51, 78), (66, 57), (118, 79),
                             (154, 79), (217, 74), (218, 103), (156, 104),
                             (152, 151), (215, 162), (213, 191),
                             (152, 174), (108, 171)],
                  'skeleton': [(0, 1), (1, 2), (0, 2), (3, 4), (4, 5),
                               (3, 7), (7, 6), (3, 12), (12, 8), (8, 9),
                               (12, 11), (11, 10)], 'count': 0,
                  'color_idx': 0, 'prev_pt': (174, 152),
                  'prev_pt_idx': 11, 'prev_clicked': (207, 186),
                  'original_support_image': None,
                  }
                 ],
                ['examples/sofa1.jpg',
                 'examples/sofa1.jpg',
                 'examples/sofa2.jpg',
                 {
                     'kp_src': [(82, 28), (65, 30), (52, 26), (65, 50),
                                (84, 52), (53, 54), (43, 52), (45, 71),
                                (81, 69), (77, 39), (57, 43), (58, 64),
                                (46, 42), (49, 65)],
                     'skeleton': [(0, 1), (3, 1), (3, 4), (10, 9), (11, 8),
                                  (1, 10), (10, 11), (11, 3), (1, 2), (7, 6),
                                  (5, 13), (5, 3), (13, 11), (12, 10), (12, 2),
                                  (6, 10), (7, 11)], 'count': 0,
                     'color_idx': 23, 'prev_pt': (71, 45), 'prev_pt_idx': 7,
                     'prev_clicked': (56, 63),
                     'original_support_image': None,
                 }],
                ['examples/person1.jpeg',
                 'examples/person1.jpeg',
                 'examples/person2.jpeg',
                 {
                     'kp_src': [(121, 95), (122, 160), (154, 130), (184, 106),
                                (181, 153)],
                     'skeleton': [(0, 1), (1, 2), (0, 2), (2, 3), (2, 4),
                                  (4, 3)], 'count': 0, 'color_idx': 6,
                     'prev_pt': (153, 181), 'prev_pt_idx': 4,
                     'prev_clicked': (181, 108),
                     'original_support_image': None,
                 }]
            ],
            inputs=[support_img, posed_support, query_img, state],
            outputs=[support_img, posed_support, query_img, state],
            fn=update_examples,
            run_on_click=True,
        )

    support_img.select(get_select_coords,
                       [support_img, posed_support, state],
                       [support_img, posed_support, state])
    support_img.upload(set_query,
                       inputs=[support_img, state],
                       outputs=[support_img, posed_support, state])
    posed_support.select(get_limbs,
                         [posed_support, state],
                         [posed_support, state])
    eval_btn.click(fn=process,
                   inputs=[query_img, state],
                   outputs=[output_img, state])

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Pose Anything Demo')
    parser.add_argument('--checkpoint',
                        help='checkpoint path',
                        default='https://github.com/orhir/PoseAnything'
                                '/releases/download/1.0.0/demo_b.pth')
    args = parser.parse_args()
    checkpoint_path = args.checkpoint
    demo.launch()


================================================
FILE: configs/1shot-swin/base_split1_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_small_1k_500k.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 18, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.3,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),
    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=8,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split1_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split1_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split1_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/1shot-swin/base_split2_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_small_1k_500k.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 18, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.3,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),
    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=8,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split2_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split2_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split2_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/1shot-swin/base_split3_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_small_1k_500k.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 18, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.3,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),
    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=8,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split3_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split3_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split3_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/1shot-swin/base_split4_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_small_1k_500k.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 18, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.3,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),
    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=8,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split4_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split4_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split4_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/1shot-swin/base_split5_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_small_1k_500k.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 18, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.3,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),
    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=8,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split5_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split5_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split5_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/1shot-swin/graph_split1_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_small_1k_500k.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 18, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.3,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            graph_decoder='pre',
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),
    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=8,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split1_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split1_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split1_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/1shot-swin/graph_split2_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_small_1k_500k.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 18, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.3,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            graph_decoder='pre',
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),
    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=8,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split2_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split2_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split2_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/1shot-swin/graph_split3_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_small_1k_500k.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 18, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.3,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            graph_decoder='pre',
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),
    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=8,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split3_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split3_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split3_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/1shot-swin/graph_split4_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_small_1k_500k.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 18, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.3,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            graph_decoder='pre',
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),
    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=8,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split4_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split4_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split4_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/1shot-swin/graph_split5_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_small_1k_500k.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 18, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.3,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            graph_decoder='pre',
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),
    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=8,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split5_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split5_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split5_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/1shots/base_split1_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_tiny_patch4_window16_256.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 6, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.2,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),

    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=16,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split1_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split1_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split1_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/1shots/base_split2_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_tiny_patch4_window16_256.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 6, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.2,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),

    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=16,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split2_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split2_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split2_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/1shots/base_split3_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_tiny_patch4_window16_256.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 6, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.2,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),

    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=16,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split3_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split3_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split3_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/1shots/base_split4_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_tiny_patch4_window16_256.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 6, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.2,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),

    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=16,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split4_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split4_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split4_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/1shots/base_split5_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_tiny_patch4_window16_256.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 6, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.2,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),

    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=16,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split5_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split5_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split5_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/1shots/graph_split1_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_tiny_patch4_window16_256.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 6, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.2,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            graph_decoder='pre',
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),
    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=16,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split1_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split1_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split1_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/1shots/graph_split2_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_tiny_patch4_window16_256.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 6, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.2,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            graph_decoder='pre',
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),
    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=16,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split2_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split2_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split2_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/1shots/graph_split3_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_tiny_patch4_window16_256.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 6, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.2,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            graph_decoder='pre',
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),
    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=16,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split3_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split3_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split3_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/1shots/graph_split4_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_tiny_patch4_window16_256.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 6, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.2,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            graph_decoder='pre',
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),
    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=16,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split4_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split4_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split4_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/1shots/graph_split5_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_tiny_patch4_window16_256.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 6, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.2,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            graph_decoder='pre',
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),
    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=16,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split5_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split5_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split5_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=1,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/5shot-swin/base_split1_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_small_1k_500k.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 18, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.3,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),
    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=8,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split1_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=5,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split1_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=5,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split1_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=5,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/5shot-swin/base_split2_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_small_1k_500k.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 18, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.3,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),
    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=8,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split2_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=5,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split2_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=5,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split2_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=5,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/5shot-swin/base_split3_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_small_1k_500k.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 18, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.3,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),
    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=8,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split3_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=5,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split3_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=5,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split3_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=5,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/5shot-swin/base_split4_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_small_1k_500k.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 18, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.3,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),
    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=8,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split4_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=5,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split4_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=5,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split4_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=5,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/5shot-swin/base_split5_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_small_1k_500k.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 18, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.3,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),
    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=8,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split5_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=5,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split5_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=5,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split5_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=5,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/5shot-swin/graph_split1_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_small_1k_500k.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 18, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.3,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            graph_decoder='pre',
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),
    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=8,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split1_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=5,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split1_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=5,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split1_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=5,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/5shot-swin/graph_split2_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_small_1k_500k.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 18, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.3,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            graph_decoder='pre',
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),
    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=8,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split2_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=5,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split2_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=5,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split2_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=5,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/5shot-swin/graph_split3_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=1,
    dataset_joints=1,
    dataset_channel=[
        [
            0,
        ],
    ],
    inference_channel=[
        0,
    ],
    max_kpt_num=100)

# model settings
model = dict(
    type='PoseAnythingModel',
    pretrained='pretrained/swinv2_small_1k_500k.pth',
    encoder_config=dict(
        type='SwinTransformerV2',
        embed_dim=96,
        depths=[2, 2, 18, 2],
        num_heads=[3, 6, 12, 24],
        window_size=16,
        drop_path_rate=0.3,
        img_size=256,
        upsample="bilinear"
    ),
    keypoint_head=dict(
        type='PoseHead',
        in_channels=768,
        transformer=dict(
            type='EncoderDecoder',
            d_model=256,
            nhead=8,
            num_encoder_layers=3,
            num_decoder_layers=3,
            graph_decoder='pre',
            dim_feedforward=768,
            dropout=0.1,
            similarity_proj_dim=256,
            dynamic_proj_dim=128,
            activation="relu",
            normalize_before=False,
            return_intermediate_dec=True),
        share_kpt_branch=False,
        num_decoder_layer=3,
        with_heatmap_loss=True,
        
        heatmap_loss_weight=2.0,
        support_order_dropout=-1,
        positional_encoding=dict(
            type='SinePositionalEncoding', num_feats=128, normalize=True)),
    # training and testing settings
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=False,
        post_process='default',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[256, 256],
    heatmap_size=[64, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'])

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=15,
        scale_factor=0.15),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs', 'category_id', 'skeleton',
        ]),
]

valid_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffineFewShot'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTargetFewShot', sigma=1),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs', 'category_id',
            'skeleton',
        ]),
]

test_pipeline = valid_pipeline

data_root = 'data/mp100'
data = dict(
    samples_per_gpu=8,
    workers_per_gpu=8,
    train=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split3_train.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=5,
        pipeline=train_pipeline),
    val=dict(
        type='TransformerPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split3_val.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=5,
        num_queries=15,
        num_episodes=100,
        pipeline=valid_pipeline),
    test=dict(
        type='TestPoseDataset',
        ann_file=f'{data_root}/annotations/mp100_split3_test.json',
        img_prefix=f'{data_root}/images/',
        # img_prefix=f'{data_root}',
        data_cfg=data_cfg,
        valid_class_ids=None,
        max_kpt_num=channel_cfg['max_kpt_num'],
        num_shots=5,
        num_queries=15,
        num_episodes=200,
        pck_threshold_list=[0.05, 0.10, 0.15, 0.2, 0.25],
        pipeline=test_pipeline),
)
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    type='PoseLocalVisualizer', vis_backends=vis_backends, name='visualizer')

shuffle_cfg = dict(interval=1)


================================================
FILE: configs/5shot-swin/graph_split4_config.py
================================================
log_level = 'INFO'
load_from = None
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=20)
evaluation = dict(
    interval=25,
    metric=['PCK', 'NME', 'AUC', 'EPE'],
    key_indicator='PCK',
    gpu_collect=True,
    res_folder='')
optimizer = dict(
    type='Adam',
    lr=1e-5,
)

optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=0.001,
    step=[160, 180])
total_epochs = 200
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

channe
Download .txt
gitextract_qsqas83s/

├── .gitignore
├── LICENSE
├── README.md
├── app.py
├── configs/
│   ├── 1shot-swin/
│   │   ├── base_split1_config.py
│   │   ├── base_split2_config.py
│   │   ├── base_split3_config.py
│   │   ├── base_split4_config.py
│   │   ├── base_split5_config.py
│   │   ├── graph_split1_config.py
│   │   ├── graph_split2_config.py
│   │   ├── graph_split3_config.py
│   │   ├── graph_split4_config.py
│   │   └── graph_split5_config.py
│   ├── 1shots/
│   │   ├── base_split1_config.py
│   │   ├── base_split2_config.py
│   │   ├── base_split3_config.py
│   │   ├── base_split4_config.py
│   │   ├── base_split5_config.py
│   │   ├── graph_split1_config.py
│   │   ├── graph_split2_config.py
│   │   ├── graph_split3_config.py
│   │   ├── graph_split4_config.py
│   │   └── graph_split5_config.py
│   ├── 5shot-swin/
│   │   ├── base_split1_config.py
│   │   ├── base_split2_config.py
│   │   ├── base_split3_config.py
│   │   ├── base_split4_config.py
│   │   ├── base_split5_config.py
│   │   ├── graph_split1_config.py
│   │   ├── graph_split2_config.py
│   │   ├── graph_split3_config.py
│   │   ├── graph_split4_config.py
│   │   └── graph_split5_config.py
│   ├── 5shots/
│   │   ├── base_split1_config.py
│   │   ├── base_split2_config.py
│   │   ├── base_split3_config.py
│   │   ├── base_split4_config.py
│   │   ├── base_split5_config.py
│   │   ├── graph_split1_config.py
│   │   ├── graph_split2_config.py
│   │   ├── graph_split3_config.py
│   │   ├── graph_split4_config.py
│   │   └── graph_split5_config.py
│   └── demo_b.py
├── demo.py
├── docker/
│   └── Dockerfile
├── models/
│   ├── VERSION
│   ├── __init__.py
│   ├── apis/
│   │   ├── __init__.py
│   │   └── train.py
│   ├── core/
│   │   ├── __init__.py
│   │   └── custom_hooks/
│   │       └── shuffle_hooks.py
│   ├── datasets/
│   │   ├── __init__.py
│   │   ├── builder.py
│   │   ├── datasets/
│   │   │   ├── __init__.py
│   │   │   └── mp100/
│   │   │       ├── __init__.py
│   │   │       ├── fewshot_base_dataset.py
│   │   │       ├── fewshot_dataset.py
│   │   │       ├── test_base_dataset.py
│   │   │       ├── test_dataset.py
│   │   │       ├── transformer_base_dataset.py
│   │   │       └── transformer_dataset.py
│   │   └── pipelines/
│   │       ├── __init__.py
│   │       ├── post_transforms.py
│   │       └── top_down_transform.py
│   ├── models/
│   │   ├── __init__.py
│   │   ├── backbones/
│   │   │   ├── __init__.py
│   │   │   ├── simmim.py
│   │   │   ├── swin_mlp.py
│   │   │   ├── swin_transformer.py
│   │   │   ├── swin_transformer_moe.py
│   │   │   ├── swin_transformer_v2.py
│   │   │   └── swin_utils.py
│   │   ├── detectors/
│   │   │   ├── __init__.py
│   │   │   └── pam.py
│   │   ├── keypoint_heads/
│   │   │   ├── __init__.py
│   │   │   └── head.py
│   │   └── utils/
│   │       ├── __init__.py
│   │       ├── builder.py
│   │       ├── encoder_decoder.py
│   │       ├── positional_encoding.py
│   │       └── transformer.py
│   └── version.py
├── requirements.txt
├── setup.cfg
├── setup.py
├── test.py
├── tools/
│   ├── dist_test.sh
│   ├── dist_train.sh
│   ├── fix_carfuxion.py
│   ├── slurm_test.sh
│   ├── slurm_train.sh
│   └── visualization.py
└── train.py
Download .txt
SYMBOL INDEX (362 symbols across 30 files)

FILE: app.py
  function plot_results (line 32) | def plot_results(support_img, query_img, support_kp, support_w, query_kp,
  function process (line 76) | def process(query_img, state,
  function update_examples (line 174) | def update_examples(support_img, posed_support, query_img, state, r=0.01...
  function get_select_coords (line 201) | def get_select_coords(kp_support,
  function get_limbs (line 232) | def get_limbs(kp_support,
  function set_query (line 279) | def set_query(support_img, state, example=False):

FILE: demo.py
  class Resize_Pad (line 26) | class Resize_Pad:
    method __init__ (line 27) | def __init__(self, w=256, h=256):
    method __call__ (line 31) | def __call__(self, image):
  function transform_keypoints_to_pad_and_resize (line 51) | def transform_keypoints_to_pad_and_resize(keypoints, image_size):
  function parse_args (line 70) | def parse_args():
  function merge_configs (line 95) | def merge_configs(cfg1, cfg2):
  function main (line 106) | def main():

FILE: models/apis/train.py
  function train_model (line 13) | def train_model(model,

FILE: models/core/custom_hooks/shuffle_hooks.py
  class ShufflePairedSamplesHook (line 6) | class ShufflePairedSamplesHook(Hook):
    method __init__ (line 11) | def __init__(self,
    method after_train_epoch (line 22) | def after_train_epoch(self, runner):

FILE: models/datasets/builder.py
  function _concat_cfg (line 7) | def _concat_cfg(cfg):
  function _check_vaild (line 26) | def _check_vaild(cfg):
  function build_dataset (line 34) | def build_dataset(cfg, default_args=None):

FILE: models/datasets/datasets/mp100/fewshot_base_dataset.py
  class FewShotBaseDataset (line 14) | class FewShotBaseDataset(Dataset, metaclass=ABCMeta):
    method __init__ (line 16) | def __init__(self,
    method _get_db (line 48) | def _get_db(self):
    method _select_kpt (line 53) | def _select_kpt(self, obj, kpt_id):
    method evaluate (line 58) | def evaluate(self, cfg, preds, output_dir, *args, **kwargs):
    method _write_keypoint_results (line 63) | def _write_keypoint_results(keypoints, res_file):
    method _report_metric (line 69) | def _report_metric(self,
    method _merge_obj (line 130) | def _merge_obj(self, Xs_list, Xq, idx):
    method __len__ (line 159) | def __len__(self):
    method __getitem__ (line 163) | def __getitem__(self, idx):
    method _sort_and_unique_bboxes (line 216) | def _sort_and_unique_bboxes(self, kpts, key='bbox_id'):

FILE: models/datasets/datasets/mp100/fewshot_dataset.py
  class FewShotKeypointDataset (line 13) | class FewShotKeypointDataset(FewShotBaseDataset):
    method __init__ (line 15) | def __init__(self,
    method random_paired_samples (line 70) | def random_paired_samples(self):
    method make_paired_samples (line 85) | def make_paired_samples(self):
    method _select_kpt (line 100) | def _select_kpt(self, obj, kpt_id):
    method _get_mapping_id_name (line 108) | def _get_mapping_id_name(imgs):
    method _get_db (line 128) | def _get_db(self):
    method _load_coco_keypoint_annotation_kernel (line 141) | def _load_coco_keypoint_annotation_kernel(self, img_id):
    method _xywh2cs (line 219) | def _xywh2cs(self, x, y, w, h):
    method evaluate (line 249) | def evaluate(self, outputs, res_folder, metric='PCK', **kwargs):

FILE: models/datasets/datasets/mp100/test_base_dataset.py
  class TestBaseDataset (line 15) | class TestBaseDataset(Dataset, metaclass=ABCMeta):
    method __init__ (line 17) | def __init__(self,
    method _get_db (line 51) | def _get_db(self):
    method _select_kpt (line 56) | def _select_kpt(self, obj, kpt_id):
    method evaluate (line 61) | def evaluate(self, cfg, preds, output_dir, *args, **kwargs):
    method _write_keypoint_results (line 66) | def _write_keypoint_results(keypoints, res_file):
    method _report_metric (line 72) | def _report_metric(self,
    method _merge_obj (line 161) | def _merge_obj(self, Xs_list, Xq, idx):
    method __len__ (line 190) | def __len__(self):
    method __getitem__ (line 194) | def __getitem__(self, idx):
    method _sort_and_unique_bboxes (line 222) | def _sort_and_unique_bboxes(self, kpts, key='bbox_id'):

FILE: models/datasets/datasets/mp100/test_dataset.py
  class TestPoseDataset (line 13) | class TestPoseDataset(TestBaseDataset):
    method __init__ (line 15) | def __init__(self,
    method random_paired_samples (line 73) | def random_paired_samples(self):
    method make_paired_samples (line 88) | def make_paired_samples(self):
    method _select_kpt (line 103) | def _select_kpt(self, obj, kpt_id):
    method _get_mapping_id_name (line 111) | def _get_mapping_id_name(imgs):
    method _get_db (line 131) | def _get_db(self):
    method _load_coco_keypoint_annotation_kernel (line 144) | def _load_coco_keypoint_annotation_kernel(self, img_id):
    method _xywh2cs (line 226) | def _xywh2cs(self, x, y, w, h):
    method evaluate (line 256) | def evaluate(self, outputs, res_folder, metric='PCK', **kwargs):

FILE: models/datasets/datasets/mp100/transformer_base_dataset.py
  class TransformerBaseDataset (line 14) | class TransformerBaseDataset(Dataset, metaclass=ABCMeta):
    method __init__ (line 16) | def __init__(self,
    method _get_db (line 48) | def _get_db(self):
    method _select_kpt (line 53) | def _select_kpt(self, obj, kpt_id):
    method evaluate (line 58) | def evaluate(self, cfg, preds, output_dir, *args, **kwargs):
    method _write_keypoint_results (line 63) | def _write_keypoint_results(keypoints, res_file):
    method _report_metric (line 69) | def _report_metric(self,
    method _merge_obj (line 130) | def _merge_obj(self, Xs_list, Xq, idx):
    method __len__ (line 159) | def __len__(self):
    method __getitem__ (line 163) | def __getitem__(self, idx):
    method _sort_and_unique_bboxes (line 190) | def _sort_and_unique_bboxes(self, kpts, key='bbox_id'):

FILE: models/datasets/datasets/mp100/transformer_dataset.py
  class TransformerPoseDataset (line 13) | class TransformerPoseDataset(TransformerBaseDataset):
    method __init__ (line 15) | def __init__(self,
    method random_paired_samples (line 72) | def random_paired_samples(self):
    method make_paired_samples (line 87) | def make_paired_samples(self):
    method _select_kpt (line 102) | def _select_kpt(self, obj, kpt_id):
    method _get_mapping_id_name (line 110) | def _get_mapping_id_name(imgs):
    method _get_db (line 130) | def _get_db(self):
    method _load_coco_keypoint_annotation_kernel (line 144) | def _load_coco_keypoint_annotation_kernel(self, img_id):
    method _xywh2cs (line 226) | def _xywh2cs(self, x, y, w, h):
    method evaluate (line 256) | def evaluate(self, outputs, res_folder, metric='PCK', **kwargs):

FILE: models/datasets/pipelines/post_transforms.py
  function get_affine_transform (line 10) | def get_affine_transform(center,
  function affine_transform (line 67) | def affine_transform(pt, trans_mat):
  function _get_3rd_point (line 83) | def _get_3rd_point(a, b):
  function rotate_point (line 105) | def rotate_point(pt, angle_rad):

FILE: models/datasets/pipelines/top_down_transform.py
  class TopDownAffineFewShot (line 12) | class TopDownAffineFewShot:
    method __init__ (line 25) | def __init__(self, use_udp=False):
    method __call__ (line 28) | def __call__(self, results):
  class TopDownGenerateTargetFewShot (line 65) | class TopDownGenerateTargetFewShot:
    method __init__ (line 93) | def __init__(self,
    method _msra_generate_target (line 107) | def _msra_generate_target(self, cfg, joints_3d, joints_3d_visible, sig...
    method _udp_generate_target (line 195) | def _udp_generate_target(self, cfg, joints_3d, joints_3d_visible, factor,
    method __call__ (line 316) | def __call__(self, results):

FILE: models/models/backbones/simmim.py
  function norm_targets (line 17) | def norm_targets(targets, patch_size):
  class SwinTransformerForSimMIM (line 40) | class SwinTransformerForSimMIM(SwinTransformer):
    method __init__ (line 41) | def __init__(self, **kwargs):
    method forward (line 49) | def forward(self, x, mask):
    method no_weight_decay (line 74) | def no_weight_decay(self):
  class SwinTransformerV2ForSimMIM (line 78) | class SwinTransformerV2ForSimMIM(SwinTransformerV2):
    method __init__ (line 79) | def __init__(self, **kwargs):
    method forward (line 87) | def forward(self, x, mask):
    method no_weight_decay (line 112) | def no_weight_decay(self):
  class SimMIM (line 116) | class SimMIM(nn.Module):
    method __init__ (line 117) | def __init__(self, config, encoder, encoder_stride, in_chans, patch_si...
    method forward (line 133) | def forward(self, x, mask):
    method no_weight_decay (line 149) | def no_weight_decay(self):
    method no_weight_decay_keywords (line 155) | def no_weight_decay_keywords(self):
  function build_simmim (line 161) | def build_simmim(config):

FILE: models/models/backbones/swin_mlp.py
  class Mlp (line 15) | class Mlp(nn.Module):
    method __init__ (line 16) | def __init__(self, in_features, hidden_features=None, out_features=Non...
    method forward (line 25) | def forward(self, x):
  function window_partition (line 34) | def window_partition(x, window_size):
  function window_reverse (line 49) | def window_reverse(windows, window_size, H, W):
  class SwinMLPBlock (line 66) | class SwinMLPBlock(nn.Module):
    method __init__ (line 82) | def __init__(self, dim, input_resolution, num_heads, window_size=7, sh...
    method forward (line 113) | def forward(self, x):
    method extra_repr (line 162) | def extra_repr(self) -> str:
    method flops (line 166) | def flops(self):
  class PatchMerging (line 185) | class PatchMerging(nn.Module):
    method __init__ (line 194) | def __init__(self, input_resolution, dim, norm_layer=nn.LayerNorm):
    method forward (line 201) | def forward(self, x):
    method extra_repr (line 224) | def extra_repr(self) -> str:
    method flops (line 227) | def flops(self):
  class BasicLayer (line 234) | class BasicLayer(nn.Module):
    method __init__ (line 251) | def __init__(self, dim, input_resolution, depth, num_heads, window_size,
    method forward (line 278) | def forward(self, x):
    method extra_repr (line 288) | def extra_repr(self) -> str:
    method flops (line 291) | def flops(self):
  class PatchEmbed (line 300) | class PatchEmbed(nn.Module):
    method __init__ (line 311) | def __init__(self, img_size=224, patch_size=4, in_chans=3, embed_dim=9...
    method forward (line 330) | def forward(self, x):
    method flops (line 340) | def flops(self):
  class SwinMLP (line 348) | class SwinMLP(nn.Module):
    method __init__ (line 369) | def __init__(self, img_size=224, patch_size=4, in_chans=3, num_classes...
    method _init_weights (line 425) | def _init_weights(self, m):
    method no_weight_decay (line 435) | def no_weight_decay(self):
    method no_weight_decay_keywords (line 439) | def no_weight_decay_keywords(self):
    method forward_features (line 442) | def forward_features(self, x):
    method forward (line 456) | def forward(self, x):
    method flops (line 461) | def flops(self):

FILE: models/models/backbones/swin_transformer.py
  class Mlp (line 26) | class Mlp(nn.Module):
    method __init__ (line 27) | def __init__(self, in_features, hidden_features=None, out_features=Non...
    method forward (line 36) | def forward(self, x):
  function window_partition (line 45) | def window_partition(x, window_size):
  function window_reverse (line 60) | def window_reverse(windows, window_size, H, W):
  class WindowAttention (line 77) | class WindowAttention(nn.Module):
    method __init__ (line 91) | def __init__(self, dim, window_size, num_heads, qkv_bias=True, qk_scal...
    method forward (line 125) | def forward(self, x, mask=None):
    method extra_repr (line 158) | def extra_repr(self) -> str:
    method flops (line 161) | def flops(self, N):
  class SwinTransformerBlock (line 175) | class SwinTransformerBlock(nn.Module):
    method __init__ (line 195) | def __init__(self, dim, input_resolution, num_heads, window_size=7, sh...
    method forward (line 248) | def forward(self, x):
    method extra_repr (line 296) | def extra_repr(self) -> str:
    method flops (line 300) | def flops(self):
  class PatchMerging (line 315) | class PatchMerging(nn.Module):
    method __init__ (line 324) | def __init__(self, input_resolution, dim, norm_layer=nn.LayerNorm):
    method forward (line 331) | def forward(self, x):
    method extra_repr (line 354) | def extra_repr(self) -> str:
    method flops (line 357) | def flops(self):
  class BasicLayer (line 364) | class BasicLayer(nn.Module):
    method __init__ (line 385) | def __init__(self, dim, input_resolution, depth, num_heads, window_size,
    method forward (line 415) | def forward(self, x):
    method extra_repr (line 425) | def extra_repr(self) -> str:
    method flops (line 428) | def flops(self):
  class PatchEmbed (line 437) | class PatchEmbed(nn.Module):
    method __init__ (line 448) | def __init__(self, img_size=224, patch_size=4, in_chans=3, embed_dim=9...
    method forward (line 467) | def forward(self, x):
    method flops (line 477) | def flops(self):
  class SwinTransformer (line 485) | class SwinTransformer(nn.Module):
    method __init__ (line 512) | def __init__(self, img_size=224, patch_size=4, in_chans=3, num_classes...
    method _init_weights (line 571) | def _init_weights(self, m):
    method no_weight_decay (line 581) | def no_weight_decay(self):
    method no_weight_decay_keywords (line 585) | def no_weight_decay_keywords(self):
    method forward_features (line 588) | def forward_features(self, x):
    method forward (line 602) | def forward(self, x):
    method flops (line 607) | def flops(self):

FILE: models/models/backbones/swin_transformer_moe.py
  class Mlp (line 23) | class Mlp(nn.Module):
    method __init__ (line 24) | def __init__(self, in_features, hidden_features=None, out_features=Non...
    method forward (line 34) | def forward(self, x):
  class MoEMlp (line 43) | class MoEMlp(nn.Module):
    method __init__ (line 44) | def __init__(self, in_features, hidden_features, num_local_experts, to...
    method forward (line 86) | def forward(self, x):
    method extra_repr (line 90) | def extra_repr(self) -> str:
    method _init_weights (line 96) | def _init_weights(self):
  function window_partition (line 104) | def window_partition(x, window_size):
  function window_reverse (line 119) | def window_reverse(windows, window_size, H, W):
  class WindowAttention (line 136) | class WindowAttention(nn.Module):
    method __init__ (line 151) | def __init__(self, dim, window_size, num_heads, qkv_bias=True, qk_scal...
    method forward (line 205) | def forward(self, x, mask=None):
    method extra_repr (line 239) | def extra_repr(self) -> str:
    method flops (line 243) | def flops(self, N):
  class SwinTransformerBlock (line 257) | class SwinTransformerBlock(nn.Module):
    method __init__ (line 292) | def __init__(self, dim, input_resolution, num_heads, window_size=7, sh...
    method forward (line 369) | def forward(self, x):
    method extra_repr (line 414) | def extra_repr(self) -> str:
    method flops (line 418) | def flops(self):
  class PatchMerging (line 436) | class PatchMerging(nn.Module):
    method __init__ (line 445) | def __init__(self, input_resolution, dim, norm_layer=nn.LayerNorm):
    method forward (line 452) | def forward(self, x):
    method extra_repr (line 475) | def extra_repr(self) -> str:
    method flops (line 478) | def flops(self):
  class BasicLayer (line 485) | class BasicLayer(nn.Module):
    method __init__ (line 521) | def __init__(self, dim, input_resolution, depth, num_heads, window_size,
    method forward (line 569) | def forward(self, x):
    method extra_repr (line 587) | def extra_repr(self) -> str:
    method flops (line 590) | def flops(self):
  class PatchEmbed (line 599) | class PatchEmbed(nn.Module):
    method __init__ (line 610) | def __init__(self, img_size=224, patch_size=4, in_chans=3, embed_dim=9...
    method forward (line 629) | def forward(self, x):
    method flops (line 639) | def flops(self):
  class SwinTransformerMoE (line 647) | class SwinTransformerMoE(nn.Module):
    method __init__ (line 690) | def __init__(self, img_size=224, patch_size=4, in_chans=3, num_classes...
    method _init_weights (line 774) | def _init_weights(self, m):
    method no_weight_decay (line 786) | def no_weight_decay(self):
    method no_weight_decay_keywords (line 790) | def no_weight_decay_keywords(self):
    method forward_features (line 794) | def forward_features(self, x):
    method forward (line 809) | def forward(self, x):
    method add_param_to_skip_allreduce (line 814) | def add_param_to_skip_allreduce(self, param_name):
    method flops (line 817) | def flops(self):

FILE: models/models/backbones/swin_transformer_v2.py
  class Mlp (line 17) | class Mlp(nn.Module):
    method __init__ (line 18) | def __init__(self, in_features, hidden_features=None, out_features=Non...
    method forward (line 27) | def forward(self, x):
  function window_partition (line 36) | def window_partition(x, window_size):
  function window_reverse (line 51) | def window_reverse(windows, window_size, H, W):
  class WindowAttention (line 68) | class WindowAttention(nn.Module):
    method __init__ (line 82) | def __init__(self, dim, window_size, num_heads, qkv_bias=True, attn_dr...
    method forward (line 141) | def forward(self, x, mask=None):
    method extra_repr (line 182) | def extra_repr(self) -> str:
    method flops (line 186) | def flops(self, N):
  class SwinTransformerBlock (line 200) | class SwinTransformerBlock(nn.Module):
    method __init__ (line 219) | def __init__(self, dim, input_resolution, num_heads, window_size=7, sh...
    method forward (line 271) | def forward(self, x):
    method extra_repr (line 309) | def extra_repr(self) -> str:
    method flops (line 313) | def flops(self):
  class PatchMerging (line 328) | class PatchMerging(nn.Module):
    method __init__ (line 337) | def __init__(self, input_resolution, dim, norm_layer=nn.LayerNorm):
    method forward (line 344) | def forward(self, x):
    method extra_repr (line 367) | def extra_repr(self) -> str:
    method flops (line 370) | def flops(self):
  class BasicLayer (line 377) | class BasicLayer(nn.Module):
    method __init__ (line 397) | def __init__(self, dim, input_resolution, depth, num_heads, window_size,
    method forward (line 427) | def forward(self, x):
    method extra_repr (line 437) | def extra_repr(self) -> str:
    method flops (line 440) | def flops(self):
    method _init_respostnorm (line 448) | def _init_respostnorm(self):
  class PatchEmbed (line 456) | class PatchEmbed(nn.Module):
    method __init__ (line 467) | def __init__(self, img_size=224, patch_size=4, in_chans=3, embed_dim=9...
    method forward (line 486) | def forward(self, x):
    method flops (line 496) | def flops(self):
  class SwinTransformerV2 (line 505) | class SwinTransformerV2(nn.Module):
    method __init__ (line 531) | def __init__(self, img_size=224, patch_size=4, in_chans=3, num_classes...
    method _init_weights (line 620) | def _init_weights(self, m):
    method no_weight_decay (line 630) | def no_weight_decay(self):
    method no_weight_decay_keywords (line 634) | def no_weight_decay_keywords(self):
    method forward_features (line 637) | def forward_features(self, x):
    method forward (line 668) | def forward(self, x):
    method flops (line 673) | def flops(self):

FILE: models/models/backbones/swin_utils.py
  function load_pretrained (line 14) | def load_pretrained(config, model, logger):
  function remap_pretrained_keys_swin (line 33) | def remap_pretrained_keys_swin(model, checkpoint_model, logger):

FILE: models/models/detectors/pam.py
  class PoseAnythingModel (line 11) | class PoseAnythingModel(BasePose):
    method __init__ (line 21) | def __init__(self,
    method init_backbone (line 36) | def init_backbone(self, pretrained, encoder_config):
    method with_keypoint (line 61) | def with_keypoint(self):
    method init_weights (line 65) | def init_weights(self, pretrained=None):
    method forward (line 71) | def forward(self,
    method forward_dummy (line 92) | def forward_dummy(self, img_s, target_s, target_weight_s, img_q, targe...
    method forward_train (line 97) | def forward_train(self,
    method forward_test (line 133) | def forward_test(self,
    method predict (line 162) | def predict(self,
    method extract_features (line 183) | def extract_features(self, img_s, img_q):
    method parse_keypoints_from_img_meta (line 203) | def parse_keypoints_from_img_meta(self, img_meta, device, keyword='que...
    method show_result (line 233) | def show_result(self,

FILE: models/models/keypoint_heads/head.py
  function inverse_sigmoid (line 17) | def inverse_sigmoid(x, eps=1e-3):
  class TokenDecodeMLP (line 24) | class TokenDecodeMLP(nn.Module):
    method __init__ (line 29) | def __init__(self,
    method forward (line 46) | def forward(self, x):
  class PoseHead (line 51) | class PoseHead(nn.Module):
    method __init__ (line 57) | def __init__(self,
    method init_weights (line 115) | def init_weights(self):
    method forward (line 132) | def forward(self, x, feature_s, target_s, mask_s, skeleton):
    method get_loss (line 202) | def get_loss(self, output, initial_proposals, similarity_map, target, ...
    method get_max_coords (line 244) | def get_max_coords(self, heatmap, heatmap_size=64):
    method heatmap_loss (line 252) | def heatmap_loss(self, similarity_map, target_heatmap, target_weight,
    method get_accuracy (line 277) | def get_accuracy(self, output, target, target_weight, target_sizes, he...
    method decode (line 305) | def decode(self, img_metas, output, img_size, **kwargs):

FILE: models/models/utils/builder.py
  function build_backbone (line 10) | def build_backbone(cfg, default_args=None):
  function build_transformer (line 15) | def build_transformer(cfg, default_args=None):
  function build_linear_layer (line 23) | def build_linear_layer(cfg, *args, **kwargs):

FILE: models/models/utils/encoder_decoder.py
  function inverse_sigmoid (line 13) | def inverse_sigmoid(x, eps=1e-3):
  class MLP (line 20) | class MLP(nn.Module):
    method __init__ (line 23) | def __init__(self, input_dim, hidden_dim, output_dim, num_layers):
    method forward (line 30) | def forward(self, x):
  class ProposalGenerator (line 36) | class ProposalGenerator(nn.Module):
    method __init__ (line 38) | def __init__(self, hidden_dim, proj_dim, dynamic_proj_dim):
    method forward (line 48) | def forward(self, query_feat, support_feat, spatial_shape):
  class EncoderDecoder (line 114) | class EncoderDecoder(nn.Module):
    method __init__ (line 116) | def __init__(self,
    method init_weights (line 153) | def init_weights(self):
    method forward (line 159) | def forward(self, src, mask, support_embed, pos_embed, support_order_e...
  class GraphTransformerDecoder (line 199) | class GraphTransformerDecoder(nn.Module):
    method __init__ (line 201) | def __init__(self,
    method forward (line 218) | def forward(self,
    method update (line 302) | def update(self, query_coordinates, delta_unsig):
  class GraphTransformerDecoderLayer (line 309) | class GraphTransformerDecoderLayer(nn.Module):
    method __init__ (line 311) | def __init__(self,
    method with_pos_embed (line 351) | def with_pos_embed(self, tensor, pos: Optional[Tensor]):
    method forward (line 354) | def forward(self,
  class TransformerEncoder (line 409) | class TransformerEncoder(nn.Module):
    method __init__ (line 411) | def __init__(self, encoder_layer, num_layers, norm=None):
    method forward (line 417) | def forward(self,
  class TransformerEncoderLayer (line 457) | class TransformerEncoderLayer(nn.Module):
    method __init__ (line 459) | def __init__(self,
    method with_pos_embed (line 481) | def with_pos_embed(self, tensor, pos: Optional[Tensor]):
    method forward (line 484) | def forward(self,
  function adj_from_skeleton (line 507) | def adj_from_skeleton(num_pts, skeleton, mask, device='cuda'):
  class GCNLayer (line 524) | class GCNLayer(nn.Module):
    method __init__ (line 525) | def __init__(self,
    method forward (line 539) | def forward(self, x, adj):
  function _get_clones (line 558) | def _get_clones(module, N):
  function _get_activation_fn (line 562) | def _get_activation_fn(activation):
  function clones (line 573) | def clones(module, N):

FILE: models/models/utils/positional_encoding.py
  class SinePositionalEncoding (line 13) | class SinePositionalEncoding(BaseModule):
    method __init__ (line 38) | def __init__(self,
    method forward (line 58) | def forward(self, mask):
    method forward_coordinates (line 97) | def forward_coordinates(self, coord):
    method __repr__ (line 125) | def __repr__(self):
  class LearnedPositionalEncoding (line 137) | class LearnedPositionalEncoding(BaseModule):
    method __init__ (line 151) | def __init__(self,
    method forward (line 163) | def forward(self, mask):
    method __repr__ (line 187) | def __repr__(self):

FILE: models/models/utils/transformer.py
  class Transformer (line 14) | class Transformer(BaseModule):
    method __init__ (line 32) | def __init__(self, encoder=None, decoder=None, init_cfg=None):
    method init_weights (line 38) | def init_weights(self):
    method forward (line 45) | def forward(self, x, mask, query_embed, pos_embed, mask_query):
  class DetrTransformerDecoderLayer (line 106) | class DetrTransformerDecoderLayer(BaseTransformerLayer):
    method __init__ (line 127) | def __init__(self,
  class DetrTransformerEncoder (line 151) | class DetrTransformerEncoder(TransformerLayerSequence):
    method __init__ (line 158) | def __init__(self, *args, post_norm_cfg=dict(type='LN'), **kwargs):
    method forward (line 169) | def forward(self, *args, **kwargs):
  class DetrTransformerDecoder (line 181) | class DetrTransformerDecoder(TransformerLayerSequence):
    method __init__ (line 189) | def __init__(self,
    method forward (line 203) | def forward(self, query, *args, **kwargs):
  class DynamicConv (line 231) | class DynamicConv(BaseModule):
    method __init__ (line 256) | def __init__(self,
    method forward (line 290) | def forward(self, param_feature, input_feature):

FILE: setup.py
  function readme (line 7) | def readme():
  function get_git_hash (line 16) | def get_git_hash():
  function get_hash (line 42) | def get_hash():
  function write_version_py (line 57) | def write_version_py():
  function get_version (line 76) | def get_version():
  function get_requirements (line 82) | def get_requirements(filename='requirements.txt'):

FILE: test.py
  function parse_args (line 23) | def parse_args():
  function merge_configs (line 67) | def merge_configs(cfg1, cfg2):
  function main (line 78) | def main():

FILE: tools/fix_carfuxion.py
  function search_match (line 9) | def search_match(bbox, num_keypoints, segmentation):

FILE: tools/visualization.py
  function plot_results (line 16) | def plot_results(support_img, query_img, support_kp, support_w, query_kp...
  function str_is_int (line 62) | def str_is_int(s):

FILE: train.py
  function parse_args (line 22) | def parse_args():
  function main (line 80) | def main():
Condensed preview — 95 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (613K chars).
[
  {
    "path": ".gitignore",
    "chars": 150,
    "preview": ".eggs/*\n.vscode/*\nwork_dirs/*\nwork_dir/*\npretrained/*\nckpt/*\nrunai_dataset/*\n*/__pycache__\n*.pyc\ndata/*\ndata\noutput/*\n.i"
  },
  {
    "path": "LICENSE",
    "chars": 11407,
    "preview": "Copyright (c) 2022 SenseTime. All Rights Reserved.\n\n                                 Apache License\n                    "
  },
  {
    "path": "README.md",
    "chars": 16221,
    "preview": ":new: *Please check out [EdgeCape](https://github.com/orhir/EdgeCape), our more recent effort in the same line of work.*"
  },
  {
    "path": "app.py",
    "chars": 17375,
    "preview": "import argparse\nimport random\n\nimport gradio as gr\nimport matplotlib\nimport numpy as np\nimport torch\nfrom PIL import Ima"
  },
  {
    "path": "configs/1shot-swin/base_split1_config.py",
    "chars": 5320,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/1shot-swin/base_split2_config.py",
    "chars": 5320,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/1shot-swin/base_split3_config.py",
    "chars": 5320,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/1shot-swin/base_split4_config.py",
    "chars": 5320,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/1shot-swin/base_split5_config.py",
    "chars": 5320,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/1shot-swin/graph_split1_config.py",
    "chars": 5353,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/1shot-swin/graph_split2_config.py",
    "chars": 5353,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/1shot-swin/graph_split3_config.py",
    "chars": 5353,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/1shot-swin/graph_split4_config.py",
    "chars": 5353,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/1shot-swin/graph_split5_config.py",
    "chars": 5353,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/1shots/base_split1_config.py",
    "chars": 5323,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/1shots/base_split2_config.py",
    "chars": 5323,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/1shots/base_split3_config.py",
    "chars": 5323,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/1shots/base_split4_config.py",
    "chars": 5323,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/1shots/base_split5_config.py",
    "chars": 5323,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/1shots/graph_split1_config.py",
    "chars": 5364,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/1shots/graph_split2_config.py",
    "chars": 5364,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/1shots/graph_split3_config.py",
    "chars": 5364,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/1shots/graph_split4_config.py",
    "chars": 5364,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/1shots/graph_split5_config.py",
    "chars": 5364,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/5shot-swin/base_split1_config.py",
    "chars": 5320,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/5shot-swin/base_split2_config.py",
    "chars": 5320,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/5shot-swin/base_split3_config.py",
    "chars": 5320,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/5shot-swin/base_split4_config.py",
    "chars": 5320,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/5shot-swin/base_split5_config.py",
    "chars": 5320,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/5shot-swin/graph_split1_config.py",
    "chars": 5353,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/5shot-swin/graph_split2_config.py",
    "chars": 5353,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/5shot-swin/graph_split3_config.py",
    "chars": 5353,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/5shot-swin/graph_split4_config.py",
    "chars": 5353,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/5shot-swin/graph_split5_config.py",
    "chars": 5353,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/5shots/base_split1_config.py",
    "chars": 5330,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/5shots/base_split2_config.py",
    "chars": 5330,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/5shots/base_split3_config.py",
    "chars": 5330,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/5shots/base_split4_config.py",
    "chars": 5330,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/5shots/base_split5_config.py",
    "chars": 5330,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/5shots/graph_split1_config.py",
    "chars": 5363,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/5shots/graph_split2_config.py",
    "chars": 5363,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/5shots/graph_split3_config.py",
    "chars": 5363,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/5shots/graph_split4_config.py",
    "chars": 5363,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/5shots/graph_split5_config.py",
    "chars": 5363,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "configs/demo_b.py",
    "chars": 5330,
    "preview": "log_level = 'INFO'\nload_from = None\nresume_from = None\ndist_params = dict(backend='nccl')\nworkflow = [('train', 1)]\nchec"
  },
  {
    "path": "demo.py",
    "chars": 10286,
    "preview": "import argparse\nimport copy\nimport os\nimport pickle\nimport random\nimport cv2\nimport numpy as np\nimport torch\nfrom mmcv i"
  },
  {
    "path": "docker/Dockerfile",
    "chars": 1594,
    "preview": "ARG PYTORCH=\"2.0.1\"\nARG CUDA=\"11.7\"\nARG CUDNN=\"8\"\n\nFROM pytorch/pytorch:${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel\n\nENV "
  },
  {
    "path": "models/VERSION",
    "chars": 6,
    "preview": "0.2.0\n"
  },
  {
    "path": "models/__init__.py",
    "chars": 90,
    "preview": "from .core import *  # noqa\nfrom .datasets import *  # noqa\nfrom .models import *  # noqa\n"
  },
  {
    "path": "models/apis/__init__.py",
    "chars": 64,
    "preview": "from .train import train_model\n\n__all__ = [\n    'train_model'\n]\n"
  },
  {
    "path": "models/apis/train.py",
    "chars": 4854,
    "preview": "import os\n\nimport torch\nfrom models.core.custom_hooks.shuffle_hooks import ShufflePairedSamplesHook\nfrom mmcv.parallel i"
  },
  {
    "path": "models/core/__init__.py",
    "chars": 1,
    "preview": "\n"
  },
  {
    "path": "models/core/custom_hooks/shuffle_hooks.py",
    "chars": 1134,
    "preview": "from mmcv.runner import Hook\nfrom mmpose.utils import get_root_logger\nfrom torch.utils.data import DataLoader\n\n\nclass Sh"
  },
  {
    "path": "models/datasets/__init__.py",
    "chars": 96,
    "preview": "from .builder import *  # noqa\nfrom .datasets import *  # noqa\nfrom .pipelines import *  # noqa\n"
  },
  {
    "path": "models/datasets/builder.py",
    "chars": 1918,
    "preview": "from mmcv.utils import build_from_cfg\nfrom mmpose.datasets.builder import DATASETS\nfrom mmpose.datasets.dataset_wrappers"
  },
  {
    "path": "models/datasets/datasets/__init__.py",
    "chars": 255,
    "preview": "from .mp100 import (FewShotKeypointDataset, FewShotBaseDataset,\n                    TransformerBaseDataset, TransformerP"
  },
  {
    "path": "models/datasets/datasets/mp100/__init__.py",
    "chars": 475,
    "preview": "from .fewshot_base_dataset import FewShotBaseDataset\nfrom .fewshot_dataset import FewShotKeypointDataset\nfrom .test_base"
  },
  {
    "path": "models/datasets/datasets/mp100/fewshot_base_dataset.py",
    "chars": 8012,
    "preview": "import copy\nfrom abc import ABCMeta, abstractmethod\n\nimport json_tricks as json\nimport numpy as np\nfrom mmcv.parallel im"
  },
  {
    "path": "models/datasets/datasets/mp100/fewshot_dataset.py",
    "chars": 10843,
    "preview": "import os\nimport random\nfrom collections import OrderedDict\n\nimport numpy as np\nfrom mmpose.datasets import DATASETS\nfro"
  },
  {
    "path": "models/datasets/datasets/mp100/test_base_dataset.py",
    "chars": 8760,
    "preview": "import copy\nfrom abc import ABCMeta, abstractmethod\n\nimport json_tricks as json\nimport numpy as np\nfrom mmcv.parallel im"
  },
  {
    "path": "models/datasets/datasets/mp100/test_dataset.py",
    "chars": 11186,
    "preview": "import os\nimport random\nfrom collections import OrderedDict\n\nimport numpy as np\nfrom mmpose.datasets import DATASETS\nfro"
  },
  {
    "path": "models/datasets/datasets/mp100/transformer_base_dataset.py",
    "chars": 7045,
    "preview": "import copy\nfrom abc import ABCMeta, abstractmethod\n\nimport json_tricks as json\nimport numpy as np\nfrom mmcv.parallel im"
  },
  {
    "path": "models/datasets/datasets/mp100/transformer_dataset.py",
    "chars": 11220,
    "preview": "import os\nimport random\nfrom collections import OrderedDict\n\nimport numpy as np\nfrom mmpose.datasets import DATASETS\nfro"
  },
  {
    "path": "models/datasets/pipelines/__init__.py",
    "chars": 192,
    "preview": "from .top_down_transform import (TopDownAffineFewShot,\n                                 TopDownGenerateTargetFewShot)\n\n_"
  },
  {
    "path": "models/datasets/pipelines/post_transforms.py",
    "chars": 3665,
    "preview": "# ------------------------------------------------------------------------------\n# Adapted from https://github.com/leoxi"
  },
  {
    "path": "models/datasets/pipelines/top_down_transform.py",
    "chars": 16208,
    "preview": "import cv2\nimport numpy as np\nfrom mmpose.core.post_processing import (get_warp_matrix,\n                                "
  },
  {
    "path": "models/models/__init__.py",
    "chars": 104,
    "preview": "from .backbones import *  # noqa\nfrom .detectors import *  # noqa\nfrom .keypoint_heads import *  # noqa\n"
  },
  {
    "path": "models/models/backbones/__init__.py",
    "chars": 51,
    "preview": "from .swin_transformer_v2 import SwinTransformerV2\n"
  },
  {
    "path": "models/models/backbones/simmim.py",
    "chars": 7187,
    "preview": "# --------------------------------------------------------\n# SimMIM\n# Copyright (c) 2021 Microsoft\n# Licensed under The "
  },
  {
    "path": "models/models/backbones/swin_mlp.py",
    "chars": 18508,
    "preview": "# --------------------------------------------------------\n# Swin Transformer\n# Copyright (c) 2021 Microsoft\n# Licensed "
  },
  {
    "path": "models/models/backbones/swin_transformer.py",
    "chars": 26070,
    "preview": "# --------------------------------------------------------\n# Swin Transformer\n# Copyright (c) 2021 Microsoft\n# Licensed "
  },
  {
    "path": "models/models/backbones/swin_transformer_moe.py",
    "chars": 38203,
    "preview": "# --------------------------------------------------------\n# Swin Transformer MoE\n# Copyright (c) 2022 Microsoft\n# Licen"
  },
  {
    "path": "models/models/backbones/swin_transformer_v2.py",
    "chars": 29665,
    "preview": "# --------------------------------------------------------\n# Swin Transformer V2\n# Copyright (c) 2022 Microsoft\n# Licens"
  },
  {
    "path": "models/models/backbones/swin_utils.py",
    "chars": 4675,
    "preview": "# --------------------------------------------------------\n# SimMIM\n# Copyright (c) 2021 Microsoft\n# Licensed under The "
  },
  {
    "path": "models/models/detectors/__init__.py",
    "chars": 68,
    "preview": "from .pam import PoseAnythingModel\n\n__all__ = ['PoseAnythingModel']\n"
  },
  {
    "path": "models/models/detectors/pam.py",
    "chars": 15874,
    "preview": "import numpy as np\nimport torch\nfrom mmpose.models import builder\nfrom mmpose.models.builder import POSENETS\nfrom mmpose"
  },
  {
    "path": "models/models/keypoint_heads/__init__.py",
    "chars": 51,
    "preview": "from .head import PoseHead\n\n__all__ = ['PoseHead']\n"
  },
  {
    "path": "models/models/keypoint_heads/head.py",
    "chars": 15991,
    "preview": "from copy import deepcopy\n\nimport numpy as np\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom mm"
  },
  {
    "path": "models/models/utils/__init__.py",
    "chars": 676,
    "preview": "# Copyright (c) OpenMMLab. All rights reserved.\nfrom .builder import build_linear_layer, build_transformer, build_backbo"
  },
  {
    "path": "models/models/utils/builder.py",
    "chars": 1697,
    "preview": "# Copyright (c) OpenMMLab. All rights reserved.\nimport torch.nn as nn\nfrom mmcv.utils import Registry, build_from_cfg\n\nT"
  },
  {
    "path": "models/models/utils/encoder_decoder.py",
    "chars": 22940,
    "preview": "import copy\nfrom typing import Optional\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom mmcv.cn"
  },
  {
    "path": "models/models/utils/positional_encoding.py",
    "chars": 7991,
    "preview": "# Copyright (c) OpenMMLab. All rights reserved.\nimport math\n\nimport torch\nimport torch.nn as nn\nfrom mmcv.cnn.bricks.tra"
  },
  {
    "path": "models/models/utils/transformer.py",
    "chars": 14126,
    "preview": "import torch\nimport torch.nn as nn\nfrom models.models.utils.builder import TRANSFORMER\nfrom mmcv.cnn import (build_activ"
  },
  {
    "path": "models/version.py",
    "chars": 137,
    "preview": "# GENERATED VERSION FILE\n# TIME: Tue Dec 19 17:01:21 2023\n__version__ = '0.2.0+f65cb07'\nshort_version = '0.2.0'\nversion_"
  },
  {
    "path": "requirements.txt",
    "chars": 63,
    "preview": "json_tricks\nnumpy\nopencv-python\npillow==6.2.2\nxtcocotools\nscipy"
  },
  {
    "path": "setup.cfg",
    "chars": 475,
    "preview": "[bdist_wheel]\nuniversal=1\n\n[aliases]\ntest=pytest\n\n[tool:pytest]\naddopts=tests/\n\n[yapf]\nbased_on_style = pep8\nblank_line_"
  },
  {
    "path": "setup.py",
    "chars": 3172,
    "preview": "import os\nimport subprocess\nimport time\nfrom setuptools import find_packages, setup\n\n\ndef readme():\n    with open('READM"
  },
  {
    "path": "test.py",
    "chars": 5456,
    "preview": "import argparse\nimport os\nimport os.path as osp\nimport random\nimport uuid\n\nimport mmcv\nimport numpy as np\nimport torch\nf"
  },
  {
    "path": "tools/dist_test.sh",
    "chars": 320,
    "preview": "#!/usr/bin/env bash\n# Copyright (c) OpenMMLab. All rights reserved.\n\nCONFIG=$1\nCHECKPOINT=$2\nGPUS=$3\nPORT=${PORT:-29000}"
  },
  {
    "path": "tools/dist_train.sh",
    "chars": 332,
    "preview": "#!/usr/bin/env bash\n# Copyright (c) OpenMMLab. All rights reserved.\n\nCONFIG=$1\nGPUS=$2\nOUTPUT_DIR=$3\nPORT=${PORT:-29000}"
  },
  {
    "path": "tools/fix_carfuxion.py",
    "chars": 3071,
    "preview": "import json\nimport os\nimport shutil\nimport sys\nimport numpy as np\nfrom xtcocotools.coco import COCO\n\n\ndef search_match(b"
  },
  {
    "path": "tools/slurm_test.sh",
    "chars": 560,
    "preview": "#!/usr/bin/env bash\n\nset -x\n\nPARTITION=$1\nJOB_NAME=$2\nCONFIG=$3\nCHECKPOINT=$4\nGPUS=${GPUS:-8}\nGPUS_PER_NODE=${GPUS_PER_N"
  },
  {
    "path": "tools/slurm_train.sh",
    "chars": 568,
    "preview": "#!/usr/bin/env bash\n\nset -x\n\nPARTITION=$1\nJOB_NAME=$2\nCONFIG=$3\nWORK_DIR=$4\nGPUS=${GPUS:-8}\nGPUS_PER_NODE=${GPUS_PER_NOD"
  },
  {
    "path": "tools/visualization.py",
    "chars": 2673,
    "preview": "import os\nimport random\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport torch\nimport torch.nn.functional as F\n"
  },
  {
    "path": "train.py",
    "chars": 6614,
    "preview": "import argparse\nimport copy\nimport os\nimport os.path as osp\nimport time\n\nimport mmcv\nimport torch\nfrom mmcv import Confi"
  }
]

About this extraction

This page contains the full source code of the orhir/PoseAnything GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 95 files (575.5 KB), approximately 149.6k tokens, and a symbol index with 362 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!