Showing preview only (356K chars total). Download the full file or copy to clipboard to get everything.
Repository: LMMMEng/OverLoCK
Branch: main
Commit: 2c8ab3b29e3a
Files: 64
Total size: 334.9 KB
Directory structure:
gitextract__a0ogxjt/
├── LICENSE.md
├── README.md
├── detection/
│ ├── configs/
│ │ ├── _base_/
│ │ │ ├── datasets/
│ │ │ │ ├── coco_detection.py
│ │ │ │ └── coco_instance.py
│ │ │ ├── default_runtime.py
│ │ │ ├── models/
│ │ │ │ ├── cascade_mask_rcnn_r50_fpn.py
│ │ │ │ ├── cascade_mask_rcnn_r50_fpn_crowdhuman.py
│ │ │ │ ├── cascade_rcnn_r50_fpn.py
│ │ │ │ ├── fast_rcnn_r50_fpn.py
│ │ │ │ ├── faster_rcnn_r50_caffe_c4.py
│ │ │ │ ├── faster_rcnn_r50_caffe_dc5.py
│ │ │ │ ├── faster_rcnn_r50_fpn.py
│ │ │ │ ├── mask_rcnn_convnext_fpn.py
│ │ │ │ ├── mask_rcnn_r50_caffe_c4.py
│ │ │ │ ├── mask_rcnn_r50_fpn.py
│ │ │ │ ├── retinanet_r50_fpn.py
│ │ │ │ ├── rpn_r50_caffe_c4.py
│ │ │ │ ├── rpn_r50_fpn.py
│ │ │ │ └── ssd300.py
│ │ │ └── schedules/
│ │ │ ├── schedule_1x.py
│ │ │ └── schedule_3x.py
│ │ └── maskrcnn_overlock/
│ │ ├── mask_rcnn_overlock_b_in1k_fpn_1x_coco.py
│ │ ├── mask_rcnn_overlock_b_in1k_fpn_3x_coco.py
│ │ ├── mask_rcnn_overlock_s_in1k_fpn_1x_coco.py
│ │ ├── mask_rcnn_overlock_s_in1k_fpn_3x_coco.py
│ │ ├── mask_rcnn_overlock_t_in1k_fpn_1x_coco.py
│ │ └── mask_rcnn_overlock_t_in1k_fpn_3x_coco.py
│ ├── models/
│ │ ├── __init__.py
│ │ └── overlock.py
│ ├── readme.md
│ ├── scripts/
│ │ ├── dist_test.sh
│ │ └── dist_train.sh
│ ├── test.py
│ └── train.py
├── models/
│ ├── __init__.py
│ ├── contmix.py
│ └── overlock.py
├── scripts/
│ ├── train_b_model.sh
│ ├── train_s_model.sh
│ ├── train_t_model.sh
│ └── train_xt_model.sh
├── segmentation/
│ ├── configs/
│ │ ├── _base_/
│ │ │ ├── datasets/
│ │ │ │ └── ade20k.py
│ │ │ ├── default_runtime.py
│ │ │ ├── models/
│ │ │ │ ├── fpn_r50.py
│ │ │ │ ├── upernet_r50.py
│ │ │ │ └── upernet_transnext.py
│ │ │ └── schedules/
│ │ │ ├── schedule_160k.py
│ │ │ ├── schedule_20k.py
│ │ │ ├── schedule_40k.py
│ │ │ └── schedule_80k.py
│ │ └── overlock/
│ │ ├── upernet_overlock_base_ade20k_8xb2.py
│ │ ├── upernet_overlock_small_ade20k_8xb2.py
│ │ └── upernet_overlock_tiny_ade20k_8xb2.py
│ ├── mmseg_custom/
│ │ ├── __init__.py
│ │ └── align_resize.py
│ ├── models/
│ │ ├── __init__.py
│ │ └── overlock.py
│ ├── readme.md
│ ├── scripts/
│ │ ├── dist_test.sh
│ │ └── dist_train.sh
│ ├── test.py
│ └── train.py
├── train.py
└── validate.py
================================================
FILE CONTENTS
================================================
================================================
FILE: LICENSE.md
================================================
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
================================================
FILE: README.md
================================================
# [[CVPR 2025 Oral] OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels](https://arxiv.org/abs/2502.20087)
This is an official PyTorch implementation of "[**OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels**](https://arxiv.org/abs/2502.20087)".
# Introduction
Top-down attention plays a crucial role in the human vision system, wherein the brain initially obtains a rough overview of a scene to discover salient cues (i.e., overview first), followed by a more careful finer-grained examination (i.e., look closely next). However, modern ConvNets remain confined to a pyramid structure that successively downsamples the feature map for receptive field expansion, neglecting this crucial biomimetic principle. We present OverLoCK, the first pure ConvNet backbone architecture that explicitly incorporates a top-down attention mechanism. Unlike pyramid backbone networks, our design features a branched architecture with three synergistic sub-networks: 1) a Base-Net that encodes low/mid-level features; 2) a lightweight Overview-Net that generates dynamic top-down attention through coarse global context modeling (i.e., overview first); and 3) a robust Focus-Net that performs finer-grained perception guided by top-down attention (i.e., look closely next). To fully unleash the power of top-down attention, we further propose a novel context-mixing dynamic convolution (ContMix) that effectively models long-range dependencies while preserving inherent local inductive biases even when the input resolution increases, addressing critical limitations in existing convolutions. Our OverLoCK exhibits a notable performance improvement over existing methods.
<center>
<img src="images/img.jpg" width="70%" height="auto">
</center>
# News
- **Dec. 25, 2025**: **To improve inference speed and reduce memory consumption**, we provide **reparameterized versions of the OverLoCK models with pre-trained weights**. These variants achieve **identical performance to their original counterparts on ImageNet-1K evaluation**. However, if you further fine-tune these reparameterized models, they may yield slightly lower accuracy compared to the original versions. Please choose the model variant during fine-tuning based on memory and accuracy requirements on your side ([More Details](https://github.com/LMMMEng/OverLoCK/blob/81dd7b216e7aa66ff5a95b07021f299dc2d4d14b/models/overlock.py#L941C13-L941C14)).
- **May. 16, 2025**: A **plug-and-play implementation of the [ContMix Block](models/contmix.py)** is now available.
# Image Classification
## 1. Requirements
We highly suggest using our provided dependencies to ensure reproducibility:
```
# Environments:
cuda==12.1
python==3.10
# Dependencies:
pip install torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu121
pip install natten==0.17.1+torch230cu121 -f https://shi-labs.com/natten/wheels/
pip install timm==0.6.12
pip install mmengine==0.2.0
```
>💡 To accelerate training and inference, we utilize the efficient large-kernel convolution proposed in [RepLKNet](https://github.com/DingXiaoH/RepLKNet-pytorch#use-our-efficient-large-kernel-convolution-with-pytorch). Please follow this [**guideline**](https://github.com/VITA-Group/SLaK#installation) to install the ``depthwise_conv2d_implicit_gemm`` function.
>
>💡 If you encounter network issues during the installation of ``natten``, please download this [**package**](https://github.com/LMMMEng/OverLoCK/releases/download/v1/natten-0.17.1+torch230cu121-cp310-cp310-linux_x86_64.whl) and install it locally.
## 2. Data Preparation
Prepare [ImageNet](https://image-net.org/) with the following folder structure, you can extract ImageNet by this [script](https://gist.github.com/BIGBALLON/8a71d225eff18d88e469e6ea9b39cef4).
```
│imagenet/
├──train/
│ ├── n01440764
│ │ ├── n01440764_10026.JPEG
│ │ ├── n01440764_10027.JPEG
│ │ ├── ......
│ ├── ......
├──val/
│ ├── n01440764
│ │ ├── ILSVRC2012_val_00000293.JPEG
│ │ ├── ILSVRC2012_val_00002138.JPEG
│ │ ├── ......
│ ├── ......
```
## 3. Main Results on ImageNet-1K with Pretrained Models
| Models | Input Size | FLOPs (G) | Params (M) | Top-1 (%) | Download |
|:-----------:|:----------:|:---------:|:----------:|:----------:|:----------:|
| OverLoCK-XT | 224x224 | 2.6 | 16 | 82.7 | [model](https://github.com/LMMMEng/OverLoCK/releases/download/v1/overlock_xt_in1k_224.pth) |
| OverLoCK-T | 224x224 | 5.5 | 33 | 84.2 | [model](https://github.com/LMMMEng/OverLoCK/releases/download/v1/overlock_t_in1k_224.pth) |
| OverLoCK-S | 224x224 | 9.7 | 56 | 84.8 | [model](https://github.com/LMMMEng/OverLoCK/releases/download/v1/overlock_s_in1k_224.pth) |
| OverLoCK-B | 224x224 | 16.7 | 95 | 85.1 | [model](https://github.com/LMMMEng/OverLoCK/releases/download/v1/overlock_b_in1k_224.pth) |
## 4. Train
To train ```OverLoCK``` models on ImageNet-1K with 8 gpus (single node), run:
```
bash scripts/train_xt_model.sh # train OverLoCK-XT
bash scripts/train_t_model.sh # train OverLoCK-T
bash scripts/train_s_model.sh # train OverLoCK-S
bash scripts/train_b_model.sh # train OverLoCK-B
```
> 💡If you encounter NaN loss, please delete ``--native-amp`` to disable AMP training and resume the checkpoint before the NaN loss occurred.
>
> 💡If your **GPU memory** is insufficient during training, you can enable gradient checkpointing by adding the following arguments: ``--grad-checkpoint --ckpt-stg 4 0 0 0``. If you're still experiencing memory issues, you can increase these values, but be aware that this may slow down training speed.
## 5. Validation
To evaluate ```OverLoCK``` on ImageNet-1K, run:
```
MODEL=overlock_xt # overlock_{xt, t, s, b}
python3 validate.py \
/path/to/imagenet \
--model $MODEL -b 128 \
--pretrained # or --checkpoint /path/to/checkpoint
```
>💡 To accelerate inference speed, OverLoCK utilizes [**Structural Re-parameterization**](https://github.com/AILab-CVC/UniRepLKNet/tree/main). Please refer to [**here**](https://github.com/LMMMEng/OverLoCK/blob/540bf6ed9cca99eab78fc8ab935b71f2a4aa2a2c/models/overlock.py#L945) for a simple usage instruction.
# Citation
If you find this project useful for your research, please consider citing:
```
@inproceedings{lou2025overlock,
title={OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels},
author={Lou, Meng and Yu, Yizhou},
booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
pages={128--138},
year={2025}
}
```
# Dense Predictions
[Object Detection](detection)
[Semantic Segmentation](segmentation)
# Acknowledgment
Our implementation is mainly based on the following codebases. We gratefully thank the authors for their wonderful works.
> [timm](https://github.com/rwightman/pytorch-image-models), [natten](https://github.com/SHI-Labs/NATTEN), [unireplknet](https://github.com/AILab-CVC/UniRepLKNet), [mmcv](https://github.com/open-mmlab/mmcv), [mmdet](https://github.com/open-mmlab/mmdetection), [mmseg](https://github.com/open-mmlab/mmsegmentation)
# Contact
If you have any questions, please feel free to [create issues❓](https://github.com/LMMMEng/OverLoCK/issues) or [contact me 📧](lmzmm.0921@gmail.com).
================================================
FILE: detection/configs/_base_/datasets/coco_detection.py
================================================
# dataset settings
dataset_type = 'CocoDataset'
data_root = '/grp01/cs_yzyu/dataset/coco2017/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img']),
])
]
data = dict(
samples_per_gpu=2,
workers_per_gpu=4,
train=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_train2017.json',
img_prefix=data_root + 'train2017/',
pipeline=train_pipeline),
val=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_val2017.json',
img_prefix=data_root + 'val2017/',
pipeline=test_pipeline),
test=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_val2017.json',
img_prefix=data_root + 'val2017/',
pipeline=test_pipeline))
evaluation = dict(interval=1, metric='bbox', classwise=True)
================================================
FILE: detection/configs/_base_/datasets/coco_instance.py
================================================
# dataset settings
dataset_type = 'CocoDataset'
data_root = '/grp01/cs_yzyu/dataset/coco2017/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img']),
])
]
data = dict(
samples_per_gpu=2,
workers_per_gpu=4,
train=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_train2017.json',
img_prefix=data_root + 'train2017/',
pipeline=train_pipeline),
val=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_val2017.json',
img_prefix=data_root + 'val2017/',
pipeline=test_pipeline),
test=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_val2017.json',
img_prefix=data_root + 'val2017/',
pipeline=test_pipeline))
evaluation = dict(metric=['bbox', 'segm'], classwise=True)
================================================
FILE: detection/configs/_base_/default_runtime.py
================================================
checkpoint_config = dict(interval=1)
# yapf:disable
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook')
])
# yapf:enable
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
================================================
FILE: detection/configs/_base_/models/cascade_mask_rcnn_r50_fpn.py
================================================
# model settings
model = dict(
type='CascadeRCNN',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch',
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5),
rpn_head=dict(
type='RPNHead',
in_channels=256,
feat_channels=256,
anchor_generator=dict(
type='AnchorGenerator',
scales=[8],
ratios=[0.5, 1.0, 2.0],
strides=[4, 8, 16, 32, 64]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[.0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0]),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)),
roi_head=dict(
type='CascadeRoIHead',
num_stages=3,
stage_loss_weights=[1, 0.5, 0.25],
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
bbox_head=[
dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=80,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=True,
loss_cls=dict(
type='CrossEntropyLoss',
use_sigmoid=False,
loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
loss_weight=1.0)),
dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=80,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.05, 0.05, 0.1, 0.1]),
reg_class_agnostic=True,
loss_cls=dict(
type='CrossEntropyLoss',
use_sigmoid=False,
loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
loss_weight=1.0)),
dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=80,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.033, 0.033, 0.067, 0.067]),
reg_class_agnostic=True,
loss_cls=dict(
type='CrossEntropyLoss',
use_sigmoid=False,
loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))
],
mask_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
mask_head=dict(
type='FCNMaskHead',
num_convs=4,
in_channels=256,
conv_out_channels=256,
num_classes=80,
loss_mask=dict(
type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
# model training and testing settings
train_cfg=dict(
rpn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
match_low_quality=True,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False),
allowed_border=0,
pos_weight=-1,
debug=False),
rpn_proposal=dict(
nms_pre=2000,
max_per_img=2000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=[
dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
match_low_quality=False,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
mask_size=28,
pos_weight=-1,
debug=False),
dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.6,
neg_iou_thr=0.6,
min_pos_iou=0.6,
match_low_quality=False,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
mask_size=28,
pos_weight=-1,
debug=False),
dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.7,
min_pos_iou=0.7,
match_low_quality=False,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
mask_size=28,
pos_weight=-1,
debug=False)
]),
test_cfg=dict(
rpn=dict(
nms_pre=1000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
score_thr=0.05,
nms=dict(type='nms', iou_threshold=0.5),
max_per_img=100,
mask_thr_binary=0.5)))
================================================
FILE: detection/configs/_base_/models/cascade_mask_rcnn_r50_fpn_crowdhuman.py
================================================
# model settings
model = dict(
type='CascadeRCNN',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch',
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5),
rpn_head=dict(
type='RPNHead',
in_channels=256,
feat_channels=256,
anchor_generator=dict(
type='AnchorGenerator',
scales=[8],
ratios=[0.5, 1.0, 2.0],
strides=[4, 8, 16, 32, 64]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[.0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0]),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)),
roi_head=dict(
type='CascadeRoIHead',
num_stages=3,
stage_loss_weights=[1, 0.5, 0.25],
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
bbox_head=[
dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=80,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=True,
loss_cls=dict(
type='CrossEntropyLoss',
use_sigmoid=False,
loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
loss_weight=1.0)),
dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=80,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.05, 0.05, 0.1, 0.1]),
reg_class_agnostic=True,
loss_cls=dict(
type='CrossEntropyLoss',
use_sigmoid=False,
loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
loss_weight=1.0)),
dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=80,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.033, 0.033, 0.067, 0.067]),
reg_class_agnostic=True,
loss_cls=dict(
type='CrossEntropyLoss',
use_sigmoid=False,
loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))
],),
# model training and testing settings
train_cfg=dict(
rpn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
match_low_quality=True,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False),
allowed_border=0,
pos_weight=-1,
debug=False),
rpn_proposal=dict(
nms_pre=2000,
max_per_img=2000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=[
dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
match_low_quality=False,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
mask_size=28,
pos_weight=-1,
debug=False),
dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.6,
neg_iou_thr=0.6,
min_pos_iou=0.6,
match_low_quality=False,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
mask_size=28,
pos_weight=-1,
debug=False),
dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.7,
min_pos_iou=0.7,
match_low_quality=False,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
mask_size=28,
pos_weight=-1,
debug=False)
]),
test_cfg=dict(
rpn=dict(
nms_pre=1000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
score_thr=0.05,
nms=dict(type='nms', iou_threshold=0.5),
max_per_img=100,
mask_thr_binary=0.5)))
================================================
FILE: detection/configs/_base_/models/cascade_rcnn_r50_fpn.py
================================================
# model settings
model = dict(
type='CascadeRCNN',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch',
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5),
rpn_head=dict(
type='RPNHead',
in_channels=256,
feat_channels=256,
anchor_generator=dict(
type='AnchorGenerator',
scales=[8],
ratios=[0.5, 1.0, 2.0],
strides=[4, 8, 16, 32, 64]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[.0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0]),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)),
roi_head=dict(
type='CascadeRoIHead',
num_stages=3,
stage_loss_weights=[1, 0.5, 0.25],
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
bbox_head=[
dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=80,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=True,
loss_cls=dict(
type='CrossEntropyLoss',
use_sigmoid=False,
loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
loss_weight=1.0)),
dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=80,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.05, 0.05, 0.1, 0.1]),
reg_class_agnostic=True,
loss_cls=dict(
type='CrossEntropyLoss',
use_sigmoid=False,
loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
loss_weight=1.0)),
dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=80,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.033, 0.033, 0.067, 0.067]),
reg_class_agnostic=True,
loss_cls=dict(
type='CrossEntropyLoss',
use_sigmoid=False,
loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))
]),
# model training and testing settings
train_cfg=dict(
rpn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
match_low_quality=True,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False),
allowed_border=0,
pos_weight=-1,
debug=False),
rpn_proposal=dict(
nms_pre=2000,
max_per_img=2000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=[
dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
match_low_quality=False,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False),
dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.6,
neg_iou_thr=0.6,
min_pos_iou=0.6,
match_low_quality=False,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False),
dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.7,
min_pos_iou=0.7,
match_low_quality=False,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False)
]),
test_cfg=dict(
rpn=dict(
nms_pre=1000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
score_thr=0.05,
nms=dict(type='nms', iou_threshold=0.5),
max_per_img=100)))
================================================
FILE: detection/configs/_base_/models/fast_rcnn_r50_fpn.py
================================================
# model settings
model = dict(
type='FastRCNN',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch',
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5),
roi_head=dict(
type='StandardRoIHead',
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
bbox_head=dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=80,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=False,
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0))),
# model training and testing settings
train_cfg=dict(
rcnn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
match_low_quality=False,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False)),
test_cfg=dict(
rcnn=dict(
score_thr=0.05,
nms=dict(type='nms', iou_threshold=0.5),
max_per_img=100)))
================================================
FILE: detection/configs/_base_/models/faster_rcnn_r50_caffe_c4.py
================================================
# model settings
norm_cfg = dict(type='BN', requires_grad=False)
model = dict(
type='FasterRCNN',
backbone=dict(
type='ResNet',
depth=50,
num_stages=3,
strides=(1, 2, 2),
dilations=(1, 1, 1),
out_indices=(2, ),
frozen_stages=1,
norm_cfg=norm_cfg,
norm_eval=True,
style='caffe',
init_cfg=dict(
type='Pretrained',
checkpoint='open-mmlab://detectron2/resnet50_caffe')),
rpn_head=dict(
type='RPNHead',
in_channels=1024,
feat_channels=1024,
anchor_generator=dict(
type='AnchorGenerator',
scales=[2, 4, 8, 16, 32],
ratios=[0.5, 1.0, 2.0],
strides=[16]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[.0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0]),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
roi_head=dict(
type='StandardRoIHead',
shared_head=dict(
type='ResLayer',
depth=50,
stage=3,
stride=2,
dilation=1,
style='caffe',
norm_cfg=norm_cfg,
norm_eval=True),
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
out_channels=1024,
featmap_strides=[16]),
bbox_head=dict(
type='BBoxHead',
with_avg_pool=True,
roi_feat_size=7,
in_channels=2048,
num_classes=80,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=False,
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0))),
# model training and testing settings
train_cfg=dict(
rpn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
match_low_quality=True,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False),
allowed_border=0,
pos_weight=-1,
debug=False),
rpn_proposal=dict(
nms_pre=12000,
max_per_img=2000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
match_low_quality=False,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False)),
test_cfg=dict(
rpn=dict(
nms_pre=6000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
score_thr=0.05,
nms=dict(type='nms', iou_threshold=0.5),
max_per_img=100)))
================================================
FILE: detection/configs/_base_/models/faster_rcnn_r50_caffe_dc5.py
================================================
# model settings
norm_cfg = dict(type='BN', requires_grad=False)
model = dict(
type='FasterRCNN',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
strides=(1, 2, 2, 1),
dilations=(1, 1, 1, 2),
out_indices=(3, ),
frozen_stages=1,
norm_cfg=norm_cfg,
norm_eval=True,
style='caffe',
init_cfg=dict(
type='Pretrained',
checkpoint='open-mmlab://detectron2/resnet50_caffe')),
rpn_head=dict(
type='RPNHead',
in_channels=2048,
feat_channels=2048,
anchor_generator=dict(
type='AnchorGenerator',
scales=[2, 4, 8, 16, 32],
ratios=[0.5, 1.0, 2.0],
strides=[16]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[.0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0]),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
roi_head=dict(
type='StandardRoIHead',
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
out_channels=2048,
featmap_strides=[16]),
bbox_head=dict(
type='Shared2FCBBoxHead',
in_channels=2048,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=80,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=False,
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0))),
# model training and testing settings
train_cfg=dict(
rpn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
match_low_quality=True,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False),
allowed_border=0,
pos_weight=-1,
debug=False),
rpn_proposal=dict(
nms_pre=12000,
max_per_img=2000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
match_low_quality=False,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False)),
test_cfg=dict(
rpn=dict(
nms=dict(type='nms', iou_threshold=0.7),
nms_pre=6000,
max_per_img=1000,
min_bbox_size=0),
rcnn=dict(
score_thr=0.05,
nms=dict(type='nms', iou_threshold=0.5),
max_per_img=100)))
================================================
FILE: detection/configs/_base_/models/faster_rcnn_r50_fpn.py
================================================
# model settings
model = dict(
type='FasterRCNN',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch',
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5),
rpn_head=dict(
type='RPNHead',
in_channels=256,
feat_channels=256,
anchor_generator=dict(
type='AnchorGenerator',
scales=[8],
ratios=[0.5, 1.0, 2.0],
strides=[4, 8, 16, 32, 64]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[.0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0]),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
roi_head=dict(
type='StandardRoIHead',
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
bbox_head=dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=80,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=False,
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0))),
# model training and testing settings
train_cfg=dict(
rpn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
match_low_quality=True,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False),
allowed_border=-1,
pos_weight=-1,
debug=False),
rpn_proposal=dict(
nms_pre=2000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
match_low_quality=False,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False)),
test_cfg=dict(
rpn=dict(
nms_pre=1000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
score_thr=0.05,
nms=dict(type='nms', iou_threshold=0.5),
max_per_img=100)
# soft-nms is also supported for rcnn testing
# e.g., nms=dict(type='soft_nms', iou_threshold=0.5, min_score=0.05)
))
================================================
FILE: detection/configs/_base_/models/mask_rcnn_convnext_fpn.py
================================================
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.
# model settings
model = dict(
type='MaskRCNN',
pretrained=None,
backbone=dict(
type='ConvNeXt',
in_chans=3,
depths=[3, 3, 9, 3],
dims=[96, 192, 384, 768],
drop_path_rate=0.2,
layer_scale_init_value=1e-6,
out_indices=[0, 1, 2, 3],
),
neck=dict(
type='FPN',
in_channels=[128, 256, 512, 1024],
out_channels=256,
num_outs=5),
rpn_head=dict(
type='RPNHead',
in_channels=256,
feat_channels=256,
anchor_generator=dict(
type='AnchorGenerator',
scales=[8],
ratios=[0.5, 1.0, 2.0],
strides=[4, 8, 16, 32, 64]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[.0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0]),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
roi_head=dict(
type='StandardRoIHead',
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
bbox_head=dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=80,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=False,
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
mask_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
mask_head=dict(
type='FCNMaskHead',
num_convs=4,
in_channels=256,
conv_out_channels=256,
num_classes=80,
loss_mask=dict(
type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
# model training and testing settings
train_cfg=dict(
rpn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
match_low_quality=True,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False),
allowed_border=-1,
pos_weight=-1,
debug=False),
rpn_proposal=dict(
nms_pre=2000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
match_low_quality=True,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
mask_size=28,
pos_weight=-1,
debug=False)),
test_cfg=dict(
rpn=dict(
nms_pre=1000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
score_thr=0.05,
nms=dict(type='nms', iou_threshold=0.5),
max_per_img=100,
mask_thr_binary=0.5)))
================================================
FILE: detection/configs/_base_/models/mask_rcnn_r50_caffe_c4.py
================================================
# model settings
norm_cfg = dict(type='BN', requires_grad=False)
model = dict(
type='MaskRCNN',
backbone=dict(
type='ResNet',
depth=50,
num_stages=3,
strides=(1, 2, 2),
dilations=(1, 1, 1),
out_indices=(2, ),
frozen_stages=1,
norm_cfg=norm_cfg,
norm_eval=True,
style='caffe',
init_cfg=dict(
type='Pretrained',
checkpoint='open-mmlab://detectron2/resnet50_caffe')),
rpn_head=dict(
type='RPNHead',
in_channels=1024,
feat_channels=1024,
anchor_generator=dict(
type='AnchorGenerator',
scales=[2, 4, 8, 16, 32],
ratios=[0.5, 1.0, 2.0],
strides=[16]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[.0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0]),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
roi_head=dict(
type='StandardRoIHead',
shared_head=dict(
type='ResLayer',
depth=50,
stage=3,
stride=2,
dilation=1,
style='caffe',
norm_cfg=norm_cfg,
norm_eval=True),
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
out_channels=1024,
featmap_strides=[16]),
bbox_head=dict(
type='BBoxHead',
with_avg_pool=True,
roi_feat_size=7,
in_channels=2048,
num_classes=80,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=False,
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
mask_roi_extractor=None,
mask_head=dict(
type='FCNMaskHead',
num_convs=0,
in_channels=2048,
conv_out_channels=256,
num_classes=80,
loss_mask=dict(
type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
# model training and testing settings
train_cfg=dict(
rpn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
match_low_quality=True,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False),
allowed_border=0,
pos_weight=-1,
debug=False),
rpn_proposal=dict(
nms_pre=12000,
max_per_img=2000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
match_low_quality=False,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
mask_size=14,
pos_weight=-1,
debug=False)),
test_cfg=dict(
rpn=dict(
nms_pre=6000,
nms=dict(type='nms', iou_threshold=0.7),
max_per_img=1000,
min_bbox_size=0),
rcnn=dict(
score_thr=0.05,
nms=dict(type='nms', iou_threshold=0.5),
max_per_img=100,
mask_thr_binary=0.5)))
================================================
FILE: detection/configs/_base_/models/mask_rcnn_r50_fpn.py
================================================
# model settings
model = dict(
type='MaskRCNN',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch',
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5),
rpn_head=dict(
type='RPNHead',
in_channels=256,
feat_channels=256,
anchor_generator=dict(
type='AnchorGenerator',
scales=[8],
ratios=[0.5, 1.0, 2.0],
strides=[4, 8, 16, 32, 64]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[.0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0]),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
roi_head=dict(
type='StandardRoIHead',
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
bbox_head=dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=80,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=False,
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
mask_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
mask_head=dict(
type='FCNMaskHead',
num_convs=4,
in_channels=256,
conv_out_channels=256,
num_classes=80,
loss_mask=dict(
type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
# model training and testing settings
train_cfg=dict(
rpn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
match_low_quality=True,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False),
allowed_border=-1,
pos_weight=-1,
debug=False),
rpn_proposal=dict(
nms_pre=2000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
match_low_quality=True,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
mask_size=28,
pos_weight=-1,
debug=False)),
test_cfg=dict(
rpn=dict(
nms_pre=1000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
score_thr=0.05,
nms=dict(type='nms', iou_threshold=0.5),
max_per_img=100,
mask_thr_binary=0.5)))
================================================
FILE: detection/configs/_base_/models/retinanet_r50_fpn.py
================================================
# model settings
model = dict(
type='RetinaNet',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch',
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
start_level=1,
add_extra_convs='on_input',
num_outs=5),
bbox_head=dict(
type='RetinaHead',
num_classes=80,
in_channels=256,
stacked_convs=4,
feat_channels=256,
anchor_generator=dict(
type='AnchorGenerator',
octave_base_scale=4,
scales_per_octave=3,
ratios=[0.5, 1.0, 2.0],
strides=[8, 16, 32, 64, 128]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[.0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0]),
loss_cls=dict(
type='FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
# model training and testing settings
train_cfg=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.4,
min_pos_iou=0,
ignore_iof_thr=-1),
allowed_border=-1,
pos_weight=-1,
debug=False),
test_cfg=dict(
nms_pre=1000,
min_bbox_size=0,
score_thr=0.05,
nms=dict(type='nms', iou_threshold=0.5),
max_per_img=100))
================================================
FILE: detection/configs/_base_/models/rpn_r50_caffe_c4.py
================================================
# model settings
model = dict(
type='RPN',
backbone=dict(
type='ResNet',
depth=50,
num_stages=3,
strides=(1, 2, 2),
dilations=(1, 1, 1),
out_indices=(2, ),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=False),
norm_eval=True,
style='caffe',
init_cfg=dict(
type='Pretrained',
checkpoint='open-mmlab://detectron2/resnet50_caffe')),
neck=None,
rpn_head=dict(
type='RPNHead',
in_channels=1024,
feat_channels=1024,
anchor_generator=dict(
type='AnchorGenerator',
scales=[2, 4, 8, 16, 32],
ratios=[0.5, 1.0, 2.0],
strides=[16]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[.0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0]),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
# model training and testing settings
train_cfg=dict(
rpn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False),
allowed_border=0,
pos_weight=-1,
debug=False)),
test_cfg=dict(
rpn=dict(
nms_pre=12000,
max_per_img=2000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0)))
================================================
FILE: detection/configs/_base_/models/rpn_r50_fpn.py
================================================
# model settings
model = dict(
type='RPN',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch',
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5),
rpn_head=dict(
type='RPNHead',
in_channels=256,
feat_channels=256,
anchor_generator=dict(
type='AnchorGenerator',
scales=[8],
ratios=[0.5, 1.0, 2.0],
strides=[4, 8, 16, 32, 64]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[.0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0]),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
# model training and testing settings
train_cfg=dict(
rpn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False),
allowed_border=0,
pos_weight=-1,
debug=False)),
test_cfg=dict(
rpn=dict(
nms_pre=2000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0)))
================================================
FILE: detection/configs/_base_/models/ssd300.py
================================================
# model settings
input_size = 300
model = dict(
type='SingleStageDetector',
backbone=dict(
type='SSDVGG',
depth=16,
with_last_pool=False,
ceil_mode=True,
out_indices=(3, 4),
out_feature_indices=(22, 34),
init_cfg=dict(
type='Pretrained', checkpoint='open-mmlab://vgg16_caffe')),
neck=dict(
type='SSDNeck',
in_channels=(512, 1024),
out_channels=(512, 1024, 512, 256, 256, 256),
level_strides=(2, 2, 1, 1),
level_paddings=(1, 1, 0, 0),
l2_norm_scale=20),
bbox_head=dict(
type='SSDHead',
in_channels=(512, 1024, 512, 256, 256, 256),
num_classes=80,
anchor_generator=dict(
type='SSDAnchorGenerator',
scale_major=False,
input_size=input_size,
basesize_ratio_range=(0.15, 0.9),
strides=[8, 16, 32, 64, 100, 300],
ratios=[[2], [2, 3], [2, 3], [2, 3], [2], [2]]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[.0, .0, .0, .0],
target_stds=[0.1, 0.1, 0.2, 0.2])),
# model training and testing settings
train_cfg=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.,
ignore_iof_thr=-1,
gt_max_assign_all=False),
smoothl1_beta=1.,
allowed_border=-1,
pos_weight=-1,
neg_pos_ratio=3,
debug=False),
test_cfg=dict(
nms_pre=1000,
nms=dict(type='nms', iou_threshold=0.45),
min_bbox_size=0,
score_thr=0.02,
max_per_img=200))
cudnn_benchmark = True
================================================
FILE: detection/configs/_base_/schedules/schedule_1x.py
================================================
# optimizer
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
================================================
FILE: detection/configs/_base_/schedules/schedule_3x.py
================================================
# optimizer
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[27, 33])
runner = dict(type='EpochBasedRunner', max_epochs=36)
================================================
FILE: detection/configs/maskrcnn_overlock/mask_rcnn_overlock_b_in1k_fpn_1x_coco.py
================================================
_base_ = [
'../_base_/models/mask_rcnn_r50_fpn.py',
'../_base_/datasets/coco_instance.py',
'../_base_/schedules/schedule_1x.py',
'../_base_/default_runtime.py'
]
dims = [80, 160, 528, 720]
model = dict(
backbone=dict(
_delete_=True,
type='overlock_b',
pretrained=True,
drop_path_rate=0.6
),
neck=dict(
type='FPN',
in_channels=dims,
out_channels=256,
num_outs=5))
###########################################################################################################
# https://github.com/Sense-X/UniFormer/blob/main/object_detection/exp/mask_rcnn_1x_hybrid_small/config.py
# We follow uniformer's optimizer and lr schedule
optimizer = dict(_delete_=True, type='AdamW', lr=0.0002, betas=(0.9, 0.999), weight_decay=0.05,
paramwise_cfg=dict(custom_keys={'absolute_pos_embed': dict(decay_mult=0.),
'relative_position_bias_table': dict(decay_mult=0.),
'norm': dict(decay_mult=0.)}))
evaluation = dict(save_best='auto')
checkpoint_config = dict(interval=1, max_keep_ckpts=1, save_last=True)
================================================
FILE: detection/configs/maskrcnn_overlock/mask_rcnn_overlock_b_in1k_fpn_3x_coco.py
================================================
_base_ = [
'../_base_/models/mask_rcnn_r50_fpn.py',
'../_base_/datasets/coco_instance.py',
'../_base_/schedules/schedule_3x.py',
'../_base_/default_runtime.py'
]
dims = [80, 160, 528, 720]
model = dict(
backbone=dict(
_delete_=True,
type='overlock_b',
pretrained=True,
drop_path_rate=0.6
),
neck=dict(
type='FPN',
in_channels=dims,
out_channels=256,
num_outs=5))
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
# augmentation strategy originates from DETR / Sparse RCNN
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='AutoAugment',
policies=[
[
dict(type='Resize',
img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333),
(608, 1333), (640, 1333), (672, 1333), (704, 1333),
(736, 1333), (768, 1333), (800, 1333)],
multiscale_mode='value',
keep_ratio=True)
],
[
dict(type='Resize',
img_scale=[(400, 1333), (500, 1333), (600, 1333)],
multiscale_mode='value',
keep_ratio=True),
dict(type='RandomCrop',
crop_type='absolute_range',
crop_size=(384, 600),
allow_negative_crop=True),
dict(type='Resize',
img_scale=[(480, 1333), (512, 1333), (544, 1333),
(576, 1333), (608, 1333), (640, 1333),
(672, 1333), (704, 1333), (736, 1333),
(768, 1333), (800, 1333)],
multiscale_mode='value',
override=True,
keep_ratio=True)
]
]),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
]
# We use 8 GPUs to train this model so that the total batch size was 16
data = dict(samples_per_gpu=2, train=dict(pipeline=train_pipeline))
optimizer = dict(_delete_=True, type='AdamW', lr=0.0001, betas=(0.9, 0.999), weight_decay=0.05,
paramwise_cfg=dict(custom_keys={'absolute_pos_embed': dict(decay_mult=0.),
'relative_position_bias_table': dict(decay_mult=0.),
'norm': dict(decay_mult=0.)}))
evaluation = dict(save_best='auto')
checkpoint_config = dict(interval=1, max_keep_ckpts=1, save_last=True)
# # AMP (faster but may meet nan loss) ->
# fp16 = dict(loss_scale='dynamic')
================================================
FILE: detection/configs/maskrcnn_overlock/mask_rcnn_overlock_s_in1k_fpn_1x_coco.py
================================================
_base_ = [
'../_base_/models/mask_rcnn_r50_fpn.py',
'../_base_/datasets/coco_instance.py',
'../_base_/schedules/schedule_1x.py',
'../_base_/default_runtime.py'
]
dims = [64, 128, 448, 640]
model = dict(
backbone=dict(
_delete_=True,
type='overlock_s',
pretrained=True,
drop_path_rate=0.4
),
neck=dict(
type='FPN',
in_channels=dims,
out_channels=256,
num_outs=5))
###########################################################################################################
# https://github.com/Sense-X/UniFormer/blob/main/object_detection/exp/mask_rcnn_1x_hybrid_small/config.py
# We follow uniformer's optimizer and lr schedule
optimizer = dict(_delete_=True, type='AdamW', lr=0.0002, betas=(0.9, 0.999), weight_decay=0.05,
paramwise_cfg=dict(custom_keys={'absolute_pos_embed': dict(decay_mult=0.),
'relative_position_bias_table': dict(decay_mult=0.),
'norm': dict(decay_mult=0.)}))
evaluation = dict(save_best='auto')
checkpoint_config = dict(interval=1, max_keep_ckpts=1, save_last=True)
================================================
FILE: detection/configs/maskrcnn_overlock/mask_rcnn_overlock_s_in1k_fpn_3x_coco.py
================================================
_base_ = [
'../_base_/models/mask_rcnn_r50_fpn.py',
'../_base_/datasets/coco_instance.py',
'../_base_/schedules/schedule_3x.py',
'../_base_/default_runtime.py'
]
dims = [64, 128, 448, 640]
model = dict(
backbone=dict(
_delete_=True,
type='overlock_s',
pretrained=True,
drop_path_rate=0.4
),
neck=dict(
type='FPN',
in_channels=dims,
out_channels=256,
num_outs=5))
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
# augmentation strategy originates from DETR / Sparse RCNN
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='AutoAugment',
policies=[
[
dict(type='Resize',
img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333),
(608, 1333), (640, 1333), (672, 1333), (704, 1333),
(736, 1333), (768, 1333), (800, 1333)],
multiscale_mode='value',
keep_ratio=True)
],
[
dict(type='Resize',
img_scale=[(400, 1333), (500, 1333), (600, 1333)],
multiscale_mode='value',
keep_ratio=True),
dict(type='RandomCrop',
crop_type='absolute_range',
crop_size=(384, 600),
allow_negative_crop=True),
dict(type='Resize',
img_scale=[(480, 1333), (512, 1333), (544, 1333),
(576, 1333), (608, 1333), (640, 1333),
(672, 1333), (704, 1333), (736, 1333),
(768, 1333), (800, 1333)],
multiscale_mode='value',
override=True,
keep_ratio=True)
]
]),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
]
# We use 8 GPUs to train this model so that the total batch size was 16
data = dict(samples_per_gpu=2, train=dict(pipeline=train_pipeline))
optimizer = dict(_delete_=True, type='AdamW', lr=0.0001, betas=(0.9, 0.999), weight_decay=0.05,
paramwise_cfg=dict(custom_keys={'absolute_pos_embed': dict(decay_mult=0.),
'relative_position_bias_table': dict(decay_mult=0.),
'norm': dict(decay_mult=0.)}))
evaluation = dict(save_best='auto')
checkpoint_config = dict(interval=1, max_keep_ckpts=1, save_last=True)
# # AMP (faster but may meet nan loss) ->
# fp16 = dict(loss_scale='dynamic')
================================================
FILE: detection/configs/maskrcnn_overlock/mask_rcnn_overlock_t_in1k_fpn_1x_coco.py
================================================
_base_ = [
'../_base_/models/mask_rcnn_r50_fpn.py',
'../_base_/datasets/coco_instance.py',
'../_base_/schedules/schedule_1x.py',
'../_base_/default_runtime.py'
]
dims = [64, 128, 384, 640]
model = dict(
backbone=dict(
_delete_=True,
type='overlock_t',
pretrained=True,
drop_path_rate=0.2
),
neck=dict(
type='FPN',
in_channels=dims,
out_channels=256,
num_outs=5))
###########################################################################################################
# https://github.com/Sense-X/UniFormer/blob/main/object_detection/exp/mask_rcnn_1x_hybrid_small/config.py
# We follow uniformer's optimizer and lr schedule
optimizer = dict(_delete_=True, type='AdamW', lr=0.0002, betas=(0.9, 0.999), weight_decay=0.05,
paramwise_cfg=dict(custom_keys={'absolute_pos_embed': dict(decay_mult=0.),
'relative_position_bias_table': dict(decay_mult=0.),
'norm': dict(decay_mult=0.)}))
evaluation = dict(save_best='auto')
checkpoint_config = dict(interval=1, max_keep_ckpts=1, save_last=True)
================================================
FILE: detection/configs/maskrcnn_overlock/mask_rcnn_overlock_t_in1k_fpn_3x_coco.py
================================================
_base_ = [
'../_base_/models/mask_rcnn_r50_fpn.py',
'../_base_/datasets/coco_instance.py',
'../_base_/schedules/schedule_3x.py',
'../_base_/default_runtime.py'
]
dims = [64, 128, 384, 640]
model = dict(
backbone=dict(
_delete_=True,
type='overlock_t',
pretrained=True,
drop_path_rate=0.2
),
neck=dict(
type='FPN',
in_channels=dims,
out_channels=256,
num_outs=5))
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
# augmentation strategy originates from DETR / Sparse RCNN
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='AutoAugment',
policies=[
[
dict(type='Resize',
img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333),
(608, 1333), (640, 1333), (672, 1333), (704, 1333),
(736, 1333), (768, 1333), (800, 1333)],
multiscale_mode='value',
keep_ratio=True)
],
[
dict(type='Resize',
img_scale=[(400, 1333), (500, 1333), (600, 1333)],
multiscale_mode='value',
keep_ratio=True),
dict(type='RandomCrop',
crop_type='absolute_range',
crop_size=(384, 600),
allow_negative_crop=True),
dict(type='Resize',
img_scale=[(480, 1333), (512, 1333), (544, 1333),
(576, 1333), (608, 1333), (640, 1333),
(672, 1333), (704, 1333), (736, 1333),
(768, 1333), (800, 1333)],
multiscale_mode='value',
override=True,
keep_ratio=True)
]
]),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
]
# We use 8 GPUs to train this model so that the total batch size was 16
data = dict(samples_per_gpu=2, train=dict(pipeline=train_pipeline))
optimizer = dict(_delete_=True, type='AdamW', lr=0.0001, betas=(0.9, 0.999), weight_decay=0.05,
paramwise_cfg=dict(custom_keys={'absolute_pos_embed': dict(decay_mult=0.),
'relative_position_bias_table': dict(decay_mult=0.),
'norm': dict(decay_mult=0.)}))
evaluation = dict(save_best='auto')
checkpoint_config = dict(interval=1, max_keep_ckpts=1, save_last=True)
# # AMP (faster but may meet nan loss) ->
# fp16 = dict(loss_scale='dynamic')
================================================
FILE: detection/models/__init__.py
================================================
from .overlock import *
================================================
FILE: detection/models/overlock.py
================================================
'''
This is an official implementation of OverLoCK model proposed in the paper:
https://arxiv.org/abs/2502.20087
'''
import torch
import timm
import torch.distributed
import torch.nn.functional as F
from torch import nn
from einops import rearrange, einsum
from natten.functional import na2d_av
from torch.utils.checkpoint import checkpoint
from timm.models.layers import DropPath, to_2tuple
from timm.models.registry import register_model
from mmdet.models.builder import MODELS
from mmdet.utils import get_root_logger
try:
from mmcv.runner import load_checkpoint
except:
from mmengine.runner import load_checkpoint
def get_conv2d(in_channels,
out_channels,
kernel_size,
stride,
padding,
dilation,
groups,
bias,
attempt_use_lk_impl=True):
kernel_size = to_2tuple(kernel_size)
if padding is None:
padding = (kernel_size[0] // 2, kernel_size[1] // 2)
else:
padding = to_2tuple(padding)
need_large_impl = kernel_size[0] == kernel_size[1] and kernel_size[0] > 5 and padding == (kernel_size[0] // 2, kernel_size[1] // 2)
if attempt_use_lk_impl and need_large_impl:
print('---------------- trying to import iGEMM implementation for large-kernel conv')
try:
from depthwise_conv2d_implicit_gemm import DepthWiseConv2dImplicitGEMM
print('---------------- found iGEMM implementation ')
except:
DepthWiseConv2dImplicitGEMM = None
print('---------------- found no iGEMM. use original conv. follow https://github.com/AILab-CVC/UniRepLKNet to install it.')
if DepthWiseConv2dImplicitGEMM is not None and need_large_impl and in_channels == out_channels \
and out_channels == groups and stride == 1 and dilation == 1:
print(f'===== iGEMM Efficient Conv Impl, channels {in_channels}, kernel size {kernel_size} =====')
return DepthWiseConv2dImplicitGEMM(in_channels, kernel_size, bias=bias)
return nn.Conv2d(in_channels, out_channels,
kernel_size=kernel_size,
stride=stride,
padding=padding,
dilation=dilation,
groups=groups,
bias=bias)
def get_bn(dim, use_sync_bn=False):
if use_sync_bn:
return nn.SyncBatchNorm(dim)
else:
return nn.BatchNorm2d(dim)
def fuse_bn(conv, bn):
conv_bias = 0 if conv.bias is None else conv.bias
std = (bn.running_var + bn.eps).sqrt()
return conv.weight * (bn.weight / std).reshape(-1, 1, 1, 1), bn.bias + (conv_bias - bn.running_mean) * bn.weight / std
def convert_dilated_to_nondilated(kernel, dilate_rate):
identity_kernel = torch.ones((1, 1, 1, 1)).to(kernel.device)
if kernel.size(1) == 1:
# This is a DW kernel
dilated = F.conv_transpose2d(kernel, identity_kernel, stride=dilate_rate)
return dilated
else:
# This is a dense or group-wise (but not DW) kernel
slices = []
for i in range(kernel.size(1)):
dilated = F.conv_transpose2d(kernel[:,i:i+1,:,:], identity_kernel, stride=dilate_rate)
slices.append(dilated)
return torch.cat(slices, dim=1)
def merge_dilated_into_large_kernel(large_kernel, dilated_kernel, dilated_r):
large_k = large_kernel.size(2)
dilated_k = dilated_kernel.size(2)
equivalent_kernel_size = dilated_r * (dilated_k - 1) + 1
equivalent_kernel = convert_dilated_to_nondilated(dilated_kernel, dilated_r)
rows_to_pad = large_k // 2 - equivalent_kernel_size // 2
merged_kernel = large_kernel + F.pad(equivalent_kernel, [rows_to_pad] * 4)
return merged_kernel
def stem(in_chans=3, embed_dim=96):
return nn.Sequential(
nn.Conv2d(in_chans, embed_dim//2, kernel_size=3, stride=2, padding=1, bias=False),
nn.BatchNorm2d(embed_dim//2),
nn.GELU(),
nn.Conv2d(embed_dim//2, embed_dim//2, kernel_size=3, padding=1, bias=False),
nn.BatchNorm2d(embed_dim//2),
nn.GELU(),
nn.Conv2d(embed_dim//2, embed_dim, kernel_size=3, stride=2, padding=1, bias=False),
nn.BatchNorm2d(embed_dim),
nn.GELU(),
nn.Conv2d(embed_dim, embed_dim, kernel_size=3, padding=1, bias=False),
nn.BatchNorm2d(embed_dim)
)
def downsample(in_dim, out_dim):
return nn.Sequential(
nn.Conv2d(in_dim, out_dim, kernel_size=3, stride=2, padding=1, bias=False),
nn.BatchNorm2d(out_dim),
)
class SEModule(nn.Module):
def __init__(self, dim, red=8, inner_act=nn.GELU, out_act=nn.Sigmoid):
super().__init__()
inner_dim = max(16, dim // red)
self.proj = nn.Sequential(
nn.AdaptiveAvgPool2d(1),
nn.Conv2d(dim, inner_dim, kernel_size=1),
inner_act(),
nn.Conv2d(inner_dim, dim, kernel_size=1),
out_act(),
)
def forward(self, x):
x = x * self.proj(x)
return x
class LayerScale(nn.Module):
def __init__(self, dim, init_value=1e-5):
super().__init__()
self.weight = nn.Parameter(torch.ones(dim, 1, 1, 1)*init_value,
requires_grad=True)
self.bias = nn.Parameter(torch.zeros(dim), requires_grad=True)
def forward(self, x):
x = F.conv2d(x, weight=self.weight, bias=self.bias, groups=x.shape[1])
return x
class LayerNorm2d(nn.LayerNorm):
def __init__(self, dim):
super().__init__(normalized_shape=dim, eps=1e-6)
def forward(self, x):
x = rearrange(x, 'b c h w -> b h w c')
x = super().forward(x)
x = rearrange(x, 'b h w c -> b c h w')
return x.contiguous()
class GRN(nn.Module):
""" GRN (Global Response Normalization) layer
Originally proposed in ConvNeXt V2 (https://arxiv.org/abs/2301.00808)
This implementation is more efficient than the original (https://github.com/facebookresearch/ConvNeXt-V2)
We assume the inputs to this layer are (N, C, H, W)
"""
def __init__(self, dim, use_bias=True):
super().__init__()
self.use_bias = use_bias
self.gamma = nn.Parameter(torch.zeros(1, dim, 1, 1))
if self.use_bias:
self.beta = nn.Parameter(torch.zeros(1, dim, 1, 1))
def forward(self, x):
Gx = torch.norm(x, p=2, dim=(-1, -2), keepdim=True)
Nx = Gx / (Gx.mean(dim=1, keepdim=True) + 1e-6)
if self.use_bias:
return (self.gamma * Nx + 1) * x + self.beta
else:
return (self.gamma * Nx + 1) * x
class DilatedReparamBlock(nn.Module):
"""
Dilated Reparam Block proposed in UniRepLKNet (https://github.com/AILab-CVC/UniRepLKNet)
We assume the inputs to this block are (N, C, H, W)
"""
def __init__(self, channels, kernel_size, deploy, use_sync_bn=False, attempt_use_lk_impl=True):
super().__init__()
self.lk_origin = get_conv2d(channels, channels, kernel_size, stride=1,
padding=kernel_size//2, dilation=1, groups=channels, bias=deploy,
attempt_use_lk_impl=attempt_use_lk_impl)
self.attempt_use_lk_impl = attempt_use_lk_impl
# Default settings. We did not tune them carefully. Different settings may work better.
if kernel_size == 19:
self.kernel_sizes = [5, 7, 9, 9, 3, 3, 3]
self.dilates = [1, 1, 1, 2, 4, 5, 7]
elif kernel_size == 17:
self.kernel_sizes = [5, 7, 9, 3, 3, 3]
self.dilates = [1, 1, 2, 4, 5, 7]
elif kernel_size == 15:
self.kernel_sizes = [5, 7, 7, 3, 3, 3]
self.dilates = [1, 1, 2, 3, 5, 7]
elif kernel_size == 13:
self.kernel_sizes = [5, 7, 7, 3, 3, 3]
self.dilates = [1, 1, 2, 3, 4, 5]
elif kernel_size == 11:
self.kernel_sizes = [5, 7, 5, 3, 3, 3]
self.dilates = [1, 1, 2, 3, 4, 5]
elif kernel_size == 9:
self.kernel_sizes = [5, 7, 5, 3, 3]
self.dilates = [1, 1, 2, 3, 4]
elif kernel_size == 7:
self.kernel_sizes = [5, 3, 3, 3]
self.dilates = [1, 1, 2, 3]
elif kernel_size == 5:
self.kernel_sizes = [3, 3]
self.dilates = [1, 2]
else:
raise ValueError('Dilated Reparam Block requires kernel_size >= 5')
if not deploy:
self.origin_bn = get_bn(channels, use_sync_bn)
for k, r in zip(self.kernel_sizes, self.dilates):
self.__setattr__('dil_conv_k{}_{}'.format(k, r),
nn.Conv2d(in_channels=channels, out_channels=channels, kernel_size=k, stride=1,
padding=(r * (k - 1) + 1) // 2, dilation=r, groups=channels,
bias=False))
self.__setattr__('dil_bn_k{}_{}'.format(k, r), get_bn(channels, use_sync_bn=use_sync_bn))
def forward(self, x):
if not hasattr(self, 'origin_bn'): # deploy mode
return self.lk_origin(x)
out = self.origin_bn(self.lk_origin(x))
for k, r in zip(self.kernel_sizes, self.dilates):
conv = self.__getattr__('dil_conv_k{}_{}'.format(k, r))
bn = self.__getattr__('dil_bn_k{}_{}'.format(k, r))
out = out + bn(conv(x))
return out
def merge_dilated_branches(self):
if hasattr(self, 'origin_bn'):
origin_k, origin_b = fuse_bn(self.lk_origin, self.origin_bn)
for k, r in zip(self.kernel_sizes, self.dilates):
conv = self.__getattr__('dil_conv_k{}_{}'.format(k, r))
bn = self.__getattr__('dil_bn_k{}_{}'.format(k, r))
branch_k, branch_b = fuse_bn(conv, bn)
origin_k = merge_dilated_into_large_kernel(origin_k, branch_k, r)
origin_b += branch_b
merged_conv = get_conv2d(origin_k.size(0), origin_k.size(0), origin_k.size(2), stride=1,
padding=origin_k.size(2)//2, dilation=1, groups=origin_k.size(0), bias=True,
attempt_use_lk_impl=self.attempt_use_lk_impl)
merged_conv.weight.data = origin_k
merged_conv.bias.data = origin_b
self.lk_origin = merged_conv
self.__delattr__('origin_bn')
for k, r in zip(self.kernel_sizes, self.dilates):
self.__delattr__('dil_conv_k{}_{}'.format(k, r))
self.__delattr__('dil_bn_k{}_{}'.format(k, r))
class CTXDownsample(nn.Module):
def __init__(self, dim, h_dim):
super().__init__()
self.x_proj = nn.Sequential(
nn.Conv2d(dim, h_dim, kernel_size=3, stride=2, padding=1, bias=False),
nn.BatchNorm2d(h_dim)
)
self.h_proj = nn.Sequential(
nn.Conv2d(h_dim//4, h_dim//4, kernel_size=3, stride=2, padding=1, bias=False),
nn.BatchNorm2d(h_dim//4)
)
def forward(self, x, ctx):
x = self.x_proj(x)
ctx = self.h_proj(ctx)
return (x, ctx)
class ResDWConv(nn.Conv2d):
'''
Depthwise convolution with residual connection
'''
def __init__(self, dim, kernel_size=3):
super().__init__(dim, dim, kernel_size=kernel_size, padding=kernel_size//2, groups=dim)
def forward(self, x):
x = x + super().forward(x)
return x
class RepConvBlock(nn.Module):
def __init__(self,
dim=64,
kernel_size=7,
mlp_ratio=4,
ls_init_value=None,
res_scale=False,
drop_path=0,
norm_layer=LayerNorm2d,
use_gemm=False,
deploy=False,
use_checkpoint=False):
super().__init__()
self.res_scale = res_scale
self.use_checkpoint = use_checkpoint
mlp_dim = int(dim*mlp_ratio)
self.dwconv = ResDWConv(dim, kernel_size=3)
self.proj = nn.Sequential(
norm_layer(dim),
DilatedReparamBlock(dim, kernel_size=kernel_size, deploy=deploy, use_sync_bn=False, attempt_use_lk_impl=use_gemm),
nn.BatchNorm2d(dim),
SEModule(dim),
nn.Conv2d(dim, mlp_dim, kernel_size=1),
nn.GELU(),
ResDWConv(mlp_dim, kernel_size=3),
GRN(mlp_dim),
nn.Conv2d(mlp_dim, dim, kernel_size=1),
DropPath(drop_path) if drop_path > 0 else nn.Identity(),
)
self.ls = LayerScale(dim, init_value=ls_init_value) if ls_init_value is not None else nn.Identity()
def forward_features(self, x):
x = self.dwconv(x)
if self.res_scale:
x = self.ls(x) + self.proj(x)
else:
drop_path = self.proj[-1]
x = x + drop_path(self.ls(self.proj[:-1](x)))
return x
def forward(self, x):
if self.use_checkpoint and x.requires_grad:
x = checkpoint(self.forward_features, x, use_reentrant=False)
else:
x = self.forward_features(x)
return x
class DynamicConvBlock(nn.Module):
def __init__(self,
dim=64,
ctx_dim=32,
kernel_size=7,
smk_size=5,
num_heads=2,
mlp_ratio=4,
ls_init_value=None,
res_scale=False,
drop_path=0,
norm_layer=LayerNorm2d,
is_first=False,
is_last=False,
use_gemm=False,
deploy=False,
use_checkpoint=False,
**kwargs):
super().__init__()
ctx_dim = ctx_dim // 4
out_dim = dim + ctx_dim
mlp_dim = int(dim*mlp_ratio)
self.kernel_size = kernel_size
self.res_scale = res_scale
self.use_gemm = use_gemm
self.smk_size = smk_size
self.num_heads = num_heads * 2
head_dim = dim // self.num_heads
self.scale = head_dim ** -0.5
self.is_first = is_first
self.is_last = is_last
self.use_checkpoint = use_checkpoint
if not is_first:
self.x_scale = LayerScale(ctx_dim, init_value=1)
self.h_scale = LayerScale(ctx_dim, init_value=1)
self.dwconv1 = ResDWConv(out_dim, kernel_size=3)
self.norm1 = norm_layer(out_dim)
self.fusion = nn.Sequential(
nn.Conv2d(out_dim, out_dim, kernel_size=3, padding=1, groups=out_dim),
nn.BatchNorm2d(out_dim),
nn.GELU(),
nn.Conv2d(out_dim, dim, kernel_size=1),
GRN(dim),
)
self.weight_query = nn.Sequential(
nn.Conv2d(dim, dim//2, kernel_size=1, bias=False),
nn.BatchNorm2d(dim//2),
)
self.weight_key = nn.Sequential(
nn.AdaptiveAvgPool2d(7),
nn.Conv2d(ctx_dim, dim//2, kernel_size=1, bias=False),
nn.BatchNorm2d(dim//2),
)
self.weight_proj = nn.Conv2d(49, kernel_size**2 + smk_size**2, kernel_size=1)
self.dyconv_proj = nn.Sequential(
nn.Conv2d(dim, dim, kernel_size=1, bias=False),
nn.BatchNorm2d(dim),
)
self.lepe = nn.Sequential(
DilatedReparamBlock(dim, kernel_size=kernel_size, deploy=deploy, use_sync_bn=False, attempt_use_lk_impl=use_gemm),
nn.BatchNorm2d(dim),
)
self.se_layer = SEModule(dim)
self.gate = nn.Sequential(
nn.Conv2d(dim, dim, kernel_size=1, bias=False),
nn.BatchNorm2d(dim),
nn.SiLU(),
)
self.proj = nn.Sequential(
nn.BatchNorm2d(dim),
nn.Conv2d(dim, out_dim, kernel_size=1),
)
self.dwconv2 = ResDWConv(out_dim, kernel_size=3)
self.norm2 = norm_layer(out_dim)
self.mlp = nn.Sequential(
nn.Conv2d(out_dim, mlp_dim, kernel_size=1),
nn.GELU(),
ResDWConv(mlp_dim, kernel_size=3),
GRN(mlp_dim),
nn.Conv2d(mlp_dim, out_dim, kernel_size=1),
)
self.ls1 = LayerScale(out_dim, init_value=ls_init_value) if ls_init_value is not None else nn.Identity()
self.ls2 = LayerScale(out_dim, init_value=ls_init_value) if ls_init_value is not None else nn.Identity()
self.drop_path = DropPath(drop_path) if drop_path > 0 else nn.Identity()
self.get_rpb()
def get_rpb(self):
self.rpb_size1 = 2 * self.smk_size - 1
self.rpb1 = nn.Parameter(torch.empty(self.num_heads, self.rpb_size1, self.rpb_size1))
self.rpb_size2 = 2 * self.kernel_size - 1
self.rpb2 = nn.Parameter(torch.empty(self.num_heads, self.rpb_size2, self.rpb_size2))
nn.init.zeros_(self.rpb1)
nn.init.zeros_(self.rpb2)
@torch.no_grad()
def generate_idx(self, kernel_size):
rpb_size = 2 * kernel_size - 1
idx_h = torch.arange(0, kernel_size)
idx_w = torch.arange(0, kernel_size)
idx_k = ((idx_h.unsqueeze(-1) * rpb_size) + idx_w).view(-1)
return (idx_h, idx_w, idx_k)
def apply_rpb(self, attn, rpb, height, width, kernel_size, idx_h, idx_w, idx_k):
"""
RPB implementation directly borrowed from https://tinyurl.com/mrbub4t3
"""
num_repeat_h = torch.ones(kernel_size, dtype=torch.long)
num_repeat_w = torch.ones(kernel_size, dtype=torch.long)
num_repeat_h[kernel_size//2] = height - (kernel_size-1)
num_repeat_w[kernel_size//2] = width - (kernel_size-1)
bias_hw = (idx_h.repeat_interleave(num_repeat_h).unsqueeze(-1) * (2*kernel_size-1)) + idx_w.repeat_interleave(num_repeat_w)
bias_idx = bias_hw.unsqueeze(-1) + idx_k
bias_idx = bias_idx.reshape(-1, int(kernel_size**2))
bias_idx = torch.flip(bias_idx, [0])
rpb = torch.flatten(rpb, 1, 2)[:, bias_idx]
rpb = rpb.reshape(1, int(self.num_heads), int(height), int(width), int(kernel_size**2))
return attn + rpb
def _forward_inner(self, x, h_x, h_r):
input_resoltion = x.shape[2:]
B, C, H, W = x.shape
B, C_h, H_h, W_h = h_x.shape
if not self.is_first:
h_x = self.x_scale(h_x) + self.h_scale(h_r)
x_f = torch.cat([x, h_x], dim=1)
x_f = self.dwconv1(x_f)
identity = x_f
x_f = self.norm1(x_f)
x = self.fusion(x_f)
gate = self.gate(x)
lepe = self.lepe(x)
is_pad = False
if min(H, W) < self.kernel_size:
is_pad = True
if H < W:
size = (self.kernel_size, int(self.kernel_size / H * W))
else:
size = (int(self.kernel_size / W * H), self.kernel_size)
x = F.interpolate(x, size=size, mode='bilinear', align_corners=False)
x_f = F.interpolate(x_f, size=size, mode='bilinear', align_corners=False)
H, W = size
query, key = torch.split(x_f, split_size_or_sections=[C, C_h], dim=1)
query = self.weight_query(query) * self.scale
key = self.weight_key(key)
query = rearrange(query, 'b (g c) h w -> b g c (h w)', g=self.num_heads)
key = rearrange(key, 'b (g c) h w -> b g c (h w)', g=self.num_heads)
weight = einsum(query, key, 'b g c n, b g c l -> b g n l')
weight = rearrange(weight, 'b g n l -> b l g n').contiguous()
weight = self.weight_proj(weight)
weight = rearrange(weight, 'b l g (h w) -> b g h w l', h=H, w=W)
attn1, attn2 = torch.split(weight, split_size_or_sections=[self.smk_size**2, self.kernel_size**2], dim=-1)
rpb1_idx = self.generate_idx(self.smk_size)
rpb2_idx = self.generate_idx(self.kernel_size)
attn1 = self.apply_rpb(attn1, self.rpb1, H, W, self.smk_size, *rpb1_idx)
attn2 = self.apply_rpb(attn2, self.rpb2, H, W, self.kernel_size, *rpb2_idx)
attn1 = torch.softmax(attn1, dim=-1)
attn2 = torch.softmax(attn2, dim=-1)
value = rearrange(x, 'b (m g c) h w -> m b g h w c', m=2, g=self.num_heads)
x1 = na2d_av(attn1, value[0], kernel_size=self.smk_size)
x2 = na2d_av(attn2, value[1], kernel_size=self.kernel_size)
x = torch.cat([x1, x2], dim=1)
x = rearrange(x, 'b g h w c -> b (g c) h w', h=H, w=W)
if is_pad:
x = F.adaptive_avg_pool2d(x, input_resoltion)
x = self.dyconv_proj(x)
x = x + lepe
x = self.se_layer(x)
x = gate * x
x = self.proj(x)
if self.res_scale:
x = self.ls1(identity) + self.drop_path(x)
else:
x = identity + self.drop_path(self.ls1(x))
x = self.dwconv2(x)
if self.res_scale:
x = self.ls2(x) + self.drop_path(self.mlp(self.norm2(x)))
else:
x = x + self.drop_path(self.ls2(self.mlp(self.norm2(x))))
if self.is_last:
return (x, None)
else:
l_x, h_x = torch.split(x, split_size_or_sections=[C, C_h], dim=1)
return (l_x, h_x)
def forward(self, x, h_x, h_r):
if self.use_checkpoint and x.requires_grad:
x = checkpoint(self._forward_inner, x, h_x, h_r, use_reentrant=False)
else:
x = self._forward_inner(x, h_x, h_r)
return x
class OverLoCK(nn.Module):
'''
An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels
https://arxiv.org/abs/2502.20087
'''
def __init__(self,
depth=[2, 2, 2, 2],
sub_depth=[4, 2],
in_chans=3,
embed_dim=[96, 192, 384, 768],
kernel_size=[7, 7, 7, 7],
mlp_ratio=[4, 4, 4, 4],
sub_mlp_ratio=[4, 4],
sub_num_heads=[4, 8],
ls_init_value=[None, None, 1, 1],
res_scale=True,
smk_size=5,
deploy=False,
use_gemm=True,
use_ds=True,
drop_rate=0,
drop_path_rate=0,
norm_layer=LayerNorm2d,
projection=1024,
num_classes=1000,
use_checkpoint=[0, 0, 0, 0],
):
super().__init__()
fusion_dim = embed_dim[-1] + embed_dim[-1]//4
# self.num_classes = num_classes
self.num_features = self.embed_dim = embed_dim
self.patch_embed1 = stem(in_chans, embed_dim[0])
self.patch_embed2 = downsample(embed_dim[0], embed_dim[1])
self.patch_embed3 = downsample(embed_dim[1], embed_dim[2])
self.patch_embed4 = downsample(embed_dim[2], embed_dim[3])
self.high_level_proj = nn.Conv2d(embed_dim[-1], embed_dim[-1]//4, kernel_size=1)
self.patch_embedx = CTXDownsample(embed_dim[2], embed_dim[3])
dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(depth) + sum(sub_depth))]
self.blocks1 = nn.ModuleList()
self.blocks2 = nn.ModuleList()
self.blocks3 = nn.ModuleList()
self.blocks4 = nn.ModuleList()
self.sub_blocks3 = nn.ModuleList()
self.sub_blocks4 = nn.ModuleList()
for i in range(depth[0]):
self.blocks1.append(
RepConvBlock(
dim=embed_dim[0],
kernel_size=kernel_size[0],
mlp_ratio=mlp_ratio[0],
ls_init_value=ls_init_value[0],
res_scale=res_scale,
drop_path=dpr[i],
norm_layer=norm_layer,
use_gemm=use_gemm,
deploy=deploy,
use_checkpoint=(i<use_checkpoint[0]),
)
)
for i in range(depth[1]):
self.blocks2.append(
RepConvBlock(
dim=embed_dim[1],
kernel_size=kernel_size[1],
mlp_ratio=mlp_ratio[1],
ls_init_value=ls_init_value[1],
res_scale=res_scale,
drop_path=dpr[i+depth[0]],
norm_layer=norm_layer,
use_gemm=use_gemm,
deploy=deploy,
use_checkpoint=(i<use_checkpoint[1]),
)
)
for i in range(depth[2]):
self.blocks3.append(
RepConvBlock(
dim=embed_dim[2],
kernel_size=kernel_size[2],
mlp_ratio=mlp_ratio[2],
ls_init_value=ls_init_value[2],
res_scale=res_scale,
drop_path=dpr[i+sum(depth[:2])],
norm_layer=norm_layer,
use_gemm=use_gemm,
deploy=deploy,
use_checkpoint=(i<use_checkpoint[2]),
)
)
for i in range(depth[3]):
self.blocks4.append(
RepConvBlock(
dim=embed_dim[3],
kernel_size=kernel_size[3],
mlp_ratio=mlp_ratio[3],
ls_init_value=ls_init_value[3],
res_scale=res_scale,
drop_path=dpr[i+sum(depth[:3])],
norm_layer=norm_layer,
use_gemm=use_gemm,
deploy=deploy,
use_checkpoint=(i<use_checkpoint[3]),
)
)
for i in range(sub_depth[0]):
self.sub_blocks3.append(
DynamicConvBlock(
dim=embed_dim[2],
ctx_dim=embed_dim[-1],
kernel_size=kernel_size[2],
num_heads=sub_num_heads[0],
pool_size=7,
mlp_ratio=sub_mlp_ratio[0],
ls_init_value=ls_init_value[2],
res_scale=res_scale,
drop_path=dpr[i+sum(depth)],
norm_layer=norm_layer,
smk_size=smk_size,
use_gemm=use_gemm,
deploy=deploy,
is_first=(i==0),
use_checkpoint=(i<use_checkpoint[2]),
)
)
for i in range(sub_depth[1]):
self.sub_blocks4.append(
DynamicConvBlock(
dim=embed_dim[3],
ctx_dim=embed_dim[-1],
kernel_size=kernel_size[-1],
num_heads=sub_num_heads[1],
pool_size=7,
mlp_ratio=sub_mlp_ratio[1],
ls_init_value=ls_init_value[3],
res_scale=res_scale,
drop_path=dpr[i+sum(depth)+sub_depth[0]],
norm_layer=norm_layer,
smk_size=smk_size,
is_first=False,
is_last=(i==sub_depth[1]-1),
use_gemm=use_gemm,
deploy=deploy,
use_checkpoint=(i<use_checkpoint[3]),
)
)
self.h_proj = nn.Sequential(
nn.Conv2d(embed_dim[-1], fusion_dim, kernel_size=1),
LayerScale(fusion_dim, init_value=1e-5),
)
# Aux Cls Head
if use_ds:
self.aux_head = nn.Sequential(
nn.BatchNorm2d(embed_dim[-1]),
nn.AdaptiveAvgPool2d(1),
nn.Conv2d(embed_dim[-1], num_classes, kernel_size=1) if num_classes > 0 else nn.Identity()
)
# Main Cls Head
self.head = nn.Sequential(
nn.Conv2d(fusion_dim, projection, kernel_size=1, bias=False),
nn.BatchNorm2d(projection),
nn.SiLU(),
nn.AdaptiveAvgPool2d(1),
nn.Conv2d(projection, num_classes, kernel_size=1) if num_classes > 0 else nn.Identity()
)
self.extra_norm = nn.ModuleList()
for idx in range(4):
dim = embed_dim[idx]
if idx >= 2:
dim = dim + embed_dim[-1]//4
self.extra_norm.append(norm_layer(dim))
self.extra_norm.append(norm_layer(embed_dim[-1]))
del self.aux_head
del self.head
self.apply(self._init_weights)
def _init_weights(self, m):
if isinstance(m, (nn.Linear, nn.Conv2d, nn.Conv1d)):
nn.init.trunc_normal_(m.weight, std=.02)
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, (nn.LayerNorm, nn.BatchNorm2d, nn.BatchNorm1d)):
nn.init.constant_(m.weight, 1.0)
nn.init.constant_(m.bias, 0)
def _convert_sync_batchnorm(self):
if torch.distributed.is_initialized():
self = nn.SyncBatchNorm.convert_sync_batchnorm(self)
def forward_pre_features(self, x):
outs = []
x = self.patch_embed1(x)
for blk in self.blocks1:
x = blk(x)
outs.append(self.extra_norm[0](x))
x = self.patch_embed2(x)
for blk in self.blocks2:
x = blk(x)
outs.append(self.extra_norm[1](x))
return outs
def forward_base_features(self, x):
x = self.patch_embed3(x)
for blk in self.blocks3:
x = blk(x)
ctx = self.patch_embed4(x)
for blk in self.blocks4:
ctx = blk(ctx)
return (x, ctx)
def forward_sub_features(self, x, ctx):
outs = []
ctx_cls = ctx
ctx_ori = self.high_level_proj(ctx)
ctx_up = F.interpolate(ctx_ori, size=x.shape[2:], mode='bilinear', align_corners=False)
for idx, blk in enumerate(self.sub_blocks3):
if idx == 0:
ctx = ctx_up
x, ctx = blk(x, ctx, ctx_up)
outs.append(self.extra_norm[2](torch.cat([x, ctx], dim=1)))
x, ctx = self.patch_embedx(x, ctx)
for idx, blk in enumerate(self.sub_blocks4):
x, ctx = blk(x, ctx, ctx_ori)
ctx = self.extra_norm[-1](ctx_cls)
x = self.extra_norm[3](x) + self.h_proj(ctx)
outs.append(x)
return outs
def forward_features(self, x):
x0, x1 = self.forward_pre_features(x)
x, ctx = self.forward_base_features(x1)
x2, x3 = self.forward_sub_features(x, ctx)
return (x0, x1, x2, x3)
def forward(self, x):
x = self.forward_features(x)
return x
@MODELS.register_module()
def overlock_xt(pretrained=False, pretrained_cfg=None, **kwargs):
model = OverLoCK(
depth=[2, 2, 3, 2],
sub_depth=[6, 2],
embed_dim=[56, 112, 256, 336],
kernel_size=[17, 15, 13, 7],
mlp_ratio=[4, 4, 4, 4],
sub_num_heads=[4, 6],
sub_mlp_ratio=[3, 3],
**kwargs
)
if pretrained:
pretrained = 'https://github.com/LMMMEng/OverLoCK/releases/download/v1/overlock_xt_in1k_224.pth'
logger = get_root_logger()
load_checkpoint(model, pretrained, logger=logger)
model._convert_sync_batchnorm()
return model
@MODELS.register_module()
def overlock_t(pretrained=False, pretrained_cfg=None, **kwargs):
model = OverLoCK(
depth=[4, 4, 6, 2],
sub_depth=[12, 2],
embed_dim=[64, 128, 256, 512],
kernel_size=[17, 15, 13, 7],
mlp_ratio=[4, 4, 4, 4],
sub_num_heads=[4, 8],
sub_mlp_ratio=[3, 3],
**kwargs
)
if pretrained:
pretrained = 'https://github.com/LMMMEng/OverLoCK/releases/download/v1/overlock_t_in1k_224.pth'
logger = get_root_logger()
load_checkpoint(model, pretrained, logger=logger)
model._convert_sync_batchnorm()
return model
@MODELS.register_module()
def overlock_s(pretrained=False, pretrained_cfg=None, **kwargs):
model = OverLoCK(
depth=[6, 6, 8, 3],
sub_depth=[16, 3],
embed_dim=[64, 128, 320, 512],
kernel_size=[17, 15, 13, 7],
mlp_ratio=[4, 4, 4, 4],
sub_num_heads=[8, 16],
sub_mlp_ratio=[3, 3],
**kwargs
)
if pretrained:
pretrained = 'https://github.com/LMMMEng/OverLoCK/releases/download/v1/overlock_s_in1k_224.pth'
logger = get_root_logger()
load_checkpoint(model, pretrained, logger=logger)
model._convert_sync_batchnorm()
return model
@MODELS.register_module()
def overlock_b(pretrained=None, pretrained_cfg=None, **kwargs):
model = OverLoCK(
depth=[8, 8, 10, 4],
sub_depth=[20, 4],
embed_dim=[80, 160, 384, 576],
kernel_size=[17, 15, 13, 7],
mlp_ratio=[4, 4, 4, 4],
sub_num_heads=[6, 9],
sub_mlp_ratio=[3, 3],
**kwargs
)
if pretrained:
pretrained = 'https://github.com/LMMMEng/OverLoCK/releases/download/v1/overlock_b_in1k_224.pth'
logger = get_root_logger()
load_checkpoint(model, pretrained, logger=logger)
model._convert_sync_batchnorm()
return model
================================================
FILE: detection/readme.md
================================================
# Applying OverLoCK to Object Detection and Instance Segmentation
## 1. Requirements
```
pip install mmcv-full==1.7.2 --no-cache-dir
pip install mmdet==2.28.2 --no-cache-dir
```
💡 To enable torch>=2.1.0 to support mmcv 1.7.2, you need to make the following changes:
> 1️⃣ https://goo.su/XhU5vWr
> 2️⃣ https://goo.su/ogm4yO
## 2. Data Preparation
Prepare COCO 2017 according to the [guidelines](https://github.com/open-mmlab/mmdetection/blob/2.x/docs/en/1_exist_data_model.md).
## 3. Main Results on COCO using Mask R-CNN framework
| Backbone | Pretrain | Schedule | AP_b | AP_m | Config | Download |
|:-------------:|:-----------:|:--------:|--------|:-------:|:------:|:----------:|
| OverLoCK-T | [ImageNet-1K](https://github.com/LMMMEng/OverLoCK/releases/download/v1/overlock_t_in1k_224.pth)| 1x | 48.3 |43.3 |[config](configs/maskrcnn_overlock/mask_rcnn_overlock_t_in1k_fpn_1x_coco.py) |[model](https://github.com/LMMMEng/OverLoCK/releases/download/v1/maskrcnn1x_overlock_tiny_coco.pth) |
| | | 3x |49.6 |43.9 |[config](configs/maskrcnn_overlock/mask_rcnn_overlock_t_in1k_fpn_3x_coco.py) |[model](https://github.com/LMMMEng/OverLoCK/releases/download/v1/maskrcnn3x_overlock_tiny_coco.pth) |
| OverLoCK-S | [ImageNet-1K](https://github.com/LMMMEng/OverLoCK/releases/download/v1/overlock_s_in1k_224.pth)| 1x |49.4 |44.0 |[config](configs/maskrcnn_overlock/mask_rcnn_overlock_s_in1k_fpn_1x_coco.py) |[model](https://github.com/LMMMEng/OverLoCK/releases/download/v1/maskrcnn1x_overlock_small_coco.pth) |
| | | 3x |51.0 |45.0 |[config](configs/maskrcnn_overlock/mask_rcnn_overlock_s_in1k_fpn_3x_coco.py) |[model](https://github.com/LMMMEng/OverLoCK/releases/download/v1/maskrcnn3x_overlock_small_coco.pth) |
| OverLoCK-B | [ImageNet-1K](https://github.com/LMMMEng/OverLoCK/releases/download/v1/overlock_b_in1k_224.pth) | 1x |49.9 |44.4 |[config](configs/maskrcnn_overlock/mask_rcnn_overlock_b_in1k_fpn_1x_coco.py) |[model](https://github.com/LMMMEng/OverLoCK/releases/download/v1/maskrcnn1x_overlock_base_coco.pth) |
| | | 3x |51.4 |45.3 |[config](configs/maskrcnn_overlock/mask_rcnn_overlock_b_in1k_fpn_3x_coco.py) |[model](https://github.com/LMMMEng/OverLoCK/releases/download/v1/maskrcnn3x_overlock_base_coco.pth) |
## 4. Train
To train ``OverLoCK-T + Mask R-CNN 1x`` model on COCO dataset with 8 GPUs (single node), run:
```
NUM_GPUS=8
CONFIG=configs/maskrcnn_overlock/mask_rcnn_overlock_t_in1k_fpn_1x_coco.py
bash scripts/dist_train.sh $CONFIG $NUM_GPUS
```
## 5. Validation
To evaluate ``OverLoCK-T + Mask R-CNN 1x`` model on COCO dataset, run:
```
NUM_GPUS=8
CKPT=path-to-checkpoint.pth
CONFIG=configs/maskrcnn_overlock/mask_rcnn_overlock_t_in1k_fpn_1x_coco.py
bash scripts/dist_test.sh $CONFIG $CKPT $NUM_GPUS --eval bbox segm
```
## Citation
If you find this project useful for your research, please consider citing:
```
@inproceedings{lou2025overlock,
title={OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels},
author={Lou, Meng and Yu, Yizhou},
booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
pages={128--138},
year={2025}
}
```
================================================
FILE: detection/scripts/dist_test.sh
================================================
#!/usr/bin/env bash
CONFIG=$1
CHECKPOINT=$2
GPUS=$3
PORT=$((RANDOM+10000))
PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
torchrun --nproc_per_node=$GPUS --master_port=$PORT test.py $CONFIG $CHECKPOINT --launcher pytorch ${@:4}
================================================
FILE: detection/scripts/dist_train.sh
================================================
#!/usr/bin/env bash
CONFIG=$1
GPUS=$2
PORT=$((RANDOM+10000))
PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
torchrun --nproc_per_node=$GPUS --master_port=$PORT train.py $CONFIG --launcher pytorch ${@:3}
================================================
FILE: detection/test.py
================================================
import argparse
import os
import os.path as osp
import time
import warnings
import mmcv
import torch
from mmcv import Config, DictAction
from mmcv.cnn import fuse_conv_bn
from mmcv.parallel import MMDataParallel, MMDistributedDataParallel
from mmcv.runner import (get_dist_info, init_dist, load_checkpoint,
wrap_fp16_model)
from mmdet.apis import multi_gpu_test, single_gpu_test
from mmdet.datasets import (build_dataloader, build_dataset,
replace_ImageToTensor)
from mmdet.models import build_detector
import models
def parse_args():
parser = argparse.ArgumentParser(
description='MMDet test (and eval) a model')
parser.add_argument('config', help='test config file path')
parser.add_argument('checkpoint', help='checkpoint file')
parser.add_argument(
'--work-dir',
help='the directory to save the file containing evaluation metrics')
parser.add_argument('--out', help='output result file in pickle format')
parser.add_argument(
'--fuse-conv-bn',
action='store_true',
help='Whether to fuse conv and bn, this will slightly increase'
'the inference speed')
parser.add_argument('--gpu-ids',
type=int,
nargs='+',
help='ids of gpus to use '
'(only applicable to non-distributed testing)')
parser.add_argument(
'--format-only',
action='store_true',
help='Format the output results without perform evaluation. It is'
'useful when you want to format the result to a specific format and '
'submit it to the test server')
parser.add_argument(
'--eval',
type=str,
nargs='+',
help='evaluation metrics, which depends on the dataset, e.g., "bbox",'
' "segm", "proposal" for COCO, and "mAP", "recall" for PASCAL VOC')
parser.add_argument('--show', action='store_true', help='show results')
parser.add_argument('--show-dir',
help='directory where painted images will be saved')
parser.add_argument('--show-score-thr',
type=float,
default=0.3,
help='score threshold (default: 0.3)')
parser.add_argument('--gpu-collect',
action='store_true',
help='whether to use gpu to collect results.')
parser.add_argument(
'--tmpdir',
help='tmp directory used for collecting results from multiple '
'workers, available when gpu-collect is not specified')
parser.add_argument(
'--cfg-options',
nargs='+',
action=DictAction,
help='override some settings in the used config, the key-value pair '
'in xxx=yyy format will be merged into config file. If the value to '
'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
'Note that the quotation marks are necessary and that no white space '
'is allowed.')
parser.add_argument(
'--options',
nargs='+',
action=DictAction,
help='custom options for evaluation, the key-value pair in xxx=yyy '
'format will be kwargs for dataset.evaluate() function (deprecate), '
'change to --eval-options instead.')
parser.add_argument(
'--eval-options',
nargs='+',
action=DictAction,
help='custom options for evaluation, the key-value pair in xxx=yyy '
'format will be kwargs for dataset.evaluate() function')
parser.add_argument('--launcher',
choices=['none', 'pytorch', 'slurm', 'mpi'],
default='none',
help='job launcher')
parser.add_argument('--local_rank', type=int, default=0)
args = parser.parse_args()
if 'LOCAL_RANK' not in os.environ:
os.environ['LOCAL_RANK'] = str(args.local_rank)
if args.options and args.eval_options:
raise ValueError(
'--options and --eval-options cannot be both '
'specified, --options is deprecated in favor of --eval-options')
if args.options:
warnings.warn('--options is deprecated in favor of --eval-options')
args.eval_options = args.options
return args
def main():
args = parse_args()
assert args.out or args.eval or args.format_only or args.show \
or args.show_dir, \
('Please specify at least one operation (save/eval/format/show the '
'results / save the results) with the argument "--out", "--eval"'
', "--format-only", "--show" or "--show-dir"')
if args.eval and args.format_only:
raise ValueError('--eval and --format_only cannot be both specified')
if args.out is not None and not args.out.endswith(('.pkl', '.pickle')):
raise ValueError('The output file must be a pkl file.')
cfg = Config.fromfile(args.config)
if args.cfg_options is not None:
cfg.merge_from_dict(args.cfg_options)
# set cudnn_benchmark
if cfg.get('cudnn_benchmark', False):
torch.backends.cudnn.benchmark = True
cfg.model.pretrained = None
if cfg.model.get('neck'):
if isinstance(cfg.model.neck, list):
for neck_cfg in cfg.model.neck:
if neck_cfg.get('rfp_backbone'):
if neck_cfg.rfp_backbone.get('pretrained'):
neck_cfg.rfp_backbone.pretrained = None
elif cfg.model.neck.get('rfp_backbone'):
if cfg.model.neck.rfp_backbone.get('pretrained'):
cfg.model.neck.rfp_backbone.pretrained = None
# in case the test dataset is concatenated
samples_per_gpu = 1
if isinstance(cfg.data.test, dict):
cfg.data.test.test_mode = True
samples_per_gpu = cfg.data.test.pop('samples_per_gpu', 1)
if samples_per_gpu > 1:
# Replace 'ImageToTensor' to 'DefaultFormatBundle'
cfg.data.test.pipeline = replace_ImageToTensor(
cfg.data.test.pipeline)
elif isinstance(cfg.data.test, list):
for ds_cfg in cfg.data.test:
ds_cfg.test_mode = True
samples_per_gpu = max(
[ds_cfg.pop('samples_per_gpu', 1) for ds_cfg in cfg.data.test])
if samples_per_gpu > 1:
for ds_cfg in cfg.data.test:
ds_cfg.pipeline = replace_ImageToTensor(ds_cfg.pipeline)
if args.gpu_ids is not None:
cfg.gpu_ids = args.gpu_ids
else:
cfg.gpu_ids = range(1)
# init distributed env first, since logger depends on the dist info.
if args.launcher == 'none':
distributed = False
if len(cfg.gpu_ids) > 1:
warnings.warn(
f'We treat {cfg.gpu_ids} as gpu-ids, and reset to '
f'{cfg.gpu_ids[0:1]} as gpu-ids to avoid potential error in '
'non-distribute testing time.')
cfg.gpu_ids = cfg.gpu_ids[0:1]
else:
distributed = True
init_dist(args.launcher, **cfg.dist_params)
rank, _ = get_dist_info()
# allows not to create
if args.work_dir is not None and rank == 0:
mmcv.mkdir_or_exist(osp.abspath(args.work_dir))
timestamp = time.strftime('%Y%m%d_%H%M%S', time.localtime())
json_file = osp.join(args.work_dir, f'eval_{timestamp}.json')
# build the dataloader
dataset = build_dataset(cfg.data.test)
data_loader = build_dataloader(dataset,
samples_per_gpu=samples_per_gpu,
workers_per_gpu=cfg.data.workers_per_gpu,
dist=distributed,
shuffle=False)
# build the model and load checkpoint
cfg.model.train_cfg = None
model = build_detector(cfg.model, test_cfg=cfg.get('test_cfg'))
fp16_cfg = cfg.get('fp16', None)
if fp16_cfg is not None:
wrap_fp16_model(model)
checkpoint = load_checkpoint(model, args.checkpoint, map_location='cpu')
if args.fuse_conv_bn:
model = fuse_conv_bn(model)
# old versions did not save class info in checkpoints, this walkaround is
# for backward compatibility
if 'CLASSES' in checkpoint.get('meta', {}):
model.CLASSES = checkpoint['meta']['CLASSES']
else:
model.CLASSES = dataset.CLASSES
if not distributed:
model = MMDataParallel(model, device_ids=cfg.gpu_ids)
outputs = single_gpu_test(model, data_loader, args.show, args.show_dir,
args.show_score_thr)
else:
model = MMDistributedDataParallel(
model.cuda(),
device_ids=[torch.cuda.current_device()],
broadcast_buffers=False)
outputs = multi_gpu_test(model, data_loader, args.tmpdir,
args.gpu_collect)
rank, _ = get_dist_info()
if rank == 0:
if args.out:
print(f'\nwriting results to {args.out}')
mmcv.dump(outputs, args.out)
kwargs = {} if args.eval_options is None else args.eval_options
if args.format_only:
dataset.format_results(outputs, **kwargs)
if args.eval:
eval_kwargs = cfg.get('evaluation', {}).copy()
# hard-code way to remove EvalHook args
for key in [
'interval', 'tmpdir', 'start', 'gpu_collect', 'save_best',
'rule', 'dynamic_intervals'
]:
eval_kwargs.pop(key, None)
eval_kwargs.update(dict(metric=args.eval, **kwargs))
metric = dataset.evaluate(outputs, **eval_kwargs)
print(metric)
metric_dict = dict(config=args.config, metric=metric)
if args.work_dir is not None and rank == 0:
mmcv.dump(metric_dict, json_file)
if __name__ == '__main__':
main()
================================================
FILE: detection/train.py
================================================
import argparse
import copy
import os
import os.path as osp
import time
import warnings
import mmcv
import torch
import torch.distributed as dist
from mmcv import Config, DictAction
from mmcv.runner import get_dist_info, init_dist
from mmcv.utils import get_git_hash
from mmdet import __version__
from mmdet.apis import init_random_seed, set_random_seed, train_detector
from mmdet.datasets import build_dataset
from mmdet.models import build_detector
from mmdet.utils import (collect_env, get_device, get_root_logger,
replace_cfg_vals, setup_multi_processes,
update_data_root)
import models
def parse_args():
parser = argparse.ArgumentParser(description='Train a detector')
parser.add_argument('config', help='train config file path')
parser.add_argument('--work-dir', help='the dir to save logs and models')
parser.add_argument('--resume-from',
help='the checkpoint file to resume from')
parser.add_argument('--auto-resume',
action='store_true',
help='resume from the latest checkpoint automatically')
parser.add_argument(
'--no-validate',
action='store_true',
help='whether not to evaluate the checkpoint during training')
group_gpus = parser.add_mutually_exclusive_group()
group_gpus.add_argument(
'--gpus',
type=int,
help='(Deprecated, please use --gpu-id) number of gpus to use '
'(only applicable to non-distributed training)')
group_gpus.add_argument(
'--gpu-ids',
type=int,
nargs='+',
help='(Deprecated, please use --gpu-id) ids of gpus to use '
'(only applicable to non-distributed training)')
group_gpus.add_argument('--gpu-id',
type=int,
default=0,
help='id of gpu to use '
'(only applicable to non-distributed training)')
parser.add_argument('--seed', type=int, default=None, help='random seed')
parser.add_argument(
'--diff-seed',
action='store_true',
help='Whether or not set different seeds for different ranks')
parser.add_argument(
'--deterministic',
action='store_true',
help='whether to set deterministic options for CUDNN backend.')
parser.add_argument(
'--options',
nargs='+',
action=DictAction,
help='override some settings in the used config, the key-value pair '
'in xxx=yyy format will be merged into config file (deprecate), '
'change to --cfg-options instead.')
parser.add_argument(
'--cfg-options',
nargs='+',
action=DictAction,
help='override some settings in the used config, the key-value pair '
'in xxx=yyy format will be merged into config file. If the value to '
'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
'Note that the quotation marks are necessary and that no white space '
'is allowed.')
parser.add_argument(
'--drop-path',
default=-1,
type=float,
help='drop-path-rate of the backbone network')
parser.add_argument(
'--freeze-bn',
action='store_true',
default=False,
help='freeze the BN layer of the backbone model during training')
parser.add_argument(
'--amp',
action='store_true',
default=False,
help='mixed precision training')
parser.add_argument('--launcher',
choices=['none', 'pytorch', 'slurm', 'mpi'],
default='none',
help='job launcher')
parser.add_argument('--local-rank', type=int, default=0)
parser.add_argument('--auto-scale-lr',
action='store_true',
help='enable automatically scaling LR.')
args = parser.parse_args()
if 'LOCAL_RANK' not in os.environ:
os.environ['LOCAL_RANK'] = str(args.local_rank)
if args.options and args.cfg_options:
raise ValueError(
'--options and --cfg-options cannot be both '
'specified, --options is deprecated in favor of --cfg-options')
if args.options:
warnings.warn('--options is deprecated in favor of --cfg-options')
args.cfg_options = args.options
return args
def main():
args = parse_args()
cfg = Config.fromfile(args.config)
# replace the ${key} with the value of cfg.key
cfg = replace_cfg_vals(cfg)
# update data root according to MMDET_DATASETS
update_data_root(cfg)
if args.cfg_options is not None:
cfg.merge_from_dict(args.cfg_options)
if args.auto_scale_lr:
if 'auto_scale_lr' in cfg and \
'enable' in cfg.auto_scale_lr and \
'base_batch_size' in cfg.auto_scale_lr:
cfg.auto_scale_lr.enable = True
else:
warnings.warn('Can not find "auto_scale_lr" or '
'"auto_scale_lr.enable" or '
'"auto_scale_lr.base_batch_size" in your'
' configuration file. Please update all the '
'configuration files to mmdet >= 2.24.1.')
# set multi-process settings
setup_multi_processes(cfg)
# set cudnn_benchmark
if cfg.get('cudnn_benchmark', False):
torch.backends.cudnn.benchmark = True
# work_dir is determined in this priority: CLI > segment in file > filename
if args.work_dir is not None:
# update configs according to CLI args if args.work_dir is not None
cfg.work_dir = args.work_dir
elif cfg.get('work_dir', None) is None:
# use config filename as default work_dir if cfg.work_dir is None
cfg.work_dir = osp.join('./work_dirs',
osp.splitext(osp.basename(args.config))[0])
if args.resume_from is not None:
cfg.resume_from = args.resume_from
cfg.auto_resume = args.auto_resume
if args.gpus is not None:
cfg.gpu_ids = range(1)
warnings.warn('`--gpus` is deprecated because we only support '
'single GPU mode in non-distributed training. '
'Use `gpus=1` now.')
if args.gpu_ids is not None:
cfg.gpu_ids = args.gpu_ids[0:1]
warnings.warn('`--gpu-ids` is deprecated, please use `--gpu-id`. '
'Because we only support single GPU mode in '
'non-distributed training. Use the first GPU '
'in `gpu_ids` now.')
if args.gpus is None and args.gpu_ids is None:
cfg.gpu_ids = [args.gpu_id]
# init distributed env first, since logger depends on the dist info.
if args.launcher == 'none':
distributed = False
else:
distributed = True
init_dist(args.launcher, **cfg.dist_params)
# re-set gpu_ids with distributed training mode
_, world_size = get_dist_info()
cfg.gpu_ids = range(world_size)
if args.freeze_bn:
try:
cfg.model.backbone.freeze_bn = True
except:
logger.info('freeze_bn is not defined in the config file')
if args.drop_path >= 0:
try:
cfg.model.backbone.drop_path_rate = args.drop_path
except:
logger.info('drop_path is not defined in the config file')
if args.amp:
loss_scale = 'dynamic'
if cfg.get('fp16', None) is None:
cfg.fp16 = dict(loss_scale=loss_scale)
# warnings.warn('fp16 is not defined in the config file')
else:
# cfg.fp16.enabled = True
# cfg.fp16.loss_scale = loss_scale
warnings.warn('fp16 has been defined in the config file')
# cfg.optimizer_config.type = 'Fp16OptimizerHook'
# cfg.optimizer_config.loss_scale = loss_scale
# create work_dir
mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
# dump config
cfg.dump(osp.join(cfg.work_dir, osp.basename(args.config)))
# init the logger before other steps
timestamp = time.strftime('%Y%m%d_%H%M%S', time.localtime())
log_file = osp.join(cfg.work_dir, f'{timestamp}.log')
logger = get_root_logger(log_file=log_file, log_level=cfg.log_level)
# init the meta dict to record some important information such as
# environment info and seed, which will be logged
meta = dict()
# log env info
env_info_dict = collect_env()
env_info = '\n'.join([(f'{k}: {v}') for k, v in env_info_dict.items()])
dash_line = '-' * 60 + '\n'
logger.info('Environment info:\n' + dash_line + env_info + '\n' +
dash_line)
meta['env_info'] = env_info
meta['config'] = cfg.pretty_text
# log some basic info
logger.info(f'Distributed training: {distributed}')
logger.info(f'Config:\n{cfg.pretty_text}')
cfg.device = get_device()
# set random seeds
seed = init_random_seed(args.seed, device=cfg.device)
seed = seed + dist.get_rank() if args.diff_seed else seed
logger.info(f'Set random seed to {seed}, '
f'deterministic: {args.deterministic}')
set_random_seed(seed, deterministic=args.deterministic)
cfg.seed = seed
meta['seed'] = seed
meta['exp_name'] = osp.basename(args.config)
model = build_detector(cfg.model,
train_cfg=cfg.get('train_cfg'),
test_cfg=cfg.get('test_cfg'))
# model.init_weights()
logger.info(model)
datasets = [build_dataset(cfg.data.train)]
if len(cfg.workflow) == 2:
val_dataset = copy.deepcopy(cfg.data.val)
val_dataset.pipeline = cfg.data.train.pipeline
datasets.append(build_dataset(val_dataset))
if cfg.checkpoint_config is not None:
# save mmdet version, config file content and class names in
# checkpoints as meta data
cfg.checkpoint_config.meta = dict(mmdet_version=__version__ +
get_git_hash()[:7],
CLASSES=datasets[0].CLASSES)
# add an attribute for visualization convenience
model.CLASSES = datasets[0].CLASSES
# torch.backends.cuda.matmul.allow_tf32 = True
# torch.backends.cudnn.allow_tf32 = True
train_detector(model,
datasets,
cfg,
distributed=distributed,
validate=(not args.no_validate),
timestamp=timestamp,
meta=meta)
if __name__ == '__main__':
main()
================================================
FILE: models/__init__.py
================================================
from .overlock import overlock_xt, overlock_t, overlock_s, overlock_b
================================================
FILE: models/contmix.py
================================================
'''
This is a plug-and-play implementation of ContMix block in the paper:
https://arxiv.org/abs/2502.20087
'''
import warnings
import torch
import torch.nn.functional as F
from torch import nn
from einops import rearrange, einsum
from timm.models.layers import DropPath, to_2tuple
from torch.utils.checkpoint import checkpoint
try:
from natten.functional import na2d_av
has_natten = True
except:
has_natten = False
warnings.warn("The efficiency may be reduced since 'natten' is not installed."
" It is recommended to install natten for better performance.")
def get_conv2d(in_channels,
out_channels,
kernel_size,
stride,
padding,
dilation,
groups,
bias,
attempt_use_lk_impl=True):
kernel_size = to_2tuple(kernel_size)
if padding is None:
padding = (kernel_size[0] // 2, kernel_size[1] // 2)
else:
padding = to_2tuple(padding)
need_large_impl = kernel_size[0] == kernel_size[1] and kernel_size[0] > 5 and padding == (kernel_size[0] // 2, kernel_size[1] // 2)
if attempt_use_lk_impl and need_large_impl:
print('---------------- trying to import iGEMM implementation for large-kernel conv')
try:
from depthwise_conv2d_implicit_gemm import DepthWiseConv2dImplicitGEMM
print('---------------- found iGEMM implementation ')
except:
DepthWiseConv2dImplicitGEMM = None
print('---------------- found no iGEMM. use original conv. follow https://github.com/AILab-CVC/UniRepLKNet to install it.')
if DepthWiseConv2dImplicitGEMM is not None and need_large_impl and in_channels == out_channels \
and out_channels == groups and stride == 1 and dilation == 1:
print(f'===== iGEMM Efficient Conv Impl, channels {in_channels}, kernel size {kernel_size} =====')
return DepthWiseConv2dImplicitGEMM(in_channels, kernel_size, bias=bias)
return nn.Conv2d(in_channels, out_channels,
kernel_size=kernel_size,
stride=stride,
padding=padding,
dilation=dilation,
groups=groups,
bias=bias)
def get_bn(dim, use_sync_bn=False):
if use_sync_bn:
return nn.SyncBatchNorm(dim)
else:
return nn.BatchNorm2d(dim)
def fuse_bn(conv, bn):
conv_bias = 0 if conv.bias is None else conv.bias
std = (bn.running_var + bn.eps).sqrt()
return conv.weight * (bn.weight / std).reshape(-1, 1, 1, 1), bn.bias + (conv_bias - bn.running_mean) * bn.weight / std
def convert_dilated_to_nondilated(kernel, dilate_rate):
identity_kernel = torch.ones((1, 1, 1, 1)).to(kernel.device)
if kernel.size(1) == 1:
# This is a DW kernel
dilated = F.conv_transpose2d(kernel, identity_kernel, stride=dilate_rate)
return dilated
else:
# This is a dense or group-wise (but not DW) kernel
slices = []
for i in range(kernel.size(1)):
dilated = F.conv_transpose2d(kernel[:,i:i+1,:,:], identity_kernel, stride=dilate_rate)
slices.append(dilated)
return torch.cat(slices, dim=1)
def merge_dilated_into_large_kernel(large_kernel, dilated_kernel, dilated_r):
large_k = large_kernel.size(2)
dilated_k = dilated_kernel.size(2)
equivalent_kernel_size = dilated_r * (dilated_k - 1) + 1
equivalent_kernel = convert_dilated_to_nondilated(dilated_kernel, dilated_r)
rows_to_pad = large_k // 2 - equivalent_kernel_size // 2
merged_kernel = large_kernel + F.pad(equivalent_kernel, [rows_to_pad] * 4)
return merged_kernel
class SEModule(nn.Module):
def __init__(self, dim, red=8, inner_act=nn.GELU, out_act=nn.Sigmoid):
super().__init__()
inner_dim = max(16, dim // red)
self.proj = nn.Sequential(
nn.AdaptiveAvgPool2d(1),
nn.Conv2d(dim, inner_dim, kernel_size=1),
inner_act(),
nn.Conv2d(inner_dim, dim, kernel_size=1),
out_act(),
)
def forward(self, x):
x = x * self.proj(x)
return x
class LayerScale(nn.Module):
def __init__(self, dim, init_value=1e-5):
super().__init__()
self.weight = nn.Parameter(torch.ones(dim, 1, 1, 1)*init_value,
requires_grad=True)
self.bias = nn.Parameter(torch.zeros(dim), requires_grad=True)
def forward(self, x):
x = F.conv2d(x, weight=self.weight, bias=self.bias, groups=x.shape[1])
return x
class LayerNorm2d(nn.LayerNorm):
def __init__(self, dim):
super().__init__(normalized_shape=dim, eps=1e-6)
def forward(self, x):
x = rearrange(x, 'b c h w -> b h w c')
x = super().forward(x)
x = rearrange(x, 'b h w c -> b c h w')
return x.contiguous()
class GRN(nn.Module):
""" GRN (Global Response Normalization) layer
Originally proposed in ConvNeXt V2 (https://arxiv.org/abs/2301.00808)
This implementation is more efficient than the original (https://github.com/facebookresearch/ConvNeXt-V2)
We assume the inputs to this layer are (N, C, H, W)
"""
def __init__(self, dim, use_bias=True):
super().__init__()
self.use_bias = use_bias
self.gamma = nn.Parameter(torch.zeros(1, dim, 1, 1))
if self.use_bias:
self.beta = nn.Parameter(torch.zeros(1, dim, 1, 1))
def forward(self, x):
Gx = torch.norm(x, p=2, dim=(-1, -2), keepdim=True)
Nx = Gx / (Gx.mean(dim=1, keepdim=True) + 1e-6)
if self.use_bias:
return (self.gamma * Nx + 1) * x + self.beta
else:
return (self.gamma * Nx + 1) * x
class DilatedReparamBlock(nn.Module):
"""
Dilated Reparam Block proposed in UniRepLKNet (https://github.com/AILab-CVC/UniRepLKNet)
We assume the inputs to this block are (N, C, H, W)
"""
def __init__(self, channels, kernel_size, deploy, use_sync_bn=False, attempt_use_lk_impl=True):
super().__init__()
self.lk_origin = get_conv2d(channels, channels, kernel_size, stride=1,
padding=kernel_size//2, dilation=1, groups=channels, bias=deploy,
attempt_use_lk_impl=attempt_use_lk_impl)
self.attempt_use_lk_impl = attempt_use_lk_impl
# Default settings. We did not tune them carefully. Different settings may work better.
if kernel_size == 19:
self.kernel_sizes = [5, 7, 9, 9, 3, 3, 3]
self.dilates = [1, 1, 1, 2, 4, 5, 7]
elif kernel_size == 17:
self.kernel_sizes = [5, 7, 9, 3, 3, 3]
self.dilates = [1, 1, 2, 4, 5, 7]
elif kernel_size == 15:
self.kernel_sizes = [5, 7, 7, 3, 3, 3]
self.dilates = [1, 1, 2, 3, 5, 7]
elif kernel_size == 13:
self.kernel_sizes = [5, 7, 7, 3, 3, 3]
self.dilates = [1, 1, 2, 3, 4, 5]
elif kernel_size == 11:
self.kernel_sizes = [5, 7, 5, 3, 3, 3]
self.dilates = [1, 1, 2, 3, 4, 5]
elif kernel_size == 9:
self.kernel_sizes = [5, 7, 5, 3, 3]
self.dilates = [1, 1, 2, 3, 4]
elif kernel_size == 7:
self.kernel_sizes = [5, 3, 3, 3]
self.dilates = [1, 1, 2, 3]
elif kernel_size == 5:
self.kernel_sizes = [3, 3]
self.dilates = [1, 2]
else:
raise ValueError('Dilated Reparam Block requires kernel_size >= 5')
if not deploy:
self.origin_bn = get_bn(channels, use_sync_bn)
for k, r in zip(self.kernel_sizes, self.dilates):
self.__setattr__('dil_conv_k{}_{}'.format(k, r),
nn.Conv2d(in_channels=channels, out_channels=channels, kernel_size=k, stride=1,
padding=(r * (k - 1) + 1) // 2, dilation=r, groups=channels,
bias=False))
self.__setattr__('dil_bn_k{}_{}'.format(k, r), get_bn(channels, use_sync_bn=use_sync_bn))
def forward(self, x):
if not hasattr(self, 'origin_bn'): # deploy mode
return self.lk_origin(x)
out = self.origin_bn(self.lk_origin(x))
for k, r in zip(self.kernel_sizes, self.dilates):
conv = self.__getattr__('dil_conv_k{}_{}'.format(k, r))
bn = self.__getattr__('dil_bn_k{}_{}'.format(k, r))
out = out + bn(conv(x))
return out
def merge_dilated_branches(self):
if hasattr(self, 'origin_bn'):
origin_k, origin_b = fuse_bn(self.lk_origin, self.origin_bn)
for k, r in zip(self.kernel_sizes, self.dilates):
conv = self.__getattr__('dil_conv_k{}_{}'.format(k, r))
bn = self.__getattr__('dil_bn_k{}_{}'.format(k, r))
branch_k, branch_b = fuse_bn(conv, bn)
origin_k = merge_dilated_into_large_kernel(origin_k, branch_k, r)
origin_b += branch_b
merged_conv = get_conv2d(origin_k.size(0), origin_k.size(0), origin_k.size(2), stride=1,
padding=origin_k.size(2)//2, dilation=1, groups=origin_k.size(0), bias=True,
attempt_use_lk_impl=self.attempt_use_lk_impl)
merged_conv.weight.data = origin_k
merged_conv.bias.data = origin_b
self.lk_origin = merged_conv
self.__delattr__('origin_bn')
for k, r in zip(self.kernel_sizes, self.dilates):
self.__delattr__('dil_conv_k{}_{}'.format(k, r))
self.__delattr__('dil_bn_k{}_{}'.format(k, r))
class ResDWConv(nn.Conv2d):
'''
Depthwise conv with residual connection
'''
def __init__(self, dim, kernel_size=3):
super().__init__(dim, dim, kernel_size=kernel_size, padding=kernel_size//2, groups=dim)
def forward(self, x):
x = x + super().forward(x)
return x
class ContMixBlock(nn.Module):
'''
A plug-and-play implementation of ContMix module with FFN layer
Paper: https://arxiv.org/abs/2502.20087
'''
def __init__(self,
dim=64,
kernel_size=7,
smk_size=5,
num_heads=2,
mlp_ratio=4,
res_scale=False,
ls_init_value=None,
drop_path=0,
norm_layer=LayerNorm2d,
use_gemm=False,
deploy=False,
use_checkpoint=False,
**kwargs):
super().__init__()
'''
Args:
kernel_size: kernel size of the main ContMix branch, default is 7
smk_size: kernel size of the secondary ContMix branch, default is 5
num_heads: number of dynamic kernel heads, default is 2
mlp_ratio: ratio of mlp hidden dim to embedding dim, default is 4
res_scale: whether to use residual layer scale, default is False
ls_init_value: layer scale init value, default is None
drop_path: drop path rate, default is 0
norm_layer: normalization layer, default is LayerNorm2d
use_gemm: whether to use iGEMM implementation for large kernel conv, default is False
deploy: whether to use deploy mode, default is False
use_checkpoint: whether to use grad checkpointing, default is False
**kwargs: other arguments
'''
mlp_dim = int(dim*mlp_ratio)
self.kernel_size = kernel_size
self.res_scale = res_scale
self.use_gemm = use_gemm
self.smk_size = smk_size
self.num_heads = num_heads * 2
head_dim = dim // self.num_heads
self.scale = head_dim ** -0.5
self.use_checkpoint = use_checkpoint
self.dwconv1 = ResDWConv(dim, kernel_size=3)
self.norm1 = norm_layer(dim)
self.weight_query = nn.Sequential(
nn.Conv2d(dim, dim//2, kernel_size=1, bias=False),
nn.BatchNorm2d(dim//2),
)
self.weight_key = nn.Sequential(
nn.AdaptiveAvgPool2d(7),
nn.Conv2d(dim, dim//2, kernel_size=1, bias=False),
nn.BatchNorm2d(dim//2),
)
self.weight_value = nn.Sequential(
nn.Conv2d(dim, dim, kernel_size=1, bias=False),
nn.BatchNorm2d(dim),
)
self.weight_proj = nn.Conv2d(49, kernel_size**2 + smk_size**2, kernel_size=1)
self.fusion_proj = nn.Sequential(
nn.Conv2d(dim, dim, kernel_size=1, bias=False),
nn.BatchNorm2d(dim),
)
self.lepe = nn.Sequential(
DilatedReparamBlock(dim, kernel_size=kernel_size, deploy=deploy, use_sync_bn=False, attempt_use_lk_impl=use_gemm),
nn.BatchNorm2d(dim),
)
self.se_layer = SEModule(dim)
self.gate = nn.Sequential(
nn.Conv2d(dim, dim, kernel_size=1, bias=False),
nn.BatchNorm2d(dim),
nn.SiLU(),
)
self.proj = nn.Sequential(
nn.BatchNorm2d(dim),
nn.Conv2d(dim, dim, kernel_size=1),
)
self.dwconv2 = ResDWConv(dim, kernel_size=3)
self.norm2 = norm_layer(dim)
self.mlp = nn.Sequential(
nn.Conv2d(dim, mlp_dim, kernel_size=1),
nn.GELU(),
ResDWConv(mlp_dim, kernel_size=3),
GRN(mlp_dim),
nn.Conv2d(mlp_dim, dim, kernel_size=1),
)
self.ls1 = LayerScale(dim, init_value=ls_init_value) if ls_init_value is not None else nn.Identity()
self.ls2 = LayerScale(dim, init_value=ls_init_value) if ls_init_value is not None else nn.Identity()
self.drop_path = DropPath(drop_path) if drop_path > 0 else nn.Identity()
self.get_rpb()
def get_rpb(self):
self.rpb_size1 = 2 * self.smk_size - 1
self.rpb1 = nn.Parameter(torch.empty(self.num_heads, self.rpb_size1, self.rpb_size1))
self.rpb_size2 = 2 * self.kernel_size - 1
self.rpb2 = nn.Parameter(torch.empty(self.num_heads, self.rpb_size2, self.rpb_size2))
nn.init.trunc_normal_(self.rpb1, std=0.02)
nn.init.trunc_normal_(self.rpb2, std=0.02)
@torch.no_grad()
def generate_idx(self, kernel_size):
rpb_size = 2 * kernel_size - 1
idx_h = torch.arange(0, kernel_size)
idx_w = torch.arange(0, kernel_size)
idx_k = ((idx_h.unsqueeze(-1) * rpb_size) + idx_w).view(-1)
return (idx_h, idx_w, idx_k)
def apply_rpb(self, attn, rpb, height, width, kernel_size, idx_h, idx_w, idx_k):
"""
RPB implementation directly borrowed from https://tinyurl.com/mrbub4t3
"""
num_repeat_h = torch.ones(kernel_size, dtype=torch.long)
num_repeat_w = torch.ones(kernel_size, dtype=torch.long)
num_repeat_h[kernel_size//2] = height - (kernel_size-1)
num_repeat_w[kernel_size//2] = width - (kernel_size-1)
bias_hw = (idx_h.repeat_interleave(num_repeat_h).unsqueeze(-1) * (2*kernel_size-1)) + idx_w.repeat_interleave(num_repeat_w)
bias_idx = bias_hw.unsqueeze(-1) + idx_k
bias_idx = bias_idx.reshape(-1, int(kernel_size**2))
bias_idx = torch.flip(bias_idx, [0])
rpb = torch.flatten(rpb, 1, 2)[:, bias_idx]
rpb = rpb.reshape(1, int(self.num_heads), int(height), int(width), int(kernel_size**2))
return attn + rpb
def reparm(self):
for m in self.modules():
if isinstance(m, DilatedReparamBlock):
m.merge_dilated_branches()
def _forward_inner(self, x):
input_resolution = x.shape[2:]
B, C, H, W = x.shape
x = self.dwconv1(x)
identity = x
x = self.norm1(x)
gate = self.gate(x)
lepe = self.lepe(x)
is_pad = False
if min(H, W) < self.kernel_size:
is_pad = True
if H < W:
size = (self.kernel_size, int(self.kernel_size / H * W))
else:
size = (int(self.kernel_size / W * H), self.kernel_size)
x = F.interpolate(x, size=size, mode='bilinear', align_corners=False)
H, W = size
query = self.weight_query(x) * self.scale
key = self.weight_key(x)
value = self.weight_value(x)
query = rearrange(query, 'b (g c) h w -> b g c (h w)', g=self.num_heads)
key = rearrange(key, 'b (g c) h w -> b g c (h w)', g=self.num_heads)
weight = einsum(query, key, 'b g c n, b g c l -> b g n l')
weight = rearrange(weight, 'b g n l -> b l g n').contiguous()
weight = self.weight_proj(weight)
weight = rearrange(weight, 'b l g (h w) -> b g h w l', h=H, w=W)
attn1, attn2 = torch.split(weight, split_size_or_sections=[self.smk_size**2, self.kernel_size**2], dim=-1)
rpb1_idx = self.generate_idx(self.smk_size)
rpb2_idx = self.generate_idx(self.kernel_size)
attn1 = self.apply_rpb(attn1, self.rpb1, H, W, self.smk_size, *rpb1_idx)
attn2 = self.apply_rpb(attn2, self.rpb2, H, W, self.kernel_size, *rpb2_idx)
attn1 = torch.softmax(attn1, dim=-1)
attn2 = torch.softmax(attn2, dim=-1)
value = rearrange(value, 'b (m g c) h w -> m b g h w c', m=2, g=self.num_heads)
if has_natten:
x1 = na2d_av(attn1, value[0], kernel_size=self.smk_size)
x2 = na2d_av(attn2, value[1], kernel_size=self.kernel_size)
else:
pad1 = self.smk_size // 2
pad2 = self.kernel_size // 2
H_o1 = H - 2 * pad1
W_o1 = W - 2 * pad1
H_o2 = H - 2 * pad2
W_o2 = W - 2 * pad2
v1 = rearrange(value[0], 'b g h w c -> b (g c) h w')
v2 = rearrange(value[1], 'b g h w c -> b (g c) h w')
v1 = F.unfold(v1, kernel_size=self.smk_size).reshape(B, -1, H_o1, W_o1)
v2 = F.unfold(v2, kernel_size=self.kernel_size).reshape(B, -1, H_o2, W_o2)
v1 = F.pad(v1, (pad1, pad1, pad1, pad1), mode='replicate')
v2 = F.pad(v2, (pad2, pad2, pad2, pad2), mode='replicate')
v1 = rearrange(v1, 'b (g c k) h w -> b g c h w k', g=self.num_heads, k=self.smk_size**2, h=H, w=W)
v2 = rearrange(v2, 'b (g c k) h w -> b g c h w k', g=self.num_heads, k=self.kernel_size**2, h=H, w=W)
x1 = einsum(attn1, v1, 'b g h w k, b g c h w k -> b g h w c')
x2 = einsum(attn2, v2, 'b g h w k, b g c h w k -> b g h w c')
x = torch.cat([x1, x2], dim=1)
x = rearrange(x, 'b g h w c -> b (g c) h w', h=H, w=W)
if is_pad:
x = F.adaptive_avg_pool2d(x, input_resolution)
x = self.fusion_proj(x)
x = x + lepe
x = self.se_layer(x)
x = gate * x
x = self.proj(x)
if self.res_scale:
x = self.ls1(identity) + self.drop_path(x)
else:
x = identity + self.drop_path(self.ls1(x))
x = self.dwconv2(x)
if self.res_scale:
x = self.ls2(x) + self.drop_path(self.mlp(self.norm2(x)))
else:
x = x + self.drop_path(self.ls2(self.mlp(self.norm2(x))))
return x
def forward(self, x):
if self.use_checkpoint and x.requires_grad:
x = checkpoint(self._forward_inner, x, use_reentrant=False)
else:
x = self._forward_inner(x)
return x
if __name__ == '__main__':
from timm.utils import random_seed
random_seed(6)
x = torch.randn(1, 64, 32, 32).cuda()
model = ContMixBlock(dim=64,
num_heads=2,
kernel_size=13,
smk_size=5,
mlp_ratio=4,
res_scale=True,
ls_init_value=1,
drop_path=0,
norm_layer=LayerNorm2d,
use_gemm=True,
deploy=False,
use_checkpoint=False)
print(model)
model.cuda()
model.eval()
y = model(x)
print(y.shape)
# Reparametrize model, more details can be found at:
# https://github.com/AILab-CVC/UniRepLKNet/tree/main
model.reparm()
z = model(x)
# Showing difference between original and reparametrized model
print((z - y).abs().sum() / y.abs().sum())
================================================
FILE: models/overlock.py
================================================
'''
This is an official implementation of OverLoCK model proposed in the paper:
https://arxiv.org/abs/2502.20087
'''
import torch
import timm
import torch.distributed
import torch.nn.functional as F
from torch import nn
from einops import rearrange, einsum
from natten.functional import na2d_av
from mmengine.runner import load_checkpoint
from torch.utils.checkpoint import checkpoint
from timm.models.layers import DropPath, to_2tuple
from timm.models.registry import register_model
def get_conv2d(in_channels,
out_channels,
kernel_size,
stride,
padding,
dilation,
groups,
bias,
attempt_use_lk_impl=True):
kernel_size = to_2tuple(kernel_size)
if padding is None:
padding = (kernel_size[0] // 2, kernel_size[1] // 2)
else:
padding = to_2tuple(padding)
need_large_impl = kernel_size[0] == kernel_size[1] and kernel_size[0] > 5 and padding == (kernel_size[0] // 2, kernel_size[1] // 2)
if attempt_use_lk_impl and need_large_impl:
print('---------------- trying to import iGEMM implementation for large-kernel conv')
try:
from depthwise_conv2d_implicit_gemm import DepthWiseConv2dImplicitGEMM
print('---------------- found iGEMM implementation ')
except:
DepthWiseConv2dImplicitGEMM = None
print('---------------- found no iGEMM. use original conv. follow https://github.com/AILab-CVC/UniRepLKNet to install it.')
if DepthWiseConv2dImplicitGEMM is not None and need_large_impl and in_channels == out_channels \
and out_channels == groups and stride == 1 and dilation == 1:
print(f'===== iGEMM Efficient Conv Impl, channels {in_channels}, kernel size {kernel_size} =====')
return DepthWiseConv2dImplicitGEMM(in_channels, kernel_size, bias=bias)
return nn.Conv2d(in_channels, out_channels,
kernel_size=kernel_size,
stride=stride,
padding=padding,
dilation=dilation,
groups=groups,
bias=bias)
def get_bn(dim, use_sync_bn=False):
if use_sync_bn:
return nn.SyncBatchNorm(dim)
else:
return nn.BatchNorm2d(dim)
def fuse_bn(conv, bn):
conv_bias = 0 if conv.bias is None else conv.bias
std = (bn.running_var + bn.eps).sqrt()
return conv.weight * (bn.weight / std).reshape(-1, 1, 1, 1), bn.bias + (conv_bias - bn.running_mean) * bn.weight / std
def convert_dilated_to_nondilated(kernel, dilate_rate):
identity_kernel = torch.ones((1, 1, 1, 1)).to(kernel.device)
if kernel.size(1) == 1:
# This is a DW kernel
dilated = F.conv_transpose2d(kernel, identity_kernel, stride=dilate_rate)
return dilated
else:
# This is a dense or group-wise (but not DW) kernel
slices = []
for i in range(kernel.size(1)):
dilated = F.conv_transpose2d(kernel[:,i:i+1,:,:], identity_kernel, stride=dilate_rate)
slices.append(dilated)
return torch.cat(slices, dim=1)
def merge_dilated_into_large_kernel(large_kernel, dilated_kernel, dilated_r):
large_k = large_kernel.size(2)
dilated_k = dilated_kernel.size(2)
equivalent_kernel_size = dilated_r * (dilated_k - 1) + 1
equivalent_kernel = convert_dilated_to_nondilated(dilated_kernel, dilated_r)
rows_to_pad = large_k // 2 - equivalent_kernel_size // 2
merged_kernel = large_kernel + F.pad(equivalent_kernel, [rows_to_pad] * 4)
return merged_kernel
def stem(in_chans=3, embed_dim=96):
return nn.Sequential(
nn.Conv2d(in_chans, embed_dim//2, kernel_size=3, stride=2, padding=1, bias=False),
nn.BatchNorm2d(embed_dim//2),
nn.GELU(),
nn.Conv2d(embed_dim//2, embed_dim//2, kernel_size=3, padding=1, bias=False),
nn.BatchNorm2d(embed_dim//2),
nn.GELU(),
nn.Conv2d(embed_dim//2, embed_dim, kernel_size=3, stride=2, padding=1, bias=False),
nn.BatchNorm2d(embed_dim),
nn.GELU(),
nn.Conv2d(embed_dim, embed_dim, kernel_size=3, padding=1, bias=False),
nn.BatchNorm2d(embed_dim)
)
def downsample(in_dim, out_dim):
return nn.Sequential(
nn.Conv2d(in_dim, out_dim, kernel_size=3, stride=2, padding=1, bias=False),
nn.BatchNorm2d(out_dim),
)
class SEModule(nn.Module):
def __init__(self, dim, red=8, inner_act=nn.GELU, out_act=nn.Sigmoid):
super().__init__()
inner_dim = max(16, dim // red)
self.proj = nn.Sequential(
nn.AdaptiveAvgPool2d(1),
nn.Conv2d(dim, inner_dim, kernel_size=1),
inner_act(),
nn.Conv2d(inner_dim, dim, kernel_size=1),
out_act(),
)
def forward(self, x):
x = x * self.proj(x)
return x
class LayerScale(nn.Module):
def __init__(self, dim, init_value=1e-5):
super().__init__()
self.weight = nn.Parameter(torch.ones(dim, 1, 1, 1)*init_value,
requires_grad=True)
self.bias = nn.Parameter(torch.zeros(dim), requires_grad=True)
def forward(self, x):
x = F.conv2d(x, weight=self.weight, bias=self.bias, groups=x.shape[1])
return x
class LayerNorm2d(nn.LayerNorm):
def __init__(self, dim):
super().__init__(normalized_shape=dim, eps=1e-6)
def forward(self, x):
x = rearrange(x, 'b c h w -> b h w c')
x = super().forward(x)
x = rearrange(x, 'b h w c -> b c h w')
return x.contiguous()
class GRN(nn.Module):
""" GRN (Global Response Normalization) layer
Originally proposed in ConvNeXt V2 (https://arxiv.org/abs/2301.00808)
This implementation is more efficient than the original (https://github.com/facebookresearch/ConvNeXt-V2)
We assume the inputs to this layer are (N, C, H, W)
"""
def __init__(self, dim, use_bias=True):
super().__init__()
self.use_bias = use_bias
self.gamma = nn.Parameter(torch.zeros(1, dim, 1, 1))
if self.use_bias:
self.beta = nn.Parameter(torch.zeros(1, dim, 1, 1))
def forward(self, x):
Gx = torch.norm(x, p=2, dim=(-1, -2), keepdim=True)
Nx = Gx / (Gx.mean(dim=1, keepdim=True) + 1e-6)
if self.use_bias:
return (self.gamma * Nx + 1) * x + self.beta
else:
return (self.gamma * Nx + 1) * x
class DilatedReparamBlock(nn.Module):
"""
Dilated Reparam Block proposed in UniRepLKNet (https://github.com/AILab-CVC/UniRepLKNet)
We assume the inputs to this block are (N, C, H, W)
"""
def __init__(self, channels, kernel_size, deploy, use_sync_bn=False, attempt_use_lk_impl=True):
super().__init__()
self.lk_origin = get_conv2d(channels, channels, kernel_size, stride=1,
padding=kernel_size//2, dilation=1, groups=channels, bias=deploy,
attempt_use_lk_impl=attempt_use_lk_impl)
self.attempt_use_lk_impl = attempt_use_lk_impl
# Default settings. We did not tune them carefully. Different settings may work better.
if kernel_size == 19:
self.kernel_sizes = [5, 7, 9, 9, 3, 3, 3]
self.dilates = [1, 1, 1, 2, 4, 5, 7]
elif kernel_size == 17:
self.kernel_sizes = [5, 7, 9, 3, 3, 3]
self.dilates = [1, 1, 2, 4, 5, 7]
elif kernel_size == 15:
self.kernel_sizes = [5, 7, 7, 3, 3, 3]
self.dilates = [1, 1, 2, 3, 5, 7]
elif kernel_size == 13:
self.kernel_sizes = [5, 7, 7, 3, 3, 3]
self.dilates = [1, 1, 2, 3, 4, 5]
elif kernel_size == 11:
self.kernel_sizes = [5, 7, 5, 3, 3, 3]
self.dilates = [1, 1, 2, 3, 4, 5]
elif kernel_size == 9:
self.kernel_sizes = [5, 7, 5, 3, 3]
self.dilates = [1, 1, 2, 3, 4]
elif kernel_size == 7:
self.kernel_sizes = [5, 3, 3, 3]
self.dilates = [1, 1, 2, 3]
elif kernel_size == 5:
self.kernel_sizes = [3, 3]
self.dilates = [1, 2]
else:
raise ValueError('Dilated Reparam Block requires kernel_size >= 5')
if not deploy:
self.origin_bn = get_bn(channels, use_sync_bn)
for k, r in zip(self.kernel_sizes, self.dilates):
self.__setattr__('dil_conv_k{}_{}'.format(k, r),
nn.Conv2d(in_channels=channels, out_channels=channels, kernel_size=k, stride=1,
padding=(r * (k - 1) + 1) // 2, dilation=r, groups=channels,
bias=False))
self.__setattr__('dil_bn_k{}_{}'.format(k, r), get_bn(channels, use_sync_bn=use_sync_bn))
def forward(self, x):
if not hasattr(self, 'origin_bn'): # deploy mode
return self.lk_origin(x)
out = self.origin_bn(self.lk_origin(x))
for k, r in zip(self.kernel_sizes, self.dilates):
conv = self.__getattr__('dil_conv_k{}_{}'.format(k, r))
bn = self.__getattr__('dil_bn_k{}_{}'.format(k, r))
out = out + bn(conv(x))
return out
def merge_dilated_branches(self):
if hasattr(self, 'origin_bn'):
origin_k, origin_b = fuse_bn(self.lk_origin, self.origin_bn)
for k, r in zip(self.kernel_sizes, self.dilates):
conv = self.__getattr__('dil_conv_k{}_{}'.format(k, r))
bn = self.__getattr__('dil_bn_k{}_{}'.format(k, r))
branch_k, branch_b = fuse_bn(conv, bn)
origin_k = merge_dilated_into_large_kernel(origin_k, branch_k, r)
origin_b += branch_b
merged_conv = get_conv2d(origin_k.size(0), origin_k.size(0), origin_k.size(2), stride=1,
padding=origin_k.size(2)//2, dilation=1, groups=origin_k.size(0), bias=True,
attempt_use_lk_impl=self.attempt_use_lk_impl)
merged_conv.weight.data = origin_k
merged_conv.bias.data = origin_b
self.lk_origin = merged_conv
self.__delattr__('origin_bn')
for k, r in zip(self.kernel_sizes, self.dilates):
self.__delattr__('dil_conv_k{}_{}'.format(k, r))
self.__delattr__('dil_bn_k{}_{}'.format(k, r))
class CTXDownsample(nn.Module):
def __init__(self, dim, h_dim):
super().__init__()
self.x_proj = nn.Sequential(
nn.Conv2d(dim, h_dim, kernel_size=3, stride=2, padding=1, bias=False),
nn.BatchNorm2d(h_dim)
)
self.h_proj = nn.Sequential(
nn.Conv2d(h_dim//4, h_dim//4, kernel_size=3, stride=2, padding=1, bias=False),
nn.BatchNorm2d(h_dim//4)
)
def forward(self, x, ctx):
x = self.x_proj(x)
ctx = self.h_proj(ctx)
return (x, ctx)
class ResDWConv(nn.Conv2d):
'''
Depthwise convolution with residual connection
'''
def __init__(self, dim, kernel_size=3):
super().__init__(dim, dim, kernel_size=kernel_size, padding=kernel_size//2, groups=dim)
def forward(self, x):
x = x + super().forward(x)
return x
class RepConvBlock(nn.Module):
def __init__(self,
dim=64,
kernel_size=7,
mlp_ratio=4,
ls_init_value=None,
res_scale=False,
drop_path=0,
norm_layer=LayerNorm2d,
use_gemm=False,
deploy=False,
use_checkpoint=False):
super().__init__()
self.res_scale = res_scale
self.use_checkpoint = use_checkpoint
mlp_dim = int(dim*mlp_ratio)
self.dwconv = ResDWConv(dim, kernel_size=3)
self.proj = nn.Sequential(
norm_layer(dim),
DilatedReparamBlock(dim, kernel_size=kernel_size, deploy=deploy, use_sync_bn=False, attempt_use_lk_impl=use_gemm),
nn.BatchNorm2d(dim),
SEModule(dim),
nn.Conv2d(dim, mlp_dim, kernel_size=1),
nn.GELU(),
ResDWConv(mlp_dim, kernel_size=3),
GRN(mlp_dim),
nn.Conv2d(mlp_dim, dim, kernel_size=1),
DropPath(drop_path) if drop_path > 0 else nn.Identity(),
)
self.ls = LayerScale(dim, init_value=ls_init_value) if ls_init_value is not None else nn.Identity()
def forward_features(self, x):
x = self.dwconv(x)
if self.res_scale:
x = self.ls(x) + self.proj(x)
else:
drop_path = self.proj[-1]
x = x + drop_path(self.ls(self.proj[:-1](x)))
return x
def forward(self, x):
if self.use_checkpoint and x.requires_grad:
x = checkpoint(self.forward_features, x, use_reentrant=False)
else:
x = self.forward_features(x)
return x
class DynamicConvBlock(nn.Module):
def __init__(self,
dim=64,
ctx_dim=32,
kernel_size=7,
smk_size=5,
num_heads=2,
mlp_ratio=4,
ls_init_value=None,
res_scale=False,
drop_path=0,
norm_layer=LayerNorm2d,
is_first=False,
is_last=False,
use_gemm=False,
deploy=False,
use_checkpoint=False,
**kwargs):
super().__init__()
ctx_dim = ctx_dim // 4
out_dim = dim + ctx_dim
mlp_dim = int(dim*mlp_ratio)
self.kernel_size = kernel_size
self.res_scale = res_scale
self.use_gemm = use_gemm
self.smk_size = smk_size
self.num_heads = num_heads * 2
head_dim = dim // self.num_heads
self.scale = head_dim ** -0.5
self.is_first = is_first
self.is_last = is_last
self.use_checkpoint = use_checkpoint
if not is_first:
self.x_scale = LayerScale(ctx_dim, init_value=1)
self.h_scale = LayerScale(ctx_dim, init_value=1)
self.dwconv1 = ResDWConv(out_dim, kernel_size=3)
self.norm1 = norm_layer(out_dim)
self.fusion = nn.Sequential(
nn.Conv2d(out_dim, out_dim, kernel_size=3, padding=1, groups=out_dim),
nn.BatchNorm2d(out_dim),
nn.GELU(),
nn.Conv2d(out_dim, dim, kernel_size=1),
GRN(dim),
)
self.weight_query = nn.Sequential(
nn.Conv2d(dim, dim//2, kernel_size=1, bias=False),
nn.BatchNorm2d(dim//2),
)
self.weight_key = nn.Sequential(
nn.AdaptiveAvgPool2d(7),
nn.Conv2d(ctx_dim, dim//2, kernel_size=1, bias=False),
nn.BatchNorm2d(dim//2),
)
self.weight_proj = nn.Conv2d(49, kernel_size**2 + smk_size**2, kernel_size=1)
self.dyconv_proj = nn.Sequential(
nn.Conv2d(dim, dim, kernel_size=1, bias=False),
nn.BatchNorm2d(dim),
)
self.lepe = nn.Sequential(
DilatedReparamBlock(dim, kernel_size=kernel_size, deploy=deploy, use_sync_bn=False, attempt_use_lk_impl=use_gemm),
nn.BatchNorm2d(dim),
)
self.se_layer = SEModule(dim)
self.gate = nn.Sequential(
nn.Conv2d(dim, dim, kernel_size=1, bias=False),
nn.BatchNorm2d(dim),
nn.SiLU(),
)
self.proj = nn.Sequential(
nn.BatchNorm2d(dim),
nn.Conv2d(dim, out_dim, kernel_size=1),
)
self.dwconv2 = ResDWConv(out_dim, kernel_size=3)
self.norm2 = norm_layer(out_dim)
self.mlp = nn.Sequential(
nn.Conv2d(out_dim, mlp_dim, kernel_size=1),
nn.GELU(),
ResDWConv(mlp_dim, kernel_size=3),
GRN(mlp_dim),
nn.Conv2d(mlp_dim, out_dim, kernel_size=1),
)
self.ls1 = LayerScale(out_dim, init_value=ls_init_value) if ls_init_value is not None else nn.Identity()
self.ls2 = LayerScale(out_dim, init_value=ls_init_value) if ls_init_value is not None else nn.Identity()
self.drop_path = DropPath(drop_path) if drop_path > 0 else nn.Identity()
self.get_rpb()
def get_rpb(self):
self.rpb_size1 = 2 * self.smk_size - 1
self.rpb1 = nn.Parameter(torch.empty(self.num_heads, self.rpb_size1, self.rpb_size1))
self.rpb_size2 = 2 * self.kernel_size - 1
self.rpb2 = nn.Parameter(torch.empty(self.num_heads, self.rpb_size2, self.rpb_size2))
nn.init.zeros_(self.rpb1)
nn.init.zeros_(self.rpb2)
@torch.no_grad()
def generate_idx(self, kernel_size):
rpb_size = 2 * kernel_size - 1
idx_h = torch.arange(0, kernel_size)
idx_w = torch.arange(0, kernel_size)
idx_k = ((idx_h.unsqueeze(-1) * rpb_size) + idx_w).view(-1)
return (idx_h, idx_w, idx_k)
def apply_rpb(self, attn, rpb, height, width, kernel_size, idx_h, idx_w, idx_k):
"""
RPB implementation directly borrowed from https://tinyurl.com/mrbub4t3
"""
num_repeat_h = torch.ones(kernel_size, dtype=torch.long)
num_repeat_w = torch.ones(kernel_size, dtype=torch.long)
num_repeat_h[kernel_size//2] = height - (kernel_size-1)
num_repeat_w[kernel_size//2] = width - (kernel_size-1)
bias_hw = (idx_h.repeat_interleave(num_repeat_h).unsqueeze(-1) * (2*kernel_size-1)) + idx_w.repeat_interleave(num_repeat_w)
bias_idx = bias_hw.unsqueeze(-1) + idx_k
bias_idx = bias_idx.reshape(-1, int(kernel_size**2))
bias_idx = torch.flip(bias_idx, [0])
rpb = torch.flatten(rpb, 1, 2)[:, bias_idx]
rpb = rpb.reshape(1, int(self.num_heads), int(height), int(width), int(kernel_size**2))
return attn + rpb
def _forward_inner(self, x, h_x, h_r):
input_resoltion = x.shape[2:]
B, C, H, W = x.shape
B, C_h, H_h, W_h = h_x.shape
if not self.is_first:
h_x = self.x_scale(h_x) + self.h_scale(h_r)
x_f = torch.cat([x, h_x], dim=1)
x_f = self.dwconv1(x_f)
identity = x_f
x_f = self.norm1(x_f)
x = self.fusion(x_f)
gate = self.gate(x)
lepe = self.lepe(x)
is_pad = False
if min(H, W) < self.kernel_size:
is_pad = True
if H < W:
size = (self.kernel_size, int(self.kernel_size / H * W))
else:
size = (int(self.kernel_size / W * H), self.kernel_size)
x = F.interpolate(x, size=size, mode='bilinear', align_corners=False)
x_f = F.interpolate(x_f, size=size, mode='bilinear', align_corners=False)
H, W = size
query, key = torch.split(x_f, split_size_or_sections=[C, C_h], dim=1)
query = self.weight_query(query) * self.scale
key = self.weight_key(key)
query = rearrange(query, 'b (g c) h w -> b g c (h w)', g=self.num_heads)
key = rearrange(key, 'b (g c) h w -> b g c (h w)', g=self.num_heads)
weight = einsum(query, key, 'b g c n, b g c l -> b g n l')
weight = rearrange(weight, 'b g n l -> b l g n').contiguous()
weight = self.weight_proj(weight)
weight = rearrange(weight, 'b l g (h w) -> b g h w l', h=H, w=W)
attn1, attn2 = torch.split(weight, split_size_or_sections=[self.smk_size**2, self.kernel_size**2], dim=-1)
rpb1_idx = self.generate_idx(self.smk_size)
rpb2_idx = self.generate_idx(self.kernel_size)
attn1 = self.apply_rpb(attn1, self.rpb1, H, W, self.smk_size, *rpb1_idx)
attn2 = self.apply_rpb(attn2, self.rpb2, H, W, self.kernel_size, *rpb2_idx)
attn1 = torch.softmax(attn1, dim=-1)
attn2 = torch.softmax(attn2, dim=-1)
value = rearrange(x, 'b (m g c) h w -> m b g h w c', m=2, g=self.num_heads)
x1 = na2d_av(attn1, value[0], kernel_size=self.smk_size)
x2 = na2d_av(attn2, value[1], kernel_size=self.kernel_size)
x = torch.cat([x1, x2], dim=1)
x = rearrange(x, 'b g h w c -> b (g c) h w', h=H, w=W)
if is_pad:
x = F.adaptive_avg_pool2d(x, input_resoltion)
x = self.dyconv_proj(x)
x = x + lepe
x = self.se_layer(x)
x = gate * x
x = self.proj(x)
if self.res_scale:
x = self.ls1(identity) + self.drop_path(x)
else:
x = identity + self.drop_path(self.ls1(x))
x = self.dwconv2(x)
if self.res_scale:
x = self.ls2(x) + self.drop_path(self.mlp(self.norm2(x)))
else:
x = x + self.drop_path(self.ls2(self.mlp(self.norm2(x))))
if self.is_last:
return (x, None)
else:
l_x, h_x = torch.split(x, split_size_or_sections=[C, C_h], dim=1)
return (l_x, h_x)
def forward(self, x, h_x, h_r):
if self.use_checkpoint and x.requires_grad:
x = checkpoint(self._forward_inner, x, h_x, h_r, use_reentrant=False)
else:
x = self._forward_inner(x, h_x, h_r)
return x
class OverLoCK(nn.Module):
'''
An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels
https://arxiv.org/abs/2502.20087
'''
def __init__(self,
depth=[2, 2, 2, 2],
sub_depth=[4, 2],
in_chans=3,
embed_dim=[96, 192, 384, 768],
kernel_size=[7, 7, 7, 7],
mlp_ratio=[4, 4, 4, 4],
sub_mlp_ratio=[4, 4],
sub_num_heads=[4, 8],
ls_init_value=[None, None, 1, 1],
res_scale=True,
smk_size=5,
deploy=False,
use_gemm=True,
use_ds=True,
drop_rate=0,
drop_path_rate=0,
norm_layer=LayerNorm2d,
projection=1024,
num_classes=1000,
use_checkpoint=[0, 0, 0, 0],
):
super().__init__()
fusion_dim = embed_dim[-1] + embed_dim[-1]//4
self.num_classes = num_classes
self.num_features = self.embed_dim = embed_dim
self.patch_embed1 = stem(in_chans, embed_dim[0])
self.patch_embed2 = downsample(embed_dim[0], embed_dim[1])
self.patch_embed3 = downsample(embed_dim[1], embed_dim[2])
self.patch_embed4 = downsample(embed_dim[2], embed_dim[3])
self.high_level_proj = nn.Conv2d(embed_dim[-1], embed_dim[-1]//4, kernel_size=1)
self.patch_embedx = CTXDownsample(embed_dim[2], embed_dim[3])
dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(depth) + sum(sub_depth))]
self.blocks1 = nn.ModuleList()
self.blocks2 = nn.ModuleList()
self.blocks3 = nn.ModuleList()
self.blocks4 = nn.ModuleList()
self.sub_blocks3 = nn.ModuleList()
self.sub_blocks4 = nn.ModuleList()
for i in range(depth[0]):
self.blocks1.append(
RepConvBlock(
dim=embed_dim[0],
kernel_size=kernel_size[0],
mlp_ratio=mlp_ratio[0],
ls_init_value=ls_init_value[0],
res_scale=res_scale,
drop_path=dpr[i],
norm_layer=norm_layer,
use_gemm=use_gemm,
deploy=deploy,
use_checkpoint=(i<use_checkpoint[0]),
)
)
for i in range(depth[1]):
self.blocks2.append(
RepConvBlock(
dim=embed_dim[1],
kernel_size=kernel_size[1],
mlp_ratio=mlp_ratio[1],
ls_init_value=ls_init_value[1],
res_scale=res_scale,
drop_path=dpr[i+depth[0]],
norm_layer=norm_layer,
use_gemm=use_gemm,
deploy=deploy,
use_checkpoint=(i<use_checkpoint[1]),
)
)
for i in range(depth[2]):
self.blocks3.append(
RepConvBlock(
dim=embed_dim[2],
kernel_size=kernel_size[2],
mlp_ratio=mlp_ratio[2],
ls_init_value=ls_init_value[2],
res_scale=res_scale,
drop_path=dpr[i+sum(depth[:2])],
norm_layer=norm_layer,
use_gemm=use_gemm,
deploy=deploy,
use_checkpoint=(i<use_checkpoint[2]),
)
)
for i in range(depth[3]):
self.blocks4.append(
RepConvBlock(
dim=embed_dim[3],
kernel_size=kernel_size[3],
mlp_ratio=mlp_ratio[3],
ls_init_value=ls_init_value[3],
res_scale=res_scale,
drop_path=dpr[i+sum(depth[:3])],
norm_layer=norm_layer,
use_gemm=use_gemm,
deploy=deploy,
use_checkpoint=(i<use_checkpoint[3]),
)
)
for i in range(sub_depth[0]):
self.sub_blocks3.append(
DynamicConvBlock(
dim=embed_dim[2],
ctx_dim=embed_dim[-1],
kernel_size=kernel_size[2],
num_heads=sub_num_heads[0],
pool_size=7,
mlp_ratio=sub_mlp_ratio[0],
ls_init_value=ls_init_value[2],
res_scale=res_scale,
drop_path=dpr[i+sum(depth)],
norm_layer=norm_layer,
smk_size=smk_size,
use_gemm=use_gemm,
deploy=deploy,
is_first=(i==0),
use_checkpoint=(i<use_checkpoint[2]),
)
)
for i in range(sub_depth[1]):
self.sub_blocks4.append(
DynamicConvBlock(
gitextract__a0ogxjt/ ├── LICENSE.md ├── README.md ├── detection/ │ ├── configs/ │ │ ├── _base_/ │ │ │ ├── datasets/ │ │ │ │ ├── coco_detection.py │ │ │ │ └── coco_instance.py │ │ │ ├── default_runtime.py │ │ │ ├── models/ │ │ │ │ ├── cascade_mask_rcnn_r50_fpn.py │ │ │ │ ├── cascade_mask_rcnn_r50_fpn_crowdhuman.py │ │ │ │ ├── cascade_rcnn_r50_fpn.py │ │ │ │ ├── fast_rcnn_r50_fpn.py │ │ │ │ ├── faster_rcnn_r50_caffe_c4.py │ │ │ │ ├── faster_rcnn_r50_caffe_dc5.py │ │ │ │ ├── faster_rcnn_r50_fpn.py │ │ │ │ ├── mask_rcnn_convnext_fpn.py │ │ │ │ ├── mask_rcnn_r50_caffe_c4.py │ │ │ │ ├── mask_rcnn_r50_fpn.py │ │ │ │ ├── retinanet_r50_fpn.py │ │ │ │ ├── rpn_r50_caffe_c4.py │ │ │ │ ├── rpn_r50_fpn.py │ │ │ │ └── ssd300.py │ │ │ └── schedules/ │ │ │ ├── schedule_1x.py │ │ │ └── schedule_3x.py │ │ └── maskrcnn_overlock/ │ │ ├── mask_rcnn_overlock_b_in1k_fpn_1x_coco.py │ │ ├── mask_rcnn_overlock_b_in1k_fpn_3x_coco.py │ │ ├── mask_rcnn_overlock_s_in1k_fpn_1x_coco.py │ │ ├── mask_rcnn_overlock_s_in1k_fpn_3x_coco.py │ │ ├── mask_rcnn_overlock_t_in1k_fpn_1x_coco.py │ │ └── mask_rcnn_overlock_t_in1k_fpn_3x_coco.py │ ├── models/ │ │ ├── __init__.py │ │ └── overlock.py │ ├── readme.md │ ├── scripts/ │ │ ├── dist_test.sh │ │ └── dist_train.sh │ ├── test.py │ └── train.py ├── models/ │ ├── __init__.py │ ├── contmix.py │ └── overlock.py ├── scripts/ │ ├── train_b_model.sh │ ├── train_s_model.sh │ ├── train_t_model.sh │ └── train_xt_model.sh ├── segmentation/ │ ├── configs/ │ │ ├── _base_/ │ │ │ ├── datasets/ │ │ │ │ └── ade20k.py │ │ │ ├── default_runtime.py │ │ │ ├── models/ │ │ │ │ ├── fpn_r50.py │ │ │ │ ├── upernet_r50.py │ │ │ │ └── upernet_transnext.py │ │ │ └── schedules/ │ │ │ ├── schedule_160k.py │ │ │ ├── schedule_20k.py │ │ │ ├── schedule_40k.py │ │ │ └── schedule_80k.py │ │ └── overlock/ │ │ ├── upernet_overlock_base_ade20k_8xb2.py │ │ ├── upernet_overlock_small_ade20k_8xb2.py │ │ └── upernet_overlock_tiny_ade20k_8xb2.py │ ├── mmseg_custom/ │ │ ├── __init__.py │ │ └── align_resize.py │ ├── models/ │ │ ├── __init__.py │ │ └── overlock.py │ ├── readme.md │ ├── scripts/ │ │ ├── dist_test.sh │ │ └── dist_train.sh │ ├── test.py │ └── train.py ├── train.py └── validate.py
SYMBOL INDEX (222 symbols across 11 files)
FILE: detection/models/overlock.py
function get_conv2d (line 22) | def get_conv2d(in_channels,
function get_bn (line 61) | def get_bn(dim, use_sync_bn=False):
function fuse_bn (line 68) | def fuse_bn(conv, bn):
function convert_dilated_to_nondilated (line 73) | def convert_dilated_to_nondilated(kernel, dilate_rate):
function merge_dilated_into_large_kernel (line 87) | def merge_dilated_into_large_kernel(large_kernel, dilated_kernel, dilate...
function stem (line 97) | def stem(in_chans=3, embed_dim=96):
function downsample (line 113) | def downsample(in_dim, out_dim):
class SEModule (line 120) | class SEModule(nn.Module):
method __init__ (line 121) | def __init__(self, dim, red=8, inner_act=nn.GELU, out_act=nn.Sigmoid):
method forward (line 132) | def forward(self, x):
class LayerScale (line 138) | class LayerScale(nn.Module):
method __init__ (line 139) | def __init__(self, dim, init_value=1e-5):
method forward (line 145) | def forward(self, x):
class LayerNorm2d (line 150) | class LayerNorm2d(nn.LayerNorm):
method __init__ (line 151) | def __init__(self, dim):
method forward (line 154) | def forward(self, x):
class GRN (line 161) | class GRN(nn.Module):
method __init__ (line 167) | def __init__(self, dim, use_bias=True):
method forward (line 174) | def forward(self, x):
class DilatedReparamBlock (line 184) | class DilatedReparamBlock(nn.Module):
method __init__ (line 189) | def __init__(self, channels, kernel_size, deploy, use_sync_bn=False, a...
method forward (line 233) | def forward(self, x):
method merge_dilated_branches (line 243) | def merge_dilated_branches(self):
class CTXDownsample (line 264) | class CTXDownsample(nn.Module):
method __init__ (line 265) | def __init__(self, dim, h_dim):
method forward (line 277) | def forward(self, x, ctx):
class ResDWConv (line 283) | class ResDWConv(nn.Conv2d):
method __init__ (line 287) | def __init__(self, dim, kernel_size=3):
method forward (line 290) | def forward(self, x):
class RepConvBlock (line 295) | class RepConvBlock(nn.Module):
method __init__ (line 297) | def __init__(self,
method forward_features (line 332) | def forward_features(self, x):
method forward (line 344) | def forward(self, x):
class DynamicConvBlock (line 354) | class DynamicConvBlock(nn.Module):
method __init__ (line 355) | def __init__(self,
method get_rpb (line 458) | def get_rpb(self):
method generate_idx (line 468) | def generate_idx(self, kernel_size):
method apply_rpb (line 476) | def apply_rpb(self, attn, rpb, height, width, kernel_size, idx_h, idx_...
method _forward_inner (line 493) | def _forward_inner(self, x, h_x, h_r):
method forward (line 575) | def forward(self, x, h_x, h_r):
class OverLoCK (line 583) | class OverLoCK(nn.Module):
method __init__ (line 588) | def __init__(self,
method _init_weights (line 775) | def _init_weights(self, m):
method _convert_sync_batchnorm (line 784) | def _convert_sync_batchnorm(self):
method forward_pre_features (line 788) | def forward_pre_features(self, x):
method forward_base_features (line 807) | def forward_base_features(self, x):
method forward_sub_features (line 820) | def forward_sub_features(self, x, ctx):
method forward_features (line 846) | def forward_features(self, x):
method forward (line 854) | def forward(self, x):
function overlock_xt (line 862) | def overlock_xt(pretrained=False, pretrained_cfg=None, **kwargs):
function overlock_t (line 884) | def overlock_t(pretrained=False, pretrained_cfg=None, **kwargs):
function overlock_s (line 906) | def overlock_s(pretrained=False, pretrained_cfg=None, **kwargs):
function overlock_b (line 928) | def overlock_b(pretrained=None, pretrained_cfg=None, **kwargs):
FILE: detection/test.py
function parse_args (line 22) | def parse_args():
function main (line 109) | def main():
FILE: detection/train.py
function parse_args (line 26) | def parse_args():
function main (line 120) | def main():
FILE: models/contmix.py
function get_conv2d (line 22) | def get_conv2d(in_channels,
function get_bn (line 61) | def get_bn(dim, use_sync_bn=False):
function fuse_bn (line 68) | def fuse_bn(conv, bn):
function convert_dilated_to_nondilated (line 73) | def convert_dilated_to_nondilated(kernel, dilate_rate):
function merge_dilated_into_large_kernel (line 87) | def merge_dilated_into_large_kernel(large_kernel, dilated_kernel, dilate...
class SEModule (line 97) | class SEModule(nn.Module):
method __init__ (line 98) | def __init__(self, dim, red=8, inner_act=nn.GELU, out_act=nn.Sigmoid):
method forward (line 109) | def forward(self, x):
class LayerScale (line 114) | class LayerScale(nn.Module):
method __init__ (line 115) | def __init__(self, dim, init_value=1e-5):
method forward (line 121) | def forward(self, x):
class LayerNorm2d (line 128) | class LayerNorm2d(nn.LayerNorm):
method __init__ (line 129) | def __init__(self, dim):
method forward (line 132) | def forward(self, x):
class GRN (line 140) | class GRN(nn.Module):
method __init__ (line 146) | def __init__(self, dim, use_bias=True):
method forward (line 154) | def forward(self, x):
class DilatedReparamBlock (line 164) | class DilatedReparamBlock(nn.Module):
method __init__ (line 169) | def __init__(self, channels, kernel_size, deploy, use_sync_bn=False, a...
method forward (line 213) | def forward(self, x):
method merge_dilated_branches (line 223) | def merge_dilated_branches(self):
class ResDWConv (line 244) | class ResDWConv(nn.Conv2d):
method __init__ (line 248) | def __init__(self, dim, kernel_size=3):
method forward (line 251) | def forward(self, x):
class ContMixBlock (line 256) | class ContMixBlock(nn.Module):
method __init__ (line 261) | def __init__(self,
method get_rpb (line 357) | def get_rpb(self):
method generate_idx (line 366) | def generate_idx(self, kernel_size):
method apply_rpb (line 373) | def apply_rpb(self, attn, rpb, height, width, kernel_size, idx_h, idx_...
method reparm (line 389) | def reparm(self):
method _forward_inner (line 394) | def _forward_inner(self, x):
method forward (line 488) | def forward(self, x):
FILE: models/overlock.py
function get_conv2d (line 18) | def get_conv2d(in_channels,
function get_bn (line 57) | def get_bn(dim, use_sync_bn=False):
function fuse_bn (line 64) | def fuse_bn(conv, bn):
function convert_dilated_to_nondilated (line 69) | def convert_dilated_to_nondilated(kernel, dilate_rate):
function merge_dilated_into_large_kernel (line 83) | def merge_dilated_into_large_kernel(large_kernel, dilated_kernel, dilate...
function stem (line 93) | def stem(in_chans=3, embed_dim=96):
function downsample (line 109) | def downsample(in_dim, out_dim):
class SEModule (line 116) | class SEModule(nn.Module):
method __init__ (line 117) | def __init__(self, dim, red=8, inner_act=nn.GELU, out_act=nn.Sigmoid):
method forward (line 128) | def forward(self, x):
class LayerScale (line 134) | class LayerScale(nn.Module):
method __init__ (line 135) | def __init__(self, dim, init_value=1e-5):
method forward (line 141) | def forward(self, x):
class LayerNorm2d (line 146) | class LayerNorm2d(nn.LayerNorm):
method __init__ (line 147) | def __init__(self, dim):
method forward (line 150) | def forward(self, x):
class GRN (line 157) | class GRN(nn.Module):
method __init__ (line 163) | def __init__(self, dim, use_bias=True):
method forward (line 170) | def forward(self, x):
class DilatedReparamBlock (line 180) | class DilatedReparamBlock(nn.Module):
method __init__ (line 185) | def __init__(self, channels, kernel_size, deploy, use_sync_bn=False, a...
method forward (line 229) | def forward(self, x):
method merge_dilated_branches (line 239) | def merge_dilated_branches(self):
class CTXDownsample (line 260) | class CTXDownsample(nn.Module):
method __init__ (line 261) | def __init__(self, dim, h_dim):
method forward (line 273) | def forward(self, x, ctx):
class ResDWConv (line 279) | class ResDWConv(nn.Conv2d):
method __init__ (line 283) | def __init__(self, dim, kernel_size=3):
method forward (line 286) | def forward(self, x):
class RepConvBlock (line 291) | class RepConvBlock(nn.Module):
method __init__ (line 293) | def __init__(self,
method forward_features (line 328) | def forward_features(self, x):
method forward (line 340) | def forward(self, x):
class DynamicConvBlock (line 350) | class DynamicConvBlock(nn.Module):
method __init__ (line 351) | def __init__(self,
method get_rpb (line 454) | def get_rpb(self):
method generate_idx (line 464) | def generate_idx(self, kernel_size):
method apply_rpb (line 472) | def apply_rpb(self, attn, rpb, height, width, kernel_size, idx_h, idx_...
method _forward_inner (line 489) | def _forward_inner(self, x, h_x, h_r):
method forward (line 570) | def forward(self, x, h_x, h_r):
class OverLoCK (line 578) | class OverLoCK(nn.Module):
method __init__ (line 583) | def __init__(self,
method _init_weights (line 757) | def _init_weights(self, m):
method reparam (line 766) | def reparam(self):
method forward_pre_features (line 771) | def forward_pre_features(self, x):
method forward_base_features (line 784) | def forward_base_features(self, x):
method forward_sub_features (line 797) | def forward_sub_features(self, x, ctx):
method forward_features (line 814) | def forward_features(self, x):
method forward (line 822) | def forward(self, x):
function _cfg (line 834) | def _cfg(url=None, **kwargs):
function overlock_xt (line 849) | def overlock_xt(pretrained=False, pretrained_cfg=None, **kwargs):
function overlock_t (line 872) | def overlock_t(pretrained=False, pretrained_cfg=None, **kwargs):
function overlock_s (line 895) | def overlock_s(pretrained=False, pretrained_cfg=None, **kwargs):
function overlock_b (line 918) | def overlock_b(pretrained=None, pretrained_cfg=None, **kwargs):
function overlock_xt_reparam (line 951) | def overlock_xt_reparam(pretrained=False, pretrained_cfg=None, **kwargs):
function overlock_t_reparam (line 964) | def overlock_t_reparam(pretrained=False, pretrained_cfg=None, **kwargs):
function overlock_s_reparam (line 977) | def overlock_s_reparam(pretrained=False, pretrained_cfg=None, **kwargs):
function overlock_b_reparam (line 989) | def overlock_b_reparam(pretrained=False, pretrained_cfg=None, **kwargs):
FILE: segmentation/mmseg_custom/align_resize.py
class AlignResize (line 10) | class AlignResize(object):
method __init__ (line 13) | def __init__(self,
method random_select (line 44) | def random_select(img_scales):
method random_sample (line 62) | def random_sample(img_scales):
method random_sample_ratio (line 89) | def random_sample_ratio(img_scale, ratio_range):
method _random_scale (line 115) | def _random_scale(self, results):
method _align (line 153) | def _align(self, img, size_divisor, interpolation=None):
method _resize_img (line 162) | def _resize_img(self, results):
method _resize_seg (line 191) | def _resize_seg(self, results):
method __call__ (line 207) | def __call__(self, results):
method __repr__ (line 225) | def __repr__(self):
FILE: segmentation/models/overlock.py
function get_conv2d (line 19) | def get_conv2d(in_channels,
function get_bn (line 58) | def get_bn(dim, use_sync_bn=False):
function fuse_bn (line 65) | def fuse_bn(conv, bn):
function convert_dilated_to_nondilated (line 70) | def convert_dilated_to_nondilated(kernel, dilate_rate):
function merge_dilated_into_large_kernel (line 84) | def merge_dilated_into_large_kernel(large_kernel, dilated_kernel, dilate...
function stem (line 94) | def stem(in_chans=3, embed_dim=96):
function downsample (line 110) | def downsample(in_dim, out_dim):
class SEModule (line 117) | class SEModule(nn.Module):
method __init__ (line 118) | def __init__(self, dim, red=8, inner_act=nn.GELU, out_act=nn.Sigmoid):
method forward (line 129) | def forward(self, x):
class LayerScale (line 135) | class LayerScale(nn.Module):
method __init__ (line 136) | def __init__(self, dim, init_value=1e-5):
method forward (line 142) | def forward(self, x):
class LayerNorm2d (line 147) | class LayerNorm2d(nn.LayerNorm):
method __init__ (line 148) | def __init__(self, dim):
method forward (line 151) | def forward(self, x):
class GRN (line 158) | class GRN(nn.Module):
method __init__ (line 164) | def __init__(self, dim, use_bias=True):
method forward (line 171) | def forward(self, x):
class DilatedReparamBlock (line 181) | class DilatedReparamBlock(nn.Module):
method __init__ (line 186) | def __init__(self, channels, kernel_size, deploy, use_sync_bn=False, a...
method forward (line 230) | def forward(self, x):
method merge_dilated_branches (line 240) | def merge_dilated_branches(self):
class CTXDownsample (line 261) | class CTXDownsample(nn.Module):
method __init__ (line 262) | def __init__(self, dim, h_dim):
method forward (line 274) | def forward(self, x, ctx):
class ResDWConv (line 280) | class ResDWConv(nn.Conv2d):
method __init__ (line 284) | def __init__(self, dim, kernel_size=3):
method forward (line 287) | def forward(self, x):
class RepConvBlock (line 292) | class RepConvBlock(nn.Module):
method __init__ (line 294) | def __init__(self,
method forward_features (line 329) | def forward_features(self, x):
method forward (line 341) | def forward(self, x):
class DynamicConvBlock (line 351) | class DynamicConvBlock(nn.Module):
method __init__ (line 352) | def __init__(self,
method get_rpb (line 455) | def get_rpb(self):
method generate_idx (line 465) | def generate_idx(self, kernel_size):
method apply_rpb (line 473) | def apply_rpb(self, attn, rpb, height, width, kernel_size, idx_h, idx_...
method _forward_inner (line 490) | def _forward_inner(self, x, h_x, h_r):
method forward (line 572) | def forward(self, x, h_x, h_r):
class OverLoCK (line 580) | class OverLoCK(nn.Module):
method __init__ (line 585) | def __init__(self,
method _init_weights (line 766) | def _init_weights(self, m):
method _convert_sync_batchnorm (line 776) | def _convert_sync_batchnorm(self):
method forward_pre_features (line 780) | def forward_pre_features(self, x):
method forward_base_features (line 799) | def forward_base_features(self, x):
method forward_sub_features (line 812) | def forward_sub_features(self, x, ctx):
method forward_features (line 835) | def forward_features(self, x):
method forward (line 843) | def forward(self, x):
function overlock_xt (line 851) | def overlock_xt(pretrained=False, pretrained_cfg=None, **kwargs):
function overlock_t (line 873) | def overlock_t(pretrained=False, pretrained_cfg=None, **kwargs):
function overlock_s (line 895) | def overlock_s(pretrained=False, pretrained_cfg=None, **kwargs):
function overlock_b (line 917) | def overlock_b(pretrained=None, pretrained_cfg=None, **kwargs):
FILE: segmentation/test.py
function parse_args (line 20) | def parse_args():
function main (line 75) | def main():
FILE: segmentation/train.py
function parse_args (line 22) | def parse_args():
function main (line 81) | def main():
FILE: train.py
function get_args_parser (line 73) | def get_args_parser():
function main (line 333) | def main(args):
function train_one_epoch (line 813) | def train_one_epoch(epoch, model, loader, optimizer, loss_fn, args,
function validate (line 1006) | def validate(model, loader, loss_fn, args, amp_autocast=suppress, log_su...
FILE: validate.py
function validate (line 117) | def validate(args):
function main (line 276) | def main():
function write_results (line 339) | def write_results(results_file, results):
Condensed preview — 64 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (357K chars).
[
{
"path": "LICENSE.md",
"chars": 11357,
"preview": " Apache License\n Version 2.0, January 2004\n "
},
{
"path": "README.md",
"chars": 7365,
"preview": "# [[CVPR 2025 Oral] OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels](https://a"
},
{
"path": "detection/configs/_base_/datasets/coco_detection.py",
"chars": 1748,
"preview": "# dataset settings\ndataset_type = 'CocoDataset'\ndata_root = '/grp01/cs_yzyu/dataset/coco2017/'\nimg_norm_cfg = dict(\n "
},
{
"path": "detection/configs/_base_/datasets/coco_instance.py",
"chars": 1775,
"preview": "# dataset settings\ndataset_type = 'CocoDataset'\ndata_root = '/grp01/cs_yzyu/dataset/coco2017/'\nimg_norm_cfg = dict(\n "
},
{
"path": "detection/configs/_base_/default_runtime.py",
"chars": 368,
"preview": "checkpoint_config = dict(interval=1)\n# yapf:disable\nlog_config = dict(\n interval=50,\n hooks=[\n dict(type='T"
},
{
"path": "detection/configs/_base_/models/cascade_mask_rcnn_r50_fpn.py",
"chars": 6949,
"preview": "# model settings\nmodel = dict(\n type='CascadeRCNN',\n backbone=dict(\n type='ResNet',\n depth=50,\n "
},
{
"path": "detection/configs/_base_/models/cascade_mask_rcnn_r50_fpn_crowdhuman.py",
"chars": 6449,
"preview": "# model settings\nmodel = dict(\n type='CascadeRCNN',\n backbone=dict(\n type='ResNet',\n depth=50,\n "
},
{
"path": "detection/configs/_base_/models/cascade_rcnn_r50_fpn.py",
"chars": 6325,
"preview": "# model settings\nmodel = dict(\n type='CascadeRCNN',\n backbone=dict(\n type='ResNet',\n depth=50,\n "
},
{
"path": "detection/configs/_base_/models/fast_rcnn_r50_fpn.py",
"chars": 2060,
"preview": "# model settings\nmodel = dict(\n type='FastRCNN',\n backbone=dict(\n type='ResNet',\n depth=50,\n "
},
{
"path": "detection/configs/_base_/models/faster_rcnn_r50_caffe_c4.py",
"chars": 3694,
"preview": "# model settings\nnorm_cfg = dict(type='BN', requires_grad=False)\nmodel = dict(\n type='FasterRCNN',\n backbone=dict("
},
{
"path": "detection/configs/_base_/models/faster_rcnn_r50_caffe_dc5.py",
"chars": 3479,
"preview": "# model settings\nnorm_cfg = dict(type='BN', requires_grad=False)\nmodel = dict(\n type='FasterRCNN',\n backbone=dict("
},
{
"path": "detection/configs/_base_/models/faster_rcnn_r50_fpn.py",
"chars": 3632,
"preview": "# model settings\nmodel = dict(\n type='FasterRCNN',\n backbone=dict(\n type='ResNet',\n depth=50,\n "
},
{
"path": "detection/configs/_base_/models/mask_rcnn_convnext_fpn.py",
"chars": 4182,
"preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n\n# All rights reserved.\n\n# This source code is licensed under the l"
},
{
"path": "detection/configs/_base_/models/mask_rcnn_r50_caffe_c4.py",
"chars": 4061,
"preview": "# model settings\nnorm_cfg = dict(type='BN', requires_grad=False)\nmodel = dict(\n type='MaskRCNN',\n backbone=dict(\n "
},
{
"path": "detection/configs/_base_/models/mask_rcnn_r50_fpn.py",
"chars": 4054,
"preview": "# model settings\nmodel = dict(\n type='MaskRCNN',\n backbone=dict(\n type='ResNet',\n depth=50,\n "
},
{
"path": "detection/configs/_base_/models/retinanet_r50_fpn.py",
"chars": 1767,
"preview": "# model settings\nmodel = dict(\n type='RetinaNet',\n backbone=dict(\n type='ResNet',\n depth=50,\n "
},
{
"path": "detection/configs/_base_/models/rpn_r50_caffe_c4.py",
"chars": 1788,
"preview": "# model settings\nmodel = dict(\n type='RPN',\n backbone=dict(\n type='ResNet',\n depth=50,\n num_s"
},
{
"path": "detection/configs/_base_/models/rpn_r50_fpn.py",
"chars": 1807,
"preview": "# model settings\nmodel = dict(\n type='RPN',\n backbone=dict(\n type='ResNet',\n depth=50,\n num_s"
},
{
"path": "detection/configs/_base_/models/ssd300.py",
"chars": 1734,
"preview": "# model settings\ninput_size = 300\nmodel = dict(\n type='SingleStageDetector',\n backbone=dict(\n type='SSDVGG'"
},
{
"path": "detection/configs/_base_/schedules/schedule_1x.py",
"chars": 319,
"preview": "# optimizer\noptimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)\noptimizer_config = dict(grad_clip=N"
},
{
"path": "detection/configs/_base_/schedules/schedule_3x.py",
"chars": 320,
"preview": "# optimizer\noptimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)\noptimizer_config = dict(grad_clip=N"
},
{
"path": "detection/configs/maskrcnn_overlock/mask_rcnn_overlock_b_in1k_fpn_1x_coco.py",
"chars": 1197,
"preview": "_base_ = [\n '../_base_/models/mask_rcnn_r50_fpn.py',\n '../_base_/datasets/coco_instance.py',\n '../_base_/schedu"
},
{
"path": "detection/configs/maskrcnn_overlock/mask_rcnn_overlock_b_in1k_fpn_3x_coco.py",
"chars": 2985,
"preview": "_base_ = [\n '../_base_/models/mask_rcnn_r50_fpn.py',\n '../_base_/datasets/coco_instance.py',\n '../_base_/schedu"
},
{
"path": "detection/configs/maskrcnn_overlock/mask_rcnn_overlock_s_in1k_fpn_1x_coco.py",
"chars": 1197,
"preview": "_base_ = [\n '../_base_/models/mask_rcnn_r50_fpn.py',\n '../_base_/datasets/coco_instance.py',\n '../_base_/schedu"
},
{
"path": "detection/configs/maskrcnn_overlock/mask_rcnn_overlock_s_in1k_fpn_3x_coco.py",
"chars": 2985,
"preview": "_base_ = [\n '../_base_/models/mask_rcnn_r50_fpn.py',\n '../_base_/datasets/coco_instance.py',\n '../_base_/schedu"
},
{
"path": "detection/configs/maskrcnn_overlock/mask_rcnn_overlock_t_in1k_fpn_1x_coco.py",
"chars": 1197,
"preview": "_base_ = [\n '../_base_/models/mask_rcnn_r50_fpn.py',\n '../_base_/datasets/coco_instance.py',\n '../_base_/schedu"
},
{
"path": "detection/configs/maskrcnn_overlock/mask_rcnn_overlock_t_in1k_fpn_3x_coco.py",
"chars": 2985,
"preview": "_base_ = [\n '../_base_/models/mask_rcnn_r50_fpn.py',\n '../_base_/datasets/coco_instance.py',\n '../_base_/schedu"
},
{
"path": "detection/models/__init__.py",
"chars": 23,
"preview": "from .overlock import *"
},
{
"path": "detection/models/overlock.py",
"chars": 33888,
"preview": "'''\nThis is an official implementation of OverLoCK model proposed in the paper: \nhttps://arxiv.org/abs/2502.20087\n'''\nim"
},
{
"path": "detection/readme.md",
"chars": 3438,
"preview": "# Applying OverLoCK to Object Detection and Instance Segmentation\n\n## 1. Requirements\n\n```\npip install mmcv-full==1.7.2 "
},
{
"path": "detection/scripts/dist_test.sh",
"chars": 225,
"preview": "#!/usr/bin/env bash\nCONFIG=$1\nCHECKPOINT=$2\nGPUS=$3\nPORT=$((RANDOM+10000))\n\nPYTHONPATH=\"$(dirname $0)/..\":$PYTHONPATH \\\n"
},
{
"path": "detection/scripts/dist_train.sh",
"chars": 200,
"preview": "#!/usr/bin/env bash\nCONFIG=$1\nGPUS=$2\nPORT=$((RANDOM+10000))\n\nPYTHONPATH=\"$(dirname $0)/..\":$PYTHONPATH \\\ntorchrun --npr"
},
{
"path": "detection/test.py",
"chars": 10012,
"preview": "import argparse\nimport os\nimport os.path as osp\nimport time\nimport warnings\n\nimport mmcv\nimport torch\nfrom mmcv import C"
},
{
"path": "detection/train.py",
"chars": 10789,
"preview": "import argparse\nimport copy\nimport os\nimport os.path as osp\nimport time\nimport warnings\n\nimport mmcv\nimport torch\nimport"
},
{
"path": "models/__init__.py",
"chars": 69,
"preview": "from .overlock import overlock_xt, overlock_t, overlock_s, overlock_b"
},
{
"path": "models/contmix.py",
"chars": 21059,
"preview": "'''\r\nThis is a plug-and-play implementation of ContMix block in the paper:\r\nhttps://arxiv.org/abs/2502.20087\r\n'''\r\nimpor"
},
{
"path": "models/overlock.py",
"chars": 35650,
"preview": "'''\nThis is an official implementation of OverLoCK model proposed in the paper: \nhttps://arxiv.org/abs/2502.20087\n'''\nim"
},
{
"path": "scripts/train_b_model.sh",
"chars": 391,
"preview": "#!/usr/bin/env bash\npython3 -m torch.distributed.launch \\\n--master_port=$((RANDOM+8888)) \\\n--nproc_per_node=8 \\\ntrain.py"
},
{
"path": "scripts/train_s_model.sh",
"chars": 391,
"preview": "#!/usr/bin/env bash\npython3 -m torch.distributed.launch \\\n--master_port=$((RANDOM+8888)) \\\n--nproc_per_node=8 \\\ntrain.py"
},
{
"path": "scripts/train_t_model.sh",
"chars": 392,
"preview": "#!/usr/bin/env bash\npython3 -m torch.distributed.launch \\\n--master_port=$((RANDOM+8888)) \\\n--nproc_per_node=8 \\\ntrain.py"
},
{
"path": "scripts/train_xt_model.sh",
"chars": 393,
"preview": "#!/usr/bin/env bash\npython3 -m torch.distributed.launch \\\n--master_port=$((RANDOM+8888)) \\\n--nproc_per_node=8 \\\ntrain.py"
},
{
"path": "segmentation/configs/_base_/datasets/ade20k.py",
"chars": 2114,
"preview": "# dataset settings\ndataset_type = 'ADE20KDataset'\ndata_root = '/grp01/cs_yzyu/dataset/ADEChallengeData2016/'\nimg_norm_cf"
},
{
"path": "segmentation/configs/_base_/default_runtime.py",
"chars": 321,
"preview": "# yapf:disable\nlog_config = dict(\n interval=50,\n hooks=[\n dict(type='TextLoggerHook', by_epoch=False),\n "
},
{
"path": "segmentation/configs/_base_/models/fpn_r50.py",
"chars": 1198,
"preview": "# copied from mmsegmentaion official config\n# https://github.com/open-mmlab/mmsegmentation/blob/master/configs/_base_/mo"
},
{
"path": "segmentation/configs/_base_/models/upernet_r50.py",
"chars": 1434,
"preview": "# copied from mmsegmentation official config\n# https://github.com/open-mmlab/mmsegmentation/blob/master/configs/_base_/m"
},
{
"path": "segmentation/configs/_base_/models/upernet_transnext.py",
"chars": 982,
"preview": "# model settings\nnorm_cfg = dict(type='SyncBN', requires_grad=True)\nmodel = dict(\n type='EncoderDecoder',\n pretrai"
},
{
"path": "segmentation/configs/_base_/schedules/schedule_160k.py",
"chars": 382,
"preview": "# optimizer\noptimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)\noptimizer_config = dict()\n# learnin"
},
{
"path": "segmentation/configs/_base_/schedules/schedule_20k.py",
"chars": 379,
"preview": "# optimizer\noptimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)\noptimizer_config = dict()\n# learnin"
},
{
"path": "segmentation/configs/_base_/schedules/schedule_40k.py",
"chars": 379,
"preview": "# optimizer\noptimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)\noptimizer_config = dict()\n# learnin"
},
{
"path": "segmentation/configs/_base_/schedules/schedule_80k.py",
"chars": 379,
"preview": "# optimizer\noptimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)\noptimizer_config = dict()\n# learnin"
},
{
"path": "segmentation/configs/overlock/upernet_overlock_base_ade20k_8xb2.py",
"chars": 1581,
"preview": "_base_ = [\n '../_base_/models/upernet_r50.py',\n '../_base_/datasets/ade20k.py',\n '../_base_/default_runtime.py'"
},
{
"path": "segmentation/configs/overlock/upernet_overlock_small_ade20k_8xb2.py",
"chars": 1580,
"preview": "_base_ = [\n '../_base_/models/upernet_r50.py',\n '../_base_/datasets/ade20k.py',\n '../_base_/default_runtime.py'"
},
{
"path": "segmentation/configs/overlock/upernet_overlock_tiny_ade20k_8xb2.py",
"chars": 1582,
"preview": "_base_ = [\n '../_base_/models/upernet_r50.py',\n '../_base_/datasets/ade20k.py',\n '../_base_/default_runtime.py'"
},
{
"path": "segmentation/mmseg_custom/__init__.py",
"chars": 37,
"preview": "from .align_resize import AlignResize"
},
{
"path": "segmentation/mmseg_custom/align_resize.py",
"chars": 9291,
"preview": "import mmcv\nimport numpy as np\nfrom mmseg.datasets.builder import PIPELINES\n\n# from IPython import embed\n# from numpy im"
},
{
"path": "segmentation/models/__init__.py",
"chars": 23,
"preview": "from .overlock import *"
},
{
"path": "segmentation/models/overlock.py",
"chars": 33507,
"preview": "'''\nThis is an official implementation of OverLoCK model proposed in the paper: \nhttps://arxiv.org/abs/2502.20087\n'''\nim"
},
{
"path": "segmentation/readme.md",
"chars": 2547,
"preview": "# Applying OverLoCK to Semantic Segmentation \n\n## 1. Requirements\n\n```\npip install mmcv-full==1.7.2 --no-cache-dir\npip"
},
{
"path": "segmentation/scripts/dist_test.sh",
"chars": 225,
"preview": "#!/usr/bin/env bash\nCONFIG=$1\nCHECKPOINT=$2\nGPUS=$3\nPORT=$((RANDOM+10000))\n\nPYTHONPATH=\"$(dirname $0)/..\":$PYTHONPATH \\\n"
},
{
"path": "segmentation/scripts/dist_train.sh",
"chars": 200,
"preview": "#!/usr/bin/env bash\nCONFIG=$1\nGPUS=$2\nPORT=$((RANDOM+10000))\n\nPYTHONPATH=\"$(dirname $0)/..\":$PYTHONPATH \\\ntorchrun --npr"
},
{
"path": "segmentation/test.py",
"chars": 5946,
"preview": "import warnings\nimport argparse\nimport os\n\nimport mmcv\nimport torch\nfrom mmcv.parallel import MMDataParallel, MMDistribu"
},
{
"path": "segmentation/train.py",
"chars": 6949,
"preview": "import argparse\nimport warnings\nimport copy\nimport os\nimport os.path as osp\nimport time\n\nimport mmcv\nimport torch\nfrom m"
},
{
"path": "train.py",
"chars": 51864,
"preview": "#!/usr/bin/env python3\n\"\"\" ImageNet Training Script\n\nThis is intended to be a lean and easily modifiable ImageNet traini"
},
{
"path": "validate.py",
"chars": 15322,
"preview": "#!/usr/bin/env python3\n\"\"\" ImageNet Validation Script\n\nThis is intended to be a lean and easily modifiable ImageNet vali"
}
]
About this extraction
This page contains the full source code of the LMMMEng/OverLoCK GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 64 files (334.9 KB), approximately 86.9k tokens, and a symbol index with 222 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.